From debbugs-submit-bounces@debbugs.gnu.org Sun May 03 12:43:53 2020 Received: (at 39258) by debbugs.gnu.org; 3 May 2020 16:43:53 +0000 Received: from localhost ([127.0.0.1]:57765 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVHia-0001U2-95 for submit@debbugs.gnu.org; Sun, 03 May 2020 12:43:53 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38276) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVHiZ-0001Tq-1f for 39258@debbugs.gnu.org; Sun, 03 May 2020 12:43:51 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:55928) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jVHiS-0005fb-Aw; Sun, 03 May 2020 12:43:44 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=49660 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jVHiR-0001GQ-HC; Sun, 03 May 2020 12:43:44 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: zimoun Subject: Re: [PATCH v4 0/3] Faster cache generation (similar as v3) References: <20200503150154.26532-1-zimon.toutoune@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 15 =?utf-8?Q?Flor=C3=A9al?= an 228 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Sun, 03 May 2020 18:43:41 +0200 In-Reply-To: <20200503150154.26532-1-zimon.toutoune@gmail.com> (zimoun's message of "Sun, 3 May 2020 17:01:51 +0200") Message-ID: <87r1w1ynnm.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 39258 Cc: arunisaac@systemreboot.net, mail@ambrevar.xyz, 39258@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello! zimoun skribis: > The aim of this version v4 is to keep the same searching performances as = the previous version v3 but to drastically reduce the generation of the cac= he. On my laptop, the overhead is now 4 seconds; compared to more than 20 = seconds for v2 and v3. > > # default > time guix build /gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-= cache.drv --check > # v4 > time guix build /gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-= cache.drv --check > > | | default | v4 | > |------+----------+-----------| > | real | 0m6.012s | 0m10.244s | > | user | 0m0.541s | 0m0.542s | > | sys | 0m0.033s | 0m0.032s | Not bad! > In the version v3, the cache is built using 'cons' and 'fold-packages' (w= rapper to 'fold-module-public-variables'). The version v4 modifies -- by a= dding other information -- the function 'generate-package-cache' which uses= 'vhash' and 'fold-module-public-variables*'. > > Therefore the cache '/lib/guix/package.cache' contains more > information. This breaks the binary interface, so we=E2=80=99ll have to analyze the impa= ct of such a change and devise a strategy. > (The v4 structure of 'package.cache' is a quick draft, so details > should be discussed and an interesting move should to have a > structured (binary and all strings) S-exp; because it should become an > entry point to export the packages list to JSON. WDYT?) It=E2=80=99s on purpose that this cache is an object file: it just needs to= be mmap=E2=80=99d, and that=E2=80=99s it. It=E2=80=99s the cheapest possible = way to do it. Parsing sexps would be more costly, and since we=E2=80=99re talking about startup time, this is sensitive. > Now, we are comparing apples to apples and the cost to compute BM25 (v2) = is not free at all. Remember that BM25 is the state-of-the-art of informat= ion retrieval (relevance ranking) and it is delegated to Xapian (v2). I do= not know if there is perfomance bottleneck between Guix, Guile-Xapian and = Xapian itself but for sure the computation of BM25 is not free. More about= that soon. > > To be clear about BM25 and caching, what I have in mind is: > 1. "guix search --build-index" optionally done by the user if they want= s for example the BM25 ranking. Something that must be done explicitly doesn=E2=80=99t seem great to me. A= s a user, I=E2=80=99d rather not think about search indexes and all. But I don= =E2=80=99t know, maybe if it happened automatically on the first =E2=80=98guix search= =E2=80=99 invocation that=E2=80=99d be fine. > 2. Use BM25 metrics to detect poor package meta-data (synopsis and desc= ription); if it worth why not add another checker to "guix lint". That=E2=80=99d be interesting! > 1. The name of 'fold-packages*' should be misleading since it does not r= eturn "true" packages. Did you see =E2=80=98fold-available-packages=E2=80=99? It seems you could = extend it instead of introducing =E2=80=98fold-packages*=E2=80=99, no? > 2. The function 'package->recutils' in 'guix/ui.scm' is modified but it = is not the better. > > (match (package-supported-systems p) > (('cache supported-systems) > (string-join supported-systems)) > (_ > (string-join (package-transitive-supported-systems p))))) > > However it avoids to duplicate code; as it is done in version v3. I made suggestions to Arun=E2=80=99s v3 about the API here. Essentially, I think I proposed having a procedure that takes the list of fields as keyword parameters, and =E2=80=98package->recutils=E2=80=99 would just dele= gate to that. > 3. Deprecated packages are displayed (bug in v3 too). > > 4. Impolite '@@' is used to access the private license construction. (guix licenses) could provide a =E2=80=98string->license=E2=80=99 procedure. Stopping here for now because I=E2=80=99m sorta drowning in patch review. = :-) Thanks for exploring this design space, we=E2=80=99re making progress! Ludo=E2=80=99.