From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 09 06:35:47 2020 Received: (at 39258) by debbugs.gnu.org; 9 Mar 2020 10:35:47 +0000 Received: from localhost ([127.0.0.1]:49844 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jBFlD-0003cV-7L for submit@debbugs.gnu.org; Mon, 09 Mar 2020 06:35:47 -0400 Received: from eggs.gnu.org ([209.51.188.92]:53072) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jBFlB-0003cH-Ga for 39258@debbugs.gnu.org; Mon, 09 Mar 2020 06:35:45 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:39688) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jBFl4-0001Sw-4s; Mon, 09 Mar 2020 06:35:38 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=40414 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jBFl3-0002tB-0L; Mon, 09 Mar 2020 06:35:37 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Arun Isaac Subject: Re: [PATCH v2 0/3] Xapian for Guix package search References: <20200307133116.11443-1-arunisaac@systemreboot.net> <87sgijgb1v.fsf@gnu.org> <875zffcc87.fsf@gnu.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 20 =?utf-8?Q?Vent=C3=B4se?= an 228 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 09 Mar 2020 11:35:35 +0100 In-Reply-To: (Arun Isaac's message of "Mon, 09 Mar 2020 01:57:40 +0530") Message-ID: <87r1y13jew.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 39258 Cc: mail@ambrevar.xyz, 39258@debbugs.gnu.org, zimon.toutoune@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hello! Arun Isaac skribis: >>> This could be accomplished even with pre-rendering. Xapian provides >>> "slots" to store arbitrary strings with a document. Instead of storing >>> the pre-rendered document as a whole, we could store pre-rendered fields >>> in separate slots. Then, during `guix search` time, we can assemble the >>> result from these pre-rendered fields. >> >> I=E2=80=99m not sure I understand. The index wouldn=E2=80=99t store pre= -rendered >> strings for every possible terminal width, right? > > No, it wouldn't. It would store a partially pre-rendered string, that is > without fill-paragraph. We run fill-paragraph at `guix search` time to > complete the rendering. Note that Texinfo rendering doesn=E2=80=99t use (@ (guix ui) fill-paragraph= ). It has its own paragraph-filling code. We cannot use =E2=80=98fill-paragra= ph=E2=80=99 after Texinfo rendering anyway, since Texinfo knows where things can be filled and where they cannot=E2=80=94e.g., @example. >> I think we need to take the whole user experience into account, not >> just =E2=80=98guix search=E2=80=99. =E2=80=98guix pull=E2=80=99 already= feels very slow, and it=E2=80=99s a >> fairly common operation. Conversely, =E2=80=98guix search=E2=80=99 take= s roughly >> between 0.5 and 2 seconds and is an uncommon operation on a =E2=80=9Cslow >> path=E2=80=9D (in the sense that when you=E2=80=99re searching for softw= are, you=E2=80=99ll >> probably have to spend more than a couple of seconds to find what >> you=E2=80=99re looking for.) > > I agree we can't compromise too much on `guix pull` performance. > >> To me, adding 20=E2=80=9350 seconds on =E2=80=98guix pull=E2=80=99 would= be undesirable. :-/ > > Maybe I'm missing something here. guix pull takes around 40 minutes on > my machine. In comparison to that, is another 20-50 seconds (roughly 1 > minute) a big deal? How much time would it be acceptable to spend on > building the Xapian index? On my laptop, in the best case, when all the substitutes are available (not uncommon), it takes 2 minutes. Sometimes, when some substitutes are missing, it takes 15 minutes. So of course, the 20=E2=80=9350 seconds matter only in the best case. But = they matter primarily because that index build may not be substitutable: it=E2= =80=99s possibly unique to each profile (see below). That means we know we=E2=80= =99re often going to pay for it. > Also, is it possible to somehow provide substitutes for the Xapian index > so that the user does not have to actually build it locally during `guix > pull` time? We could provide a substitute for users who use only the official 'guix channel. However, as soon as users combine multiple channels, they=E2=80= =99ll have to build the index locally. >> I=E2=80=99m not sufficiently familiar with Xapian=E2=80=99s query langua= ge. The >> examples I had in mind were: >> It=E2=80=99s not so much about regexps than it is about selecting indivi= dual >> fields. > > I have totally not tested this, but I imagine that equivalent Xapian > queries might look something like: > >> guix search | recsel -p name -e 'license ~ "LGPL 3"' > > guix search license:LGPL3 Nice. >> guix search crypto library | \ >> recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis > > guix search crypto library AND (NOT ghc) AND (NOT perl) AND (NOT python) > AND (NOT ruby) This one is not quite equivalent I guess, but yeah. :-) >> What I meant was that we could use (statprof) to see whether/how Texinfo >> rendering/parsing can be optimized. > > Oh, ok. I'll try this if we decide not to pre-render. It=E2=80=99d be beneficial anyways. Thank you! Ludo=E2=80=99.