Faster guix search using an sqlite cache

OpenSubmitted by Arun Isaac.
Details
5 participants
  • Arun Isaac
  • Jonathan Brielmaier
  • Ludovic Courtès
  • Pierre Neidhardt
  • zimoun
Owner
unassigned
Severity
important
A
A
Arun Isaac wrote on 23 Jan 20:51 +0100
(address . guix-patches@gnu.org)
cu7pnfaar36.fsf@systemreboot.net
Hi,
As discussed on guix-devel athttps://lists.gnu.org/archive/html/guix-devel/2020-01/msg00310.html, Iam working on an sqlite cache to improve guix search performance. I haveattached a highly incomplete WIP patch. The patch attempts toreimplement the package-cache-file hook in guix/channels.scm using asqlite database. To this end, it rewrites most of thegenerate-package-cache and cache-lookup functions in gnu/packages.scm. Iam yet to hook this up to guix search.
At the moment, I am having some difficulty populating the sqlitedatabase. generate-package-cache populates the database correctly wheninvoked from a normal guile REPL using geiser, but fails to do so whenrun by the guix daemon during guix pull.
I ran guix pull using
$ ./pre-inst-env guix pull --url=$PWD --branch=search -p /tmp/test
where search is the branch I am working on.
Running
$ ls /tmp/test/lib/guix -lh
shows
total 2.1M-r--r--r-- 2 root root 2.1M ஜன. 1 1970 package-cache.sqlite-r--r--r-- 2 root root 26K ஜன. 1 1970 package-cache.sqlite-journal
On examining package-cache.sqlite, I find that no records have beenwritten. And, there is a lingering journal file that shouldn't bethere. For some reason, populating the sqlite database does not workwith guix pull. sqlite probably crashes and leaves the journal file.
If I try to populate the database with each package record beinginserted in its own transaction, at least some of the insertionswork. But the journal file still lingers. My unverified guess is thateverything except the last transaction was successful.
Any ideas what's going on?
Also, inserting each package in its own transaction is ridiculously slowand so that is out of the question. See https://www.sqlite.org/faq.html#q19
From d1305351a90a84eb75e4769284d5e06927eade3e Mon Sep 17 00:00:00 2001From: Arun Isaac <arunisaac@systemreboot.net>Date: Tue, 21 Jan 2020 20:45:43 +0530Subject: [PATCH] fast search
--- build-aux/build-self.scm | 5 + gnu/packages.scm | 207 +++++++++++++++++++++++---------------- 2 files changed, 128 insertions(+), 84 deletions(-)
Toggle diff (306 lines)diff --git a/build-aux/build-self.scm b/build-aux/build-self.scmindex fc13032b73..c123ad3b11 100644--- a/build-aux/build-self.scm+++ b/build-aux/build-self.scm@@ -264,6 +264,9 @@ interface (FFI) of Guile.") (define fake-git (scheme-file "git.scm" #~(define-module (git)))) + (define fake-sqlite3+ (scheme-file "sqlite3.scm" #~(define-module (sqlite3))))+ (with-imported-modules `(((guix config) => ,(make-config.scm)) @@ -278,6 +281,8 @@ interface (FFI) of Guile.") ;; (git) to placate it. ((git) => ,fake-git) + ((sqlite3) => ,fake-sqlite3)+ ,@(source-module-closure `((guix store) (guix self) (guix derivations)diff --git a/gnu/packages.scm b/gnu/packages.scmindex d22c992bb1..4e2c52e62d 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -43,6 +43,7 @@ #:use-module (srfi srfi-34) #:use-module (srfi srfi-35) #:use-module (srfi srfi-39)+ #:use-module (sqlite3) #:export (search-patch search-patches search-auxiliary-file@@ -204,10 +205,8 @@ PROC is called along these lines: PROC can use #:allow-other-keys to ignore the bits it's not interested in. When a package cache is available, this procedure does not actually load any package module."- (define cache- (load-package-cache (current-profile)))-- (if (and cache (cache-is-authoritative?))+ (if (and (cache-is-authoritative?)+ (current-profile)) (vhash-fold (lambda (name vector result) (match vector (#(name version module symbol outputs@@ -220,7 +219,7 @@ package module." #:supported? supported? #:deprecated? deprecated?)))) init- cache)+ (cache-lookup (current-profile))) (fold-packages (lambda (package result) (proc (package-name package) (package-version package)@@ -252,31 +251,7 @@ is guaranteed to never traverse the same package twice." (define %package-cache-file ;; Location of the package cache.- "/lib/guix/package.cache")--(define load-package-cache- (mlambda (profile)- "Attempt to load the package cache. On success return a vhash keyed by-package names. Return #f on failure."- (match profile- (#f #f)- (profile- (catch 'system-error- (lambda ()- (define lst- (load-compiled (string-append profile %package-cache-file)))- (fold (lambda (item vhash)- (match item- (#(name version module symbol outputs- supported? deprecated?- file line column)- (vhash-cons name item vhash))))- vlist-null- lst))- (lambda args- (if (= ENOENT (system-error-errno args))- #f- (apply throw args))))))))+ "/lib/guix/package-cache.sqlite") (define find-packages-by-name/direct ;bypass the cache (let ((packages (delay@@ -297,25 +272,57 @@ decreasing version order." matching) matching))))) -(define (cache-lookup cache name)+(define* (cache-lookup profile #:optional name) "Lookup package NAME in CACHE. Return a list sorted in increasing version order." (define (package-version<? v1 v2) (version>? (vector-ref v2 1) (vector-ref v1 1))) - (sort (vhash-fold* cons '() name cache)- package-version<?))+ (define (int->boolean n)+ (case n+ ((0) #f)+ ((1) #t)))++ (define (string->list str)+ (call-with-input-string str read))++ (define select-statement+ (string-append+ "SELECT name, version, module, symbol, outputs, supported, superseded, locationFile, locationLine, locationColumn from packages"+ (if name " WHERE name = :name" "")))++ (define cache-file+ (string-append profile %package-cache-file))++ (let* ((db (sqlite-open cache-file SQLITE_OPEN_READONLY))+ (statement (sqlite-prepare db select-statement)))+ (when name+ (sqlite-bind-arguments statement #:name name))+ (let ((result (sqlite-fold (lambda (v result)+ (match v+ (#(name version module symbol outputs supported superseded file line column)+ (cons+ (vector name+ version+ (string->list module)+ (string->symbol symbol)+ (string->list outputs)+ (int->boolean supported)+ (int->boolean superseded)+ (list file line column))+ result))))+ '() statement)))+ (sqlite-finalize statement)+ (sqlite-close db)+ (sort result package-version<?)))) (define* (find-packages-by-name name #:optional version) "Return the list of packages with the given NAME. If VERSION is not #f, then only return packages whose version is prefixed by VERSION, sorted in decreasing version order."- (define cache- (load-package-cache (current-profile)))-- (if (and (cache-is-authoritative?) cache)- (match (cache-lookup cache name)- (#f #f)+ (if (and (cache-is-authoritative?)+ (current-profile))+ (match (cache-lookup (current-profile) name) ((#(_ versions modules symbols _ _ _ _ _ _) ...) (fold (lambda (version* module symbol result) (if (or (not version)@@ -331,12 +338,9 @@ decreasing version order." (define* (find-package-locations name #:optional version) "Return a list of version/location pairs corresponding to each package matching NAME and VERSION."- (define cache- (load-package-cache (current-profile)))-- (if (and cache (cache-is-authoritative?))- (match (cache-lookup cache name)- (#f '())+ (if (and (cache-is-authoritative?)+ (current-profile))+ (match (cache-lookup (current-profile) name) ((#(name versions modules symbols outputs supported? deprecated? files lines columns) ...)@@ -372,6 +376,9 @@ VERSION." ;; Prevent Guile 3 from inlining this procedure so we can mock it in tests. (set! find-best-packages-by-name find-best-packages-by-name) +(define (list->string x)+ (call-with-output-string (cut write x <>)))+ (define (generate-package-cache directory) "Generate under DIRECTORY a cache of all the available packages. @@ -381,49 +388,81 @@ reducing the memory footprint." (define cache-file (string-append directory %package-cache-file)) - (define (expand-cache module symbol variable result+seen)+ (define schema+ "CREATE TABLE packages (name text,+version text,+module text,+symbol text,+outputs text,+supported int,+superseded int,+locationFile text,+locationLine int,+locationColumn int);+CREATE VIRTUAL TABLE packageSearch USING fts5(name, searchText);")++ (define insert-statement+ "INSERT INTO packages(name, version, module, symbol, outputs, supported, superseded, locationFile, locationLine, locationColumn)+VALUES(:name, :version, :module, :symbol, :outputs, :supported, :superseded, :locationfile, :locationline, :locationcolumn)")++ (define insert-package-search-statement+ "INSERT INTO packageSearch(name, searchText) VALUES(:name, :searchtext)")++ (define (boolean->int x)+ (if x 1 0))++ (define (list->string x)+ (call-with-output-string (cut write x <>)))++ (define (insert-package db module symbol variable seen) (match (false-if-exception (variable-ref variable)) ((? package? package)- (match result+seen- ((result . seen)- (if (or (vhash-assq package seen)- (hidden-package? package))- (cons result seen)- (cons (cons `#(,(package-name package)- ,(package-version package)- ,(module-name module)- ,symbol- ,(package-outputs package)- ,(->bool (supported-package? package))- ,(->bool (package-superseded package))- ,@(let ((loc (package-location package)))- (if loc- `(,(location-file loc)- ,(location-line loc)- ,(location-column loc))- '(#f #f #f))))- result)- (vhash-consq package #t seen))))))- (_- result+seen)))-- (define exp- (first- (fold-module-public-variables* expand-cache- (cons '() vlist-null)- (all-modules (%package-module-path)- #:warn- warn-about-load-error))))+ (cond+ ((or (vhash-assq package seen)+ (hidden-package? package))+ seen)+ (else+ (let ((statement (sqlite-prepare db insert-statement)))+ (sqlite-bind-arguments statement+ #:name (package-name package)+ #:version (package-version package)+ #:module (list->string (module-name module))+ #:symbol (symbol->string symbol)+ #:outputs (list->string (package-outputs package))+ #:supported (boolean->int (supported-package? package))+ #:superseded (boolean->int (package-superseded package))+ #:locationfile (cond+ ((package-location package) => location-file)+ (else #f))+ #:locationline (cond+ ((package-location package) => location-line)+ (else #f))+ #:locationcolumn (cond+ ((package-location package) => location-column)+ (else #f)))+ (sqlite-fold cons '() statement)+ (sqlite-finalize statement))+ (let ((statement (sqlite-prepare db insert-package-search-statement)))+ (sqlite-bind-arguments statement+ #:name (package-name package)+ #:searchtext (package-description package))+ (sqlite-fold cons '() statement)+ (sqlite-finalize statement))+ (vhash-consq package #t seen))))+ (_ seen))) (mkdir-p (dirname cache-file))- (call-with-output-file cache-file- (lambda (port)- ;; Store the cache as a '.go' file. This makes loading fast and reduces- ;; heap usage since some of the static data is directly mmapped.- (put-bytevector port- (compile `'(,@exp)- #:to 'bytecode- #:opts '(#:to-file? #t)))))+ (let ((db (sqlite-open cache-file)))+ (sqlite-exec db schema)+ (sqlite-exec db "BEGIN")+ (fold-module-public-variables* (cut insert-package db <> <> <> <>)+ vlist-null+ (all-modules (%package-module-path)+ #:warn+ warn-about-load-error))+ (sqlite-exec db "COMMIT;")+ (sqlite-close db))+ cache-file) -- 2.23.0
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl4p+V0ACgkQLiXui2GAK7Oe/wf/VUd/Gcd+KFyJxXTfML9vGIxR7xUXl6M92mevuADXCxI4JECqMmjIuj9fZbC6o/D+XcnJz7XyttQZi/iyjbwZIA0DwUbdAg5BRP8cK6ZkCflPfjamWNQ2RVYu2S+oITgatidZTLDTFGP6RYeXN27I+fkK5P28XSJHa69aE34bVor0R3bb7Ki57OVS+cEYu6nlGbADqpFpLT6VjB7ewgr9wt0tQyq721JevZzi3PNb+WVq6Pi2N69nHDdOkslYfUVi2kWnN9i3gtBnEwVo2cj2uaD2eSYd5YoA2c5kxHVomo5/CiuDUqdCyxd+/Xbjbug+6C7SeCcBbfVW0RgRo2Dvng===HLpI-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 30 Jan 00:33 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)(address . 39258@debbugs.gnu.org)
CAJ3okZ05JXVgbKJd_WeJKERJd6rc4SwxLRrRR0AyW48QDyX8eQ@mail.gmail.com
Hi Arun,
Thank you for the patch!Cool! :-)
I have not tested it yet. Sorry.

On Thu, 23 Jan 2020 at 20:53, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (5 lines)> At the moment, I am having some difficulty populating the sqlite> database. generate-package-cache populates the database correctly when> invoked from a normal guile REPL using geiser, but fails to do so when> run by the guix daemon during guix pull.
[...]
Toggle quote (5 lines)> On examining package-cache.sqlite, I find that no records have been> written. And, there is a lingering journal file that shouldn't be> there. For some reason, populating the sqlite database does not work> with guix pull. sqlite probably crashes and leaves the journal file.
Hum? weird...Is it possible that a module is loaded when Guile repl is used and notwith Guix pull?What about "guix repl"?

Toggle quote (3 lines)> If I try to populate the database with each package record being> inserted in its own transaction, at least some of the insertions
You mean 'commit' the database after each insertion, right?

Toggle quote (3 lines)> work. But the journal file still lingers. My unverified guess is that> everything except the last transaction was successful.
And this does not happen with the repl, right?

Toggle quote (2 lines)> Any ideas what's going on?
I have no idea.Weird.
What about adding 'last-insert-row-id' from 'guix/store/database.scm'?I mean without really understanding and just grepping'sqlite-finalize' spots that 'last-insert-row-id' seems often around.:-)

Toggle quote (3 lines)> Also, inserting each package in its own transaction is ridiculously slow> and so that is out of the question. See https://www.sqlite.org/faq.html#q19
Agree that it is not an option. :-)

Otherwise, 'list->string' is defined twice. And the first one is notnecessary, I guess.The docstring of 'cache-lookup' is not coherent anymore. :-)

Cheers,simon
A
A
Arun Isaac wrote on 30 Jan 14:48 +0100
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 39258@debbugs.gnu.org)
cu74kwdawc7.fsf@systemreboot.net
Toggle quote (4 lines)> Hum? weird...> Is it possible that a module is loaded when Guile repl is used and not> with Guix pull?
It could be. But I don't know how to confirm this theory.
Toggle quote (2 lines)> What about "guix repl"?
I just tried 'guix repl'. It worked correctly, just like guile repl.
Toggle quote (5 lines)>> If I try to populate the database with each package record being>> inserted in its own transaction, at least some of the insertions>> You mean 'commit' the database after each insertion, right?
Yes, that is what I mean.
Toggle quote (5 lines)>> work. But the journal file still lingers. My unverified guess is that>> everything except the last transaction was successful.>> And this does not happen with the repl, right?
No, this does not happen with the repl.
Toggle quote (5 lines)>> Any ideas what's going on?>> I have no idea.> Weird.
Do you know of any way sqlite can create an error log to report what'sgoing on? That might really help debug this issue.
Toggle quote (5 lines)> What about adding 'last-insert-row-id' from 'guix/store/database.scm'?> I mean without really understanding and just grepping> 'sqlite-finalize' spots that 'last-insert-row-id' seems often around.> :-)
I tried this just now, but still the journal lingers.
Toggle quote (3 lines)> Otherwise, 'list->string' is defined twice. And the first one is not> necessary, I guess.
Ah, thanks for catching this!
Toggle quote (2 lines)> The docstring of 'cache-lookup' is not coherent anymore. :-)
Yes, I haven't gotten around to fixing up all those yet. I thought I'llget the code working first.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl4y3sgACgkQLiXui2GAK7O5mQf/QRSZdYPR4DAVXlfg04OEoCKgTi4m9SLL3ByJjMazxfuR9gXS6o97Sr/p5IH2tchH2WwXM/ZN1R8a5utlrWR49azy9Z9uW+b2Bk48aQkp2l5uLz+uQDekBZ+zo7gIYA+dnuGtJMlLG90kB2P6TP1LoaOm0rZyOCRPb4GoNnIjv3v9yxEU2Z7vMcbQ6AOd/lj2Owr4Ur3BrXn5+XNfZO34uXMHHleZq/R7cwxPqmK8xj7xsArPG0/5o2XNvJYVe9AC8J2DMEcyU5wafxs9DebxnuZZAB3umSQeUuFiLIT34fkBSQmW2vEIvaC2R5zEZ9E6PqcpRz8K7rp18z8hcX+jVw===Akh8-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 31 Jan 13:48 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)(address . 39258@debbugs.gnu.org)
CAJ3okZ0GXS9ibh__1DFsuCCjO82R6-YXtPAOF3GoquN=sSsaWA@mail.gmail.com
Hi,
On Thu, 30 Jan 2020 at 14:49, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (1 lines)> >> Any ideas what's going on?
[...]
Toggle quote (3 lines)> Do you know of any way sqlite can create an error log to report what's> going on? That might really help debug this issue.
Danny told me something like:
(catch (sqlite-error
I have not tried yet.


Toggle quote (5 lines)> > The docstring of 'cache-lookup' is not coherent anymore. :-)>> Yes, I haven't gotten around to fixing up all those yet. I thought I'll> get the code working first.
Yes, I imagine. Just to notice. :-)


All the best,simon
A
A
Arun Isaac wrote on 2 Feb 22:16 +0100
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 39258@debbugs.gnu.org)
cu7y2tkadwi.fsf@systemreboot.net
Toggle quote (6 lines)> Danny told me something like:>> (catch (sqlite-error>> I have not tried yet.
Thank you, this was useful. I was able to catch and report the error. Ialso found the log file for the guix-package-cache profile hook. It says
(repl-version 0 0)Generating package cache for '/gnu/store/b6f9b5qbcn4r932whrr6m15rdimbgrhs-profile'...(exception sqlite-error (value sqlite-open) (value 14) (value "Unable to open the database file"))
This could be a permission error, or something to do with the existenceor lack thereof of certain directories (such as /var) in the chroot ofthe build daemon. I'm still figuring it out.
I'm also in half a mind to get some guile xapian bindings ready so wecan just do that instead of messing with sqlite here. But, let'ssee. :-P
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl43PB4ACgkQLiXui2GAK7O0Hwf/bpqqdDy9FynrZNWUqne4w/IQ5nqJLIxZJ30IcbBu8TxWZp31+UjrBE5MlJI9GQglb+K1atySjm1ZPhQ/uzsiheXF+ENULTP6HeC1bHgZaWQpPhzmknMEfeoGd1NU3ePQHLiE5IjPm6PptdYnbzCRmcmGupD23VHCIPbr1LVFSS7ajsFiKofehDytLgpANYLVDokhe0w99Oa6Htf4C3WwkICMPCVJA01LborzdVs65Kso+HpQkBh3egjw+m8sdF+xoPBYKSR+GlhZjXTWFKv8l/btMu6UCJmjRFflN+jqHT01HCiWmeXtESV1zowEz7TtWQOjPhfJeW7igm73ECKkGA===62li-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 4 Feb 11:19 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)(address . 39258@debbugs.gnu.org)
CAJ3okZ37rFh=SB-f6nqc0kMpCai=pw2D0-GFiakXY1p4FHHftA@mail.gmail.com
Hi,
On Sun, 2 Feb 2020 at 22:16, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (2 lines)> Thank you, this was useful. I was able to catch and report the error. I
Where have you reported the error?

Toggle quote (10 lines)> also found the log file for the guix-package-cache profile hook. It says>> (repl-version 0 0)> Generating package cache for '/gnu/store/b6f9b5qbcn4r932whrr6m15rdimbgrhs-profile'...> (exception sqlite-error (value sqlite-open) (value 14) (value "Unable to open the database file"))>> This could be a permission error, or something to do with the existence> or lack thereof of certain directories (such as /var) in the chroot of> the build daemon. I'm still figuring it out.
Hum? And this should explain why it is working with the REPL and notwith the CLI, right?

Toggle quote (4 lines)> I'm also in half a mind to get some guile xapian bindings ready so we> can just do that instead of messing with sqlite here. But, let's> see. :-P
Cool!Let me know if you push something somewhere.

Cheers,simon
A
A
Arun Isaac wrote on 6 Feb 02:58 +0100
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 39258@debbugs.gnu.org)
cu7v9oka341.fsf@systemreboot.net
Toggle quote (4 lines)>> Thank you, this was useful. I was able to catch and report the error. I>> Where have you reported the error?
I reported the error to the derivation log. For example, if thederivation for the guix-package-cache derivation is/gnu/store/cyf2h3frcjxm147dii5qic8d6kpm39nq-guix-package-cache.drv, thelog file will be at/var/log/guix/drvs/cy/f2h3frcjxm147dii5qic8d6kpm39nq-guix-package-cache.drv.bz2. Noticethat the directory name under drvs is the first two letters of the hash,and the file name under that directory is the remaining letters.
Also please find attached a dump of my code so far.
Toggle quote (7 lines)>> This could be a permission error, or something to do with the existence>> or lack thereof of certain directories (such as /var) in the chroot of>> the build daemon. I'm still figuring it out.>> Hum? And this should explain why it is working with the REPL and not> with the CLI, right?
This could expalin it, but I am not sure if this is the correctexplanation.
Toggle quote (7 lines)>> I'm also in half a mind to get some guile xapian bindings ready so we>> can just do that instead of messing with sqlite here. But, let's>> see. :-P>> Cool!> Let me know if you push something somewhere.
Sure, will let you know.
From 4c883fcff1f44339b28df6ccdb2b10c906439e3d Mon Sep 17 00:00:00 2001From: Arun Isaac <arunisaac@systemreboot.net>Date: Tue, 21 Jan 2020 20:45:43 +0530Subject: [PATCH] fast search
--- build-aux/build-self.scm | 5 + gnu/packages.scm | 234 +++++++++++++++++++++++++-------------- 2 files changed, 155 insertions(+), 84 deletions(-)
Toggle diff (333 lines)diff --git a/build-aux/build-self.scm b/build-aux/build-self.scmindex fc13032b73..c123ad3b11 100644--- a/build-aux/build-self.scm+++ b/build-aux/build-self.scm@@ -264,6 +264,9 @@ interface (FFI) of Guile.") (define fake-git (scheme-file "git.scm" #~(define-module (git)))) + (define fake-sqlite3+ (scheme-file "sqlite3.scm" #~(define-module (sqlite3))))+ (with-imported-modules `(((guix config) => ,(make-config.scm)) @@ -278,6 +281,8 @@ interface (FFI) of Guile.") ;; (git) to placate it. ((git) => ,fake-git) + ((sqlite3) => ,fake-sqlite3)+ ,@(source-module-closure `((guix store) (guix self) (guix derivations)diff --git a/gnu/packages.scm b/gnu/packages.scmindex d22c992bb1..0ae5b84284 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -43,6 +43,7 @@ #:use-module (srfi srfi-34) #:use-module (srfi srfi-35) #:use-module (srfi srfi-39)+ #:use-module (sqlite3) #:export (search-patch search-patches search-auxiliary-file@@ -204,10 +205,8 @@ PROC is called along these lines: PROC can use #:allow-other-keys to ignore the bits it's not interested in. When a package cache is available, this procedure does not actually load any package module."- (define cache- (load-package-cache (current-profile)))-- (if (and cache (cache-is-authoritative?))+ (if (and (cache-is-authoritative?)+ (current-profile)) (vhash-fold (lambda (name vector result) (match vector (#(name version module symbol outputs@@ -220,7 +219,7 @@ package module." #:supported? supported? #:deprecated? deprecated?)))) init- cache)+ (cache-lookup (current-profile))) (fold-packages (lambda (package result) (proc (package-name package) (package-version package)@@ -252,31 +251,7 @@ is guaranteed to never traverse the same package twice." (define %package-cache-file ;; Location of the package cache.- "/lib/guix/package.cache")--(define load-package-cache- (mlambda (profile)- "Attempt to load the package cache. On success return a vhash keyed by-package names. Return #f on failure."- (match profile- (#f #f)- (profile- (catch 'system-error- (lambda ()- (define lst- (load-compiled (string-append profile %package-cache-file)))- (fold (lambda (item vhash)- (match item- (#(name version module symbol outputs- supported? deprecated?- file line column)- (vhash-cons name item vhash))))- vlist-null- lst))- (lambda args- (if (= ENOENT (system-error-errno args))- #f- (apply throw args))))))))+ "/lib/guix/package-cache.sqlite") (define find-packages-by-name/direct ;bypass the cache (let ((packages (delay@@ -297,25 +272,57 @@ decreasing version order." matching) matching))))) -(define (cache-lookup cache name)+(define* (cache-lookup profile #:optional name) "Lookup package NAME in CACHE. Return a list sorted in increasing version order." (define (package-version<? v1 v2) (version>? (vector-ref v2 1) (vector-ref v1 1))) - (sort (vhash-fold* cons '() name cache)- package-version<?))+ (define (int->boolean n)+ (case n+ ((0) #f)+ ((1) #t)))++ (define (string->list str)+ (call-with-input-string str read))++ (define select-statement+ (string-append+ "SELECT name, version, module, symbol, outputs, supported, superseded, locationFile, locationLine, locationColumn from packages"+ (if name " WHERE name = :name" "")))++ (define cache-file+ (string-append profile %package-cache-file))++ (let* ((db (sqlite-open cache-file SQLITE_OPEN_READONLY))+ (statement (sqlite-prepare db select-statement)))+ (when name+ (sqlite-bind-arguments statement #:name name))+ (let ((result (sqlite-fold (lambda (v result)+ (match v+ (#(name version module symbol outputs supported superseded file line column)+ (cons+ (vector name+ version+ (string->list module)+ (string->symbol symbol)+ (string->list outputs)+ (int->boolean supported)+ (int->boolean superseded)+ (list file line column))+ result))))+ '() statement)))+ (sqlite-finalize statement)+ (sqlite-close db)+ (sort result package-version<?)))) (define* (find-packages-by-name name #:optional version) "Return the list of packages with the given NAME. If VERSION is not #f, then only return packages whose version is prefixed by VERSION, sorted in decreasing version order."- (define cache- (load-package-cache (current-profile)))-- (if (and (cache-is-authoritative?) cache)- (match (cache-lookup cache name)- (#f #f)+ (if (and (cache-is-authoritative?)+ (current-profile))+ (match (cache-lookup (current-profile) name) ((#(_ versions modules symbols _ _ _ _ _ _) ...) (fold (lambda (version* module symbol result) (if (or (not version)@@ -331,12 +338,9 @@ decreasing version order." (define* (find-package-locations name #:optional version) "Return a list of version/location pairs corresponding to each package matching NAME and VERSION."- (define cache- (load-package-cache (current-profile)))-- (if (and cache (cache-is-authoritative?))- (match (cache-lookup cache name)- (#f '())+ (if (and (cache-is-authoritative?)+ (current-profile))+ (match (cache-lookup (current-profile) name) ((#(name versions modules symbols outputs supported? deprecated? files lines columns) ...)@@ -372,6 +376,33 @@ VERSION." ;; Prevent Guile 3 from inlining this procedure so we can mock it in tests. (set! find-best-packages-by-name find-best-packages-by-name) +;; (generate-package-cache "/tmp/test")++;; XXX: missing in guile-sqlite3@0.1.0+(define SQLITE_BUSY 5)++(define (call-with-transaction db proc)+ "Start a transaction with DB (make as many attempts as necessary) and run+PROC. If PROC exits abnormally, abort the transaction, otherwise commit the+transaction after it finishes."+ (catch 'sqlite-error+ (lambda ()+ ;; We use begin immediate here so that if we need to retry, we+ ;; figure that out immediately rather than because some SQLITE_BUSY+ ;; exception gets thrown partway through PROC - in which case the+ ;; part already executed (which may contain side-effects!) would be+ ;; executed again for every retry.+ (sqlite-exec db "begin immediate;")+ (let ((result (proc)))+ (sqlite-exec db "commit;")+ result))+ (lambda (key who error description)+ (if (= error SQLITE_BUSY)+ (call-with-transaction db proc)+ (begin+ (sqlite-exec db "rollback;")+ (throw 'sqlite-error who error description))))))+ (define (generate-package-cache directory) "Generate under DIRECTORY a cache of all the available packages. @@ -381,49 +412,84 @@ reducing the memory footprint." (define cache-file (string-append directory %package-cache-file)) - (define (expand-cache module symbol variable result+seen)+ (define schema+ "CREATE TABLE packages (name text,+version text,+module text,+symbol text,+outputs text,+supported int,+superseded int,+locationFile text,+locationLine int,+locationColumn int);+CREATE VIRTUAL TABLE packageSearch USING fts5(name, searchText);")++ (define insert-statement+ "INSERT INTO packages(name, version, module, symbol, outputs, supported, superseded, locationFile, locationLine, locationColumn)+VALUES(:name, :version, :module, :symbol, :outputs, :supported, :superseded, :locationfile, :locationline, :locationcolumn)")++ (define insert-package-search-statement+ "INSERT INTO packageSearch(name, searchText) VALUES(:name, :searchtext)")++ (define (boolean->int x)+ (if x 1 0))++ (define (list->string x)+ (call-with-output-string (cut write x <>)))++ (define (insert-package db module symbol variable seen) (match (false-if-exception (variable-ref variable)) ((? package? package)- (match result+seen- ((result . seen)- (if (or (vhash-assq package seen)- (hidden-package? package))- (cons result seen)- (cons (cons `#(,(package-name package)- ,(package-version package)- ,(module-name module)- ,symbol- ,(package-outputs package)- ,(->bool (supported-package? package))- ,(->bool (package-superseded package))- ,@(let ((loc (package-location package)))- (if loc- `(,(location-file loc)- ,(location-line loc)- ,(location-column loc))- '(#f #f #f))))- result)- (vhash-consq package #t seen))))))- (_- result+seen)))-- (define exp- (first- (fold-module-public-variables* expand-cache- (cons '() vlist-null)- (all-modules (%package-module-path)- #:warn- warn-about-load-error))))+ (cond+ ((or (vhash-assq package seen)+ (hidden-package? package))+ seen)+ (else+ (let ((statement (sqlite-prepare db insert-statement)))+ (sqlite-bind-arguments statement+ #:name (package-name package)+ #:version (package-version package)+ #:module (list->string (module-name module))+ #:symbol (symbol->string symbol)+ #:outputs (list->string (package-outputs package))+ #:supported (boolean->int (supported-package? package))+ #:superseded (boolean->int (package-superseded package))+ #:locationfile (cond+ ((package-location package) => location-file)+ (else #f))+ #:locationline (cond+ ((package-location package) => location-line)+ (else #f))+ #:locationcolumn (cond+ ((package-location package) => location-column)+ (else #f)))+ (sqlite-fold cons '() statement)+ (sqlite-finalize statement))+ (let ((statement (sqlite-prepare db insert-package-search-statement)))+ (sqlite-bind-arguments statement+ #:name (package-name package)+ #:searchtext (package-description package))+ (sqlite-fold cons '() statement)+ (sqlite-finalize statement))+ (vhash-consq package #t seen))))+ (_ seen))) (mkdir-p (dirname cache-file))- (call-with-output-file cache-file- (lambda (port)- ;; Store the cache as a '.go' file. This makes loading fast and reduces- ;; heap usage since some of the static data is directly mmapped.- (put-bytevector port- (compile `'(,@exp)- #:to 'bytecode- #:opts '(#:to-file? #t)))))+ (let ((tmp (string-append (dirname cache-file) "/tmp")))+ (mkdir-p tmp)+ (setenv "SQLITE_TMPDIR" tmp))+ (let ((db (sqlite-open cache-file)))+ (sqlite-exec db schema)+ (call-with-transaction db+ (lambda ()+ (fold-module-public-variables* (cut insert-package db <> <> <> <>)+ vlist-null+ (all-modules (%package-module-path)+ #:warn+ warn-about-load-error))))+ (sqlite-close db))+ cache-file) -- 2.23.0
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl47cr4ACgkQLiXui2GAK7OLPwf9FgQ/N8AazOSXRwuXn8N+1QZ0b0G1bq2ZpQYppHpBl0e8SgmTyYXpE4zaU2tBfJEMTUXW7YUZD/36DKNpN5JUD1c4Hnsn/BDqRQFFXt4agzTE3PhtOV2gFLjCwYH/i9xtLoRM+7g36/ZYAmTfAZoiFro4A3kq1ndy8nQCcf++v/v3tQvFVWn3ywKTFig63/wm8hXz7kP0+fi6MVKlhKe05VDBXxZCNfM7tSjmEw2qFNmwALUncWUyN9SzuF0aU+/JzPUF6kvXMZR9cQxBI5ISZ+gxsdg7rikmwso9jvyxRbEDFAck4Etv9j32drcNfNH4ZSUaZY/NcxIpzu/yI9dKKA===M4NZ-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 11 Feb 17:29 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
8736bhytn9.fsf@gnu.org
Hello Arun!
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (5 lines)> From 4c883fcff1f44339b28df6ccdb2b10c906439e3d Mon Sep 17 00:00:00 2001> From: Arun Isaac <arunisaac@systemreboot.net>> Date: Tue, 21 Jan 2020 20:45:43 +0530> Subject: [PATCH] fast search
[...]
Toggle quote (40 lines)> --- a/gnu/packages.scm> +++ b/gnu/packages.scm> @@ -43,6 +43,7 @@> #:use-module (srfi srfi-34)> #:use-module (srfi srfi-35)> #:use-module (srfi srfi-39)> + #:use-module (sqlite3)> #:export (search-patch> search-patches> search-auxiliary-file> @@ -204,10 +205,8 @@ PROC is called along these lines:> PROC can use #:allow-other-keys to ignore the bits it's not interested in.> When a package cache is available, this procedure does not actually load any> package module."> - (define cache> - (load-package-cache (current-profile)))> -> - (if (and cache (cache-is-authoritative?))> + (if (and (cache-is-authoritative?)> + (current-profile))> (vhash-fold (lambda (name vector result)> (match vector> (#(name version module symbol outputs> @@ -220,7 +219,7 @@ package module."> #:supported? supported?> #:deprecated? deprecated?))))> init> - cache)> + (cache-lookup (current-profile)))> (fold-packages (lambda (package result)> (proc (package-name package)> (package-version package)> @@ -252,31 +251,7 @@ is guaranteed to never traverse the same package twice."> > (define %package-cache-file> ;; Location of the package cache.> - "/lib/guix/package.cache")> -> -(define load-package-cache
[...]
Toggle quote (21 lines)> +(define* (cache-lookup profile #:optional name)> "Lookup package NAME in CACHE. Return a list sorted in increasing version> order."> (define (package-version<? v1 v2)> (version>? (vector-ref v2 1) (vector-ref v1 1)))> > - (sort (vhash-fold* cons '() name cache)> - package-version<?))> + (define (int->boolean n)> + (case n> + ((0) #f)> + ((1) #t)))> +> + (define (string->list str)> + (call-with-input-string str read))> +> + (define select-statement> + (string-append> + "SELECT name, version, module, symbol, outputs, supported, superseded, locationFile, locationLine, locationColumn from packages"> + (if name " WHERE name = :name" "")))
I would rather keep the current package cache as-is instead of insertingsqlite in here. I don’t expect it to bring much comparedperformance-wise to the current simple cache (especially if we look atload time), and it does increase complexity quite a bit.
However, using sqlite for keyword search as you initially proposed onguix-devel does sound like a great idea to me.
WDYT?
Thanks,Ludo’.
Z
Z
zimoun wrote on 11 Feb 19:21 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ2EF5244EN3XcO-U9Uz8me42byuPMeUGeGQ_mWi6r864A@mail.gmail.com
Hi Ludo,
On Tue, 11 Feb 2020 at 17:29, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (5 lines)> I would rather keep the current package cache as-is instead of inserting> sqlite in here. I don’t expect it to bring much compared> performance-wise to the current simple cache (especially if we look at> load time), and it does increase complexity quite a bit.
Complexity is about taste. ;-)About performance, the idea was to first implement something withsqlite and then see if it makes the difference. I mean I haveunderstood that.
Toggle quote (3 lines)> However, using sqlite for keyword search as you initially proposed on> guix-devel does sound like a great idea to me.
If I understand correctly, you are proposing 2 caches, right?Or are you proposing an inverted index (VHash/VList table) based ontrigrams to speed up the lookup?
Cheers,simon
L
L
Ludovic Courtès wrote on 11 Feb 19:39 +0100
(name . zimoun)(address . zimon.toutoune@gmail.com)
87sgjhx92g.fsf@gnu.org
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (9 lines)> On Tue, 11 Feb 2020 at 17:29, Ludovic Courtès <ludo@gnu.org> wrote:>>> I would rather keep the current package cache as-is instead of inserting>> sqlite in here. I don’t expect it to bring much compared>> performance-wise to the current simple cache (especially if we look at>> load time), and it does increase complexity quite a bit.>> Complexity is about taste. ;-)
It’s measurable to some extent (lines of code, cyclomatic complexity,etc.)
Toggle quote (4 lines)> About performance, the idea was to first implement something with> sqlite and then see if it makes the difference. I mean I have> understood that.
Yes. But keep in mind that this package cache is used exclusively forpackage lookups by name. Namely, the goal is to speed package lookup inoperations like “guix install foo” (mapping “foo” to the right <package>in the right module without walking through all the modules) and “guixpackage -A” (which is what the shell completion hooks use).
Currently “guix package -A” runs in .5s on my laptop, and I suspect it’sgoing to be hard to do better just by touching the cache.
Toggle quote (7 lines)>> However, using sqlite for keyword search as you initially proposed on>> guix-devel does sound like a great idea to me.>> If I understand correctly, you are proposing 2 caches, right?> Or are you proposing an inverted index (VHash/VList table) based on> trigrams to speed up the lookup?
Arun started the discussion on guix-devel with the idea of an invertedindex, and I thought this would become a second index (possiblyimplemented using SQLite). Perhaps I misunderstood the discussion allalong though, let me know! :-)
Thanks,Ludo’.
A
A
Arun Isaac wrote on 11 Feb 20:07 +0100
(address . 39258@debbugs.gnu.org)
cu7zhdplz84.fsf@systemreboot.net
Toggle quote (10 lines)>>> I would rather keep the current package cache as-is instead of inserting>>> sqlite in here. I don’t expect it to bring much compared>>> performance-wise to the current simple cache (especially if we look at>>> load time), and it does increase complexity quite a bit.>>>> Complexity is about taste. ;-)>> It’s measurable to some extent (lines of code, cyclomatic complexity,> etc.)
I agree with Ludo here. I think it does increase the complexity, andprobably unnecessarily so.
Toggle quote (5 lines)> Arun started the discussion on guix-devel with the idea of an inverted> index, and I thought this would become a second index (possibly> implemented using SQLite). Perhaps I misunderstood the discussion all> along though, let me know! :-)
No, you didn't misunderstand. That's where it began. But, whileimplementing it, I thought I might as well replace the existing cache.
Also, I've started working on guile-xapian bindings. With that, it seemssimpler to keep the current package cache and add a xapian index only tospeed up package search.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5C+2sACgkQLiXui2GAK7O3DQgAnz3GCW1FVXQowVB/RqkTb4meDytN8r/LXnA06VVUHhjPuMMSV6d2xPj8B9p0lagCeCYN8ivFvBq0X5qFwT4gBtmdXWkn2VmU4nBvUAw+uFd9d5S8bFjNeXe4IGnmWOBuokF6ttYRkGH9Z4wStz49a/9CJcVW4jpCKyjEyodM/PJ9bVOiOTFgqpFATTQhMSQM1GuJO8OKhctsyZ8xpNejTW/LFPe2CgoIEQavdlMnzgL7Nl3d5NOa2rS8VyIamqbofxI9Uj9QBXjn8HxIV5FaA2On/XmWXF/uCr44Fw1xGLyy/AFlReE9KgJEjP/TUwhXMr/s9/e8k0bEDQMESnEfFw===L3bY-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 11 Feb 21:13 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ0sXg7xqFLahk+Ej_aE8bR8GmUMX9DFaRjgNV417xvAsw@mail.gmail.com
Hi Ludo,
On Tue, 11 Feb 2020 at 19:39, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (7 lines)> > About performance, the idea was to first implement something with> > sqlite and then see if it makes the difference. I mean I have> > understood that.>> Yes. But keep in mind that this package cache is used exclusively for> package lookups by name. Namely, the goal is to speed package lookup in
I agree that some confusion happens here. And this cache cannot be improved.

Toggle quote (12 lines)> >> However, using sqlite for keyword search as you initially proposed on> >> guix-devel does sound like a great idea to me.> >> > If I understand correctly, you are proposing 2 caches, right?> > Or are you proposing an inverted index (VHash/VList table) based on> > trigrams to speed up the lookup?>> Arun started the discussion on guix-devel with the idea of an inverted> index, and I thought this would become a second index (possibly> implemented using SQLite). Perhaps I misunderstood the discussion all> along though, let me know! :-)
Well, your suggestion is very welcome and 2 caches are required: onefor the lookup by name, as it is already (and does a good job);another one for "guix search" speeds up, SQLite or whatever (based oninverted index or whatever).

Thanks,simon
Z
Z
zimoun wrote on 11 Feb 21:20 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ3nTmv+dD_2m7mtcHXQRsoVcxcqW1bJyRzOZfKWaq0Opw@mail.gmail.com
Hi Arun,
On Tue, 11 Feb 2020 at 20:07, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (8 lines)> > Arun started the discussion on guix-devel with the idea of an inverted> > index, and I thought this would become a second index (possibly> > implemented using SQLite). Perhaps I misunderstood the discussion all> > along though, let me know! :-)>> No, you didn't misunderstand. That's where it began. But, while> implementing it, I thought I might as well replace the existing cache.
An inverted index backed by Guile as you did on guix-devel or backedby SQLite seems a good improvement for "guix search".
Toggle quote (4 lines)> Also, I've started working on guile-xapian bindings. With that, it seems> simpler to keep the current package cache and add a xapian index only to> speed up package search.
Xapian would be cool!And an SQLite based index seems easier to index locally the packagesand their history. The Guix Data Service is already doing such thingbut AFAIK using PostgreSQL and some magic. :-)
http://data.guix.gnu.org/repository/1/branch/master/package/git

All the best,simon
A
A
Arun Isaac wrote on 15 Feb 15:50 +0100
cu7h7zrucpe.fsf@systemreboot.net
Toggle quote (4 lines)> Also, I've started working on guile-xapian bindings. With that, it seems> simpler to keep the current package cache and add a xapian index only to> speed up package search.
I have published the first version of guile-xapian! Feedback iswelcome. :-)
https://git.systemreboot.net/guile-xapian/about/
I will now move on to building a Xapian index for Guix's package search.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5IBS0ACgkQLiXui2GAK7NmlQf/YWcix4M3tpDfOfHAyQiMGkgM++FG9f9tLdaDd3NJxcwnSzmiLPbKXwzJc1UuAIq8LVyFqDvzp23ZWI2+61HrIFwpEO0ClZhh5/l1n1gHQaGNjxMthgD0GzMVzNenJISebJW2tlbV2+RTpaVKd6DqcN/Ww7hxZbPdETl72Y9gxuIx6Z5xUsqLrtTcQiSQo7My6aIG3dPJdnhMQUESga+2Sr2Cns8Z3j+NTKJQmpxipG4JDTgbyJ98tIH1lT8V8opTbJ4kncQUG8FfheTWnoemWymc76uHLuR1XSQqNTvrlGB4goNKs1nBK0y7T8ST01zuEQdDQcDR0z7OdCpoqA4fdw===i8Qu-----END PGP SIGNATURE-----
A
A
Arun Isaac wrote on 27 Feb 21:41 +0100
[PATCH 0/4] Xapian for Guix package search
(address . 39258@debbugs.gnu.org)
20200227204150.30985-1-arunisaac@systemreboot.net
Hi,
I have finally got xapian working for package search. Some comments follow.
* Speed improvement
Despite search-package-index in gnu/packages.scm taking only around 1.5ms, Isee an overall speedup in `guix search` of only a factor of 2 -- from around2s to around 1s. I wonder what else in `guix search` is taking up so muchtime.
* Currently indexing only the package descriptions
In this patchset, I have only indexed the package descriptions. In the nextversion of this patchset, I will index all other terms as specified in%package-metrics of guix/ui.scm.
* Should I add guile-xapian as a propagated input to guix in gnu/packages/package-management.scm?
* Drop regexp search support
In this patchset, I have retained the older regexp search support. But, Ithink we should drop it and only have xapian search. In cases where the searchindex is not authoritative, we can build an in-memory xapian search index onthe fly and use it to search. This will slow down the search, but will ensureour search results are consistent and do not depend on the authoritativenessof the search index.
* Commit messages
Except for patch 1, I am not sure what prefixes (build-self, gnu, etc.) to usein the first line of the commit message. Some advice there would be helpful.
Regards,Arun.
Arun Isaac (4): gnu: Add guile-xapian. build-self: Add guile-xapian to Guix dependencies. gnu: Generate xapian package search index. gnu: Use xapian index for package search.
build-aux/build-self.scm | 11 ++++++++ gnu/packages.scm | 44 ++++++++++++++++++++++++++++- gnu/packages/guile-xyz.scm | 50 ++++++++++++++++++++++++++++++++- guix/channels.scm | 34 ++++++++++++++++++++++- guix/scripts/package.scm | 57 ++++++++++++++++++++++---------------- guix/self.scm | 7 ++++- 6 files changed, 175 insertions(+), 28 deletions(-)
-- 2.23.0
A
A
Arun Isaac wrote on 27 Feb 21:41 +0100
[PATCH 3/4] gnu: Generate xapian package search index.
(address . 39258@debbugs.gnu.org)
20200227204150.30985-4-arunisaac@systemreboot.net
* gnu/packages.scm (%package-search-index): New variable.(generate-package-search-index): New function.* guix/channels.scm (package-search-index): New function.(%channel-profile-hooks): Add package-search-index.--- gnu/packages.scm | 29 ++++++++++++++++++++++++++++- guix/channels.scm | 34 +++++++++++++++++++++++++++++++++- 2 files changed, 61 insertions(+), 2 deletions(-)
Toggle diff (117 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex d22c992bb1..e91753e2a8 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -4,6 +4,7 @@ ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2016, 2017 Alex Kost <alezost@gmail.com> ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -43,6 +44,7 @@ #:use-module (srfi srfi-34) #:use-module (srfi srfi-35) #:use-module (srfi srfi-39)+ #:use-module (xapian xapian) #:export (search-patch search-patches search-auxiliary-file@@ -64,7 +66,8 @@ specification->location specifications->manifest - generate-package-cache))+ generate-package-cache+ generate-package-search-index)) ;;; Commentary: ;;;@@ -426,6 +429,30 @@ reducing the memory footprint." #:opts '(#:to-file? #t))))) cache-file) +(define %package-search-index+ ;; Location of the package search-index+ "/lib/guix/package-search.index")++(define (generate-package-search-index directory)+ "Generate under DIRECTORY a xapian index of all the available packages."+ (define db-path+ (string-append directory %package-search-index))++ (mkdir-p (dirname db-path))+ (call-with-writable-database db-path+ (lambda (db)+ (fold-packages (lambda (package _)+ (let* ((idterm (string-append "Q" (package-name package)))+ (doc (make-document #:data (package-name package)+ #:terms `((,idterm . 0))))+ (term-generator (make-term-generator #:stem (make-stem "en")+ #:document doc)))+ (index-text! term-generator (package-description package))+ (replace-document! db idterm doc)))+ #f)))++ db-path)+ (define %sigint-prompt ;; The prompt to jump to upon SIGINT.diff --git a/guix/channels.scm b/guix/channels.scmindex f0261dc2da..c70c70938c 100644--- a/guix/channels.scm+++ b/guix/channels.scm@@ -2,6 +2,7 @@ ;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2018 Ricardo Wurmus <rekado@elephly.net> ;;; Copyright © 2019 Jan (janneke) Nieuwenhuizen <janneke@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -581,9 +582,40 @@ be used as a profile hook." (hook . package-cache)) #:local-build? #t))) +(define (package-search-index manifest)+ "Build a package search index for the instance in MANIFEST. This is meant+to be used as a profile hook."+ (mlet %store-monad ((profile (profile-derivation manifest+ #:hooks '())))++ (define build+ #~(begin+ (use-modules (gnu packages))++ (if (defined? 'generate-package-search-index)+ (begin+ ;; Delegate package search index generation to the inferior.+ (format (current-error-port)+ "Generating package search index for '~a'...~%"+ #$profile)+ (generate-package-search-index #$output))+ (mkdir #$output))))++ (gexp->derivation-in-inferior "guix-package-search-index" build+ profile++ ;; If the Guix in PROFILE is too old and+ ;; lacks 'guix repl', don't build the cache+ ;; instead of failing.+ #:silent-failure? #t++ #:properties '((type . profile-hook)+ (hook . package-search-index))+ #:local-build? #t)))+ (define %channel-profile-hooks ;; The default channel profile hooks.- (cons package-cache-file %default-profile-hooks))+ (cons* package-cache-file package-search-index %default-profile-hooks)) (define (channel-instances->derivation instances) "Return the derivation of the profile containing INSTANCES, a list of-- 2.23.0
A
A
Arun Isaac wrote on 27 Feb 21:41 +0100
[PATCH 4/4] gnu: Use xapian index for package search.
(address . 39258@debbugs.gnu.org)
20200227204150.30985-5-arunisaac@systemreboot.net
* gnu/packages.scm (search-package-index): New function.* guix/scripts/package.scm (find-packages-by-description): Search using thexapian package index if search patterns are literal strings. Else, searchusing fold-packages.--- gnu/packages.scm | 17 +++++++++++- guix/scripts/package.scm | 57 +++++++++++++++++++++++----------------- 2 files changed, 49 insertions(+), 25 deletions(-)
Toggle diff (122 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex e91753e2a8..5b5b29bf84 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -67,7 +67,8 @@ specifications->manifest generate-package-cache- generate-package-search-index))+ generate-package-search-index+ search-package-index)) ;;; Commentary: ;;;@@ -453,6 +454,20 @@ reducing the memory footprint." db-path) +(define (search-package-index profile querystring)+ (let ((offset 0)+ (pagesize 10))+ (call-with-database (string-append profile %package-search-index)+ (lambda (db)+ (let ((query (parse-query querystring #:stemmer (make-stem "en"))))+ (mset-fold (lambda (item result)+ (match (find-packages-by-name+ (document-data (mset-item-document item)))+ ((package _ ...)+ (append result `((,package . ,(mset-item-weight item)))))))+ '()+ (enquire-mset (enquire db query) offset pagesize)))))))+ (define %sigint-prompt ;; The prompt to jump to upon SIGINT.diff --git a/guix/scripts/package.scm b/guix/scripts/package.scmindex 1cb0d382bf..6a3b9002dd 100644--- a/guix/scripts/package.scm+++ b/guix/scripts/package.scm@@ -7,6 +7,7 @@ ;;; Copyright © 2016 Benz Schenk <benz.schenk@uzh.ch> ;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -178,31 +179,40 @@ hooks\" run when building the profile." ;;; Package specifications. ;;; -(define (find-packages-by-description regexps)+(define (find-packages-by-description patterns) "Return a list of pairs: packages whose name, synopsis, description, or output matches at least one of REGEXPS sorted by relevance, and its non-zero relevance score."- (let ((matches (fold-packages (lambda (package result)- (if (package-superseded package)- result- (match (package-relevance package- regexps)- ((? zero?)- result)- (score- (cons (cons package score)- result)))))- '())))- (sort matches- (lambda (m1 m2)- (match m1- ((package1 . score1)- (match m2- ((package2 . score2)- (if (= score1 score2)- (string>? (package-full-name package1)- (package-full-name package2))- (> score1 score2))))))))))+ (define (regexp? str)+ (string-any+ (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)+ str))++ (if (and (current-profile)+ (not (any regexp? patterns)))+ (search-package-index (current-profile) (string-join patterns " "))+ (let* ((regexps (map (cut make-regexp* <> regexp/icase) patterns))+ (matches (fold-packages (lambda (package result)+ (if (package-superseded package)+ result+ (match (package-relevance package+ regexps)+ ((? zero?)+ result)+ (score+ (cons (cons package score)+ result)))))+ '())))+ (sort matches+ (lambda (m1 m2)+ (match m1+ ((package1 . score1)+ (match m2+ ((package2 . score2)+ (if (= score1 score2)+ (string>? (package-full-name package1)+ (package-full-name package2))+ (> score1 score2))))))))))) (define (transaction-upgrade-entry store entry transaction) "Return a variant of TRANSACTION that accounts for the upgrade of ENTRY, a@@ -777,8 +787,7 @@ processed, #f otherwise." (('query 'search rx) rx) (_ #f)) opts))- (regexps (map (cut make-regexp* <> regexp/icase) patterns))- (matches (find-packages-by-description regexps)))+ (matches (find-packages-by-description patterns))) (leave-on-EPIPE (display-search-results matches (current-output-port))) #t))-- 2.23.0
A
A
Arun Isaac wrote on 27 Feb 21:41 +0100
[PATCH 1/4] gnu: Add guile-xapian.
(address . 39258@debbugs.gnu.org)
20200227204150.30985-2-arunisaac@systemreboot.net
* gnu/packages/guile-xyz.scm (guile-xapian, guile3.0-xapian): New variables.--- gnu/packages/guile-xyz.scm | 50 +++++++++++++++++++++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-)
Toggle diff (76 lines)diff --git a/gnu/packages/guile-xyz.scm b/gnu/packages/guile-xyz.scmindex 37a5198e4e..75aba83593 100644--- a/gnu/packages/guile-xyz.scm+++ b/gnu/packages/guile-xyz.scm@@ -17,7 +17,7 @@ ;;; Copyright © 2017 ng0 <ng0@n0.is> ;;; Copyright © 2017, 2018 Tobias Geerinckx-Rice <me@tobias.gr> ;;; Copyright © 2018 Maxim Cournoyer <maxim.cournoyer@gmail.com>-;;; Copyright © 2018, 2019 Arun Isaac <arunisaac@systemreboot.net>+;;; Copyright © 2018, 2019, 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; Copyright © 2018 Pierre-Antoine Rouby <pierre-antoine.rouby@inria.fr> ;;; Copyright © 2018 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2019 swedebugia <swedebugia@riseup.net>@@ -80,8 +80,10 @@ #:use-module (gnu packages python) #:use-module (gnu packages readline) #:use-module (gnu packages sdl)+ #:use-module (gnu packages search) #:use-module (gnu packages slang) #:use-module (gnu packages sqlite)+ #:use-module (gnu packages swig) #:use-module (gnu packages tex) #:use-module (gnu packages texinfo) #:use-module (gnu packages tls)@@ -3109,3 +3111,49 @@ currently a re-implementation of the lentes library for Clojure. Lenses provide composable procedures, which can be used to focus, apply functions over, or update a value in arbitrary data structures.") (license license:gpl3+))))++(define-public guile-xapian+ (let ((commit "bfad1b0e2a88bfe1d4c100046da0d585b96d2a73")+ (revision "1"))+ (package+ (name "guile-xapian")+ (version (git-version "0.1.0" revision commit))+ (home-page "https://git.systemreboot.net/guile-xapian")+ (source+ (origin+ (method git-fetch)+ (uri (git-reference (url home-page)+ (commit commit)))+ (file-name (git-file-name name version))+ (sha256+ (base32+ "1nrs23abb0lxx7gw14jw5k8jgbma0gi21gzahw0jgv6b25d9jdwp"))))+ (build-system gnu-build-system)+ (arguments+ '(#:make-flags '("GUILE_AUTO_COMPILE=0"))) ; to prevent guild warnings+ (inputs+ `(("guile" ,guile-2.2)+ ("xapian" ,xapian)+ ("zlib" ,zlib)))+ (native-inputs+ `(("autoconf" ,autoconf)+ ("autoconf-archive" ,autoconf-archive)+ ("automake" ,automake)+ ("libtool" ,libtool)+ ("pkg-config" ,pkg-config)+ ("swig" ,swig)))+ (synopsis "Guile bindings for Xapian")+ (description "@code{guile-xapian} provides Guile bindings for Xapian, a+search engine library. Xapian is a highly adaptable toolkit which allows+developers to easily add advanced indexing and search facilities to their own+applications. It has built-in support for several families of weighting+models and also supports a rich set of boolean query operators.")+ (license license:gpl2+))))++(define-public guile3.0-xapian+ (package+ (inherit guile-xapian)+ (name "guile3.0-xapian")+ (inputs+ `(("guile" ,guile-next)+ ,@(alist-delete "guile" (package-inputs guile-xapian))))))-- 2.23.0
A
A
Arun Isaac wrote on 27 Feb 21:41 +0100
[PATCH 2/4] build-self: Add guile-xapian to Guix dependencies.
(address . 39258@debbugs.gnu.org)
20200227204150.30985-3-arunisaac@systemreboot.net
* build-aux/build-self.scm (build-program): Import fake guile-xapian module.* guix/self.scm (compiled-guix): Add guile-xapian to Guix dependencies.--- build-aux/build-self.scm | 11 +++++++++++ guix/self.scm | 7 ++++++- 2 files changed, 17 insertions(+), 1 deletion(-)
Toggle diff (75 lines)diff --git a/build-aux/build-self.scm b/build-aux/build-self.scmindex f2e785b7f1..05d0353ccf 100644--- a/build-aux/build-self.scm+++ b/build-aux/build-self.scm@@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -261,6 +262,10 @@ interface (FFI) of Guile.") #~(define-module (gcrypt hash) #:export (sha1 sha256)))) + (define fake-xapian-hash+ ;; Fake (xapian xapian) module; see below.+ (scheme-file "xapian.scm" #~(define-module (xapian xapian))))+ (define fake-git (scheme-file "git.scm" #~(define-module (git)))) @@ -273,6 +278,12 @@ interface (FFI) of Guile.") ;; adjust %LOAD-PATH later on. ((gcrypt hash) => ,fake-gcrypt-hash) + ;; To avoid relying on 'with-extensions', which was+ ;; introduced in 0.15.0, provide a fake (xapian+ ;; xapian) just so that we can build modules, and+ ;; adjust %LOAD-PATH later on.+ ((xapian xapian) => ,fake-xapian-hash)+ ;; (guix git-download) depends on (git) but only ;; for peripheral functionality. Provide a dummy ;; (git) to placate it.diff --git a/guix/self.scm b/guix/self.scmindex 6b633f9bc0..a4f40574d1 100644--- a/guix/self.scm+++ b/guix/self.scm@@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -54,6 +55,7 @@ ("guile-git" (ref '(gnu packages guile) 'guile3.0-git)) ("guile-sqlite3" (ref '(gnu packages guile) 'guile3.0-sqlite3)) ("guile-gcrypt" (ref '(gnu packages gnupg) 'guile3.0-gcrypt))+ ("guile-xapian" (ref '(gnu packages guile-xyz) 'guile3.0-xapian)) ("gnutls" (ref '(gnu packages tls) 'guile3.0-gnutls)) ("zlib" (ref '(gnu packages compression) 'zlib)) ("lzlib" (ref '(gnu packages compression) 'lzlib))@@ -682,6 +684,9 @@ Info manual." (define guile-gcrypt (specification->package "guile-gcrypt")) + (define guile-xapian+ (specification->package "guile-xapian"))+ (define gnutls (specification->package "gnutls")) @@ -690,7 +695,7 @@ Info manual." (cons (list "x" package) (package-transitive-propagated-inputs package))) (list guile-gcrypt gnutls guile-git guile-json- guile-ssh guile-sqlite3))+ guile-ssh guile-sqlite3 guile-xapian)) (((labels packages _ ...) ...) packages))) -- 2.23.0
P
P
Pierre Neidhardt wrote on 28 Feb 09:04 +0100
Re: [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87k1475e94.fsf@ambrevar.xyz
Arun Isaac <arunisaac@systemreboot.net> writes:
Toggle quote (17 lines)> +(define (generate-package-search-index directory)> + "Generate under DIRECTORY a xapian index of all the available packages."> + (define db-path> + (string-append directory %package-search-index))> +> + (mkdir-p (dirname db-path))> + (call-with-writable-database db-path> + (lambda (db)> + (fold-packages (lambda (package _)> + (let* ((idterm (string-append "Q" (package-name package)))> + (doc (make-document #:data (package-name package)> + #:terms `((,idterm . 0))))> + (term-generator (make-term-generator #:stem (make-stem "en")> + #:document doc)))> + (index-text! term-generator (package-description package))> + (replace-document! db idterm doc)))
I guess these non-functional functions (index-text!, replace-document!)represent how Xapian works at the C++ level. Would it be possible tomake more functional bindings nonetheless?
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl5YyZcACgkQm9z0l6S7zH9EkAf/eCddrqNRJXNrLpUYJ2kOAuqshemvMAF4v0WSF3enTf7Y59qDF56FS9V7SsvxrUL9CF1LzJAquLLSNUnTEDLcmLb1p6O/ksSq0ZMQ9Yr4MGBeLJU36m1+WGv3n+03tUCFvSLDupAlyBxNxCX4QBmzGd79shu3X5NZLVp06G842chKFB6GXlQyFRhhiJLDq1kynhsW6eZIPONvHqQxTzEfb8Wh5wbNbzPvvN8fIWKPoBpP5yxXHLDGC2vMVcodopQOoXebt1XAh4WIz4Tr3KaF3/B5m9OunrdLmLj3ETZQwTbifkObhnPG7tOpNI1rOx5CP47ckZtbdK9HnfvSkvO0Tw===UV3T-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 28 Feb 09:11 +0100
Re: [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87h7zb5dxo.fsf@ambrevar.xyz
Arun Isaac <arunisaac@systemreboot.net> writes:
Toggle quote (6 lines)> @@ -453,6 +454,20 @@ reducing the memory footprint."> > db-path)> > +(define (search-package-index profile querystring)
Maybe `query-string'?
Toggle quote (21 lines)> > --- a/guix/scripts/package.scm> +++ b/guix/scripts/package.scm> @@ -7,6 +7,7 @@> ;;; Copyright © 2016 Benz Schenk <benz.schenk@uzh.ch>> ;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com>> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr>> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>> ;;;> ;;; This file is part of GNU Guix.> ;;;> @@ -178,31 +179,40 @@ hooks\" run when building the profile."> ;;; Package specifications.> ;;;> > -(define (find-packages-by-description regexps)> +(define (find-packages-by-description patterns)> "Return a list of pairs: packages whose name, synopsis, description,> or output matches at least one of REGEXPS sorted by relevance, and its> non-zero relevance score."
Need to update the docstring.
Toggle quote (29 lines)> - (let ((matches (fold-packages (lambda (package result)> - (if (package-superseded package)> - result> - (match (package-relevance package> - regexps)> - ((? zero?)> - result)> - (score> - (cons (cons package score)> - result)))))> - '())))> - (sort matches> - (lambda (m1 m2)> - (match m1> - ((package1 . score1)> - (match m2> - ((package2 . score2)> - (if (= score1 score2)> - (string>? (package-full-name package1)> - (package-full-name package2))> - (> score1 score2))))))))))> + (define (regexp? str)> + (string-any> + (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)> + str))> +> + (if (and (current-profile)> + (not (any regexp? patterns)))
I would not put characters like ".", "$", or "+" here, lest we mistake aXapian pattern for a regexp.
As you said, I don't think both are compatible without ambiguityanyways, so we should probably drop regexp (or at least toggle them witha command line argument).

-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl5YyzMACgkQm9z0l6S7zH8myAf/dd2HZVdNaDmPZ/rEzf9eveD0wrhxS4pEwO8lJyBnobVHXfzcpDQ5ZjRxu/ZWqMIjbmjz8VFAPvQGMDIxQubxoDXii5ps94ZQNgitlhJfb4qq8REHC5rhuZHYbGxq4qTGKQYCXC3Yakg/uhRlQH4PhYvhVZDgWreJ2ay19JQV4fnfDeshdCq/oAUWIGti/XiBt50KWOBmRIctI3hYhEdA1mISQqh4RoPA9xKEQvnWSwS5hs1OZwcRCzGimL3tgY7OKOmlojVVDCVo/r8Q9oXKlNiy+/sSilGCvO5AbsyFMsE3DqFYbQdFKpZqwFyI6XkXFHHmiKSVgYLct89Qc5Lgiw===tOlQ-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 28 Feb 09:13 +0100
Re: [PATCH 0/4] Xapian for Guix package search
87eeuf5dty.fsf@ambrevar.xyz
Fantastic, thank you so much for this great feature!
I can't build your patch though:
Toggle snippet (13 lines)[ 30%] LOAD guix/scripts/package.scm;;; note: source file ./guix/scripts/package.scm;;; newer than compiled /home/ambrevar/projects/guix/guix/scripts/package.go;;; note: source file ./guix/scripts/package.scm;;; newer than compiled /home/ambrevar/projects/guix/guix/scripts/package.go;;; note: source file ./gnu/packages.scm;;; newer than compiled /home/ambrevar/projects/guix/gnu/packages.go;;; note: source file ./gnu/packages.scm;;; newer than compiled /home/ambrevar/projects/guix/gnu/packages.goerror: failed to load 'gnu/packages.scm':ice-9/eval.scm:293:34: no code for module (xapian xapian)
Beside this issue, how do you test it? I guess we first need to installa bunch of package with `pre-inst-env guix ...` then to a `pre-inst-env search`?
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl5Yy7kACgkQm9z0l6S7zH8QWwf+IGDJgU0kdoe0i5VpNxWdDDL7J+MOttmJhSsJEDS1jMG6XTEPC8UPoTtMftuf5t5LB+KEispRXIsg5mIUwd2qGvP9N5dbEmKqRrnpLyD6T8qalH9neYAN+++FJed8csUTyrMD7BlseFOEFzUrQCiPabHK/+hOWfP5rAsecwr07NYt5yZPLgqgqLQUHMMJGjGzG8Zd2FEZGiRFhINlRh+WD8rMCyLmIsRNjXQDY153LPzC1KqCNHE2fIIs5Naxy2dejwD0hos5Nn7EF9F8GiecnKKnludcQmFa1oCE7mZrGL0JPqrJZa+ZhpZEN9kbwFiswZVu+avqGKi/FTxdZdF2KA===XIu9-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 28 Feb 13:36 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ1cda2ypFfo550C=DvvsqagYsweKi1xH-8kbQSBGYXVFA@mail.gmail.com
Hi Arun,
Really cool! Thank you!

On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (7 lines)> * Speed improvement>> Despite search-package-index in gnu/packages.scm taking only around 1.5ms, I> see an overall speedup in `guix search` of only a factor of 2 -- from around> 2s to around 1s. I wonder what else in `guix search` is taking up so much> time.
Interesting... maybe an hidden 'fold-packages'?Well, I have not yet looked into your code.

Toggle quote (6 lines)> * Currently indexing only the package descriptions>> In this patchset, I have only indexed the package descriptions. In the next> version of this patchset, I will index all other terms as specified in> %package-metrics of guix/ui.scm.
Yes, it appears to me a detail that should be easy to fix. I mean, itdoes not seems blocking.

Toggle quote (3 lines)> * Should I add guile-xapian as a propagated input to guix in> gnu/packages/package-management.scm?
IMHO, yes.I mean, I guess. :-)

Toggle quote (9 lines)> * Drop regexp search support>> In this patchset, I have retained the older regexp search support. But, I> think we should drop it and only have xapian search. In cases where the search> index is not authoritative, we can build an in-memory xapian search index on> the fly and use it to search. This will slow down the search, but will ensure> our search results are consistent and do not depend on the authoritativeness> of the search index.
I understand why you have turned off the regexp support. It is notnecessary at the first experimentation to see if it is worth theaddition or not.So, before investigating how some better regexp could be used withXapian, let start to benchmark Xapian vs plain 'fold-packages'.

Toggle quote (5 lines)> * Commit messages>> Except for patch 1, I am not sure what prefixes (build-self, gnu, etc.) to use> in the first line of the commit message. Some advice there would be helpful.
I cannot help. )-:

All the best,simon
Z
Z
zimoun wrote on 28 Feb 13:39 +0100
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
CAJ3okZ1twVzgfxRKv8MyNYY17Davd7QdkVHSp9+jb_9FV35HhQ@mail.gmail.com
Hi Pierre,
On Fri, 28 Feb 2020 at 09:13, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
Toggle quote (3 lines)> Beside this issue, how do you test it? I guess we first need to install> a bunch of package with `pre-inst-env guix ...` then to a `pre-inst-env search`?
It is not searching in the installed packages but in all the packages.So, to test it, you need to "./pre-inst-env guix pull -p" or somethinglike that to populate the Xapian index database. Then "./pre-inst-envguix search" will lookup into.I mean, it is how I understand it should work. I have not yet lookedinto the code.

Cheers,simon
P
P
Pierre Neidhardt wrote on 28 Feb 13:49 +0100
(name . zimoun)(address . zimon.toutoune@gmail.com)
87zhd2ho6c.fsf@ambrevar.xyz
zimoun <zimon.toutoune@gmail.com> writes:
Toggle quote (14 lines)> Hi Pierre,>> On Fri, 28 Feb 2020 at 09:13, Pierre Neidhardt <mail@ambrevar.xyz> wrote:>>> Beside this issue, how do you test it? I guess we first need to install>> a bunch of package with `pre-inst-env guix ...` then to a `pre-inst-env search`?>> It is not searching in the installed packages but in all the packages.> So, to test it, you need to "./pre-inst-env guix pull -p" or something> like that to populate the Xapian index database. Then "./pre-inst-env> guix search" will lookup into.> I mean, it is how I understand it should work. I have not yet looked> into the code.
What I meant with "install a bunch of packages" is "guix pull -p", isyou said. Xapian cacheis populated as a hook of guix pull if I got it correctly.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl5ZDFsACgkQm9z0l6S7zH/zOgf8CNjylikzLFmq6wV4BSdcmPP4sAdqlQSPsea2Gl6pwybKHC6nAWWWC/PnX79HMHiQk9lcywIsTcyW+hqHMQzWIQfFY24zndSAfAfQoRkyfEA8Icukwe0BVqmnbpVVZvaCyUKJ8ZUpoTGJitanqbeSSQHMpKMbg9AvYq9o1ccFTwPar4f696Hm0hAMTgaf2p1XKf3z5XxMStF6ov+nXtM/KjWhBR3FrWvg4tRczmzkRUZItaH9ds6yT6wkuk3kDGscvz2mzfvJaERb2jEaRpkbCnkHJC2HN7O7k/8jvuS+gLUhMw+mQs+ZuK6VxtUrCqzg0KviCBoYNgmEPItz6sRknA===eYIZ-----END PGP SIGNATURE-----
A
A
Arun Isaac wrote on 28 Feb 16:36 +0100
cu7d09ypvu1.fsf@systemreboot.net
Toggle quote (4 lines)> I can't build your patch though:>> ice-9/eval.scm:293:34: no code for module (xapian xapian)
Sorry, I forgot to mention this in my patch cover letter. The aboveerror is happening because of the new guile-xapian dependency. It's alittle tricky to get right at the moment. Here goes.
Drop into a guix development environment.
$ guix environment guix
Commit patch 1 (the patch that adds guile-xapian) alone, and build.
$ git am 0001-gnu-Add-guile-xapian.patch$ make
Then, drop into an environment where guile-xapian is available.
$ ./pre-inst-env guix environment guix --ad-hoc guile-xapian
Apply the other 3 patches and build.
$ git am 0002-build-self-Add-guile-xapian-to-Guix-dependencies.patch 0003-gnu-Generate-xapian-package-search-index.patch 0004-gnu-Use-xapian-index-for-package-search.patch$ make
Now, the build should have completed successfully. Let's do a test guixpull to actually test the new guix search.
$ ./pre-inst-env guix pull -p /tmp/test
Then, run the guix search in /tmp/test.
$ /tmp/test/bin/guix search game
That's it! :-)
This whole process will be simpler if the guile-xapian package is pushedto master and guile-xapian added as an input to the guix package ingnu/packages/package-management.scm. But, for now...
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5ZM5YACgkQLiXui2GAK7OmQwf+M9OBdWM5FdfL+jv9P6BB+KsTqEHEM/UhQGhjgXQVj0TBP9F0L/dIKLlR1fSe8k4ZUgXNJVeMzkt9qUt+Q7TfDzu9WRkQ3lh+dh7d4fPIvYUchmg3eqJ5lvw0jvYhiwCu/PNguvbACpV0FRRHspV6gZLtICfxPTg+3LYVRgIkEknxmekbCcrYQWa10x7FJ0vUkWylzn35Sim0sEeInCaTtJFwlBqRwxSd1+0r8BoYwsj8g8f3YKRlV9jT9rRPNLdrqP88RBXMu19u4uAzHag61irD/k8cSQSl4o1hBLNDX7OLxDdhmN6MON1neRhv5MrzYMVdcSvSlW4VZivTyBrT1Q===GMio-----END PGP SIGNATURE-----
A
A
Arun Isaac wrote on 28 Feb 17:04 +0100
cu74kvapuja.fsf@systemreboot.net
Toggle quote (2 lines)> $ ./pre-inst-env guix pull -p /tmp/test
One mistake. This command should be
./pre-inst-env guix pull --url=$PWD --branch=xapian -p /tmp/test
where xapian is the name of the branch you committed the patches to.
Also, I acknowledge the corrections you both suggested. I willincorporate them in v2 of the patchset.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5ZOikACgkQLiXui2GAK7PEnAgAkIWWspGpqT7hMeD88YDFkXWEX4qJb5Z8C9Gk90AsAMlA/GCZYWVOM/FKHOuw/KhM1FEmnKX1TX3O6BV1EZgY+gjtgQKdy0Yf++kdH9kQPb+jBFCWk2mFJ72S/RY5SZ6SFCfCSEJg9elcEtCK9edAcysHNz5oY1SL0oGPSxsM+0+CVi2C+476gOJljB9u7DPpjpnrOsmyhMd5/H6ojJ6MkndLtl3srAHokkbyrhG7KpqOOfM/UEnFZb3SyxgQY5g9IHEHUU+sVgFUE2XAA3hVcxThFjYrtgPpnru6Cd3JKxHoediJJKM6LmdlxoM2UBchjEHAf6OnIBQ74jjUALPyNA===d4UJ-----END PGP SIGNATURE-----
A
A
Arun Isaac wrote on 29 Feb 09:25 +0100
Re: [bug#39258] [PATCH 0/4] Xapian for Guix package search
(address . zimon.toutoune@gmail.com)
cu7k14594wi.fsf@systemreboot.net
Toggle quote (4 lines)> This whole process will be simpler if the guile-xapian package is pushed> to master and guile-xapian added as an input to the guix package in> gnu/packages/package-management.scm. But, for now...
Shall I push patch 1 (add guile-xapian) alone to master?
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5aH+0ACgkQLiXui2GAK7Ms1wf/fH0DfN6qH0mqwpS0RNHgrPznYh16rcDMqniHkeX7Hy+Lnlx5c9nzC6Tk8sfBrywZfIaSRl1DVwolbfPKJdlJnY1bleW3xAoVFfgidm8zgiO4QKuU0GNE/w7ywm/c5HgtFp7FRwp3sihWhqv3gZBQQR5dBWZmE430XnGqJphNH+w72KG1SRHRnMWNjGkvdXsU7IqnW2vj9gDaPpBLobOP0AVKKJcT/cj2tn7lqAOvSn6Xs38ZcPtX0QxGacBuf+KHo5sDTj6Okjgm2gMARJ0Jjr0Rc7juKasWrFNVdOkEcUfgL539xJ7q7S7sI05AvH0KgOGd9YIggzgYfxbZwwTrWg===am5M-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 2 Mar 19:27 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ3sD9kPkvDNOgTzns-uh4CQWANenWQ372HSY3=2=D5PKQ@mail.gmail.com
Hi Arun,
On Sat, 29 Feb 2020 at 09:25, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (2 lines)> Shall I push patch 1 (add guile-xapian) alone to master?
Yes, it seems a good idea and it will ease the process for buildingand then benchmarking the "guix search" via Xapian.

All the best,simon
Z
Z
zimoun wrote on 2 Mar 19:37 +0100
Re: [PATCH 0/4] Xapian for Guix package search
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ2g8iDRoAt40MXzPD8vP58ntiGyZsPqG6pZ+ZXBPaquEA@mail.gmail.com
Hi Arun,
Do you have some benchmark in mind?

On Fri, 28 Feb 2020 at 17:05, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (2 lines)> ./pre-inst-env guix pull --url=$PWD --branch=xapian -p /tmp/test
We need to benchmark on different machines the new "guix pull". Well,it is nothing compared to the derivation computations. :-)And more importantly, 'make as-derivations' to avoid a "guix pull" breakage,
Then on cold caches, the new "guix search" for a couple of query.
There is no so much inspiration in tests/. :-)Ah do not forget to adapt some tests.

All the best,simon
Z
Z
zimoun wrote on 2 Mar 20:13 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ0=jbtutBFFYTxa8S=d3MDNvJRuS_19WKp-Nvqpmx7eVQ@mail.gmail.com
Hi,
After a quick benchmark:
a. It is faster. Between x2 and x3. Really? b. The xapian relevance should truncated and examined in more details.
Toggle snippet (26 lines)time guix search emacs | recsel -p name,relevance | head -n18name: emacsrelevance: 33
name: emacs-with-editorrelevance: 19
name: emacs-restart-emacsrelevance: 19
name: emacs-epkgrelevance: 18
name: guile-emacsrelevance: 17
name: emacs-xwidgetsrelevance: 17

real 0m1.530suser 0m1.827ssys 0m0.074s

Toggle snippet (26 lines)time /tmp/test/bin/guix search emacs | recsel -p name,relevance | head -n18name: emacs-helm-passrelevance: 5.0774748262821685
name: emacs-sparkrelevance: 4.898640632723127
name: emacs-evil-smartparensrelevance: 4.898640632723127
name: emacs-howmrelevance: 4.8638448958830685
name: emacs-el-mockrelevance: 4.8638448958830685
name: emacs-strace-moderelevance: 4.693676055650271

real 0m0.440suser 0m0.482ssys 0m0.058s

Here for example, Xapian does not return the package 'emacs' itself asthe first. And worse, it is not returned at all.That's said, I do not know if it is really faster since:
Toggle snippet (4 lines)guix search emacs | recsel -C -P name | wc -l829
and
Toggle snippet (4 lines)/tmp/test/bin/guix search emacs | recsel -C -P name | wc -l10
Maybe I am doing a mistake.

Well, thank you Arun for the Xapian bindings which will improve thesearching experience. :-)And now it needs some polishing.

All the bestsimo
Z
Z
zimoun wrote on 3 Mar 17:29 +0100
Re: [PATCH 1/4] gnu: Add guile-xapian.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ1XAysirS_w1Z_3FUnwZYUyfUqTEUuHQaWTyEp40DNFLw@mail.gmail.com
Hi Arun,
On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (2 lines)> * gnu/packages/guile-xyz.scm (guile-xapian, guile3.0-xapian): New variables.
I am a bit lost with the Guile update. Now the convention should notbe the opposite: guile-xapian using 3.0 and guile2.2-xapian using 2.2(or simply 2.2 since 2.0 seems not really used).
Otherwise, feel free to push it. :-)(It will ease to reach a large audience of testers for "guix search" ;-))
All the best,simon
Z
Z
zimoun wrote on 3 Mar 19:29 +0100
Re: [PATCH 3/4] gnu: Generate xapian package search index.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ2TxN=mxQKhP7S7-yeAmBhYFiBkGHDYbzjD2kO_X7igFQ@mail.gmail.com
Hi Arun,
In the commit message, I would capitalize Xapian.

On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (51 lines)>> * gnu/packages.scm (%package-search-index): New variable.> (generate-package-search-index): New function.> * guix/channels.scm (package-search-index): New function.> (%channel-profile-hooks): Add package-search-index.> ---> gnu/packages.scm | 29 ++++++++++++++++++++++++++++-> guix/channels.scm | 34 +++++++++++++++++++++++++++++++++-> 2 files changed, 61 insertions(+), 2 deletions(-)>> diff --git a/gnu/packages.scm b/gnu/packages.scm> index d22c992bb1..e91753e2a8 100644> --- a/gnu/packages.scm> +++ b/gnu/packages.scm> @@ -4,6 +4,7 @@> ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org>> ;;; Copyright © 2016, 2017 Alex Kost <alezost@gmail.com>> ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org>> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>> ;;;> ;;; This file is part of GNU Guix.> ;;;> @@ -43,6 +44,7 @@> #:use-module (srfi srfi-34)> #:use-module (srfi srfi-35)> #:use-module (srfi srfi-39)> + #:use-module (xapian xapian)> #:export (search-patch> search-patches> search-auxiliary-file> @@ -64,7 +66,8 @@> specification->location> specifications->manifest>> - generate-package-cache))> + generate-package-cache> + generate-package-search-index))>> ;;; Commentary:> ;;;> @@ -426,6 +429,30 @@ reducing the memory footprint."> #:opts '(#:to-file? #t)))))> cache-file)>> +(define %package-search-index> + ;; Location of the package search-index> + "/lib/guix/package-search.index")> +> +(define (generate-package-search-index directory)> + "Generate under DIRECTORY a xapian index of all the available packages."
Xapian with capital.

Toggle quote (14 lines)> + (define db-path> + (string-append directory %package-search-index))> +> + (mkdir-p (dirname db-path))> + (call-with-writable-database db-path> + (lambda (db)> + (fold-packages (lambda (package _)> + (let* ((idterm (string-append "Q" (package-name package)))> + (doc (make-document #:data (package-name package)> + #:terms `((,idterm . 0))))> + (term-generator (make-term-generator #:stem (make-stem "en")> + #:document doc)))> + (index-text! term-generator (package-description package))
Instead, this:
(index-term! term-generator (string-append (package-synopsis package)(package-description package)))
should index both 'synopsis' and 'description'.

Is (make-stem "en") for the locale?

Toggle quote (56 lines)> + (replace-document! db idterm doc)))> + #f)))> +> + db-path)> +>> (define %sigint-prompt> ;; The prompt to jump to upon SIGINT.> diff --git a/guix/channels.scm b/guix/channels.scm> index f0261dc2da..c70c70938c 100644> --- a/guix/channels.scm> +++ b/guix/channels.scm> @@ -2,6 +2,7 @@> ;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>> ;;; Copyright © 2018 Ricardo Wurmus <rekado@elephly.net>> ;;; Copyright © 2019 Jan (janneke) Nieuwenhuizen <janneke@gnu.org>> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>> ;;;> ;;; This file is part of GNU Guix.> ;;;> @@ -581,9 +582,40 @@ be used as a profile hook."> (hook . package-cache))> #:local-build? #t)))>> +(define (package-search-index manifest)> + "Build a package search index for the instance in MANIFEST. This is meant> +to be used as a profile hook."> + (mlet %store-monad ((profile (profile-derivation manifest> + #:hooks '())))> +> + (define build> + #~(begin> + (use-modules (gnu packages))> +> + (if (defined? 'generate-package-search-index)> + (begin> + ;; Delegate package search index generation to the inferior.> + (format (current-error-port)> + "Generating package search index for '~a'...~%"> + #$profile)> + (generate-package-search-index #$output))> + (mkdir #$output))))> +> + (gexp->derivation-in-inferior "guix-package-search-index" build> + profile> +> + ;; If the Guix in PROFILE is too old and> + ;; lacks 'guix repl', don't build the cache> + ;; instead of failing.> + #:silent-failure? #t> +> + #:properties '((type . profile-hook)> + (hook . package-search-index))> + #:local-build? #t)))> +
package-search-index and package-cache-file could be refactoredbecause they share all the same code.

Toggle quote (10 lines)> (define %channel-profile-hooks> ;; The default channel profile hooks.> - (cons package-cache-file %default-profile-hooks))> + (cons* package-cache-file package-search-index %default-profile-hooks))>> (define (channel-instances->derivation instances)> "Return the derivation of the profile containing INSTANCES, a list of> --> 2.23.0>
Z
Z
zimoun wrote on 3 Mar 20:21 +0100
Re: [PATCH 4/4] gnu: Use xapian index for package search.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ0XiJcAwq8bqOWL7bPWMXrZ6zo9D0JMx86x0nJa4apBpQ@mail.gmail.com
Hi Arun,

On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (32 lines)>> * gnu/packages.scm (search-package-index): New function.> * guix/scripts/package.scm (find-packages-by-description): Search using the> xapian package index if search patterns are literal strings. Else, search> using fold-packages.> ---> gnu/packages.scm | 17 +++++++++++-> guix/scripts/package.scm | 57 +++++++++++++++++++++++-----------------> 2 files changed, 49 insertions(+), 25 deletions(-)>> diff --git a/gnu/packages.scm b/gnu/packages.scm> index e91753e2a8..5b5b29bf84 100644> --- a/gnu/packages.scm> +++ b/gnu/packages.scm> @@ -67,7 +67,8 @@> specifications->manifest>> generate-package-cache> - generate-package-search-index))> + generate-package-search-index> + search-package-index))>> ;;; Commentary:> ;;;> @@ -453,6 +454,20 @@ reducing the memory footprint.">> db-path)>> +(define (search-package-index profile querystring)> + (let ((offset 0)> + (pagesize 10))
Why this value of 10?This fix the number of packages returned. Hum?I have tried to replace by 100 and I got 100 packages. :-)

Toggle quote (5 lines)> + (call-with-database (string-append profile %package-search-index)> + (lambda (db)> + (let ((query (parse-query querystring #:stemmer (make-stem "en"))))> + (mset-fold (lambda (item result)
I do not know what is the convention for the bindings.But there is 'fold-packages' so I would be inclined to 'fold-msets' orsomething in this flavour.

Toggle quote (57 lines)> + (match (find-packages-by-name> + (document-data (mset-item-document item)))> + ((package _ ...)> + (append result `((,package . ,(mset-item-weight item)))))))> + '()> + (enquire-mset (enquire db query) offset pagesize)))))))> +>> (define %sigint-prompt> ;; The prompt to jump to upon SIGINT.> diff --git a/guix/scripts/package.scm b/guix/scripts/package.scm> index 1cb0d382bf..6a3b9002dd 100644> --- a/guix/scripts/package.scm> +++ b/guix/scripts/package.scm> @@ -7,6 +7,7 @@> ;;; Copyright © 2016 Benz Schenk <benz.schenk@uzh.ch>> ;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com>> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr>> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>> ;;;> ;;; This file is part of GNU Guix.> ;;;> @@ -178,31 +179,40 @@ hooks\" run when building the profile."> ;;; Package specifications.> ;;;>> -(define (find-packages-by-description regexps)> +(define (find-packages-by-description patterns)> "Return a list of pairs: packages whose name, synopsis, description,> or output matches at least one of REGEXPS sorted by relevance, and its> non-zero relevance score."> - (let ((matches (fold-packages (lambda (package result)> - (if (package-superseded package)> - result> - (match (package-relevance package> - regexps)> - ((? zero?)> - result)> - (score> - (cons (cons package score)> - result)))))> - '())))> - (sort matches> - (lambda (m1 m2)> - (match m1> - ((package1 . score1)> - (match m2> - ((package2 . score2)> - (if (= score1 score2)> - (string>? (package-full-name package1)> - (package-full-name package2))> - (> score1 score2))))))))))> + (define (regexp? str)> + (string-any> + (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)> + str))
Instead of reverting this, I would let the current'find-packages-by-description' and would add'find-packages-by-description-indexed' doing just'(search-package-index (current-profile) (string-join patterns " "))'.And maybe refactoring the sort of scores. Then I would put the testbranch in 'guix/scripts/packages.scm'...

Toggle quote (9 lines)> + (if (and (current-profile)> + (not (any regexp? patterns)))> + (search-package-index (current-profile) (string-join patterns " "))> + (let* ((regexps (map (cut make-regexp* <> regexp/icase) patterns))> + (matches (fold-packages (lambda (package result)> + (if (package-superseded package)> + result> + (match (package-relevance package
Note that I am in the process of implementing the BM25 weights as'package-relevance'; at least really thinking about it! :-)I have already talked about TF-IDF as relevance, for example here [1].And reading the Xapian documentation [2], it seems affordable. Or not;-) because of the regexp... Need some thoughts... I mean "in theprocess". ;-)And in this case, it is almost a drop-in replacement of'fold-packages' by 'mset-fold'; well it should add some flexibilityand a more unified code.
(Aside the searching, IMHO 'package-relevance' should help too in thelinting process of bad written descriptions, another story. ;-)
[1] https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00252.html[2] https://xapian.org/docs/bm25.html

Toggle quote (22 lines)> + regexps)> + ((? zero?)> + result)> + (score> + (cons (cons package score)> + result)))))> + '())))> + (sort matches> + (lambda (m1 m2)> + (match m1> + ((package1 . score1)> + (match m2> + ((package2 . score2)> + (if (= score1 score2)> + (string>? (package-full-name package1)> + (package-full-name package2))> + (> score1 score2)))))))))))>> (define (transaction-upgrade-entry store entry transaction)> "Return a variant of TRANSACTION that accounts for the upgrade of ENTRY, a> @@ -777,8 +787,7 @@ processed, #f otherwise."
...here.
+ (define (regexp? str)+ (string-any+ (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)+ str))
Toggle quote (7 lines)> (('query 'search rx) rx)> (_ #f))> opts))>> - (regexps (map (cut make-regexp* <> regexp/icase) patterns))> - (matches (find-packages-by-description regexps)))
+ (if (any regexp? patterns)+ (matches (find-packages-by-description regexps))+ (matches (find-packages-by-description-indexed patterns))
I mean something like that.
Toggle quote (7 lines)> (leave-on-EPIPE> (display-search-results matches (current-output-port)))> #t))> --> 2.23.0

All the best,simon
Z
Z
zimoun wrote on 3 Mar 20:51 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ0zoD6N74Wrx29Thp-Hx+kMWVKoPQ-eZSt84R4UiBa_Gw@mail.gmail.com
On Tue, 3 Mar 2020 at 20:21, zimoun <zimon.toutoune@gmail.com> wrote:
Toggle quote (10 lines)> On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote:
> > +(define (search-package-index profile querystring)> > + (let ((offset 0)> > + (pagesize 10))>> Why this value of 10?> This fix the number of packages returned. Hum?> I have tried to replace by 100 and I got 100 packages. :-)
I propose the value of 4294967295 for pagesize.
Z
Z
zimoun wrote on 3 Mar 21:04 +0100
Re: [PATCH 0/4] Xapian for Guix package search
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ2ugJ9B6-pwcjezCfL2F+KpgmTt=mr1LH3ugMdnWYkwYg@mail.gmail.com
Hi,
On Mon, 2 Mar 2020 at 20:13, zimoun <zimon.toutoune@gmail.com> wrote:
Toggle quote (7 lines)> --8<---------------cut here---------------start------------->8---> /tmp/test/bin/guix search emacs | recsel -C -P name | wc -l> 10> --8<---------------cut here---------------end--------------->8--->> Maybe I am doing a mistake.
I think this issue is fixed when changing the 'pagesize' value.
Well, with '(pagesize 4294967295)' and using the same commit(c1febbbf94), I get:
Toggle snippet (8 lines)guix time-machine --commit=c1febbbf94 -- guix search games | recsel -C-p name | wc -l247
./pre-inst-env guix search games | recsel -C -p name | wc -l236
(I modified the patches in order to pull once to generate the index atcommit c1febbbf94 and then do some stuff.)

Note that the old "guix search" does not output blender and Xapiandoes even if the term 'games' is not in the description but 'game' is.Well, I am comparing the different list, i.e., "guix search games |recsel -C -P name | sort" to see which one is in one list and not theother one.
But before going more ahead, let polish a bit the patches to moreeasily test without the double environment etc.And because I am using good old HDD and some SSD comparison should be welcome.

All the best,simon
L
L
Ludovic Courtès wrote on 5 Mar 17:46 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87h7z292d2.fsf@gnu.org
Hello Arun,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (7 lines)> * Speed improvement>> Despite search-package-index in gnu/packages.scm taking only around 1.5ms, I> see an overall speedup in `guix search` of only a factor of 2 -- from around> 2s to around 1s. I wonder what else in `guix search` is taking up so much> time.
Note that ‘guix search’ time is largely dominated by I/O. On my laptop,I get (first measurement is cold cache, second one is warm cache):
Toggle snippet (13 lines)$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'$ time guix search foo >/dev/null
real 0m2.631suser 0m1.134ssys 0m0.124s$ time guix search foo >/dev/null
real 0m0.836suser 0m1.027ssys 0m0.053s
It’s hard to do better on the warm cache case because at this level,there may be other things to optimize having little to do with searchingitself.
Note that this is on an SSD; the cold-cache case must be worse on NFS oron a spinning disk, and there we could gain a lot.
I think we should weigh the pros and cons on all these aspects: speed,complexity and maintenance cost, search result quality, search features,etc.
Thanks,Ludo’.
PS: I have not yet looked at the whole series as I’m just coming back to the keyboard. :-)
A
A
Arun Isaac wrote on 5 Mar 21:26 +0100
Re: [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index.
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
cu7k13yeefl.fsf@systemreboot.net
Toggle quote (13 lines)>> + (fold-packages (lambda (package _)>> + (let* ((idterm (string-append "Q" (package-name package)))>> + (doc (make-document #:data (package-name package)>> + #:terms `((,idterm . 0))))>> + (term-generator (make-term-generator #:stem (make-stem "en")>> + #:document doc)))>> + (index-text! term-generator (package-description package))>> + (replace-document! db idterm doc)))>> I guess these non-functional functions (index-text!, replace-document!)> represent how Xapian works at the C++ level. Would it be possible to> make more functional bindings nonetheless?
I somehow overlooked this particular email and am reading it justnow. Yes, the non-functional bindings are a bit ugly. But, I'm not ableto think of a clean way to make functional bindings without supportingall features offered by xapian. Any suggestions you have in this regardwould be useful. Look through xapian/termgenerator.h for moredetails. In particular, look at functions increase_termpos,index_text_without_positions.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5hYG8ACgkQLiXui2GAK7PkRwf+NvdP6lMsIbZH4CZRl0J/rfHZv4wQD6P4VA+JejO0wuiT9L7lYezPz06UVdRGo6pt/GGt/XIA0qKA3+HumBtaWJ5YLCRFbgJKa/iqHaMUEfl1LU/rWo4peUDkDRrSP4tGFhYJtxGiHuPU/ia7KyWaCzoERhFRfQgKUM69/WSkpxKjQpo6tzciBBuvLVvRTcy0CcPyUww210W2qwL50fzQ27dgbFaX4WXLHmzpQGPDMXbmRQCXPsbCsTFXiLPJzdPHiu1BCl15nevLCcfVCdmHKelyIU23LmhD8LtrtuLnIaYlb4UE4fMPlE9U5wubfuszWe90QibFjZmjuKHZRoLTeA===WJVN-----END PGP SIGNATURE-----
A
A
Arun Isaac wrote on 7 Mar 14:31 +0100
[PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies.
(address . 39258@debbugs.gnu.org)
20200307133116.11443-2-arunisaac@systemreboot.net
* build-aux/build-self.scm (build-program): Import fake guile-xapian module.* guix/self.scm (compiled-guix): Add guile-xapian to Guix dependencies.--- build-aux/build-self.scm | 11 +++++++++++ guix/self.scm | 7 ++++++- 2 files changed, 17 insertions(+), 1 deletion(-)
Toggle diff (75 lines)diff --git a/build-aux/build-self.scm b/build-aux/build-self.scmindex f2e785b7f1..05d0353ccf 100644--- a/build-aux/build-self.scm+++ b/build-aux/build-self.scm@@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -261,6 +262,10 @@ interface (FFI) of Guile.") #~(define-module (gcrypt hash) #:export (sha1 sha256)))) + (define fake-xapian-hash+ ;; Fake (xapian xapian) module; see below.+ (scheme-file "xapian.scm" #~(define-module (xapian xapian))))+ (define fake-git (scheme-file "git.scm" #~(define-module (git)))) @@ -273,6 +278,12 @@ interface (FFI) of Guile.") ;; adjust %LOAD-PATH later on. ((gcrypt hash) => ,fake-gcrypt-hash) + ;; To avoid relying on 'with-extensions', which was+ ;; introduced in 0.15.0, provide a fake (xapian+ ;; xapian) just so that we can build modules, and+ ;; adjust %LOAD-PATH later on.+ ((xapian xapian) => ,fake-xapian-hash)+ ;; (guix git-download) depends on (git) but only ;; for peripheral functionality. Provide a dummy ;; (git) to placate it.diff --git a/guix/self.scm b/guix/self.scmindex 6b633f9bc0..a4f40574d1 100644--- a/guix/self.scm+++ b/guix/self.scm@@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -54,6 +55,7 @@ ("guile-git" (ref '(gnu packages guile) 'guile3.0-git)) ("guile-sqlite3" (ref '(gnu packages guile) 'guile3.0-sqlite3)) ("guile-gcrypt" (ref '(gnu packages gnupg) 'guile3.0-gcrypt))+ ("guile-xapian" (ref '(gnu packages guile-xyz) 'guile3.0-xapian)) ("gnutls" (ref '(gnu packages tls) 'guile3.0-gnutls)) ("zlib" (ref '(gnu packages compression) 'zlib)) ("lzlib" (ref '(gnu packages compression) 'lzlib))@@ -682,6 +684,9 @@ Info manual." (define guile-gcrypt (specification->package "guile-gcrypt")) + (define guile-xapian+ (specification->package "guile-xapian"))+ (define gnutls (specification->package "gnutls")) @@ -690,7 +695,7 @@ Info manual." (cons (list "x" package) (package-transitive-propagated-inputs package))) (list guile-gcrypt gnutls guile-git guile-json- guile-ssh guile-sqlite3))+ guile-ssh guile-sqlite3 guile-xapian)) (((labels packages _ ...) ...) packages))) -- 2.25.1
A
A
Arun Isaac wrote on 7 Mar 14:31 +0100
[PATCH v2 2/3] gnu: Generate Xapian package search index.
(address . 39258@debbugs.gnu.org)
20200307133116.11443-3-arunisaac@systemreboot.net
* guix/ui.scm: Export %package-metrics.* gnu/packages.scm (%package-search-index): New variable.(generate-package-search-index): New function.* guix/channels.scm (package-search-index): New function.(%channel-profile-hooks): Add package-search-index.--- gnu/packages.scm | 42 +++++++++++++++++++++++++++++++++++++++++- guix/channels.scm | 34 +++++++++++++++++++++++++++++++++- guix/ui.scm | 2 ++ 3 files changed, 76 insertions(+), 2 deletions(-)
Toggle diff (150 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex d22c992bb1..c8e221de68 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -4,6 +4,7 @@ ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2016, 2017 Alex Kost <alezost@gmail.com> ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -43,6 +44,7 @@ #:use-module (srfi srfi-34) #:use-module (srfi srfi-35) #:use-module (srfi srfi-39)+ #:use-module (xapian xapian) #:export (search-patch search-patches search-auxiliary-file@@ -64,7 +66,8 @@ specification->location specifications->manifest - generate-package-cache))+ generate-package-cache+ generate-package-search-index)) ;;; Commentary: ;;;@@ -426,6 +429,43 @@ reducing the memory footprint." #:opts '(#:to-file? #t))))) cache-file) +(define %package-search-index+ ;; Location of the package search-index+ "/lib/guix/package-search.index")++(define (generate-package-search-index directory)+ "Generate under DIRECTORY a Xapian index of all the available packages."+ (define db-path+ (string-append directory %package-search-index))++ (mkdir-p (dirname db-path))+ (call-with-writable-database db-path+ (lambda (db)+ (fold-packages (lambda (package _)+ (let* ((idterm (string-append "Q" (package-name package)))+ (doc (make-document #:data (string-trim-right+ (call-with-output-string+ (cut package->recutils package <>))+ #\newline)+ #:terms `((,idterm . 0))))+ (term-generator (make-term-generator #:stem (make-stem "en")+ #:document doc)))+ (for-each (match-lambda+ ((field . weight)+ (match (field package)+ ((? string? str)+ (index-text! term-generator str+ #:wdf-increment weight))+ ((lst ...)+ (for-each (cut index-text! term-generator <>+ #:wdf-increment weight)+ lst)))+ (replace-document! db idterm doc)))+ %package-metrics)))+ #f)))++ db-path)+ (define %sigint-prompt ;; The prompt to jump to upon SIGINT.diff --git a/guix/channels.scm b/guix/channels.scmindex f0261dc2da..c70c70938c 100644--- a/guix/channels.scm+++ b/guix/channels.scm@@ -2,6 +2,7 @@ ;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2018 Ricardo Wurmus <rekado@elephly.net> ;;; Copyright © 2019 Jan (janneke) Nieuwenhuizen <janneke@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -581,9 +582,40 @@ be used as a profile hook." (hook . package-cache)) #:local-build? #t))) +(define (package-search-index manifest)+ "Build a package search index for the instance in MANIFEST. This is meant+to be used as a profile hook."+ (mlet %store-monad ((profile (profile-derivation manifest+ #:hooks '())))++ (define build+ #~(begin+ (use-modules (gnu packages))++ (if (defined? 'generate-package-search-index)+ (begin+ ;; Delegate package search index generation to the inferior.+ (format (current-error-port)+ "Generating package search index for '~a'...~%"+ #$profile)+ (generate-package-search-index #$output))+ (mkdir #$output))))++ (gexp->derivation-in-inferior "guix-package-search-index" build+ profile++ ;; If the Guix in PROFILE is too old and+ ;; lacks 'guix repl', don't build the cache+ ;; instead of failing.+ #:silent-failure? #t++ #:properties '((type . profile-hook)+ (hook . package-search-index))+ #:local-build? #t)))+ (define %channel-profile-hooks ;; The default channel profile hooks.- (cons package-cache-file %default-profile-hooks))+ (cons* package-cache-file package-search-index %default-profile-hooks)) (define (channel-instances->derivation instances) "Return the derivation of the profile containing INSTANCES, a list ofdiff --git a/guix/ui.scm b/guix/ui.scmindex fbe2b70485..3bc82111a5 100644--- a/guix/ui.scm+++ b/guix/ui.scm@@ -14,6 +14,7 @@ ;;; Copyright © 2019 Chris Marusich <cmmarusich@gmail.com> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr> ;;; Copyright © 2019 Simon Tournier <zimon.toutoune@gmail.com>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -120,6 +121,7 @@ relevance package-relevance display-search-results+ %package-metrics with-profile-lock string->generations-- 2.25.1
A
A
Arun Isaac wrote on 7 Mar 14:31 +0100
[PATCH v2 3/3] gnu: Use Xapian index for package search.
(address . 39258@debbugs.gnu.org)
20200307133116.11443-4-arunisaac@systemreboot.net
* gnu/packages.scm (search-package-index): New function.* guix/ui.scm (display-package-search-results): New function.* guix/scripts/package.scm (process-query): Search using the Xapian packageindex if current profile is available. Else, search using regexps.--- gnu/packages.scm | 22 +++++++++++++++++++++- guix/scripts/package.scm | 7 +++++-- guix/ui.scm | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 61 insertions(+), 3 deletions(-)
Toggle diff (121 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex c8e221de68..3cbd7c63e3 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -67,7 +67,8 @@ specifications->manifest generate-package-cache- generate-package-search-index))+ generate-package-search-index+ search-package-index)) ;;; Commentary: ;;;@@ -466,6 +467,25 @@ reducing the memory footprint." db-path) +(define (search-package-index profile query-string)+ "Search Xapian index in PROFILE for packages matching the Xapian query+QUERY-STRING. Return a list of search result texts each corresponding to one+matching package."+ (call-with-database (string-append profile %package-search-index)+ (lambda (db)+ (let ((query (parse-query query-string #:stemmer (make-stem "en"))))+ (mset-fold (lambda (item result)+ (let ((search-result-text+ (call-with-output-string+ (cut format <> "~a~%relevance: ~a~%~%"+ (document-data (mset-item-document item))+ ;; Round score to one decimal place.+ (/ (round (* 10 (mset-item-weight item))) 10)))))+ (append result (list search-result-text))))+ '()+ (enquire-mset (enquire db query)+ #:maximum-items (database-document-count db)))))))+ (define %sigint-prompt ;; The prompt to jump to upon SIGINT.diff --git a/guix/scripts/package.scm b/guix/scripts/package.scmindex d2f4f1ccd3..91c975b168 100644--- a/guix/scripts/package.scm+++ b/guix/scripts/package.scm@@ -7,6 +7,7 @@ ;;; Copyright © 2016 Benz Schenk <benz.schenk@uzh.ch> ;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -781,9 +782,11 @@ processed, #f otherwise." (_ #f)) opts)) (regexps (map (cut make-regexp* <> regexp/icase) patterns))- (matches (find-packages-by-description regexps)))+ (matches (if (current-profile)+ (search-package-index (current-profile) (string-join patterns " "))+ (find-packages-by-description regexps)))) (leave-on-EPIPE- (display-search-results matches (current-output-port)))+ (display-package-search-results matches (current-output-port))) #t)) (('show requested-name)diff --git a/guix/ui.scm b/guix/ui.scmindex 3bc82111a5..163042054c 100644--- a/guix/ui.scm+++ b/guix/ui.scm@@ -121,6 +121,7 @@ relevance package-relevance display-search-results+ display-package-search-results %package-metrics with-profile-lock@@ -1490,6 +1491,40 @@ to view all the results.") (() #t)))) +(define* (display-package-search-results search-results port+ #:key+ (command "guix search"))+ "Display SEARCH-RESULTS, a list of search result texts each corresponding to+one matching package. If PORT is a terminal, print at most a full screen of+results."+ (define first-line+ (port-line port))++ (define max-rows+ (and first-line (isatty? port)+ (terminal-rows port)))++ (define (line-count str)+ (string-count str #\newline))++ (let loop ((search-results search-results))+ (match search-results+ ((text rest ...)+ (if (and (not (getenv "INSIDE_EMACS"))+ max-rows+ (> (port-line port) first-line) ;print at least one result+ (> (+ 4 (line-count text) (port-line port))+ max-rows))+ (unless (null? rest)+ (display-hint (format #f (G_ "Run @code{~a ... | less} \+to view all the results.")+ command)))+ (begin+ (display text port)+ (loop rest))))+ (()+ #t))))+ (define (string->generations str) "Return the list of generations matching a pattern in STR. This function-- 2.25.1
A
A
Arun Isaac wrote on 7 Mar 14:31 +0100
[PATCH v2 0/3] Xapian for Guix package search
(address . 39258@debbugs.gnu.org)
20200307133116.11443-1-arunisaac@systemreboot.net
Hi,
Here is the second iteration of my Xapian Guix package search patchset. I havefound the reason the earlier patchset did not show significant speedup. Itturns out that most of the time is spent in printing and texinfo rendering ofthe search results. So, in this patchset, I pre-render the search resultswhile building the Xapian index and stuff them into the Xapian databaseitself. Therefore, during `guix search`, I just pull out the pre-renderedsearch results and print it on the screen. This is much faster. See comparisonbelow.
Toggle snippet (8 lines)With a warm cache,$ time guix search inkscape
real 0m1.787suser 0m1.745ssys 0m0.111s
Toggle snippet (7 lines)$ time /tmp/test/bin/guix search inkscape
real 0m0.199suser 0m0.182ssys 0m0.024s
If most of the speedup comes from pre-rendering the results, it might seemthat the Xapian search is not so useful. We might as well have stuffed thepre-rendered search results into the existing package cache generated bygenerate-package-cache, or so it might seem. But, there are the followingarguments in favor of Xapian.
- The package cache would grow in size, and lookup would be slowed down because we need to load the entire cache into memory. Xapian, on the other hand, need only look up the specific packages that match the search query.- Xapian can provide superior search results due to it stemming and language models.- Xapian can provide spelling correction and query expansion -- that is, suggest search terms to improve search results. Note that I haven't implemented this yet and is out of scope in this patchset.
* Simplify our package search results
Why not use a simpler package search results format like Arch Linux or Debiandoes? We could just display the package name, version and synopsis like so.
inkscape 0.92.4 Vector graphics editorinklingreader 0.8 Wacom Inkling sketch format conversion and manipulation
Why do we need the entire recutils format? If the user is interested, they canalways use `guix package --show` to get the full recutils formattedinfo. Having shorter search results will make everything even faster and muchmore readable. WDYT?
* How to test this patchset
To get guile-xapian, run a `guix pull`, if you haven't already. Then in yourGuix source directory, drop into an environment with guix dependencies andguile-xapian.
$ guix environment guix --ad-hoc guile-xapian
Apply patches and build.
$ git am v2-0000-cover-letter.patch v2-0002-gnu-Generate-Xapian-package-search-index.patch v2-0001-build-self-Add-guile-xapian-to-Guix-dependencies.patch v2-0003-gnu-Use-Xapian-index-for-package-search.patch$ make
Run a test guix pull.
$ ./pre-inst-env guix pull --url=$PWD --branch=xapian -p /tmp/test
where xapian is the name of the branch you committed the patches to.
Then, run the guix search in /tmp/test.
$ /tmp/test/bin/guix search game
* Comments
Pierre Neidhardt <mail@ambrevar.xyz> writes:
Toggle quote (4 lines)>> +(define (search-package-index profile querystring)>> Maybe `query-string'?
Done in this patchset.
Toggle quote (15 lines)>> + (define (regexp? str)>> + (string-any>> + (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)>> + str))>> +>> + (if (and (current-profile)>> + (not (any regexp? patterns)))>> I would not put characters like ".", "$", or "+" here, lest we mistake a> Xapian pattern for a regexp.>> As you said, I don't think both are compatible without ambiguity> anyways, so we should probably drop regexp (or at least toggle them with> a command line argument).
I agree.
zimoun <zimon.toutoune@gmail.com> writes:
Toggle quote (2 lines)> In the commit message, I would capitalize Xapian.
Done in this patchset.
Toggle quote (5 lines)>> +(define (generate-package-search-index directory)>> + "Generate under DIRECTORY a xapian index of all the available packages.">> Xapian with capital.
Done in this patchset.
Toggle quote (2 lines)> Is (make-stem "en") for the locale?
I still have English hard-coded. I haven't yet figured out how to detect thelocale and stem accordingly. But, there is a larger problem. Since we cannotanticipate what locale the user will run guix search with, should we build theXapian index for all locales? That is, should we index not only the Englishversions of the packages but also all other translations as well?
Toggle quote (3 lines)> package-search-index and package-cache-file could be refactored> because they share all the same code.
Yes, they could be. However, I'll postpone to the next iteration of thepatchset.
Toggle quote (4 lines)> I do not know what is the convention for the bindings.> But there is 'fold-packages' so I would be inclined to 'fold-msets' or> something in this flavour.
Well, everywhere else in guile we have such things as vhash-fold, string-fold,hash-fold, stream-fold, etc. That's why I went with mset-fold. Also, we arefolding over a single mset (match-set). So, mset should be in the singular.
Toggle quote (3 lines)> And more importantly, 'make as-derivations' to avoid a "guix pull" breakage,> Ah do not forget to adapt some tests.
Will do this once we have consensus about the other features of this patchset.
Toggle quote (2 lines)> b. The xapian relevance should truncated
Done in this patchset.
Toggle quote (3 lines)> Xapian does not return the package 'emacs' itself as the first. And worse,> it is not returned at all.
In this patchset, since we're indexing the package name as well, emacs isreturned but it is still far from the beginning.
Toggle quote (2 lines)> I propose the value of 4294967295 for pagesize.
In this patchset, I pass (database-document-count db) as the #:maximum-itemskeyword argument to enquire-mset. This is the upstream recommended way to getall search results. I hadn't done this earlier since I hadn't yet wrappeddatabase-document-count in guile-xapian.
Toggle quote (7 lines)>> In this patchset, I have only indexed the package descriptions. In the next>> version of this patchset, I will index all other terms as specified in>> %package-metrics of guix/ui.scm.>> Yes, it appears to me a detail that should be easy to fix. I mean, it> does not seems blocking.
Done in this patchset.
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (2 lines)> Note that ‘guix search’ time is largely dominated by I/O.
Yes, `guix search` is I/O intensive. That is why I expect Xapian to do bettersince it only needs to access matching packages not all packages. Also, theXapian index is fast at all times. It is not very dependent on a warmfilesystem cache.
Toggle quote (24 lines)> On my laptop,> I get (first measurement is cold cache, second one is warm cache):>> --8<---------------cut here---------------start------------->8---> $ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'> $ time guix search foo >/dev/null>> real 0m2.631s> user 0m1.134s> sys 0m0.124s> $ time guix search foo >/dev/null>> real 0m0.836s> user 0m1.027s> sys 0m0.053s> --8<---------------cut here---------------end--------------->8--->> It’s hard to do better on the warm cache case because at this level,> there may be other things to optimize having little to do with searching> itself.>> Note that this is on an SSD; the cold-cache case must be worse on NFS or> on a spinning disk, and there we could gain a lot.
My laptop is quite old with a particularly slow HDD. Hence my motivation toimprove guix search performance!
Toggle quote (4 lines)> I think we should weigh the pros and cons on all these aspects: speed,> complexity and maintenance cost, search result quality, search features,> etc.
I agree.
Toggle quote (3 lines)> PS: I have not yet looked at the whole series as I’m just coming back to> the keyboard. :-)
Welcome back! :-)
Arun Isaac (3): build-self: Add guile-xapian to Guix dependencies. gnu: Generate Xapian package search index. gnu: Use Xapian index for package search.
build-aux/build-self.scm | 11 +++++++ gnu/packages.scm | 62 +++++++++++++++++++++++++++++++++++++++- guix/channels.scm | 34 +++++++++++++++++++++- guix/scripts/package.scm | 7 +++-- guix/self.scm | 7 ++++- guix/ui.scm | 37 ++++++++++++++++++++++++ 6 files changed, 153 insertions(+), 5 deletions(-)
-- 2.25.1
L
L
Ludovic Courtès wrote on 7 Mar 21:33 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87sgijgb1v.fsf@gnu.org
Hello,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (22 lines)> Here is the second iteration of my Xapian Guix package search patchset. I have> found the reason the earlier patchset did not show significant speedup. It> turns out that most of the time is spent in printing and texinfo rendering of> the search results. So, in this patchset, I pre-render the search results> while building the Xapian index and stuff them into the Xapian database> itself. Therefore, during `guix search`, I just pull out the pre-rendered> search results and print it on the screen. This is much faster. See comparison> below.>> With a warm cache,> $ time guix search inkscape>> real 0m1.787s> user 0m1.745s> sys 0m0.111s>> $ time /tmp/test/bin/guix search inkscape>> real 0m0.199s> user 0m0.182s> sys 0m0.024s
Nice!
In general, pre-rendering doesn’t seem practical to me: the output of‘guix search’ is locale-dependent (it speaks the user’s language) andadjusts to the terminal width (well, this is temporarily broken onGuile 3.0.0, but see ‘%text-width’ in (guix ui)).
Also, if the 12K+ descriptions need to be rendered at the time the userruns ‘guix pull’, the experience may not be great, because it could takea bit of time.
WDYT?
Toggle quote (13 lines)> Why not use a simpler package search results format like Arch Linux or Debian> does? We could just display the package name, version and synopsis like so.>> inkscape 0.92.4> Vector graphics editor> inklingreader 0.8> Wacom Inkling sketch format conversion and manipulation>> Why do we need the entire recutils format? If the user is interested, they can> always use `guix package --show` to get the full recutils formatted> info. Having shorter search results will make everything even faster and much> more readable. WDYT?
What I like about the recutils format in this context is that it’s bothhuman- and machine-readable. The examples in the manual show how it canbe useful to select the information displayed or to refine the search(info "(guix) Invoking guix package").
Also: I’d recommend tackling one thing at a time. :-)
Toggle quote (9 lines)> Ludovic Courtès <ludo@gnu.org> writes:>>> Note that ‘guix search’ time is largely dominated by I/O.>> Yes, `guix search` is I/O intensive. That is why I expect Xapian to do better> since it only needs to access matching packages not all packages. Also, the> Xapian index is fast at all times. It is not very dependent on a warm> filesystem cache.
Yes, indeed.
Toggle quote (27 lines)>> On my laptop,>> I get (first measurement is cold cache, second one is warm cache):>>>> --8<---------------cut here---------------start------------->8--->> $ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'>> $ time guix search foo >/dev/null>>>> real 0m2.631s>> user 0m1.134s>> sys 0m0.124s>> $ time guix search foo >/dev/null>>>> real 0m0.836s>> user 0m1.027s>> sys 0m0.053s>> --8<---------------cut here---------------end--------------->8--->>>> It’s hard to do better on the warm cache case because at this level,>> there may be other things to optimize having little to do with searching>> itself.>>>> Note that this is on an SSD; the cold-cache case must be worse on NFS or>> on a spinning disk, and there we could gain a lot.>> My laptop is quite old with a particularly slow HDD. Hence my motivation to> improve guix search performance!
Were you able to measure the cost of rendering specifically?
Here’s what I see when I turn ‘package->recutils’ into a no-op:
Toggle snippet (13 lines)$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'$ time ./pre-inst-env guix search foo
real 0m1.617suser 0m0.812ssys 0m0.094s$ time ./pre-inst-env guix search foo
real 0m0.595suser 0m0.747ssys 0m0.043s
To compare with:
Toggle snippet (7 lines)$ time ./pre-inst-env guix search foo >/dev/null
real 0m0.829suser 0m1.026ssys 0m0.046s
I think we should look at a profile of ‘package->recutils’, there’sprobably room for improvement there.
Thoughts?
Ludo’.
A
A
Arun Isaac wrote on 8 Mar 10:01 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
cu7tv2zcj9l.fsf@systemreboot.net
Toggle quote (3 lines)>> It turns out that most of the time is spent in printing and texinfo>> rendering of the search results.
Also, when we put all package metadata into the Xapian index, we don'thave to look up any of the package variables in (gnu packages *) during`guix search` time. This also contributes substantially to the speedup.
Toggle quote (3 lines)> In general, pre-rendering doesn’t seem practical to me: the output of> ‘guix search’ is locale-dependent (it speaks the user’s language) and
Note that we already need to index package synopses and descriptions inall languages. I still haven't implemented this, though.
Toggle quote (3 lines)> adjusts to the terminal width (well, this is temporarily broken on> Guile 3.0.0, but see ‘%text-width’ in (guix ui)).
This could be accomplished even with pre-rendering. Xapian provides"slots" to store arbitrary strings with a document. Instead of storingthe pre-rendered document as a whole, we could store pre-rendered fieldsin separate slots. Then, during `guix search` time, we can assemble theresult from these pre-rendered fields.
Toggle quote (4 lines)> Also, if the 12K+ descriptions need to be rendered at the time the user> runs ‘guix pull’, the experience may not be great, because it could take> a bit of time.
This is a problem, but I would see it as a necessary "compilation"step. :-P In fact, this whole patchset speeds up `guix search` by doingpart of the work of `guix search` ahead of time. So, some such cost isunavoidable.
Toggle quote (5 lines)> What I like about the recutils format in this context is that it’s both> human- and machine-readable. The examples in the manual show how it can> be useful to select the information displayed or to refine the search> (info "(guix) Invoking guix package").
Xapian's query language is much more natural (as in natural language)than the regexp based techniques we need to use with recutils. I havehardly ever used the regexp based search and I suspect many othershaven't either. Also, refining the search query should be easier to dowith Xapian. We could even use Xapian's query expansion feature tosuggest improved queries to the user.
That said, if we want the recutils format, we can still keep it in asimplified form like so.
name: inkscapeversion: 0.92.4synopsis: Vector graphics editor
name: inklingreaderversion: 0.8synopsis: Wacom Inkling skecth format conversion and manipulation
Toggle quote (2 lines)> Also: I’d recommend tackling one thing at a time. :-)
I totally agree, but I'm tempted to say that pre-rendering would be alot cheaper with the simplified form of search results. :-)
Toggle quote (2 lines)> Were you able to measure the cost of rendering specifically?
generate-package-search-index takes around 50 seconds. If I modifygenerate-package-search-index to not pre-render but simply store thepackage description alone, it takes around 20 seconds. That gives us arough idea of the cost of pre-rendering.
Toggle quote (3 lines)> I think we should look at a profile of ‘package->recutils’, there’s> probably room for improvement there.
On quick inspection, most of the time in package->recutils is spent intexinfo rendering the description. Unless we use the simplified searchresults format as discussed above, we cannot avoid it.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5ktHYACgkQLiXui2GAK7NqqQgAnPZx1ilBTQfl8kAoHTyQrdcdhL//PI4LHFMtfYqi+s+AnYvu10jNVM9KlPX2AL3hDRcDYSJL1r2KQ+uicc/FZwY9qWVPg5axflohUnwa6pzn7LDnJ3U5ClU/ON5uY7Vh6nixBjsIMwMqklRSWb2Lk/VEYTM9+HYUXCPgpvTzUz1XuJN5IP6YeXMoEIIMpOUVFz3Qxg3bZqPXSZ29gxAg+KdS3DhtK4zgmeFZTBeT+81Lld6rDy1q68Y3kwyFKmiLSsg0XWRStK5tai7J0h7snN+/EM4SkORjOGBlEnsUPQSYfeRnAUXy2f5Ykxw+OnaWgsrF+UtGYrWAg4+SLo0ocw===rzzL-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 8 Mar 12:33 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
875zffcc87.fsf@gnu.org
Hi,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (7 lines)>>> It turns out that most of the time is spent in printing and texinfo>>> rendering of the search results.>> Also, when we put all package metadata into the Xapian index, we don't> have to look up any of the package variables in (gnu packages *) during> `guix search` time. This also contributes substantially to the speedup.
Yup.
Toggle quote (6 lines)>> In general, pre-rendering doesn’t seem practical to me: the output of>> ‘guix search’ is locale-dependent (it speaks the user’s language) and>> Note that we already need to index package synopses and descriptions in> all languages. I still haven't implemented this, though.
Oh, right. Tricky!
Toggle quote (9 lines)>> adjusts to the terminal width (well, this is temporarily broken on>> Guile 3.0.0, but see ‘%text-width’ in (guix ui)).>> This could be accomplished even with pre-rendering. Xapian provides> "slots" to store arbitrary strings with a document. Instead of storing> the pre-rendered document as a whole, we could store pre-rendered fields> in separate slots. Then, during `guix search` time, we can assemble the> result from these pre-rendered fields.
I’m not sure I understand. The index wouldn’t store pre-renderedstrings for every possible terminal width, right?
Toggle quote (9 lines)>> Also, if the 12K+ descriptions need to be rendered at the time the user>> runs ‘guix pull’, the experience may not be great, because it could take>> a bit of time.>> This is a problem, but I would see it as a necessary "compilation"> step. :-P In fact, this whole patchset speeds up `guix search` by doing> part of the work of `guix search` ahead of time. So, some such cost is> unavoidable.
Yeah. I think we need to take the whole user experience into account,not just ‘guix search’. ‘guix pull’ already feels very slow, and it’s afairly common operation. Conversely, ‘guix search’ takes roughlybetween 0.5 and 2 seconds and is an uncommon operation on a “slow path”(in the sense that when you’re searching for software, you’ll probablyhave to spend more than a couple of seconds to find what you’re lookingfor.)
Toggle quote (12 lines)>> What I like about the recutils format in this context is that it’s both>> human- and machine-readable. The examples in the manual show how it can>> be useful to select the information displayed or to refine the search>> (info "(guix) Invoking guix package").>> Xapian's query language is much more natural (as in natural language)> than the regexp based techniques we need to use with recutils. I have> hardly ever used the regexp based search and I suspect many others> haven't either. Also, refining the search query should be easier to do> with Xapian. We could even use Xapian's query expansion feature to> suggest improved queries to the user.
I’m not sufficiently familiar with Xapian’s query language. Theexamples I had in mind were:
guix search malloc | recsel -p name,version,relevance guix search | recsel -p name -e 'license ~ "LGPL 3"' guix search crypto library | \ recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis
It’s not so much about regexps than it is about selecting individualfields.
Toggle quote (7 lines)>> Were you able to measure the cost of rendering specifically?>> generate-package-search-index takes around 50 seconds. If I modify> generate-package-search-index to not pre-render but simply store the> package description alone, it takes around 20 seconds. That gives us a> rough idea of the cost of pre-rendering.
To me, adding 20–50 seconds on ‘guix pull’ would be undesirable. :-/
Toggle quote (7 lines)>> I think we should look at a profile of ‘package->recutils’, there’s>> probably room for improvement there.>> On quick inspection, most of the time in package->recutils is spent in> texinfo rendering the description. Unless we use the simplified search> results format as discussed above, we cannot avoid it.
What I meant was that we could use (statprof) to see whether/how Texinforendering/parsing can be optimized.
Thanks,Ludo’.
Z
Z
zimoun wrote on 8 Mar 21:27 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ0tDs9s2Ju+0oKX2uf-JuRdFHiSXSpQonfF1wrmJYKvaQ@mail.gmail.com
Hi Arun,
Thank you for that.I will be probably far from keyboard for a couple of days for medicalreasons* and I will not be able to look into this second set patchessoon.
Cheers,simon
*nothing to worry, even if I am currently typing that in an hospital,just the bad consequence of sports. ;-)
A
A
Arun Isaac wrote on 8 Mar 21:27 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
cu7imjed22r.fsf@systemreboot.net
Toggle quote (9 lines)>> This could be accomplished even with pre-rendering. Xapian provides>> "slots" to store arbitrary strings with a document. Instead of storing>> the pre-rendered document as a whole, we could store pre-rendered fields>> in separate slots. Then, during `guix search` time, we can assemble the>> result from these pre-rendered fields.>> I’m not sure I understand. The index wouldn’t store pre-rendered> strings for every possible terminal width, right?
No, it wouldn't. It would store a partially pre-rendered string, that iswithout fill-paragraph. We run fill-paragraph at `guix search` time tocomplete the rendering.
Toggle quote (8 lines)> I think we need to take the whole user experience into account, not> just ‘guix search’. ‘guix pull’ already feels very slow, and it’s a> fairly common operation. Conversely, ‘guix search’ takes roughly> between 0.5 and 2 seconds and is an uncommon operation on a “slow> path” (in the sense that when you’re searching for software, you’ll> probably have to spend more than a couple of seconds to find what> you’re looking for.)
I agree we can't compromise too much on `guix pull` performance.
Toggle quote (2 lines)> To me, adding 20–50 seconds on ‘guix pull’ would be undesirable. :-/
Maybe I'm missing something here. guix pull takes around 40 minutes onmy machine. In comparison to that, is another 20-50 seconds (roughly 1minute) a big deal? How much time would it be acceptable to spend onbuilding the Xapian index?
Also, is it possible to somehow provide substitutes for the Xapian indexso that the user does not have to actually build it locally during `guixpull` time?
Toggle quote (5 lines)> I’m not sufficiently familiar with Xapian’s query language. The> examples I had in mind were:> It’s not so much about regexps than it is about selecting individual> fields.
I have totally not tested this, but I imagine that equivalent Xapianqueries might look something like:
Toggle quote (2 lines)> guix search | recsel -p name -e 'license ~ "LGPL 3"'
guix search license:LGPL3
Toggle quote (3 lines)> guix search crypto library | \> recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis
guix search crypto library AND (NOT ghc) AND (NOT perl) AND (NOT python)AND (NOT ruby)
Toggle quote (3 lines)> What I meant was that we could use (statprof) to see whether/how Texinfo> rendering/parsing can be optimized.
Oh, ok. I'll try this if we decide not to pre-render.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5lVTwACgkQLiXui2GAK7MufAf+PuuYkEHruEk5UrGI/8ofRfpKs0DL5GUdhc09xLrYI7GnBu3IkOB/2ockUfNDd1Vkgs0UKPs6rbkxer/IJ8Vf/Eat5B7r+RlDoUxByICgj/IRhgBYz42IUrfSoZuM5XxeBS158WvNvQcFfYgrXrkUl/WpxyoWZP64xEZBhgSEJYbV783Mf6trYiOYTLwuRi4vzZQru1RAHu8L+GoAUE5SGTf6jLb5xPZ5cYdNXwlRq2zSZcm8+UWv7RCRv9MVlydIsOtb++TSI/n/xwCL9eeSl8Sz5VTphVpHrZ74VKKakI0DEiubg3bTzxR9WD9peO5+8BTwuI3kI/hmmKDx3vmQQw===XhPo-----END PGP SIGNATURE-----
A
A
Arun Isaac wrote on 8 Mar 21:40 +0100
(name . zimoun)(address . zimon.toutoune@gmail.com)
cu7fteid1gm.fsf@systemreboot.net
zimoun <zimon.toutoune@gmail.com> writes:
Toggle quote (4 lines)> I will be probably far from keyboard for a couple of days for medical> reasons* and I will not be able to look into this second set patches> soon.
No problem, take care! :-)
Toggle quote (2 lines)> *nothing to worry, even if I am currently typing that in an hospital,> just the bad consequence of sports. ;-)
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5lWFkACgkQLiXui2GAK7MigQgAk+jg+2ta+gYnG+Iubpd5GFdoOT0ndCBqtHn3gULZTOypW8qqc5HHz5K63WXzL5IAdW0bhpuvGQIsmqUXMKAKqiIPzOsjFfexRllAvXGmqUhORnCd7RG5w+TAdVOPhrB2/xOIi3Hs9RJwGWruf3715QOZ7qUCSYpuf/gBj0/htTIiMBDqjxk96RoS/GnYTB8IpLXM5gFU1JONvlAtEMUZQ0UHjHrEqTdiGNvg7PwygIB9nCu4A0ey7eoYy6a6s8eXcEyUdZ6M/WOFfc74HQBTrvqlr7FBSSyiR+LZQPrD6L4XNv8ohyRuVIG0yMzQnApc4RckPIK5HYlkQlPsYZmhRg===k6ET-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 9 Mar 08:42 +0100
87a74qvusm.fsf@ambrevar.xyz
Arun Isaac <arunisaac@systemreboot.net> writes:
Toggle quote (18 lines)>> I’m not sufficiently familiar with Xapian’s query language. The>> examples I had in mind were:>> It’s not so much about regexps than it is about selecting individual>> fields.>> I have totally not tested this, but I imagine that equivalent Xapian> queries might look something like:>>> guix search | recsel -p name -e 'license ~ "LGPL 3"'>> guix search license:LGPL3>>> guix search crypto library | \>> recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis>> guix search crypto library AND (NOT ghc) AND (NOT perl) AND (NOT python)> AND (NOT ruby)
Indeed, if you look at the notmuch-search-terms man page, you'll seethat you can select fields.In my opinion, the recsel format is fully superseded by Xapian.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl5l81kACgkQm9z0l6S7zH8T8Af9HItQX5qR3TAF/dqk4XxKqjZFl+hq17yIpC6ZTIa12RA8GnKhDhLgCxi9IyCbRxbCmeqLSRyBictizH/pnnRwKVNoZdgDtLvxiL6AO750AWy7I/Ly4j4KJ+/Y9LLK9U6eOXiCCtggfutIxyYApeQYUvLjzvcjtyNooUfXzEa6Od962/BZaOAfveti07W/IGCi1PIPIZJTOTSF0kbiiqvs91Phty+a1f1Me4ZY5PuJxhETDJFUtkZ8j+zj0oym8CKCbWLAgIsb0v/8Vv/pWavTWJqxxjALLa3mqVwn1JHeufbLN6YRf1bIOrO56AWkufRWzP+KxdkpwDUMq8zklWkSpg===z+V4-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 9 Mar 08:50 +0100
877dzuvues.fsf@ambrevar.xyz
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (8 lines)> Yeah. I think we need to take the whole user experience into account,> not just ‘guix search’. ‘guix pull’ already feels very slow, and it’s a> fairly common operation. Conversely, ‘guix search’ takes roughly> between 0.5 and 2 seconds and is an uncommon operation on a “slow path”> (in the sense that when you’re searching for software, you’ll probably> have to spend more than a couple of seconds to find what you’re looking> for.)
I think I disagree with "guix search" being an uncommon operation and aslow path.
- The slowness of `guix search' (and the awkwardness of recutils) is maybe what makes it uncommon: users refrain from using it because it's too impractical.
- Searches are typically refined, i.e. you run a search multiple times by precising the terms, so in that sense I believe `guix search` is a very common operation. Or should be.
Anyways, one of the key issues here is the inherent limitation of theshell interface that does not allow us to directly and contextuallyprocess the output of a command (at least not without rerunning it).
This issue can only be tackled with a GUI: there the user would be ableto interactively act with the result of the search, without having tore-run the search.
Concretely, the GUI search would only return the package name, versionand synopses. No need for the Texinfo / recutils juggling.
Then the user would select the packages of interest to display moredetails. This allows us to query the full details just-in-time.


Back to the topic: I believe that Xapian is a huge win both for theshell and the future GUI :)
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl5l9UsACgkQm9z0l6S7zH8LzQf/W/eTpJMUphwkd4KFmmhNeUQeXk4K07i+1b2zm2BipqYVN2DOGMdFfAzOMiVg/G0fNJQ/AMAr7mSjJnoSP0OPQ3o25/FBu2O5wLcw/8CNRUOL85tfEq/4iBlSvm+J33eiJZBGXDzRTxXV2wY/tE0pL7Jc98vy/2d5yEBP2yc87gr2WgEUO9AwqyV9xSCmu+/ZNI381zT3hWGfR2FkctLpXHnWt6roHcIiIbuFiVxhLkQ8/VYcxAkB+jNKKJEfohyQoXPuefeWCHQLqTuk5XakAtXpVV7NChq2jf3Yl6676A53VBddGSj7ctoocyUL78LbDWheHbHvT/8n4aXynYt1bQ===M5Cf-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 9 Mar 11:28 +0100
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87blp54yag.fsf@gnu.org
Hello,
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (13 lines)> Ludovic Courtès <ludo@gnu.org> writes:>>> Yeah. I think we need to take the whole user experience into account,>> not just ‘guix search’. ‘guix pull’ already feels very slow, and it’s a>> fairly common operation. Conversely, ‘guix search’ takes roughly>> between 0.5 and 2 seconds and is an uncommon operation on a “slow path”>> (in the sense that when you’re searching for software, you’ll probably>> have to spend more than a couple of seconds to find what you’re looking>> for.)>> I think I disagree with "guix search" being an uncommon operation and a> slow path.
(Not “and” but “on” a slow path.)
Toggle quote (4 lines)> - The slowness of `guix search' (and the awkwardness of recutils) is> maybe what makes it uncommon: users refrain from using it because it's> too impractical.
I think “slowness” and “awkwardness” are overstatements. I’m not sayingthis is perfect, but to me it’s not bad. (Of course I’m biased :-), butI’ve used other similar tools and this one looks rather good compared towhat I’ve used.)
Toggle quote (8 lines)> - Searches are typically refined, i.e. you run a search multiple times> by precising the terms, so in that sense I believe `guix search` is a> very common operation. Or should be.>> Anyways, one of the key issues here is the inherent limitation of the> shell interface that does not allow us to directly and contextually> process the output of a command (at least not without rerunning it).
I agree, but ‘guix search’ is a shell command, so we have to adapt tothat context.
Toggle quote (6 lines)> Concretely, the GUI search would only return the package name, version> and synopses. No need for the Texinfo / recutils juggling.>> Then the user would select the packages of interest to display more> details. This allows us to query the full details just-in-time.
Note that Emacs-Guix does that, although it doesn’t use the searchfacility of (guix ui) with relevance metrics.
Toggle quote (3 lines)> Back to the topic: I believe that Xapian is a huge win both for the> shell and the future GUI :)
It could be, but we need to consider all the aspects of the story,including the maintenance cost and overhead moved to ‘guix pull’. Soit’s not so much about “beliefs” at this point, but rather aboutdemonstrating what can be done, and I’m glad Arun is exploring thatspace!
Ludo’.
L
L
Ludovic Courtès wrote on 9 Mar 11:35 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87r1y13jew.fsf@gnu.org
Hello!
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (13 lines)>>> This could be accomplished even with pre-rendering. Xapian provides>>> "slots" to store arbitrary strings with a document. Instead of storing>>> the pre-rendered document as a whole, we could store pre-rendered fields>>> in separate slots. Then, during `guix search` time, we can assemble the>>> result from these pre-rendered fields.>>>> I’m not sure I understand. The index wouldn’t store pre-rendered>> strings for every possible terminal width, right?>> No, it wouldn't. It would store a partially pre-rendered string, that is> without fill-paragraph. We run fill-paragraph at `guix search` time to> complete the rendering.
Note that Texinfo rendering doesn’t use (@ (guix ui) fill-paragraph).It has its own paragraph-filling code. We cannot use ‘fill-paragraph’after Texinfo rendering anyway, since Texinfo knows where things can befilled and where they cannot—e.g., @example.
Toggle quote (17 lines)>> I think we need to take the whole user experience into account, not>> just ‘guix search’. ‘guix pull’ already feels very slow, and it’s a>> fairly common operation. Conversely, ‘guix search’ takes roughly>> between 0.5 and 2 seconds and is an uncommon operation on a “slow>> path” (in the sense that when you’re searching for software, you’ll>> probably have to spend more than a couple of seconds to find what>> you’re looking for.)>> I agree we can't compromise too much on `guix pull` performance.>>> To me, adding 20–50 seconds on ‘guix pull’ would be undesirable. :-/>> Maybe I'm missing something here. guix pull takes around 40 minutes on> my machine. In comparison to that, is another 20-50 seconds (roughly 1> minute) a big deal? How much time would it be acceptable to spend on> building the Xapian index?
On my laptop, in the best case, when all the substitutes are available(not uncommon), it takes 2 minutes. Sometimes, when some substitutesare missing, it takes 15 minutes.
So of course, the 20–50 seconds matter only in the best case. But theymatter primarily because that index build may not be substitutable: it’spossibly unique to each profile (see below). That means we know we’reoften going to pay for it.
Toggle quote (4 lines)> Also, is it possible to somehow provide substitutes for the Xapian index> so that the user does not have to actually build it locally during `guix> pull` time?
We could provide a substitute for users who use only the official 'guixchannel. However, as soon as users combine multiple channels, they’llhave to build the index locally.
Toggle quote (12 lines)>> I’m not sufficiently familiar with Xapian’s query language. The>> examples I had in mind were:>> It’s not so much about regexps than it is about selecting individual>> fields.>> I have totally not tested this, but I imagine that equivalent Xapian> queries might look something like:>>> guix search | recsel -p name -e 'license ~ "LGPL 3"'>> guix search license:LGPL3
Nice.
Toggle quote (6 lines)>> guix search crypto library | \>> recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis>> guix search crypto library AND (NOT ghc) AND (NOT perl) AND (NOT python)> AND (NOT ruby)
This one is not quite equivalent I guess, but yeah. :-)
Toggle quote (5 lines)>> What I meant was that we could use (statprof) to see whether/how Texinfo>> rendering/parsing can be optimized.>> Oh, ok. I'll try this if we decide not to pre-render.
It’d be beneficial anyways.
Thank you!
Ludo’.
Z
Z
zimoun wrote on 9 Mar 13:28 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ1C-9EKjgX=2B5t9wz1_wnXwuLeHAXBQocEEsHeXMv8GA@mail.gmail.com
Hi,
On Sat, 7 Mar 2020 at 14:31, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (17 lines)> --8<---------------cut here---------------start------------->8---> With a warm cache,> $ time guix search inkscape>> real 0m1.787s> user 0m1.745s> sys 0m0.111s> --8<---------------cut here---------------end--------------->8--->> --8<---------------cut here---------------start------------->8---> $ time /tmp/test/bin/guix search inkscape>> real 0m0.199s> user 0m0.182s> sys 0m0.024s> --8<---------------cut here---------------end--------------->8---
IMHO, it is interesting to compare the list of results and the orderof the both query; as i did with Emacs.Speed is one thing, the initial motivation. But accuracy is maybe moreimportant.

Toggle quote (4 lines)> - The package cache would grow in size, and lookup would be slowed down> because we need to load the entire cache into memory. Xapian, on the other> hand, need only look up the specific packages that match the search query.
I agree that 'fold-packages' could become soon a bottleneck.
IMHO, 'mset-fold' should be a drop-in replacement of 'fold-package' inthe search function.

Toggle quote (6 lines)> - Xapian can provide superior search results due to it stemming and language> models.> - Xapian can provide spelling correction and query expansion -- that is,> suggest search terms to improve search results. Note that I haven't> implemented this yet and is out of scope in this patchset.
I agree too that Xapian should improve the user experience when searching.

Toggle quote (15 lines)> * Simplify our package search results>> Why not use a simpler package search results format like Arch Linux or Debian> does? We could just display the package name, version and synopsis like so.>> inkscape 0.92.4> Vector graphics editor> inklingreader 0.8> Wacom Inkling sketch format conversion and manipulation2>> Why do we need the entire recutils format? If the user is interested, they can> always use `guix package --show` to get the full recutils formatted> info. Having shorter search results will make everything even faster and much> more readable. WDYT?
I disagree.
What I proposed some time ago was to have different flavour of theouput of search; as e.g., 'git log --pretty=oneline' etc..For example by default, it should be what you suggest. Then "guixsearch --format=full" should output the current. And we could imaginemimick the Git log strategy: "guix search --format="%name%version\n%license" etc.
WDYT?


Toggle quote (8 lines)> > Is (make-stem "en") for the locale?>> I still have English hard-coded. I haven't yet figured out how to detect the> locale and stem accordingly. But, there is a larger problem. Since we cannot> anticipate what locale the user will run guix search with, should we build the> Xapian index for all locales? That is, should we index not only the English> versions of the packages but also all other translations as well?
I understand. Let consider that for the next round.

Toggle quote (6 lines)> > package-search-index and package-cache-file could be refactored> > because they share all the same code.>> Yes, they could be. However, I'll postpone to the next iteration of the> patchset.
Ok.

Toggle quote (8 lines)> > I do not know what is the convention for the bindings.> > But there is 'fold-packages' so I would be inclined to 'fold-msets' or> > something in this flavour.>> Well, everywhere else in guile we have such things as vhash-fold, string-fold,> hash-fold, stream-fold, etc. That's why I went with mset-fold. Also, we are> folding over a single mset (match-set). So, mset should be in the singular.
I understand.

Toggle quote (5 lines)> > And more importantly, 'make as-derivations' to avoid a "guix pull" breakage,> > Ah do not forget to adapt some tests.>> Will do this once we have consensus about the other features of this patchset.
And we should test that on different machines and states.


Toggle quote (6 lines)> > Xapian does not return the package 'emacs' itself as the first. And worse,> > it is not returned at all.>> In this patchset, since we're indexing the package name as well, emacs is> returned but it is still far from the beginning.
This is an issue.
IMHO, it is because of the BM25 score. It is too rough and some weightshould be applied. But that another story.The fix is: a- provide a scoring function to Xapian as the doc explains b- adapt 'fold-package' to 'mset-fold' in'find-packages-by-description' and implement our version of BM25 thenuse it in 'relevance'

Toggle quote (7 lines)> > I propose the value of 4294967295 for pagesize.>> In this patchset, I pass (database-document-count db) as the #:maximum-items> keyword argument to enquire-mset. This is the upstream recommended way to get> all search results. I hadn't done this earlier since I hadn't yet wrapped> database-document-count in guile-xapian.
Cool!


Toggle quote (3 lines)> My laptop is quite old with a particularly slow HDD. Hence my motivation to> improve guix search performance!
I agree.But performance is not all. Accuracy counts more! :-)

Toggle quote (6 lines)> > I think we should weigh the pros and cons on all these aspects: speed,> > complexity and maintenance cost, search result quality, search features,> > etc.>> I agree.
I agree too.We should write a benchmark. For example, using Emacs as query or morecomplex we could think of.

All the best,simon
Z
Z
zimoun wrote on 9 Mar 13:34 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ3aHikno_vxt=Ox8yzQs4T7Dmx815HQiPX1_0Xt7TBZhA@mail.gmail.com
Hi,
On Sat, 7 Mar 2020 at 21:33, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (22 lines)> Arun Isaac <arunisaac@systemreboot.net> skribis:
> > Why not use a simpler package search results format like Arch Linux or Debian> > does? We could just display the package name, version and synopsis like so.> >> > inkscape 0.92.4> > Vector graphics editor> > inklingreader 0.8> > Wacom Inkling sketch format conversion and manipulation> >> > Why do we need the entire recutils format? If the user is interested, they can> > always use `guix package --show` to get the full recutils formatted> > info. Having shorter search results will make everything even faster and much> > more readable. WDYT?>> What I like about the recutils format in this context is that it’s both> human- and machine-readable. The examples in the manual show how it can> be useful to select the information displayed or to refine the search> (info "(guix) Invoking guix package").>> Also: I’d recommend tackling one thing at a time. :-)
I agree with Ludo.
And IMHO, we should add "guix search --format=<options>" mimicking how"git log" works.By default, displays as Arun proposes. Using '--format=full" as it isdone now by default.And we could imagine "--format=%name \t %version \n %description" etc.


Toggle quote (3 lines)> I think we should look at a profile of ‘package->recutils’, there’s> probably room for improvement there.
Interesting. Note that speed was the initial motivation but accuracyis another important one. As we discussed earlier when I showed anexample with TF-IDF. And Xapian implemets the state-of-art (BM25) forscoring.

All the best,simon
Z
Z
zimoun wrote on 9 Mar 13:40 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ2pxP6ALANTxtEpp9Gnaejj69D5ULficdmmH0CfROSLiQ@mail.gmail.com
On Sun, 8 Mar 2020 at 10:02, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (7 lines)> >> It turns out that most of the time is spent in printing and texinfo> >> rendering of the search results.>> Also, when we put all package metadata into the Xapian index, we don't> have to look up any of the package variables in (gnu packages *) during> `guix search` time. This also contributes substantially to the speedup.
Yes, magic power of inverted index. ;-)


Toggle quote (9 lines)> > Also, if the 12K+ descriptions need to be rendered at the time the user> > runs ‘guix pull’, the experience may not be great, because it could take> > a bit of time.>> This is a problem, but I would see it as a necessary "compilation"> step. :-P In fact, this whole patchset speeds up `guix search` by doing> part of the work of `guix search` ahead of time. So, some such cost is> unavoidable.
Currently "guix pull" is rather long on my machine. I would accept acouple of seconds more (even minutes).So this compilation step could be done at the "guix pull" time.Or even we could imagine something indexing in the background.

Toggle quote (28 lines)> > What I like about the recutils format in this context is that it’s both> > human- and machine-readable. The examples in the manual show how it can> > be useful to select the information displayed or to refine the search> > (info "(guix) Invoking guix package").>> Xapian's query language is much more natural (as in natural language)> than the regexp based techniques we need to use with recutils. I have> hardly ever used the regexp based search and I suspect many others> haven't either. Also, refining the search query should be easier to do> with Xapian. We could even use Xapian's query expansion feature to> suggest improved queries to the user.>> That said, if we want the recutils format, we can still keep it in a> simplified form like so.>> name: inkscape> version: 0.92.4> synopsis: Vector graphics editor>> name: inklingreader> version: 0.8> synopsis: Wacom Inkling skecth format conversion and manipulation>> > Also: I’d recommend tackling one thing at a time. :-)>> I totally agree, but I'm tempted to say that pre-rendering would be a> lot cheaper with the simplified form of search results. :-)
IMHO, we "just" need to propose different outputs mimicking "git log--format". Soemthing like "guix search --format=".
What do you think?


All the best,simon
Z
Z
zimoun wrote on 9 Mar 13:47 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ3ebRtuZa=zRRQvKbwESvdwcAJ+B-QTNMF=WwAwi3Z8fA@mail.gmail.com
On Sun, 8 Mar 2020 at 12:33, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (15 lines)> Arun Isaac <arunisaac@systemreboot.net> skribis:
> > This is a problem, but I would see it as a necessary "compilation"> > step. :-P In fact, this whole patchset speeds up `guix search` by doing> > part of the work of `guix search` ahead of time. So, some such cost is> > unavoidable.>> Yeah. I think we need to take the whole user experience into account,> not just ‘guix search’. ‘guix pull’ already feels very slow, and it’s a> fairly common operation. Conversely, ‘guix search’ takes roughly> between 0.5 and 2 seconds and is an uncommon operation on a “slow path”> (in the sense that when you’re searching for software, you’ll probably> have to spend more than a couple of seconds to find what you’re looking> for.)
We could imagine something doing the job of indexing in thebackground; using the daemon or whatever.

Toggle quote (20 lines)> >> What I like about the recutils format in this context is that it’s both> >> human- and machine-readable. The examples in the manual show how it can> >> be useful to select the information displayed or to refine the search> >> (info "(guix) Invoking guix package").> >> > Xapian's query language is much more natural (as in natural language)> > than the regexp based techniques we need to use with recutils. I have> > hardly ever used the regexp based search and I suspect many others> > haven't either. Also, refining the search query should be easier to do> > with Xapian. We could even use Xapian's query expansion feature to> > suggest improved queries to the user.>> I’m not sufficiently familiar with Xapian’s query language. The> examples I had in mind were:>> guix search malloc | recsel -p name,version,relevance> guix search | recsel -p name -e 'license ~ "LGPL 3"'> guix search crypto library | \> recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis
I think these examples are good ones to benchmark the different approaches.Because the speed is one thing, the accuracy is another one.
Let cut the "slow path" by providing a better experience when searching. ;-)

Toggle quote (3 lines)> It’s not so much about regexps than it is about selecting individual> fields.
The regexp should be provided directly to "guix search" actually and'recsel' is only a "filter" allowing to deal differently with thefields.


Toggle quote (2 lines)> To me, adding 20–50 seconds on ‘guix pull’ would be undesirable. :-/
Ok, at least it is clear. :-)And computing in the background?

All the best,simon
Z
Z
zimoun wrote on 9 Mar 13:50 +0100
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
CAJ3okZ3ozYEhajKb3=CKpVu5AUsFTd-BiD3+zHf28yjy6LUhBw@mail.gmail.com
On Mon, 9 Mar 2020 at 08:42, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
Toggle quote (25 lines)>> Arun Isaac <arunisaac@systemreboot.net> writes:>> >> I’m not sufficiently familiar with Xapian’s query language. The> >> examples I had in mind were:> >> It’s not so much about regexps than it is about selecting individual> >> fields.> >> > I have totally not tested this, but I imagine that equivalent Xapian> > queries might look something like:> >> >> guix search | recsel -p name -e 'license ~ "LGPL 3"'> >> > guix search license:LGPL3> >> >> guix search crypto library | \> >> recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis> >> > guix search crypto library AND (NOT ghc) AND (NOT perl) AND (NOT python)> > AND (NOT ruby)>> Indeed, if you look at the notmuch-search-terms man page, you'll see> that you can select fields.> In my opinion, the recsel format is fully superseded by Xapian.
No!Because implementing the "fields" using Xapian is not done and it isnot as straightforward as it seems.For sure, Xapian could do a lot of thing. But we should move one stepafter one step.
Let first focus on speed and accuracy. For example, the fact that"guix search emacs" does not returns first the package 'emacs' usingXapian is really an issue.
Cheers,simon
Z
Z
zimoun wrote on 9 Mar 13:53 +0100
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
CAJ3okZ2a=+04ZoEnG=1DrzpQfVJsLkXyd4XqCKPaaeY1kbpoKA@mail.gmail.com
On Mon, 9 Mar 2020 at 08:50, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
Toggle quote (3 lines)> Back to the topic: I believe that Xapian is a huge win both for the> shell and the future GUI :)
I agree.The big win is to test the strategy of the inverted index strategy andin the same time the state-of-art of scoring (relevance).
Z
Z
zimoun wrote on 9 Mar 14:03 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ2r5WDEF7jnaFSbg4Lznt_kjo6XtUA-2oEe74RztmN1ZQ@mail.gmail.com
On Mon, 9 Mar 2020 at 11:29, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (9 lines)> > Back to the topic: I believe that Xapian is a huge win both for the> > shell and the future GUI :)>> It could be, but we need to consider all the aspects of the story,> including the maintenance cost and overhead moved to ‘guix pull’. So> it’s not so much about “beliefs” at this point, but rather about> demonstrating what can be done, and I’m glad Arun is exploring that> space!
I agree.What is currently tested with Xapian is: 1- speeding up (or not) using an inverted index 2- the accuracy using the state-of-art of information retrieval (BM25)
About 1- I do not have a strong opinion; even if I find "guix search"terribly slow as I mentioned earlier (one year ago ;-)).
About 2- as I mentioned earlier, the 'relevance' function could beimproved. Currently, the score is computed only considering thepackage itself and not the other packages (the words they use, theirnumber etc.). BM25 is the state-of-art using what I tried to explainedsome time ago when I showed for example TF-IDF. The question is sowhat the best move to improve the accuracy. And the improvementnecessarily uses a global index (of terms, at least). But on the otherhand, the improvement should not pay off because it would addcomplexity and burden, more than the improvement itself.
Without testing, we cannot say. Thank you Arun for pushing forward.

All the best,simon
Z
Z
zimoun wrote on 9 Mar 19:14 +0100
Re: [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ05nva16fAAjhUeY=-Fd+RT-F118UXG-PmiuFTEDHNmGw@mail.gmail.com
On Sat, 7 Mar 2020 at 14:31, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (20 lines)> diff --git a/build-aux/build-self.scm b/build-aux/build-self.scm> index f2e785b7f1..05d0353ccf 100644> --- a/build-aux/build-self.scm> +++ b/build-aux/build-self.scm> @@ -1,5 +1,6 @@> ;;; GNU Guix --- Functional package management for GNU> ;;; Copyright © 2014, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>> ;;;> ;;; This file is part of GNU Guix.> ;;;> @@ -261,6 +262,10 @@ interface (FFI) of Guile.")> #~(define-module (gcrypt hash)> #:export (sha1 sha256))))>> + (define fake-xapian-hash> + ;; Fake (xapian xapian) module; see below.> + (scheme-file "xapian.scm" #~(define-module (xapian xapian))))> +
Why 'fake-xapian-hash' and not simply 'fake-xapian'?
Z
Z
zimoun wrote on 9 Mar 19:19 +0100
Re: [PATCH v2 2/3] gnu: Generate Xapian package search index.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ1n=RTexsb5vCCsXdnBO0KEy6Hm0BuSXh1yKsL2GnHTJA@mail.gmail.com
On Sat, 7 Mar 2020 at 14:31, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (5 lines)> diff --git a/gnu/packages.scm b/gnu/packages.scm> index d22c992bb1..c8e221de68 100644> --- a/gnu/packages.scm> +++ b/gnu/packages.scm
[...]
Toggle quote (41 lines)> @@ -426,6 +429,43 @@ reducing the memory footprint."> #:opts '(#:to-file? #t)))))> cache-file)>> +(define %package-search-index> + ;; Location of the package search-index> + "/lib/guix/package-search.index")> +> +(define (generate-package-search-index directory)> + "Generate under DIRECTORY a Xapian index of all the available packages."> + (define db-path> + (string-append directory %package-search-index))> +> + (mkdir-p (dirname db-path))> + (call-with-writable-database db-path> + (lambda (db)> + (fold-packages (lambda (package _)> + (let* ((idterm (string-append "Q" (package-name package)))> + (doc (make-document #:data (string-trim-right> + (call-with-output-string> + (cut package->recutils package <>))> + #\newline)> + #:terms `((,idterm . 0))))> + (term-generator (make-term-generator #:stem (make-stem "en")> + #:document doc)))> + (for-each (match-lambda> + ((field . weight)> + (match (field package)> + ((? string? str)> + (index-text! term-generator str> + #:wdf-increment weight))> + ((lst ...)> + (for-each (cut index-text! term-generator <>> + #:wdf-increment weight)> + lst)))> + (replace-document! db idterm doc)))> + %package-metrics)))> + #f)))> +> + db-path)
If I understand correctly, the index is stored with a weight comingfrom '%package-metrics', right? Well, I am not convinced it is thecorrect way but I have not tried by myself yet. :-)
J
J
Jonathan Brielmaier wrote on 10 Mar 00:40 +0100
Re: [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies.
67edd6fa-b17f-7a86-2e43-a8d4e9397e3c@web.de
On 07.03.20 14:31, Arun Isaac wrote:
Toggle quote (3 lines)> * build-aux/build-self.scm (build-program): Import fake guile-xapian module.> * guix/self.scm (compiled-guix): Add guile-xapian to Guix dependencies.
Could you please trigger a release (e.g. 0.0.1) for guile-xapian? Thiswould make it more easy for distributions like openSUSE to package it.
A
A
Arun Isaac wrote on 10 Mar 06:24 +0100
(name . Jonathan Brielmaier)(address . jonathan.brielmaier@web.de)(address . 39258@debbugs.gnu.org)
cu78sk8dbp0.fsf@systemreboot.net
Jonathan Brielmaier <jonathan.brielmaier@web.de> writes:
Toggle quote (3 lines)> Could you please trigger a release (e.g. 0.0.1) for guile-xapian? This> would make it more easy for distributions like openSUSE to package it.
Done! :-) Seehttps://git.systemreboot.net/guile-xapian/snapshot/guile-xapian-0.1.0.tar.xz
But I must warn you that guile-xapian is terribly incomplete andterribly unstable. The API may change at any time without notice.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5nJIsACgkQLiXui2GAK7OwOAf+PiKPV9ZajPUC1nGaA2wvaUOaBlpFNk6vS9HGspnmX2kY7eVaf6mIjW0XMffQwA8lZalTl8psqVzC4991aQvyQVVDIH+5AeyFZZX5MeiOCcYwzu7r6LA8oHOjNCtXX2fEtUPlFn++k0+IoEpivxFHyOk8dSDe4hT8ztyD6hz2ZkBg6Ib5HE0Tl3DGB+73ENuvAJ+GIqdfbh/BwrzwNZQrtvK3pj2gOvt/APsLgz3CvwGY/SVv3YOVZueES1cpLdPkO2+9KcCgICgSZClTI2SfSe+yVN2HCEu5923FvyiVdc8A0B4gDmfNp+yLuaZm0UiuvrC/2Ch7Ehr1MIBe/leXLQ===uSoh-----END PGP SIGNATURE-----
A
A
Arun Isaac wrote on 10 Mar 15:17 +0100
Re: [PATCH v2 0/3] Xapian for Guix package search
(name . Ludovic Courtès)(address . ludo@gnu.org)
cu736agcmzu.fsf@systemreboot.net
Toggle quote (5 lines)> Note that Texinfo rendering doesn’t use (@ (guix ui) fill-paragraph).> It has its own paragraph-filling code. We cannot use ‘fill-paragraph’> after Texinfo rendering anyway, since Texinfo knows where things can be> filled and where they cannot—e.g., @example.
True, I did not think of this.
Toggle quote (4 lines)> We could provide a substitute for users who use only the official 'guix> channel. However, as soon as users combine multiple channels, they’ll> have to build the index locally.
We could build a separate Xapian database for each channel. Xapian doessupport searching across multiple databases at once and will handlemerging the results together appropriately. If I understand correctly,this means we can provide substitutes for at least the official guixchannel and let the user build the index locally for other channels. Isthat correct?
Also, could someone please build the patchset v2 on their machine andmeasure the time taken by generate-package-search-index? My laptop,particularly my HDD is slow even as far as HDDs go. So, my figure of20-50 seconds may not be representative.
Thanks,Arun.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5noZUACgkQLiXui2GAK7Nw9gf/b/wKWyQ4b6s7K0/oUeiQOOY54Mg45SpV58vTQs3/n97iLOmhEDsRUoJLQjSpl/BwbpeyjQTeAToBom6UT2QW0rWQ/x4DOlCeFnBlL6LFhGvkNEptHFb2ag7GOpgdO+ljsgO5BkFlU31HvHbe7wLZ7KWSmREXWOw3enGymQD6e/VdDUZz71AwMx/c95SCGtEY2wKUc0ZceBsgPr7+LyHCQ3h3ga+SX/G9HFyJ5/rU2+7ytzjri6oXaDXJvWycwBmvutrNZASSuZ/XZaLy7nnK9HWpYABPiiKUTsYei62WtVyiTvzDkFpI0kkYG8Qn7VU6ZJ95cZb4lY60ETYbgdsnnw===fOJU-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 10 Mar 15:33 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
CAJ3okZ0aK7+oCaOEdXReC0Br=A0LSab1yttp8VGOyn_FkDcr_g@mail.gmail.com
Hi Arun,
On Tue, 10 Mar 2020 at 15:18, Arun Isaac <arunisaac@systemreboot.net> wrote:
Toggle quote (11 lines)> > We could provide a substitute for users who use only the official 'guix> > channel. However, as soon as users combine multiple channels, they’ll> > have to build the index locally.>> We could build a separate Xapian database for each channel. Xapian does> support searching across multiple databases at once and will handle> merging the results together appropriately. If I understand correctly,> this means we can provide substitutes for at least the official guix> channel and let the user build the index locally for other channels. Is> that correct?
To complement your words, you could also imagine index all the historyas any other channels. It needs some thoughts but it seems a path thatI would to go.

Toggle quote (5 lines)> Also, could someone please build the patchset v2 on their machine and> measure the time taken by generate-package-search-index? My laptop,> particularly my HDD is slow even as far as HDDs go. So, my figure of> 20-50 seconds may not be representative.
I will do when I will be fully back. :-)

All the best,simon
L
L
Ludovic Courtès wrote on 11 Mar 14:50 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87sgifj8zo.fsf@gnu.org
Hello!
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (8 lines)>> We could provide a substitute for users who use only the official 'guix>> channel. However, as soon as users combine multiple channels, they’ll>> have to build the index locally.>> We could build a separate Xapian database for each channel. Xapian does> support searching across multiple databases at once and will handle> merging the results together appropriately.
Nice!
Toggle quote (4 lines)> If I understand correctly, this means we can provide substitutes for> at least the official guix channel and let the user build the index> locally for other channels. Is that correct?
I’m afraid not, or at least not trivially.
Currently, profile hooks such as ‘%channel-profile-hooks’, receive acomplete profile—in this case, the composition of all the channels theuser chose.
So if we want to achieve what you propose, we’d need to find another wayto hook database generation.

BTW, there’s also the problem of modules added dynamically with$GUIX_PACKAGE_PATH or ‘-L’. With the proposed scheme, it seems thatthey could no longer be searched. Is that correct?
(Conversely the package cache is optional: it’s only used when it’sconsidered authoritative, see (gnu packages). The API and behavior areexactly the same whether or not the package cache is used.)
Thanks,Ludo’.
A
A
Arun Isaac wrote on 13 Mar 06:37 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
cu7o8t0byt6.fsf@systemreboot.net
Toggle quote (7 lines)> Currently, profile hooks such as ‘%channel-profile-hooks’, receive a> complete profile—in this case, the composition of all the channels the> user chose.>> So if we want to achieve what you propose, we’d need to find another way> to hook database generation.
Hmmm. Tough luck, I suppose. Do you have suggestions for anywhere elseto hook database generation?
Toggle quote (4 lines)> BTW, there’s also the problem of modules added dynamically with> $GUIX_PACKAGE_PATH or ‘-L’. With the proposed scheme, it seems that> they could no longer be searched. Is that correct?
Unfortunately, that is correct. To address this, we discussed retainingthe current search implementation along with the new xapianimplementation. But, that changes the search query behaviour andadds a lot of complexity. I'll think of some other way out.
Toggle quote (4 lines)> (Conversely the package cache is optional: it’s only used when it’s> considered authoritative, see (gnu packages). The API and behavior are> exactly the same whether or not the package cache is used.)
Thanks,Arun
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl5rHAYACgkQLiXui2GAK7OAdgf/cYuLpIA4tY7koTlO4E5nwmfyUxAS8J1lpM6H2xaCCwKjyEWXg8SCxMk2nSoj57D1jJIZoRQpqXyxu2tVFkyF1S/L1YuBSS4MX1w9dcjsyCKHiNkr5qdV6WT/5z1t7rxdzVWBLfHUYJexfjTxepyxApu9jwAoXIpRDSHdDKhSC7s4N/vtxW8TSwaekpH+ferfN5TBRPBkIvVlrm28Q9M7IcJxVktePtMgX3/R0PNhO5p/8l4p/pKkKf2WwuwPjKuRJhYwDjyNBFqX8dBv6TGSeFNX6Lmp34wa009ucshYpvrkaXHKfEAkn9UG/FhrvGeHSotyjWXJ0bSYHg1zkowQfA===yBNx-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 15 Mar 21:40 +0100
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87imj5ic7p.fsf@gnu.org
Hi Arun,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (10 lines)>> Currently, profile hooks such as ‘%channel-profile-hooks’, receive a>> complete profile—in this case, the composition of all the channels the>> user chose.>>>> So if we want to achieve what you propose, we’d need to find another way>> to hook database generation.>> Hmmm. Tough luck, I suppose. Do you have suggestions for anywhere else> to hook database generation?
For the core database (packages that come with Guix), (guix self) couldtake care of it.
Toggle quote (9 lines)>> BTW, there’s also the problem of modules added dynamically with>> $GUIX_PACKAGE_PATH or ‘-L’. With the proposed scheme, it seems that>> they could no longer be searched. Is that correct?>> Unfortunately, that is correct. To address this, we discussed retaining> the current search implementation along with the new xapian> implementation. But, that changes the search query behaviour and> adds a lot of complexity. I'll think of some other way out.
Yeah, I think we’d want to have roughly a single implementation.
I wonder if the relevant metrics that Xapian implements, like zimounmentioned, could be available directly in Scheme in a way that allows usto compute them at run time when the pre-built cache is unavailable. Orwould that be necessarily too slow?
If so, perhaps a slightly less fancy metric could work with betterperformance?
Thanks,Ludo’.
A
A
Arun Isaac wrote on 27 Mar 17:26 +0100
[PATCH v3 0/3] Package metadata cache for guix search
(address . 39258@debbugs.gnu.org)
20200327162654.18785-1-arunisaac@systemreboot.net
Hi everyone,
This is v3 of my attempt to make guix search faster. In this version, I haveabandoned use of xapian. Instead I build a cache of the metadata of allpackages in a profile hook. Then, I use that cache to search and displaysearch results. This way, package guile modules are not loaded during guixsearch.
Speedup is around 2x. Both measurements below are with a warm cache.
Toggle snippet (7 lines)$ time guix search inkscape
real 0m1.722suser 0m1.776ssys 0m0.097s
Toggle snippet (7 lines)$ time /tmp/test/bin/guix search inkscape
real 0m0.749suser 0m0.770ssys 0m0.020s
This patchset does not affect the search API nor does it improve the relevanceof search results. If there is interest in this approach, I'll complete thispatchset properly. But, in the long run, I do think we should aim to getxapian or the like for guix search. WDYT?
Unfortunately, generate-package-metadata-cache takes 43 seconds to build thecache on my relatively slow computer. Performance should be better on otherpeople's machines.
Meanwhile, it would still be useful if someone built patchset v2 on theirmachine and reported the time it took to build the xapian index.
* How to test this patchset
Apply patches and build as usual. Do a guix pull into a temporary profile.
$ ./pre-inst-env guix pull --url=$PWD --branch=the-name-of-the-branch-you-applied-patches-to -p /tmp/test
Then, run guix search from the built profile
$ /tmp/test/bin/guix search inkscape
Thanks!
Arun Isaac (3): guix: Generate package metadata cache. guix: Search package metadata cache. guix: Use package metadata cache for package search.
gnu/packages.scm | 88 +++++++++++++++++++++++++- guix/channels.scm | 34 +++++++++- guix/packages.scm | 32 ++++++++++ guix/scripts/package.scm | 5 +- guix/ui.scm | 132 ++++++++++++++++++++++++++++++++++++--- 5 files changed, 277 insertions(+), 14 deletions(-)
-- 2.25.1
A
A
Arun Isaac wrote on 27 Mar 17:26 +0100
[PATCH v3 2/3] guix: Search package metadata cache.
(address . 39258@debbugs.gnu.org)
20200327162654.18785-3-arunisaac@systemreboot.net
* gnu/packages.scm (search-packages): New function.* guix/packages.scm (<package-metadata>): New record type.--- gnu/packages.scm | 38 ++++++++++++++++++++++++++++++++++++++ guix/packages.scm | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+)
Toggle diff (115 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex c0b527acf0..2510b1fe49 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -59,6 +59,7 @@ find-packages-by-name find-package-locations find-best-packages-by-name+ search-packages specification->package specification->package+output@@ -474,6 +475,43 @@ package modules." #:opts '(#:to-file? #t))))) cache-file) +(define (search-packages profile regexps)+ "Return a list of pairs: <package-metadata> objects corresponding to+packages whose name, synopsis, description, or output matches at least one of+REGEXPS sorted by relevance, and its non-zero relevance score."+ (define cache-file+ (string-append profile %package-metadata-cache-file))++ (define cache+ (catch 'system-error+ (lambda ()+ (map (match-lambda+ (#(name version dependencies outputs systems+ synopsis description home-page (file line column))+ (make-package-metadata+ name version dependencies outputs systems+ synopsis description home-page+ (location file line column))))+ (load-compiled cache-file)))+ (lambda args+ (if (= ENOENT (system-error-errno args))+ #f+ (apply throw args)))))++ (let ((matches+ (filter-map (lambda (package-metadata)+ (let ((score (package-relevance package-metadata regexps)))+ (and (positive? score)+ (cons package-metadata score))))+ cache)))+ (sort matches+ (lambda (m1 m2)+ (match m1+ ((package1 . score1)+ (match m2+ ((package2 . score2)+ (> score1 score2)))))))))+ (define %sigint-prompt ;; The prompt to jump to upon SIGINT.diff --git a/guix/packages.scm b/guix/packages.scmindex 70b1478c91..bb06baa1ee 100644--- a/guix/packages.scm+++ b/guix/packages.scm@@ -5,6 +5,7 @@ ;;; Copyright © 2016 Alex Kost <alezost@gmail.com> ;;; Copyright © 2017, 2019 Efraim Flashner <efraim@flashner.co.il> ;;; Copyright © 2019 Marius Bakke <mbakke@fastmail.com>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -115,6 +116,21 @@ transitive-input-references + package-metadata+ make-package-metadata+ package-metadata?+ this-package-metadata+ package-metadata-name+ package-metadata-version+ package-metadata-dependencies+ package-metadata-outputs+ package-metadata-synopsis+ package-metadata-description+ package-metadata-license+ package-metadata-home-page+ package-metadata-supported-systems+ package-metadata-location+ %supported-systems %hurd-systems %hydra-supported-systems@@ -310,6 +326,22 @@ name of its URI." package) 16))))) +(define-record-type* <package-metadata>+ package-metadata make-package-metadata+ package-metadata?+ this-package-metadata+ (name package-metadata-name)+ (version package-metadata-version)+ (dependencies package-metadata-dependencies)+ (outputs package-metadata-outputs)+ (supported-systems package-metadata-supported-systems)+ (synopsis package-metadata-synopsis)+ (description package-metadata-description)+ ;; TODO: Add license+ ;; (license package-metadata-license)+ (home-page package-metadata-home-page)+ (location package-metadata-location))+ (define (package-upstream-name package) "Return the upstream name of PACKAGE, which could be different from the name it has in Guix."-- 2.25.1
A
A
Arun Isaac wrote on 27 Mar 17:26 +0100
[PATCH v3 1/3] guix: Generate package metadata cache.
(address . 39258@debbugs.gnu.org)
20200327162654.18785-2-arunisaac@systemreboot.net
* gnu/packages.scm (%package-metadata-cache-file): New variable.(generate-package-metadata-cache): New function.* guix/channels.scm (package-metadata-cache-file): New function.(%channel-profile-hooks): Add package-metadata-cache-file.--- gnu/packages.scm | 50 ++++++++++++++++++++++++++++++++++++++++++++++- guix/channels.scm | 34 +++++++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 2 deletions(-)
Toggle diff (131 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex d22c992bb1..c0b527acf0 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -4,6 +4,7 @@ ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2016, 2017 Alex Kost <alezost@gmail.com> ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -64,7 +65,8 @@ specification->location specifications->manifest - generate-package-cache))+ generate-package-cache+ generate-package-metadata-cache)) ;;; Commentary: ;;;@@ -426,6 +428,52 @@ reducing the memory footprint." #:opts '(#:to-file? #t))))) cache-file) +(define %package-metadata-cache-file+ ;; Location of the package metadata cache.+ "/lib/guix/package-metadata.cache")++(define (generate-package-metadata-cache directory)+ "Generate under DIRECTORY a cache of the metadata of all available packages.++The primary purpose of this cache is to speed up package metadata lookup+during package search so that we don't have to traverse and load all the+package modules."+ (define cache-file+ (string-append directory %package-metadata-cache-file))++ (define (package<? p1 p2)+ (string<? (package-full-name p1) (package-full-name p2)))++ (define (expand-cache package result)+ (cons `#(,(package-name package)+ ,(package-version package)+ ,(delete-duplicates+ (map package-full-name+ (sort (filter package? (package-direct-inputs package))+ package<?)))+ ,(package-outputs package)+ ,(package-supported-systems package)+ ,(package-synopsis package)+ ,(package-description package)+ ,(package-home-page package)+ ,(let ((location (package-location package)))+ (list (location-file location)+ (location-line location)+ (location-column location))))+ result))++ (define exp+ (fold-packages expand-cache '()))++ (mkdir-p (dirname cache-file))+ (call-with-output-file cache-file+ (lambda (port)+ (put-bytevector port+ (compile `'(,@exp)+ #:to 'bytecode+ #:opts '(#:to-file? #t)))))+ cache-file)+ (define %sigint-prompt ;; The prompt to jump to upon SIGINT.diff --git a/guix/channels.scm b/guix/channels.scmindex f0261dc2da..c4efaa7300 100644--- a/guix/channels.scm+++ b/guix/channels.scm@@ -2,6 +2,7 @@ ;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2018 Ricardo Wurmus <rekado@elephly.net> ;;; Copyright © 2019 Jan (janneke) Nieuwenhuizen <janneke@gnu.org>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -581,9 +582,40 @@ be used as a profile hook." (hook . package-cache)) #:local-build? #t))) +(define (package-metadata-cache-file manifest)+ "Build a package metadata cache file for the instance in MANIFEST. This is+meant to be used as a profile hook."+ (mlet %store-monad ((profile (profile-derivation manifest+ #:hooks '())))++ (define build+ #~(begin+ (use-modules (gnu packages))++ (if (defined? 'generate-package-metadata-cache)+ (begin+ ;; Delegate package cache generation to the inferior.+ (format (current-error-port)+ "Generating package metadata cache for '~a'...~%"+ #$profile)+ (generate-package-metadata-cache #$output))+ (mkdir #$output))))++ (gexp->derivation-in-inferior "guix-package-metadata-cache" build+ profile++ ;; If the Guix in PROFILE is too old and+ ;; lacks 'guix repl', don't build the cache+ ;; instead of failing.+ #:silent-failure? #t++ #:properties '((type . profile-hook)+ (hook . package-cache))+ #:local-build? #t)))+ (define %channel-profile-hooks ;; The default channel profile hooks.- (cons package-cache-file %default-profile-hooks))+ (cons* package-cache-file package-metadata-cache-file %default-profile-hooks)) (define (channel-instances->derivation instances) "Return the derivation of the profile containing INSTANCES, a list of-- 2.25.1
A
A
Arun Isaac wrote on 27 Mar 17:26 +0100
[PATCH v3 3/3] guix: Use package metadata cache for package search.
(address . 39258@debbugs.gnu.org)
20200327162654.18785-4-arunisaac@systemreboot.net
* guix/scripts/package.scm (process-query): Call search-packages anddisplay-package-search-results instead of find-packages-by-description anddisplay-search-results respectively.* guix/ui.scm (package-metadata->recutils): New function.(%package-metrics): Use package-metadata record field accessors.(package-relevance): Rename argument package to package-metadata.(display-package-search-results): New function.--- guix/scripts/package.scm | 5 +- guix/ui.scm | 132 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 125 insertions(+), 12 deletions(-)
Toggle diff (215 lines)diff --git a/guix/scripts/package.scm b/guix/scripts/package.scmindex 110d4f2977..c11f92f5a2 100644--- a/guix/scripts/package.scm+++ b/guix/scripts/package.scm@@ -7,6 +7,7 @@ ;;; Copyright © 2016 Benz Schenk <benz.schenk@uzh.ch> ;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -770,9 +771,9 @@ processed, #f otherwise." (_ #f)) opts)) (regexps (map (cut make-regexp* <> regexp/icase) patterns))- (matches (find-packages-by-description regexps)))+ (matches (search-packages (current-profile) regexps))) (leave-on-EPIPE- (display-search-results matches (current-output-port)))+ (display-package-search-results matches (current-output-port))) #t)) (('show requested-name)diff --git a/guix/ui.scm b/guix/ui.scmindex 1e24fe5dca..934699f065 100644--- a/guix/ui.scm+++ b/guix/ui.scm@@ -14,6 +14,7 @@ ;;; Copyright © 2019 Chris Marusich <cmmarusich@gmail.com> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr> ;;; Copyright © 2019 Simon Tournier <zimon.toutoune@gmail.com>+;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net> ;;; ;;; This file is part of GNU Guix. ;;;@@ -112,6 +113,7 @@ package-synopsis-string string->recutils package->recutils+ package-metadata->recutils package-specification->name+version+output supports-hyperlinks?@@ -122,6 +124,7 @@ relevance package-relevance display-search-results+ display-package-search-results with-profile-lock string->generations@@ -1484,6 +1487,75 @@ HYPERLINKS? is true, emit hyperlink escape sequences when appropriate." extra-fields) (newline port)) +(define* (package-metadata->recutils p port #:optional (width (%text-width))+ #:key+ (hyperlinks? (supports-hyperlinks? port))+ (extra-fields '()))+ "Write to PORT a `recutils' record of <package-metadata> object P, arranging+to fit within WIDTH columns. EXTRA-FIELDS is a list of symbol/value pairs to+emit. When HYPERLINKS? is true, emit hyperlink escape sequences when+appropriate."+ (define width*+ ;; The available number of columns once we've taken into account space for+ ;; the initial "+ " prefix.+ (if (> width 2) (- width 2) width))++ ;; Note: Don't i18n field names so that people can post-process it.+ (format port "name: ~a~%" (package-metadata-name p))+ (format port "version: ~a~%" (package-metadata-version p))+ (format port "outputs: ~a~%" (string-join (package-metadata-outputs p)))+ (format port "systems: ~a~%"+ (string-join (package-metadata-supported-systems p)))+ (format port "dependencies: ~a~%"+ (string-join (package-metadata-dependencies p) " "))+ (format port "location: ~a~%"+ (or (and=> (package-metadata-location p)+ (if hyperlinks? location->hyperlink location->string))+ (G_ "unknown")))++ ;; Note: Starting from version 1.6 or recutils, hyphens are not allowed in+ ;; field identifiers.+ (format port "homepage: ~a~%" (package-metadata-home-page p))++ ;; TODO: Print license+ ;; (format port "license: ~a~%"+ ;; (match (package-metadata-license p)+ ;; (((? license? licenses) ...)+ ;; (string-join (map license-name licenses)+ ;; ", "))+ ;; ((? license? license)+ ;; (let ((text (license-name license))+ ;; (uri (license-uri license)))+ ;; (if (and hyperlinks? uri (string-prefix? "http" uri))+ ;; (hyperlink uri text)+ ;; text)))+ ;; (x+ ;; (G_ "unknown"))))+ (format port "synopsis: ~a~%"+ (string-map (match-lambda+ (#\newline #\space)+ (chr chr))+ (or (and=> (package-metadata-synopsis p) P_)+ "")))+ (format port "~a~%"+ (string->recutils+ (string-trim-right+ (parameterize ((%text-width width*))+ (texi->plain-text+ (string-append "description: "+ (or (and=> (package-metadata-description p) P_)+ ""))))+ #\newline)))+ (for-each (match-lambda+ ((field . value)+ (let ((field (symbol->string field)))+ (format port "~a: ~a~%"+ field+ (fill-paragraph (object->string value) width*+ (string-length field))))))+ extra-fields)+ (newline port))+ ;;; ;;; Searching.@@ -1528,34 +1600,74 @@ score, the more relevant OBJ is to REGEXPS." (define %package-metrics ;; Metrics used to compute the "relevance score" of a package against a set ;; of regexps.- `((,package-name . 4)+ `((,package-metadata-name . 4) ;; Match against uncommon outputs.- (,(lambda (package)+ (,(lambda (package-metadata) (filter (lambda (output) (not (member output ;; Some common outpus shared by many packages. '("out" "doc" "debug" "lib" "include" "bin"))))- (package-outputs package)))+ (package-metadata-outputs package-metadata))) . 1) ;; Match regexps on the raw Texinfo since formatting it is quite expensive ;; and doesn't have much of an effect on search results.- (,(lambda (package)- (and=> (package-synopsis package) P_)) . 3)- (,(lambda (package)- (and=> (package-description package) P_)) . 2)+ (,(lambda (package-metadata)+ (and=> (package-metadata-synopsis package-metadata) P_)) . 3)+ (,(lambda (package-metadata)+ (and=> (package-metadata-description package-metadata) P_)) . 2) (,(lambda (type)- (match (and=> (package-location type) location-file)+ (match (and=> (package-metadata-location type) location-file) ((? string? file) (basename file ".scm")) (#f ""))) . 1))) -(define (package-relevance package regexps)+(define (package-relevance package-metadata regexps) "Return a score denoting the relevance of PACKAGE for REGEXPS. A score of zero means that PACKAGE does not match any of REGEXPS."- (relevance package regexps %package-metrics))+ (relevance package-metadata regexps %package-metrics))++(define* (display-package-search-results matches port+ #:key+ (command "guix search"))+ "Display MATCHES, a list of <package-metadata>/score pairs. If PORT is a+terminal, print at most a full screen of results."+ (define first-line+ (port-line port))++ (define max-rows+ (and first-line (isatty? port)+ (terminal-rows port)))++ (define (line-count str)+ (string-count str #\newline))++ (let loop ((matches matches))+ (match matches+ (((package-metadata . score) rest ...)+ (let* ((links? (supports-hyperlinks? port))+ (text (call-with-output-string+ (lambda (port)+ (package-metadata->recutils package-metadata port+ #:hyperlinks? links?+ #:extra-fields+ `((relevance . ,score)))))))+ (if (and (not (getenv "INSIDE_EMACS"))+ max-rows+ (> (port-line port) first-line) ;print at least one result+ (> (+ 4 (line-count text) (port-line port))+ max-rows))+ (unless (null? rest)+ (display-hint (format #f (G_ "Run @code{~a ... | less} \+to view all the results.")+ command)))+ (begin+ (display text port)+ (loop rest)))))+ (()+ #t)))) (define* (display-search-results matches port #:key-- 2.25.1
L
L
Ludovic Courtès wrote on 5 Apr 16:08 +0200
Re: [PATCH v3 0/3] Package metadata cache for guix search
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87blo6hvoj.fsf@gnu.org
Hi Arun,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (8 lines)> This is v3 of my attempt to make guix search faster. In this version, I have> abandoned use of xapian. Instead I build a cache of the metadata of all> packages in a profile hook. Then, I use that cache to search and display> search results. This way, package guile modules are not loaded during guix> search.>> Speedup is around 2x. Both measurements below are with a warm cache.
Sorry for the delay! Just to say that I like the approach, and I’lltake a closer look once the release is out…
Thank you!
Ludo’.
L
L
Ludovic Courtès wrote on 24 Apr 22:38 +0200
control message for bug #39258
(address . control@debbugs.gnu.org)
87r1wchb4v.fsf@gnu.org
severity 39258 importantquit
L
L
Ludovic Courtès wrote on 24 Apr 22:48 +0200
Re: [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87h7x8haor.fsf@gnu.org
Hi Arun,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (5 lines)> * gnu/packages.scm (%package-metadata-cache-file): New variable.> (generate-package-metadata-cache): New function.> * guix/channels.scm (package-metadata-cache-file): New function.> (%channel-profile-hooks): Add package-metadata-cache-file.
This is short and sweet, nice!
Toggle quote (17 lines)> + (define (expand-cache package result)> + (cons `#(,(package-name package)> + ,(package-version package)> + ,(delete-duplicates> + (map package-full-name> + (sort (filter package? (package-direct-inputs package))> + package<?)))> + ,(package-outputs package)> + ,(package-supported-systems package)> + ,(package-synopsis package)> + ,(package-description package)> + ,(package-home-page package)> + ,(let ((location (package-location package)))> + (list (location-file location)> + (location-line location)> + (location-column location))))
I was wondering if we could omit inputs, which are not that useful.
Apart from that it LGTM.
Note that this is probably the place where we could eventually add thecomputation of an inverted index like zimoun suggested inhttps://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html.
Toggle quote (3 lines)> + #:properties '((type . profile-hook)> + (hook . package-cache))
‘package-metadata-cache’, even (it’s for UI purposes).
Nitpick: I’d use “packages:” as the prefix in the subject line.
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 24 Apr 22:58 +0200
Re: [bug#39258] [PATCH v3 2/3] guix: Search package metadata cache.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
874kt8ha89.fsf@gnu.org
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (3 lines)> * gnu/packages.scm (search-packages): New function.> * guix/packages.scm (<package-metadata>): New record type.
[...]
Toggle quote (7 lines)> +(define (search-packages profile regexps)> + "Return a list of pairs: <package-metadata> objects corresponding to> +packages whose name, synopsis, description, or output matches at least one of> +REGEXPS sorted by relevance, and its non-zero relevance score."> + (define cache-file> + (string-append profile %package-metadata-cache-file))
Here we’re missing something that checks if the cache is authoritativeand falls back to the old method if it’s not, akin to what‘fold-available-packages’ does.
Toggle quote (12 lines)> + (define cache> + (catch 'system-error> + (lambda ()> + (map (match-lambda> + (#(name version dependencies outputs systems> + synopsis description home-page (file line column))> + (make-package-metadata> + name version dependencies outputs systems> + synopsis description home-page> + (location file line column))))> + (load-compiled cache-file)))
I realize the other cache also has that problem, but it would be nice toadd a version tag to the cache. Basically emit something like:
(package-metadata-cache (version 0) VECTOR …)
instead of just:
(VECTOR …)
Toggle quote (16 lines)> +(define-record-type* <package-metadata>> + package-metadata make-package-metadata> + package-metadata?> + this-package-metadata> + (name package-metadata-name)> + (version package-metadata-version)> + (dependencies package-metadata-dependencies)> + (outputs package-metadata-outputs)> + (supported-systems package-metadata-supported-systems)> + (synopsis package-metadata-synopsis)> + (description package-metadata-description)> + ;; TODO: Add license> + ;; (license package-metadata-license)> + (home-page package-metadata-home-page)> + (location package-metadata-location))
I’m not comfortable with this data structure duplication, especiallyright in (guix packages, but I’m not sure it’s avoidable.‘fold-available-packages’ avoids it by passing all the fields asarguments to the fold procedure, I’m not sure if it’s applicable here.
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 24 Apr 23:03 +0200
Re: [bug#39258] [PATCH v3 3/3] guix: Use package metadata cache for package search.
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87v9lofvg1.fsf@gnu.org
Hello,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (8 lines)> * guix/scripts/package.scm (process-query): Call search-packages and> display-package-search-results instead of find-packages-by-description and> display-search-results respectively.> * guix/ui.scm (package-metadata->recutils): New function.> (%package-metrics): Use package-metadata record field accessors.> (package-relevance): Rename argument package to package-metadata.> (display-package-search-results): New function.
[...]
Toggle quote (9 lines)> +(define* (package-metadata->recutils p port #:optional (width (%text-width))> + #:key> + (hyperlinks? (supports-hyperlinks? port))> + (extra-fields '()))> + "Write to PORT a `recutils' record of <package-metadata> object P, arranging> +to fit within WIDTH columns. EXTRA-FIELDS is a list of symbol/value pairs to> +emit. When HYPERLINKS? is true, emit hyperlink escape sequences when> +appropriate."
I think we should avoid copy/paste of ‘package->recutils’.
How about factorizing by having a common procedure that takes the fieldsas keyword arguments instead of taking a record?
Toggle quote (6 lines)> (define %package-metrics> ;; Metrics used to compute the "relevance score" of a package against a set> ;; of regexps.> - `((,package-name . 4)> + `((,package-metadata-name . 4)
Here we would also need to arrange so that this can apply to both a<package> and <package-metadata> (or whatever), perhaps by defining thetwo sets of metrics at once, or defining the second one by mapping overthe first one.
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 24 Apr 23:05 +0200
Re: [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87pnbwfvc9.fsf@gnu.org
Hi,
Arun Isaac <arunisaac@systemreboot.net> skribis:
Toggle quote (23 lines)> Speedup is around 2x. Both measurements below are with a warm cache.>> $ time guix search inkscape>> real 0m1.722s> user 0m1.776s> sys 0m0.097s>> $ time /tmp/test/bin/guix search inkscape>> real 0m0.749s> user 0m0.770s> sys 0m0.020s>> This patchset does not affect the search API nor does it improve the relevance> of search results. If there is interest in this approach, I'll complete this> patchset properly. But, in the long run, I do think we should aim to get> xapian or the like for guix search. WDYT?>> Unfortunately, generate-package-metadata-cache takes 43 seconds to build the> cache on my relatively slow computer. Performance should be better on other> people's machines.
43 seconds is a lot. How do these 43 seconds compare to current ‘guixsearch’ on your computer? (Both do roughly the same thing.)
Thanks,Ludo’.
Z
Z
zimoun wrote on 26 Apr 05:54 +0200
benchmark search: default vs v2 vs v3
CAJ3okZ0KP0bYuwShZoNkkFWrhGNPsUGOOfiL5azj_x8Q+GNqLA@mail.gmail.com
Hi,
Thank you Arun for the patches and all the work. Sorryfor the delay.

TLDR:
1) around 25 seconds added to "guix pull"... but I am more than oftenwaiting around 10 minutes when pulling. 2) the speedup is clear: more than 2x.

The question is the tradeoff between: the slowdown of pull vs thespeedup of search. What is acceptable?

Here let benchmark 3 versions of Guix:
- default is a357849f5b - v2 rebased on default and based on Xapian - v3 rebased on default too and based on "custom" index
and let compare the time of "guix pull" and then "guix search".Because v2 uses Xapian, the accuracy is different and so the list ofoutputs is different depending on the query; the impact on theperformance seems minimal. Let discuss elsewhere about accuracy andBM25 and let focus on performance for now.

* guix pull-----------
The idea is: measure if computing the new index is expensive or not,compared to all of what "guix pull" computes.

** Reference------------
Maybe, I should have misconfigured something or my laptop is reallynot powerful at all, but here some numbers.
(Note: /proc/cpuinfo says 4 times Intel(R) Core(TM) i5-5200U CPU @ 2.20GHzand /sys/block/sda/queue/rotational says 0 which is SSD.)
Toggle snippet (8 lines)$ guix describeGeneration 8 Apr 25 2020 09:00:01 (current) guix f84b036 repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: f84b0363053e5479464f6ce6ded45f80360d90fc
Toggle snippet (29 lines)$ time guix pull -C ~/.config/guix/default-channels.scmUpdating channel 'guix' from Git repository at'https://git.savannah.gnu.org/git/guix.git'...Building from this channel: guix https://git.savannah.gnu.org/git/guix.git 8cf6d15downloading fromhttps://ci.guix.gnu.org/nar/gzip/xgakzpfs3rz57m666hsk1v3d3zcy7wgn-config.scm... config.scm
[...]
building fonts directory...building directory of Info manuals...building database for manual pages...building profile with 1 package...building /gnu/store/kq1zlj5rxz8wrxc3ha8vck2wv2iakfnb-inferior-script.scm.drv...building package cache...building profile with 1 package...New in this revision: 2 new packages: cl-osicat, sbcl-osicat

real 13m37.997suser 1m38.129ssys 0m0.856s

And because "guix search" is used say 10 times more than "guix pull",an increase of 10% of "guix pull" will ease the experience of the userif "guix search" is faster, IMHO.
Therefore, because "guix pull" takes around 13 minutes, the extra costto index all the packages can be roughly 1min30s (at most).

Then, if I pull back from 8cf6d15 to '--commit=a357849f5b' then it takes:
real 2m13.693suser 1m37.418ssys 0m0.666s
so in this case 10% means around 7s. But after 1 minute waiting, thecommand feels too long to me and personally I am already waiting so Ido not mind much if it would take 2m13s or 3m00s.

Well, it is hard to draw a clear line about what could be accepted asthe time of indexing because the time of pulling is already highlyvariable.

What is the average of "guix pull"?
It could be really interresting to probe the users. They could report: - guix describe - time guix pullwhatever which channels are up.
Just to have an idea about what should be the acceptable extra timeadded by indexing. For sure it depends on the hardware but it wouldprovide an idea and help to see if the extra time is worth or not.
WDYT?


** Let's compare the index time-------------------------------
Let pull for the 3 cases and populate the store by all the necessaryitems. Could be looooonng! (20minutes) For example, for the version2 of patches -- living in my branch 'search-v2' using a worktree.
Toggle snippet (5 lines)time ./pre-inst-env guix pull -p /tmp/v2 \ --url=$PWD --branch=search-v2 \ -C ~/.config/guix/default-channels.scm
and then let spot the index file for each version:
Toggle snippet (10 lines)# ls -l /tmp/default/lib/guix/gnu/store/g5c08vqsv31nkn2r0hr32dbrkhf3cvd8-guix-package-cache
readlink /tmp/v2/lib/guix/package-search.index/gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index/lib/guix/package-search.index
readlink /tmp/v3/lib/guix/package-metadata.cache/gnu/store/8j78b5c4ddic21gcx7wpbq2akjn7x7mr-guix-package-metadata-cache/lib/guix/package-metadata.cache
Well, let remove the profiles and garbage collect the index files:
Toggle snippet (8 lines)rm /tmp/default /tmp/v{2,3}*guix gc -D \ /gnu/store/g5c08vqsv31nkn2r0hr32dbrkhf3cvd8-guix-package-cache \ /gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index \ /gnu/store/8j78b5c4ddic21gcx7wpbq2akjn7x7mr-guix-package-metadata-cache

And then re-run "guix pull". We are now comparing apple to apple, I guess.

| time | default | v2 | v3 ||------+-----------+-----------+-----------|| real | 1m11.899s | 1m30.806s | 1m34.341s || user | 1m23.845s | 1m24.160s | 1m24.233s || sys | 0m0.570s | 0m0.563s | 0m0.529s |

Therefore less than extra 20s and 25s for v2 and v3.

All the question is an extra 25s compared to which time of "guix pull": - more than 13m: adding 25s is acceptable - less than 2m: adding 25s is questionable
Usually, my feeling about "guix pull" is... I am waiting! Therefore,I will not see this extra 25s because it is masked by all the otherwork "guix pull" is doing.

* guix search-------------
Let compare cold (sudo echo 3 > /proc/sys/vm/drop_caches) and warmcache. For example for the query 'inkscape'.

| time | default | v2 | v3 ||------+----------+----------+----------|| real | 0m1.842s | 0m0.331s | 0m0.437s || user | 0m1.270s | 0m0.179s | 0m0.336s || sys | 0m0.142s | 0m0.047s | 0m0.052s ||------+----------+----------+----------|| real | 0m0.898s | 0m0.132s | 0m0.292s || user | 0m1.069s | 0m0.168s | 0m0.353s || sys | 0m0.072s | 0m0.008s | 0m0.019s |

Therefore the speedup is at least 3.
| cache | default-vs-v2 | default-vs-v3 ||-------+---------------+---------------|| cold | 5.6 | 4.2 || warm | 6.8 | 3.1 |

Another query:
Toggle snippet (3 lines)time guix search crypto library | recsel -P name | grep libb2
| time | default | v2 | v3 ||------+----------+----------+----------|| real | 0m2.216s | 0m1.109s | 0m0.689s || user | 0m1.655s | 0m1.309s | 0m0.683s || sys | 0m0.193s | 0m0.073s | 0m0.035s ||------+----------+----------+----------|| real | 0m1.197s | 0m0.490s | 0m0.491s || user | 0m1.448s | 0m0.819s | 0m0.625s || sys | 0m0.089s | 0m0.034s | 0m0.039s |

| cache | default-vs-v2 | default-vs-v3 ||-------+---------------+---------------|| cold | 2.0 | 3.2 || warm | 2.4 | 2.4 |



Before going further, especially about any other more sophisticatedinverted index (BM25), it appears to me important to fix what is"cost" on "guix pull" that the users are ready to pay. Becausesomehow the inverted index has to be computed. And without aninverted index, it seems difficult to improve the accurary.
One solution should be: let compute the inverted index in thebackground with a low priority. If the index is not done yet when"guix search" is called, then fallback to the current defaultbehaviour.

WDYT?

Cheers,simon
P
P
Pierre Neidhardt wrote on 26 Apr 09:29 +0200
877dy2902m.fsf@ambrevar.xyz
Hi Simon,
Thanks for taking the time to benchmark this, this is very insightful!
Toggle quote (4 lines)> Usually, my feeling about "guix pull" is... I am waiting! Therefore,> I will not see this extra 25s because it is masked by all the other> work "guix pull" is doing.
I agree and this is a very good point in my opinion.While I don't expect nor do I need "guix pull" to complete immediately,this is not true of "guix search".
As Simon suggested, maybe we can wrap a benchmark script together, postit on the mailing list and ask member to report their results. Maybea few dozen results would give us a better idea of the numbers we aredealing with.
Cheers!
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl6lOGEACgkQm9z0l6S7zH/HUwf/YH8o+NEQOqbycpcKL4sJe3GXB1HFmwVALSGRggRdguaqBcabj8JygHjx/Oi0n47C69K27bJWsdu1BvfmHHdmmXtHIWIvMl4t1o5NT1Z11sDw7cUL4FDnfJ9YqV1gS0qVIfx5ccZAbtSSeAhg31IrrzmgXbfBIPksWIWzCF8US6s2ftSh3/QkbJnL2AodteVHlm9rKaAQPJVQGcRVka7VdaDaH2zUXs1w6e5Q6cp/71oeoeok+QtAnqg57jwZceCUu3OA3sMCzabMAadnzC9w/xPJI5knR6Lb5HqtQla4slHgYBTVLtNdXxol1itksg9vevWgqqwzHdJFr+jADoe1Ew===prUD-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 26 Apr 11:48 +0200
Re: [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache.
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ1M=xvMS7ZQpTV5PgPKn83skLx7Vn6ty7ezY1MwsA=qaw@mail.gmail.com
On Fri, 24 Apr 2020 at 22:48, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (19 lines)> > + (define (expand-cache package result)> > + (cons `#(,(package-name package)> > + ,(package-version package)> > + ,(delete-duplicates> > + (map package-full-name> > + (sort (filter package? (package-direct-inputs package))> > + package<?)))> > + ,(package-outputs package)> > + ,(package-supported-systems package)> > + ,(package-synopsis package)> > + ,(package-description package)> > + ,(package-home-page package)> > + ,(let ((location (package-location package)))> > + (list (location-file location)> > + (location-line location)> > + (location-column location))))>> I was wondering if we could omit inputs, which are not that useful.
Agree.

Toggle quote (4 lines)> Note that this is probably the place where we could eventually add the> computation of an inverted index like zimoun suggested in> <https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>.
We should first agree on the extra cost (time) we are ready to pay tobuild improvements.See the lengthy message [1] about only the caching "inverted index"using the current 'relevance' scoring function.
[1] http://issues.guix.gnu.org/39258#78


Cheers,simon
L
L
Ludovic Courtès wrote on 26 Apr 16:35 +0200
(name . zimoun)(address . zimon.toutoune@gmail.com)
87r1wa48n0.fsf@gnu.org
Hi Simon,
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (2 lines)> On Fri, 24 Apr 2020 at 22:48, Ludovic Courtès <ludo@gnu.org> wrote:
[...]
Toggle quote (7 lines)>> Note that this is probably the place where we could eventually add the>> computation of an inverted index like zimoun suggested in>> <https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00243.html>.>> We should first agree on the extra cost (time) we are ready to pay to> build improvements.
It’s complicated. As it stands, I’d rather not add overhead to ‘guixpull’, especially since current ‘guix search’ on my SSD is fast enoughand can hardly be made any faster.
Realistically though, I understand that things are different on slowermachines and/or spinning disks. That’s why I’m interested in seeing howArun’s proposed changes can affect such machines.
If, as a bonus, it allows us to have an inverted index and thus improvethe quality of search results, that’s great!
Thanks,Ludo’.
P
P
Pierre Neidhardt wrote on 26 Apr 16:54 +0200
87r1wa70wj.fsf@ambrevar.xyz
Hi Ludo!
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (4 lines)> It’s complicated. As it stands, I’d rather not add overhead to ‘guix> pull’, especially since current ‘guix search’ on my SSD is fast enough> and can hardly be made any faster.
The question is, what is fast enough? I have an NVMe here that has athroughput of some 2GB/s, and yet
Toggle snippet (6 lines)time guix search emacs > /dev/nullreal 0m1.545suser 0m1.938ssys 0m0.080s
on a hot cache, which is too slow in my opinion :p
Mildly impatient users might be slightly discouraged from iteratingsearch queries.
It also makes `guix search` very impractical to use in (non-guile)script. Which is too bad considering that the recsel-formatting makes`guix search` a very good candidate for scripting.
Cheers!
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl6loKwACgkQm9z0l6S7zH8XxAf+N5eNs9ZFVdD/bZ6EjRN8lm4vH3L8FobMWtGptX86qlvgayEwferhobfEogycDjCFVp0pXmYw2O60T9mY5lQn6cghfJS2zZqjFrgLMo16PtGh7sQVu28AMQOqy2j1plUYkXlRPyw0qQRONjKPaLxZ9p0fBtnQubFvjTAXMTf28TSHX+ZJfE5/ESvdcgZ7y4nMkYrBXqdBKyWLJ3RNqyeLCgY0TeeE3blYHu4iMv+2xzUIDLy1rsg5Xlb0ttsIXAnmVBrx4nSCG4Kj7irGW6YTlg0PC+73Dt2VU4yuT7CFUEpApA7f88D1U4vspQuJDRuwnQrC4toal1EYLedup5beLw===B9Hq-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 26 Apr 17:05 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ1p5OE6451Ze+qSn2q7Xh-U5b7oqMbbuBkGxWM+tQ-GYQ@mail.gmail.com
Hi Ludo,
On Sun, 26 Apr 2020 at 16:35, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (4 lines)> Realistically though, I understand that things are different on slower> machines and/or spinning disks. That’s why I’m interested in seeing how> Arun’s proposed changes can affect such machines.
I understand. I have done a small benchmark [1] of the 3 ways: thecurrent, the v2 using Xapian (which is not an option on the long term)and the v3.
My "slower" machine is at my office... but it provides alreadyinteresting numbers, IMHO.
[1] http://issues.guix.gnu.org/39258#78

Toggle quote (3 lines)> If, as a bonus, it allows us to have an inverted index and thus improve> the quality of search results, that’s great!
This "issue" is: any improvement on both sides performance andaccuracy would add an somehow extra cost. The question is what is themaximum users would accept to pay for?

Well, it is complicated as you said. :-)A trade off between extra cost, maintenance, complexity, etc is noteasy to draw, as you said too elsewhere.I am seeing all that as experimental: explore ideas to see if they areworth or not.And what should be concluded now could change in the (near) future;for example if the computations of derivations are faster, resultingon "guix pull" faster, etc..

Cheer,simon
L
L
Ludovic Courtès wrote on 26 Apr 17:33 +0200
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87368q45ym.fsf@gnu.org
Hey!
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (14 lines)> Ludovic Courtès <ludo@gnu.org> writes:>>> It’s complicated. As it stands, I’d rather not add overhead to ‘guix>> pull’, especially since current ‘guix search’ on my SSD is fast enough>> and can hardly be made any faster.>> The question is, what is fast enough? I have an NVMe here that has a> throughput of some 2GB/s, and yet>> time guix search emacs > /dev/null> real 0m1.545s> user 0m1.938s> sys 0m0.080s
That accounts for the time to render 864 entries:
$ guix search emacs| grep ^name| wc -l 864
Compare with:
$ time guix search emacs | head -100 > /dev/null
real 0m0.674s user 0m0.802s sys 0m0.048s
Again, this is not to say it cannot be improved, but it’s quite achallenge to do better on such hardware.
Though as discussed with Arun, there may be low-hanging optimizationfruits in Texinfo parsing and rendering. I guess we need to go aheadfire up statprof now. :-)
Ludo’.
L
L
Ludovic Courtès wrote on 26 Apr 17:49 +0200
Re: benchmark search: default vs v2 vs v3
(name . zimoun)(address . zimon.toutoune@gmail.com)
87lfmi2qod.fsf@gnu.org
Hi,
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (4 lines)> 1) around 25 seconds added to "guix pull"... but I am more than often> waiting around 10 minutes when pulling.> 2) the speedup is clear: more than 2x.
Nice!
It does seem like Arun’s v3 (or maybe even v2) would work nicely.
Toggle quote (3 lines)> The question is the tradeoff between: the slowdown of pull vs the> speedup of search. What is acceptable?
That’s only one criterion among others. I hear the argument that 25s is“nothing” compared to the rest, but it’s really a tradeoff. Like, if Ispent a day optimizing ‘guix pull’ and managed to save 25s, I would findit nice. :-)
Toggle quote (2 lines)> $ time guix pull -C ~/.config/guix/default-channels.scm
It also depends on what’s in that file, of course.
Toggle quote (6 lines)> Then, if I pull back from 8cf6d15 to '--commit=a357849f5b' then it takes:>> real 2m13.693s> user 1m37.418s> sys 0m0.666s
For me:
Toggle snippet (18 lines)$ guix describeGeneracio 139 Apr 13 2020 21:50:08 (nuna) guix bad368b repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: bad368b0d794689f3a8a11b58f1ea4987938682e$ time guix pull -p /tmp/test --commit=bad368b0d794689f3a8a11b58f1ea4987938682eUpdating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...Building from this channel: guix https://git.savannah.gnu.org/git/guix.git bad368b
[...]
real 0m57.916suser 1m1.017ssys 0m0.609s
(On a 2.6 GHz i7 though.)
Toggle quote (8 lines)> Well, let remove the profiles and garbage collect the index files:>> rm /tmp/default /tmp/v{2,3}*> guix gc -D \> /gnu/store/g5c08vqsv31nkn2r0hr32dbrkhf3cvd8-guix-package-cache \> /gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index \> /gnu/store/8j78b5c4ddic21gcx7wpbq2akjn7x7mr-guix-package-metadata-cache
Could you do, for v2 and v3:
time guix build /gnu/store/…-package-metadata-cache.drv --check
?
That we’ll give us the exact cost of that part. It’ll be interestingespecially in the Xapian case, which we expected to be higher.
Thanks for the insightful benchmarks!
Ludo’.
Z
Z
zimoun wrote on 26 Apr 19:01 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ1AeYdprjmntvSYOq1-NNyvibJOatwPm20WVNfkm-RxkA@mail.gmail.com
Hi Ludo,
On Sun, 26 Apr 2020 at 17:49, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (2 lines)> It does seem like Arun’s v3 (or maybe even v2) would work nicely.
The v3 is more interesting because it does not change the relevancescoring and does not add other dependency.However v2 is interesting to easily test BM25 which is anotherrelevance scoring... work in progress. :-)

Toggle quote (8 lines)> > The question is the tradeoff between: the slowdown of pull vs the> > speedup of search. What is acceptable?>> That’s only one criterion among others. I hear the argument that 25s is> “nothing” compared to the rest, but it’s really a tradeoff. Like, if I> spent a day optimizing ‘guix pull’ and managed to save 25s, I would find> it nice. :-)
And I expect that the middle-term roadmap would even decrease more thecomputations of derivations. ;-)


Toggle quote (4 lines)> > $ time guix pull -C ~/.config/guix/default-channels.scm>> It also depends on what’s in that file, of course.
Contains only one line: %default-channels
See my wishlist ;-)https://lists.gnu.org/archive/html/guix-devel/2020-04/msg00393.html


me: 2m13.693syou: 0m57.916s
As we already discussed elsewhere, it is hard to "test" 'guix pull'.Does it make sense to measure "guix pull"? As Chris (Marusich) did forCDN.

Toggle quote (12 lines)> > Well, let remove the profiles and garbage collect the index files:> >> > rm /tmp/default /tmp/v{2,3}*> > guix gc -D \> > /gnu/store/g5c08vqsv31nkn2r0hr32dbrkhf3cvd8-guix-package-cache \> > /gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index \> > /gnu/store/8j78b5c4ddic21gcx7wpbq2akjn7x7mr-guix-package-metadata-cache>> Could you do, for v2 and v3:>> time guix build /gnu/store/…-package-metadata-cache.drv --check
Newbie me! :-)
Two points:
1. It may not be reproducible... I am checking. 2. The time seems similar (v2=26s and v3=29s) considering the timeto start Guile and so on.
Toggle snippet (23 lines)guix gc --list-live | grep metadatatime /tmp/v3/bin/guix build/gnu/store/jxs0abica8kjz1ppym95df97jk0qa9by-guix-package-metadata-cache.drv--checkThe following profile hook will be built: /gnu/store/jxs0abica8kjz1ppym95df97jk0qa9by-guix-package-metadata-cache.drvbuilding package cache...(repl-version 0 1 1)Generating package metadata cache for'/gnu/store/95mi525syinh08jmcd3q7a7a8mr1sykb-profile'...(values (value "/gnu/store/zhp7wv87vr6iis0fa3ff925i5r04i08q-guix-package-metadata-cache/lib/guix/package-metadata.cache"))guix build: error: derivation`/gnu/store/jxs0abica8kjz1ppym95df97jk0qa9by-guix-package-metadata-cache.drv'may not be deterministic: output`/gnu/store/zhp7wv87vr6iis0fa3ff925i5r04i08q-guix-package-metadata-cache'differs
real 0m29.788suser 0m0.535ssys 0m0.025s

Toggle quote (3 lines)> That we’ll give us the exact cost of that part. It’ll be interesting> especially in the Xapian case, which we expected to be higher.
Toggle snippet (21 lines)time /tmp/v2/bin/guix build/gnu/store/w0dhl2n3ngi4v2ld8lprkqjl1g1q2m4p-guix-package-search-index.drv--checkThe following profile hook will be built: /gnu/store/w0dhl2n3ngi4v2ld8lprkqjl1g1q2m4p-guix-package-search-index.drvrunning profile hook of type 'package-search-index'...(repl-version 0 1 1)Generating package search index for'/gnu/store/wiinj9nrb45wlf2cgbgkjl9chxz9cb9b-profile'...(values (value "/gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index/lib/guix/package-search.index"))guix build: error: derivation`/gnu/store/w0dhl2n3ngi4v2ld8lprkqjl1g1q2m4p-guix-package-search-index.drv'may not be deterministic: output`/gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index'differs
real 0m26.552suser 0m0.626ssys 0m0.046s
It is not higher. Why should it be?

Considering aside the issue of reproducibility -- which should be one!-- well, should be possible to download the index file as any othersubstitute?

Cheers,simon
L
L
Ludovic Courtès wrote on 26 Apr 22:22 +0200
(name . zimoun)(address . zimon.toutoune@gmail.com)
87pnbuyp25.fsf@gnu.org
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (12 lines)>> Could you do, for v2 and v3:>>>> time guix build /gnu/store/…-package-metadata-cache.drv --check>> Newbie me! :-)>> Two points:>> 1. It may not be reproducible... I am checking.> 2. The time seems similar (v2=26s and v3=29s) considering the time> to start Guile and so on.
Good, so it means that’s not Xapian taking time here.
Thanks again!
Ludo’.
Z
Z
zimoun wrote on 30 Apr 15:10 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ04SYqisveQf5QFNx57s1bZ5iCGu7Szmx1drF5+-S_5Ng@mail.gmail.com
Hi Ludo,
On Sun, 26 Apr 2020 at 17:49, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (5 lines)> That’s only one criterion among others. I hear the argument that 25s is> “nothing” compared to the rest, but it’s really a tradeoff. Like, if I> spent a day optimizing ‘guix pull’ and managed to save 25s, I would find> it nice. :-)
I am not sure to understand all what "guix pull" does.Does "guix pull" compile all the scheme files under 'gnu/'? Probablyonly recompiles the "new" files?
I do not know if it makes sense, but I just note this difference:
1. Search without compiling of all files under 'gnu/packages/' 2. Compile all the files under 'gnu/packages/' then search 3. Search with only the file gnu/packages/emacs-xyz.scm not compiled(all the other files are compiled) 4. Compile the file above and then search
3b and 4b with gnu/packages/cobol.scm which is smaller than emacs-xyz.scm.

Results:
1) 1m43.312s2) 0m1.301s (but 9m51.801s compiling)
3) 0m6.526s4) 0m1.389s (1m8.670s compiling)
3b) 0m0.921s4b) 0m0.924s (0m1.884s compiling)
Therefore, an option to reduce the time when pulling should to relaxthe "compilation" for 'gnu/packages/' and 'gnu/services'; somethingless optimized since the packages and services "just" need to betransformed into bytecode to improve IO when reading them. Perhaps Imiss a point...
And maybe, it is similar than what Andy Wingo is proposing in [1].
[1] https://lists.gnu.org/archive/html/guix-devel/2020-04/msg00444.html

Cheers,simon
Toggle snippet (16 lines)find gnu/packages -name "*.scm" -type f -exec touch {} \;time ./pre-inst-env guix search gmsh | recsel -C -p name
;;; note: source file /home/simon/src/guix/wk/tmp/gnu/packages/abduco.scm;;; newer than compiled /home/simon/src/guix/wk/tmp/gnu/packages/abduco.go
[...]
;;; note: source file /home/simon/src/guix/wk/tmp/gnu/packages/zwave.scm;;; newer than compiled /home/simon/src/guix/wk/tmp/gnu/packages/zwave.goname: gmsh
real 1m43.312suser 2m19.318s
Toggle snippet (32 lines)find gnu/packages -name "*.scm" -type f -exec touch {} \;time make -j4 && time ./pre-inst-env guix search gmsh | recsel -C -p name
make all-recursivemake[1]: Entering directory '/home/simon/src/guix/wk/tmp'Making all in po/guixmake[2]: Entering directory '/home/simon/src/guix/wk/tmp/po/guix'make[2]: Leaving directory '/home/simon/src/guix/wk/tmp/po/guix'Making all in po/packagesmake[2]: Entering directory '/home/simon/src/guix/wk/tmp/po/packages'make[2]: Leaving directory '/home/simon/src/guix/wk/tmp/po/packages'make[2]: Entering directory '/home/simon/src/guix/wk/tmp'Compiling Scheme modules...[ 0%] LOAD gnu/packages/abduco.scm;;; note: source file ./gnu/packages/abduco.scm;;; newer than compiled /home/simon/src/guix/wk/tmp/gnu/packages/abduco.go
[...]
[100%] GUILEC gnu/packages/zwave.gomake[2]: Leaving directory '/home/simon/src/guix/wk/tmp'make[1]: Leaving directory '/home/simon/src/guix/wk/tmp'
real 9m51.801suser 29m18.938ssys 0m5.822sname: gmsh
real 0m1.301suser 0m1.266ssys 0m0.101s
Z
Z
zimoun wrote on 3 May 17:01 +0200
[PATCH v4 0/3] Faster cache generation (similar as v3)
(address . 39258@debbugs.gnu.org)
20200503150154.26532-1-zimon.toutoune@gmail.com
Dear,
The aim of this version v4 is to keep the same searching performances as the previous version v3 but to drastically reduce the generation of the cache. On my laptop, the overhead is now 4 seconds; compared to more than 20 seconds for v2 and v3.
Toggle snippet (6 lines)# defaulttime guix build /gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-cache.drv --check# v4time guix build /gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-cache.drv --check
| | default | v4 ||------+----------+-----------|| real | 0m6.012s | 0m10.244s || user | 0m0.541s | 0m0.542s || sys | 0m0.033s | 0m0.032s |

In the version v3, the cache is built using 'cons' and 'fold-packages' (wrapper to 'fold-module-public-variables'). The version v4 modifies -- by adding other information -- the function 'generate-package-cache' which uses 'vhash' and 'fold-module-public-variables*'.
Therefore the cache '/lib/guix/package.cache' contains more information. (The v4 structure of 'package.cache' is a quick draft, so details should be discussed and an interesting move should to have a structured (binary and all strings) S-exp; because it should become an entry point to export the packages list to JSON. WDYT?)

Now, we are comparing apples to apples and the cost to compute BM25 (v2) is not free at all. Remember that BM25 is the state-of-the-art of information retrieval (relevance ranking) and it is delegated to Xapian (v2). I do not know if there is perfomance bottleneck between Guix, Guile-Xapian and Xapian itself but for sure the computation of BM25 is not free. More about that soon.
To be clear about BM25 and caching, what I have in mind is: 1. "guix search --build-index" optionally done by the user if they wants for example the BM25 ranking. 2. Use BM25 metrics to detect poor package meta-data (synopsis and description); if it worth why not add another checker to "guix lint".
However, ranking is another story and I am not convinced yet if BM25 fits Guix needs or not.


* Details~~~~~~~~~
The pacthes applies against the commit a357849f5b (and it is not yet rebased).
Toggle snippet (4 lines)time ./pre-env-inst guix pull --branch=search-v4 --url=$PWD -p /tmp/v4

Similar test than the previous benchmark (cold cache).
Toggle snippet (9 lines)time ./pre-env-inst /tmp/v4/bin/guix search crypto library \ | recsel -P name | grep libb2name: libb2
real 0m0.784suser 0m0.810ssys 0m0.037s
And the option '--load-path' turns off the cache and it fallbacks to the usual 'fold-package'.
Toggle snippet (11 lines)time ./pre-inst-env /tmp/v4/bin/guix search -L /tmp/my-pkgs crypto library \ | recsel -C -p name | grep libb2name: libb2
real 0m2.446suser 0m1.872ssys 0m0.187s


* Still draft~~~~~~~~~~~~~
1. The name of 'fold-packages*' should be misleading since it does not return "true" packages.
Toggle snippet (11 lines)(define get-hello (p r) (if (string=? (package-name p) "hello") p r))(define no-cache (fold-packages get-hello '()))(define from-cache (fold-packages* get-hello '()))
(equal? no-cache from-cache);;; #f
Another name for the procedure is welcome if it is an issue.
2. The function 'package->recutils' in 'guix/ui.scm' is modified but it is not the better.
Toggle snippet (7 lines) (match (package-supported-systems p) (('cache supported-systems) (string-join supported-systems)) (_ (string-join (package-transitive-supported-systems p)))))
However it avoids to duplicate code; as it is done in version v3.

3. Deprecated packages are displayed (bug in v3 too).
4. Impolite '@@' is used to access the private license construction.
5. Commit messages are incomplete, copyright header too, etc..


* Next?~~~~~~~
IMHO, simply caching improves the current situation:
- a bit of extra time at pull time (less than 5s on my machine) + speed up at search time (2x faster) * maintainable code?
Is it in the right direction?Could you advise for a more compliant code?Could you test on your machines to have another point of comparison?


Best regards,simon

zimoun (3): DRAFT packages: Add fields to packages cache. DRAFT packages: Add new procedure 'fold-packages*'. DRAFT guix package: Use cache in 'find-packages-by-description'.
gnu/packages.scm | 98 ++++++++++++++++++++++++++++++++++++++-- guix/scripts/package.scm | 2 +- guix/ui.scm | 29 +++++++----- tests/packages.scm | 31 +++++++++++++ 4 files changed, 143 insertions(+), 17 deletions(-)
-- 2.26.1
Z
Z
zimoun wrote on 3 May 17:01 +0200
[PATCH v4 1/3] DRAFT packages: Add fields to packages cache.
(address . 39258@debbugs.gnu.org)
20200503150154.26532-2-zimon.toutoune@gmail.com
--- gnu/packages.scm | 51 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 46 insertions(+), 5 deletions(-)
Toggle diff (99 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex d22c992bb1..fa18f81487 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -33,6 +33,8 @@ #:use-module (guix profiles) #:use-module (guix describe) #:use-module (guix deprecation)+ #:use-module (guix build-system)+ #:use-module (guix licenses) #:use-module (ice-9 vlist) #:use-module (ice-9 match) #:use-module (ice-9 binary-ports)@@ -212,7 +214,8 @@ package module." (match vector (#(name version module symbol outputs supported? deprecated?- file line column)+ file line column+ _ _ _ _ _ _ _ _ _ _) (proc name version result #:outputs outputs #:location (and file@@ -269,7 +272,11 @@ package names. Return #f on failure." (match item (#(name version module symbol outputs supported? deprecated?- file line column)+ file line column+ synopsis description home-page+ build-system-name build-system-description+ supported-systems direct-inputs+ license-name license-uri license-comment) (vhash-cons name item vhash)))) vlist-null lst))@@ -316,7 +323,8 @@ decreasing version order." (if (and (cache-is-authoritative?) cache) (match (cache-lookup cache name) (#f #f)- ((#(_ versions modules symbols _ _ _ _ _ _) ...)+ ((#(_ versions modules symbols _ _ _ _ _ _+ _ _ _ _ _ _ _ _ _ _) ...) (fold (lambda (version* module symbol result) (if (or (not version) (version-prefix? version version*))@@ -339,7 +347,8 @@ matching NAME and VERSION." (#f '()) ((#(name versions modules symbols outputs supported? deprecated?- files lines columns) ...)+ files lines columns+ _ _ _ _ _ _ _ _ _ _) ...) (fold (lambda (version* file line column result) (if (and file (or (not version)@@ -401,7 +410,39 @@ reducing the memory footprint." `(,(location-file loc) ,(location-line loc) ,(location-column loc))- '(#f #f #f))))+ '(#f #f #f)))++ ,(package-synopsis package)+ ,(package-description package)+ ,(package-home-page package)++ ,@(let ((build-system+ (package-build-system package)))+ `(,(symbol->string+ (build-system-name build-system))+ ,(build-system-description build-system)))++ ,(package-transitive-supported-systems package)++ ,(delete-duplicates+ (sort (map package-full-name+ (match (package-direct-inputs package)+ (((labels inputs . _) ...)+ (filter package? inputs))))+ string<?))++ ,@(match (package-license package)+ (((? license? licenses) ...) ; multilicenses+ `(,(string-join (map license-name licenses)+ ", ")+ ,(license-uri (car licenses)) ;TODO: names>uris?+ ;; see gpl1+ comment #f+ ,(license-comment (car licenses))))+ ((? license? license)+ `(,(license-name license)+ ,(license-uri license)+ ,(license-comment license)))+ (_ '(#f #f #f)))) result) (vhash-consq package #t seen)))))) (_-- 2.26.1
Z
Z
zimoun wrote on 3 May 17:01 +0200
[PATCH v4 2/3] DRAFT packages: Add new procedure 'fold-packages*'.
(address . 39258@debbugs.gnu.org)
20200503150154.26532-3-zimon.toutoune@gmail.com
--- gnu/packages.scm | 47 ++++++++++++++++++++++++++++++++++++++++++++++ guix/ui.scm | 29 +++++++++++++++++----------- tests/packages.scm | 31 ++++++++++++++++++++++++++++++ 3 files changed, 96 insertions(+), 11 deletions(-)
Toggle diff (157 lines)diff --git a/gnu/packages.scm b/gnu/packages.scmindex fa18f81487..a0c5835b8b 100644--- a/gnu/packages.scm+++ b/gnu/packages.scm@@ -55,6 +55,7 @@ fold-packages fold-available-packages+ fold-packages* find-newest-available-packages find-packages-by-name@@ -253,6 +254,52 @@ is guaranteed to never traverse the same package twice." init modules)) +(define (fold-packages* proc init)+ "Fold (PROC PACKAGE RESULT) over the list of available packages. When a+package cache is available, this procedure does not actually load any package+module. Moreover when package cache is available, this procedure+re-constructs a new package skipping some package record field. The usage of+this procedure is User Interface (ui) only."+ (define cache+ (load-package-cache (current-profile)))++ (define license (@@ (guix licenses) license))++ (if (and cache (cache-is-authoritative?))+ (vhash-fold (lambda (name vector result)+ (match vector+ (#(name version module symbol outputs+ supported? deprecated?+ file line column+ synopsis description home-page+ build-system-name build-system-description+ supported-systems direct-inputs+ license-name license-uri license-comment)+ (proc (package+ (name name)+ (version version)+ (source #f) ;TODO: ?+ (build-system+ (build-system+ (name (string->symbol build-system-name))+ (description build-system-description)+ (lower #f))) ; never used by ui+ (inputs ; list of "full-name@version"+ (list 'cache direct-inputs))+ (outputs outputs)+ (synopsis synopsis)+ (description description)+ (license (license+ license-name license-uri license-comment))+ (home-page home-page)+ (supported-systems (list 'cache supported-systems))+ (location (location+ file line column)))+ result))))+ init+ cache)+ (fold-packages proc init)))+ (define %package-cache-file ;; Location of the package cache. "/lib/guix/package.cache")diff --git a/guix/ui.scm b/guix/ui.scmindex 1e24fe5dca..257d119798 100644--- a/guix/ui.scm+++ b/guix/ui.scm@@ -1416,13 +1416,10 @@ HYPERLINKS? is true, emit hyperlink escape sequences when appropriate." ;; the initial "+ " prefix. (if (> width 2) (- width 2) width)) - (define (dependencies->recutils packages)- (let ((list (string-join (delete-duplicates- (map package-full-name- (sort packages package<?))) " ")))- (string->recutils- (fill-paragraph list width*- (string-length "dependencies: ")))))+ (define (dependencies->string packages)+ (string-join (delete-duplicates+ (map package-full-name+ (sort packages package<?))) " ")) (define (package<? p1 p2) (string<? (package-full-name p1) (package-full-name p2)))@@ -1432,11 +1429,21 @@ HYPERLINKS? is true, emit hyperlink escape sequences when appropriate." (format port "version: ~a~%" (package-version p)) (format port "outputs: ~a~%" (string-join (package-outputs p))) (format port "systems: ~a~%"- (string-join (package-transitive-supported-systems p)))+ (match (package-supported-systems p)+ (('cache supported-systems)+ (string-join supported-systems))+ (_+ (string-join (package-transitive-supported-systems p))))) (format port "dependencies: ~a~%"- (match (package-direct-inputs p)- (((labels inputs . _) ...)- (dependencies->recutils (filter package? inputs)))))+ (let ((dependencies+ (match (package-direct-inputs p)+ (('cache inputs)+ (string-join inputs))+ (((labels inputs . _) ...)+ (dependencies->string (filter package? inputs))))))+ (string->recutils+ (fill-paragraph dependencies width*+ (string-length "dependencies: "))))) (format port "location: ~a~%" (or (and=> (package-location p) (if hyperlinks? location->hyperlink location->string))diff --git a/tests/packages.scm b/tests/packages.scmindex 7a8b5e4a2d..4504f6cf33 100644--- a/tests/packages.scm+++ b/tests/packages.scm@@ -1169,6 +1169,37 @@ ((one) (eq? one guile-2.0)))) +(test-assert "fold-packages* hello with/without cache"+ (let ()+ (define (equal-package? p1 p2)+ ;; fold-package* re-constructs a new package skipping 'source' and 'lower'+ ;; so equal? does not apply+ (and (equal? (package-full-name p1) (package-full-name p2))+ (equal? (package-description p1) (package-description p2))))++ (define no-cache+ (fold-packages* (lambda (p r)+ (if (string=? (package-name p) "hello")+ p+ r))+ #f))++ (define from-cache+ (call-with-temporary-directory+ (lambda (cache)+ (generate-package-cache cache)+ (mock ((guix describe) current-profile (const cache))+ (mock ((gnu packages) cache-is-authoritative? (const #t))+ (fold-packages* (lambda (p r)+ (if (string=? (package-name p) "hello")+ p+ r))+ #f))))))++ (and (equal? no-cache hello)+ (equal-package? from-cache hello)+ (equal-package? no-cache from-cache))))+ (test-assert "fold-available-packages with/without cache" (let () (define no-cache-- 2.26.1
Z
Z
zimoun wrote on 3 May 17:01 +0200
[PATCH v4 3/3] DRAFT guix package: Use cache in 'find-packages-by-description'.
(address . 39258@debbugs.gnu.org)
20200503150154.26532-4-zimon.toutoune@gmail.com
--- guix/scripts/package.scm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Toggle diff (15 lines)diff --git a/guix/scripts/package.scm b/guix/scripts/package.scmindex badb1dcd38..6b982eb172 100644--- a/guix/scripts/package.scm+++ b/guix/scripts/package.scm@@ -174,7 +174,7 @@ hooks\" run when building the profile." "Return a list of pairs: packages whose name, synopsis, description, or output matches at least one of REGEXPS sorted by relevance, and its non-zero relevance score."- (let ((matches (fold-packages (lambda (package result)+ (let ((matches (fold-packages* (lambda (package result) (if (package-superseded package) result (match (package-relevance package-- 2.26.1
L
L
Ludovic Courtès wrote on 3 May 18:43 +0200
Re: [PATCH v4 0/3] Faster cache generation (similar as v3)
(name . zimoun)(address . zimon.toutoune@gmail.com)
87r1w1ynnm.fsf@gnu.org
Hello!
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (13 lines)> The aim of this version v4 is to keep the same searching performances as the previous version v3 but to drastically reduce the generation of the cache. On my laptop, the overhead is now 4 seconds; compared to more than 20 seconds for v2 and v3.>> # default> time guix build /gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-cache.drv --check> # v4> time guix build /gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-cache.drv --check>> | | default | v4 |> |------+----------+-----------|> | real | 0m6.012s | 0m10.244s |> | user | 0m0.541s | 0m0.542s |> | sys | 0m0.033s | 0m0.032s |
Not bad!
Toggle quote (5 lines)> In the version v3, the cache is built using 'cons' and 'fold-packages' (wrapper to 'fold-module-public-variables'). The version v4 modifies -- by adding other information -- the function 'generate-package-cache' which uses 'vhash' and 'fold-module-public-variables*'.>> Therefore the cache '/lib/guix/package.cache' contains more> information.
This breaks the binary interface, so we’ll have to analyze the impact ofsuch a change and devise a strategy.
Toggle quote (5 lines)> (The v4 structure of 'package.cache' is a quick draft, so details> should be discussed and an interesting move should to have a> structured (binary and all strings) S-exp; because it should become an> entry point to export the packages list to JSON. WDYT?)
It’s on purpose that this cache is an object file: it just needs to bemmap’d, and that’s it. It’s the cheapest possible way to do it.Parsing sexps would be more costly, and since we’re talking aboutstartup time, this is sensitive.
Toggle quote (5 lines)> Now, we are comparing apples to apples and the cost to compute BM25 (v2) is not free at all. Remember that BM25 is the state-of-the-art of information retrieval (relevance ranking) and it is delegated to Xapian (v2). I do not know if there is perfomance bottleneck between Guix, Guile-Xapian and Xapian itself but for sure the computation of BM25 is not free. More about that soon.>> To be clear about BM25 and caching, what I have in mind is:> 1. "guix search --build-index" optionally done by the user if they wants for example the BM25 ranking.
Something that must be done explicitly doesn’t seem great to me. As auser, I’d rather not think about search indexes and all. But I don’tknow, maybe if it happened automatically on the first ‘guix search’invocation that’d be fine.
Toggle quote (2 lines)> 2. Use BM25 metrics to detect poor package meta-data (synopsis and description); if it worth why not add another checker to "guix lint".
That’d be interesting!
Toggle quote (2 lines)> 1. The name of 'fold-packages*' should be misleading since it does not return "true" packages.
Did you see ‘fold-available-packages’? It seems you could extend itinstead of introducing ‘fold-packages*’, no?
Toggle quote (10 lines)> 2. The function 'package->recutils' in 'guix/ui.scm' is modified but it is not the better.>> (match (package-supported-systems p)> (('cache supported-systems)> (string-join supported-systems))> (_> (string-join (package-transitive-supported-systems p)))))>> However it avoids to duplicate code; as it is done in version v3.
I made suggestions to Arun’s v3 about the API here. Essentially, Ithink I proposed having a procedure that takes the list of fields askeyword parameters, and ‘package->recutils’ would just delegate to that.

Toggle quote (4 lines)> 3. Deprecated packages are displayed (bug in v3 too).>> 4. Impolite '@@' is used to access the private license construction.
(guix licenses) could provide a ‘string->license’ procedure.
Stopping here for now because I’m sorta drowning in patch review. :-)
Thanks for exploring this design space, we’re making progress!
Ludo’.
Z
Z
zimoun wrote on 3 May 20:10 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ1GS3aMjX3kGBYOkJi03MzGe2qgfAznWE5aGNn+zKonrw@mail.gmail.com
Hi Ludo,
On Sun, 3 May 2020 at 18:43, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (6 lines)> > Therefore the cache '/lib/guix/package.cache' contains more> > information.>> This breaks the binary interface, so we’ll have to analyze the impact of> such a change and devise a strategy.
Interface between what and what?
Because from my understanding, this file is only used by only oneguix. What do I miss?

Note that I have read your comment in v3 2/3 but I did not understand it. Sorry.
Toggle snippet (12 lines)I realize the other cache also has that problem, but it would be nice toadd a version tag to the cache. Basically emit something like:
(package-metadata-cache (version 0) VECTOR …)
instead of just:
(VECTOR …)


Toggle quote (10 lines)> > (The v4 structure of 'package.cache' is a quick draft, so details> > should be discussed and an interesting move should to have a> > structured (binary and all strings) S-exp; because it should become an> > entry point to export the packages list to JSON. WDYT?)>> It’s on purpose that this cache is an object file: it just needs to be> mmap’d, and that’s it. It’s the cheapest possible way to do it.> Parsing sexps would be more costly, and since we’re talking about> startup time, this is sensitive.
I agree and I have badly worded or I misunderstand something.For example, 'supported-systems' is saved as a list of strings,whereas 'license' is expanded as 3 strings without be packed in a listof strings. From my point of view, it is inconsistent and I do notknow what is the best (readibility, startup time, etc.).

Toggle quote (8 lines)> > To be clear about BM25 and caching, what I have in mind is:> > 1. "guix search --build-index" optionally done by the user if they wants for example the BM25 ranking.>> Something that must be done explicitly doesn’t seem great to me. As a> user, I’d rather not think about search indexes and all. But I don’t> know, maybe if it happened automatically on the first ‘guix search’> invocation that’d be fine.
I do not think it is an option to build the BM25 the first time "guixsearch" is called. Back-to-envelop estimation, it needs ~25 secondsto Xapian* to do so.
From my point of view, two options: a) "guix pull" does this extra ~25 seconds (compared to 10 seconds tobuild the v4 cache) b) the user manually build the index (I agree it is awkward!)
Well, the first question is to evaluate if it is worth -- I am usingthe v2 version based on Xapian to have an idea. Please if you havesuggestions about query (terms an user could type) and results(packages an user could expect), there are welcome.

*Xapian: I do not think we could do better but I have not checked yetif there is a bottleneck Guix, Guile-Xapian and Xapian.

Toggle quote (5 lines)> > 1. The name of 'fold-packages*' should be misleading since it does not return "true" packages.>> Did you see ‘fold-available-packages’? It seems you could extend it> instead of introducing ‘fold-packages*’, no?
Yes and no.
a) 'fold-available-packages' requires to modify the 'lambda' in'find-package-by-description', b) 'fold-package*' returning a 'package' is less tweaks, IMHO.
Well, I agree that on the long term, what 'fold-package*' does couldbe done by 'fold-available-packages' with the adequate 'proc'.
Thank you for the suggestion; even if once re-read correctly v3 2/3you already mentioned it. :-)

Toggle quote (14 lines)> > 2. The function 'package->recutils' in 'guix/ui.scm' is modified but it is not the better.> >> > (match (package-supported-systems p)> > (('cache supported-systems)> > (string-join supported-systems))> > (_> > (string-join (package-transitive-supported-systems p)))))> >> > However it avoids to duplicate code; as it is done in version v3.>> I made suggestions to Arun’s v3 about the API here. Essentially, I> think I proposed having a procedure that takes the list of fields as> keyword parameters, and ‘package->recutils’ would just delegate to that.
Yes, it was already your suggestion in v3 3/3. Do you suggest torefactor 'package->recutils'? For example,
Toggle snippet (10 lines)(define* (package->recutils name version ... all-the-other-fields ... port #:optional (width (%text-width)) #:key (hyperlinks? (supports-hyperlinks? port)) (extra-fields '()))


Toggle quote (4 lines)> > 4. Impolite '@@' is used to access the private license construction.>> (guix licenses) could provide a ‘string->license’ procedure.
Well, do you suggest:
(define (string->license name) (license name #f #f))
? Skipping 'uri' and 'comment'? Naive question: what is the purposeof these 2 fields? Because there are not exposed at the CLI level,AFAIK, and I do not think an user evaluate '(license-uri pkg)' in ascript.
Well, I think that the hyperlink feature could be used to display thelicense URI too. WDYT?


Toggle quote (2 lines)> Stopping here for now because I’m sorta drowning in patch review. :-)
Thank you for all the comments.

Toggle quote (2 lines)> Thanks for exploring this design space, we’re making progress!
My pleasure. Scheme is designed to explore. ;-)

Cheers,simon
L
L
Ludovic Courtès wrote on 3 May 21:49 +0200
(name . zimoun)(address . zimon.toutoune@gmail.com)
87tv0wyf2l.fsf@gnu.org
Hi,
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (10 lines)> On Sun, 3 May 2020 at 18:43, Ludovic Courtès <ludo@gnu.org> wrote:>>> > Therefore the cache '/lib/guix/package.cache' contains more>> > information.>>>> This breaks the binary interface, so we’ll have to analyze the impact of>> such a change and devise a strategy.>> Interface between what and what?
Guix revision N creates a cache that will be read by revision N+1, upon‘guix pull’ completion.
Toggle quote (11 lines)> Note that I have read your comment in v3 2/3 but I did not understand it. Sorry.>> I realize the other cache also has that problem, but it would be nice to> add a version tag to the cache. Basically emit something like:>> (package-metadata-cache (version 0) VECTOR …)>> instead of just:>> (VECTOR …)
Yes, it would be better.
Toggle quote (5 lines)> For example, 'supported-systems' is saved as a list of strings,> whereas 'license' is expanded as 3 strings without be packed in a list> of strings. From my point of view, it is inconsistent and I do not> know what is the best (readibility, startup time, etc.).
I guess both ‘license’ and ‘supported-systems’ should be list ofstrings. It doesn’t really have an impact on startup time (I thoughtyou were suggesting storing the cache as an sexp instead of an objectfile.)
Toggle quote (19 lines)>> Something that must be done explicitly doesn’t seem great to me. As a>> user, I’d rather not think about search indexes and all. But I don’t>> know, maybe if it happened automatically on the first ‘guix search’>> invocation that’d be fine.>> I do not think it is an option to build the BM25 the first time "guix> search" is called. Back-to-envelop estimation, it needs ~25 seconds> to Xapian* to do so.>> From my point of view, two options:> a) "guix pull" does this extra ~25 seconds (compared to 10 seconds to> build the v4 cache)> b) the user manually build the index (I agree it is awkward!)>> Well, the first question is to evaluate if it is worth -- I am using> the v2 version based on Xapian to have an idea. Please if you have> suggestions about query (terms an user could type) and results> (packages an user could expect), there are welcome.
Yeah, dunno. Maybe an option would be to create the index in such a waythat it is substitutable.

[...]
Toggle quote (10 lines)> Yes, it was already your suggestion in v3 3/3. Do you suggest to> refactor 'package->recutils'? For example,>> (define* (package->recutils name version> ... all-the-other-fields ...> port #:optional (width (%text-width))> #:key> (hyperlinks? (supports-hyperlinks? port))> (extra-fields '()))
Yes.
Toggle quote (8 lines)>> > 4. Impolite '@@' is used to access the private license construction.>>>> (guix licenses) could provide a ‘string->license’ procedure.>> Well, do you suggest:>> (define (string->license name) (license name #f #f))
No; rather, it would look up the license in a dictionary and return thecorresponding object or #f if it’s not a known license.
Thanks,Ludo’.
?