Dear Arun, Here, I am speaking about only the first patch: the cut-off. TL;DR: 1. I was wrong about the bottleneck. 2. The queries were not the good ones to see a clear effect -- on my machine. On Sat, 13 Jun 2020 at 22:51, Arun Isaac wrote: > Yes, I did read your earlier mail. And, I tried again, this time with > patch 1 alone. It certainly makes a difference on my machine. It is > clear from the code logic that it should make a difference on your > machine as well, at least for longer queries. But, somehow it isn't and > I do not understand why. :-( Well, I spent some hours* to do some stats (Student's t-test). Roughly speaking, on my machine, the standard deviation error (stddev) hides the point -- depending on the query -- and that's why I am not always seeing the improvement, I guess. *ah all my Sunday in fact. ;-) I compared different conditions for the query "game strategy": - cold vs warm - xterm vs shell in Emacs (my config vs -q) - no pipe vs pipe And I run 10 times in a row each experiment. The conclusion is: in average -- on my machine -- the cut-off improves. But sometimes considering only 3 repeats in a row, the improvement is not obvious (on the mean); because the both tails of distribution overlap a bit on my machine and so it is kind of bad luck. And it is ``worse'' depending against which commit your patch is rebased: a357849 (old) vs e782756. The t-test captures this variation, even with only 3 repeats, but I have not done in my previous email and only compared the visible mean. Sorry about that. Moreover, printing increases the stddev, so the results are more fluctuating inside Emacs vs xterm and piping helps in this case. Piping does not change the final result -- hopefully. :-) It adds an extra time but in average it is the same. About cold vs warm cache, I notice that the improvement is not the same (in average). Considering the raw time, there is a difference about 10% (with "good" confidence); it could be worth to understand why. Well, considering that, I did other stats with other queries and the conclusion for my machine is that *the patch improves* on average by reducing the timing for typical usages. Which is really cool! :-) I definitively have wrong about the bottleneck and this one could be one. One way to have an idea is to use "statprof" but it is hard for me to read the results (I believe Guile master have a fix improving the 'anon #addr', but do not really know more). --8<---------------cut here---------------start------------->8--- $ /tmp/v5-1/bin/guix repl scheme@(guix-user)> ,use(guix scripts search) scheme@(guix-user)> ,pr (guix-search "game" "strategy") % cumulative self time seconds seconds procedure 17.81 0.29 0.27 anon #xe40178 12.33 0.20 0.18 ice-9/boot-9.scm:2201:0:%load-announce 12.33 0.18 0.18 anon #xe3c770 5.48 0.08 0.08 ice-9/boot-9.scm:1396:0:symbol-append 4.11 1.57 0.06 guix/memoization.scm:100:0 4.11 0.06 0.06 ice-9/popen.scm:145:0:reap-pipes 2.74 0.55 0.04 guix/ui.scm:1511:12 2.74 0.33 0.04 ice-9/regex.scm:170:0:fold-matches 2.74 0.04 0.04 ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress? 2.74 0.04 0.04 texinfo/string-utils.scm:98:5 2.74 0.04 0.04 ice-9/vlist.scm:539:0:vhash-assq 1.37 69.81 0.02 ice-9/threads.scm:388:4 [...] --- Sample count: 73 Total time: 1.490955132 seconds (0.387756476 seconds in GC) --8<---------------cut here---------------end--------------->8--- To compare with the default: --8<---------------cut here---------------start------------->8--- time seconds seconds procedure 24.47 0.49 0.46 anon #x1d89178 21.28 0.40 0.40 anon #x1d85770 9.57 0.20 0.18 ice-9/boot-9.scm:2201:0:%load-announce 3.19 4.71 0.06 ice-9/boot-9.scm:1673:4:with-exception-handler 3.19 1.64 0.06 guix/memoization.scm:100:0 3.19 0.06 0.06 ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress? 3.19 0.06 0.06 anon #x1d84c78 3.19 0.06 0.06 ice-9/popen.scm:145:0:reap-pipes 2.13 1.01 0.04 guix/ui.scm:1511:12 2.13 0.08 0.04 ice-9/boot-9.scm:1396:0:symbol-append 2.13 0.04 0.04 anon #x1d83248 1.06 0.30 0.02 anon #x7f057e6c90e8 [...] --8<---------------cut here---------------end--------------->8--- So clearly the patch has an effect! If someone knows what is: - ice-9/boot-9.scm:2201:0:%load-announce - ice-9/boot-9.scm:1396:0:symbol-append and from where they could come from, it could help. :-) Well, I am interested to know which part is the Regex Engine and the string search. :-) Linking to the discussion about KMP and others. > Here are more fresh results. Could you try for longer queries like > "strategy game caesar" and without the output being piped to recsel, > grep, etc.? For simplicity, let's talk only about warm cache results. > > |----------------------------------+--------+-------| > | query | before | after | > |----------------------------------+--------+-------| > | guix search strategy game | 2.58 | 1.96 | > | guix search strategy game caesar | 2.95 | 1.76 | > |----------------------------------+--------+-------| At first, I was confused why one more terms returns faster. This is because the query "caesar" returns only one package so the query "strategy game caesar" cuts off all the packages when searching the terms "game" and then "strategy". I mean guix search julius should be as long as guix search strategy game caesar It is; in average on my machine. And secondly, I was confused because the timing of the query "caesar strategy game" is almost the same (2.8% +/- 2.5% with 99.0% of confidence; 10 repeats). Well, it is because in one case the term "caesar" is applied to 15 packages and in another case the terms "strategy" and "game" are applied to 1 package. Adding some stddev error and not enough repeats (nor good stats), the confusion is complete and my conclusion is wrong. That's said, the effect of the cut-off is clear (on my machine even with on shot) with the queries: - game strategy the - the game strategy Thank you, simon