service urandom-seed takes too long on boot

OpenSubmitted by Robert Vollmert.
Details
5 participants
  • Alex Sassmannshausen
  • Brice Waegeneire
  • Leo Famulari
  • Ludovic Courtès
  • Robert Vollmert
Owner
unassigned
Severity
important
R
R
Robert Vollmert wrote on 25 Jun 2019 20:12
(address . bug-guix@gnu.org)
F88CEF04-9BFA-4886-8A2D-AD84AE278D07@vllmrt.net
On my VPS, booting takes forever (long enough that for a longtime I thought the install had failed). I just rebooted again,and it took over 7 minutes, see attached screenshot.
I would suggest skipping the seeding from /dev/hwrng by defaultif /var/lib/random-seed is available. I’m assuming here that myproblem is not too rare — if it is, an option to disable theseeding from /dev/hwrng seems like a good idea.
Attachment: file
A
A
Alex Sassmannshausen wrote on 26 Jun 2019 11:41
(address . bug-guix@gnu.org)(address . 36380@debbugs.gnu.org)
871rzg3cpw.fsf@gmail.com
Hi Robert,
Robert Vollmert <rob@vllmrt.net> writes:
Toggle quote (9 lines)> On my VPS, booting takes forever (long enough that for a long> time I thought the install had failed). I just rebooted again,> and it took over 7 minutes, see attached screenshot.>> I would suggest skipping the seeding from /dev/hwrng by default> if /var/lib/random-seed is available. I’m assuming here that my> problem is not too rare — if it is, an option to disable the> seeding from /dev/hwrng seems like a good idea.
I'm not sure I'm qualified on best practices with regard to urandom, butanecdotally, my servers are booting pretty fast, within a minute,consistently. This is even when booting a qemu virtual image ratherthan a VPS.
Perhaps something else is going on here?
Alex
L
L
Leo Famulari wrote on 26 Jun 2019 17:47
(name . Robert Vollmert)(address . rob@vllmrt.net)(address . 36380@debbugs.gnu.org)
20190626154721.GA2999@jasmine.lan
On Tue, Jun 25, 2019 at 08:12:28PM +0200, Robert Vollmert wrote:
Toggle quote (4 lines)> On my VPS, booting takes forever (long enough that for a long> time I thought the install had failed). I just rebooted again,> and it took over 7 minutes, see attached screenshot.
Yikes, that's way too long. Can you say what VPS it is?
Toggle quote (5 lines)> I would suggest skipping the seeding from /dev/hwrng by default> if /var/lib/random-seed is available. I’m assuming here that my> problem is not too rare — if it is, an option to disable the> seeding from /dev/hwrng seems like a good idea.
Originally I added the HWRNG read specifically the for VM / VPS use case[0], where the first boot environment is relatively deterministic. Iagree it's superfluous if the random-seed file is handled properly butit's nice to unconditionally have this other entropy source that avoidsthe pitfalls of file-based entropy seeding.
Ideally the hypervisor would seed the guest's HWRNG interface with thehost's /dev/urandom, which would avoid significant delays. It seems theyare using some other more limited resource instead.
Does anyone else have an opinion or experience with this issue? It wouldbe great to know how widespread it is.
[0]https://git.savannah.gnu.org/cgit/guix.git/commit/?id=9a56cf2b5b4970843c215091ea9823a67e077310https://lists.gnu.org/archive/html/guix-devel/2017-12/msg00096.html
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl0Tk4QACgkQJkb6MLrKfwhzQQ/8Dec8zVia6JlNfy5x142pZNTseg3wV2ngdEeJrpViVjhtXRvfMY/UYOPF9mct+VviPHSae8oSJG5S+rPA7xQwXedMdMRsDjdORajEoB7WUZ7FKYWmkRvuhatBbJPisHnYkZXE/+Un4hQEYQV8Ntpbr1hmBmC2DTzqpLbL13nD1lxfjolRg67ShywtTkQOMt81waqRQdyY2tNK6whjgFMfzAyTdsW/kaMzGWgtyI8ze4vus1F4wG1LWVgH47O7q8uRq+y94jKfTN3RzRLhdK7jRtClAZi5nLETwbh+mCO82fiq+/5jCMKHo63EJGpL7LoIDwZCLKC9K+VVpHfUIUpw8nV42eY/2VW6NyF5n/dFszAsbnwNa4vJVpD/w6YAr83y1LwiThz0cq5e/kgW8PWuDTCzUtJCXJ+9fMqZjhxkLbY21yZHyLwu1yKlAP0GY3+77AcDbqvXXF7br19l5B2KNF2TjV5uhssEnnTSy80RWufQQ8N3HeoGbHKFC68Ls/XQ42Zyfy5r+lulPOF0C/2d+pNJJ7aFVqFTw8wpWApwhexafDqdmBoNyNH331J+gzdNZ+SukUoE0x+NkBkgIgedLU/D4tYh7kgPJKGFfZpomLGsry83sgy1wMlE/JvJqQQsgPRp6YD6P+37Vil5ob6KzlV1QmUhH4H4XI/WiUjJnlI==JjKb-----END PGP SIGNATURE-----

R
R
Robert Vollmert wrote on 26 Jun 2019 18:02
(name . Leo Famulari)(address . leo@famulari.name)(address . 36380@debbugs.gnu.org)
51B21C84-982D-4DC0-AEA7-A32EA0F855B8@vllmrt.net
Toggle quote (9 lines)> On 26. Jun 2019, at 17:47, Leo Famulari <leo@famulari.name> wrote:> > On Tue, Jun 25, 2019 at 08:12:28PM +0200, Robert Vollmert wrote:>> On my VPS, booting takes forever (long enough that for a long>> time I thought the install had failed). I just rebooted again,>> and it took over 7 minutes, see attached screenshot.> > Yikes, that's way too long. Can you say what VPS it is?
It’s with arpnetworks.com, their default “small" VPS. I don’treally know more; here’s some dmesg output that might be relevant,happy to provide more info.
[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1.1 04/01/2014[ 0.000000] Hypervisor detected: KVM[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00[ 0.000000] kvm-clock: cpu 0, msr 2a782001, primary cpu clock[ 0.000000] kvm-clock: using sched offset of 1160634602574609 cycles[ 0.000002] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns[ 0.000004] tsc: Detected 3066.774 MHz processor
L
L
Ludovic Courtès wrote on 27 Jun 2019 17:20
(name . Leo Famulari)(address . leo@famulari.name)
87zhm3xdfu.fsf@gnu.org
Hi Leo,
Leo Famulari <leo@famulari.name> skribis:
Toggle quote (7 lines)> On Tue, Jun 25, 2019 at 08:12:28PM +0200, Robert Vollmert wrote:>> On my VPS, booting takes forever (long enough that for a long>> time I thought the install had failed). I just rebooted again,>> and it took over 7 minutes, see attached screenshot.>> Yikes, that's way too long. Can you say what VPS it is?
We had a “bug report” athttps://distrowatch.com/weekly.php?issue=20190624#guixsd, which may bedue to the same issue:
The first time I loaded Guix the boot process took an unusually long time. At one point the system appeared to lock up for about five minutes before continuing. In the end, from boot menu to graphical login screen, the start-up time totalled about ten minutes.
The author says they were running Guix in VirtualBox. I’m glad Robert’sbug report has more info than that one. :-)
What should we do?
Ludo’.
L
L
Ludovic Courtès wrote on 27 Jun 2019 17:20
control message for bug #36380
(address . control@debbugs.gnu.org)
87y31nxdfe.fsf@gnu.org
severity 36380 importantquit
L
L
Leo Famulari wrote on 27 Jun 2019 21:03
Re: bug#36380: service urandom-seed takes too long on boot
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190627190314.GA7403@jasmine.lan
On Thu, Jun 27, 2019 at 05:20:21PM +0200, Ludovic Courtès wrote:
Toggle quote (9 lines)> We had a “bug report” at> <https://distrowatch.com/weekly.php?issue=20190624#guixsd>, which may be> due to the same issue:> > The first time I loaded Guix the boot process took an unusually long> time. At one point the system appeared to lock up for about five> minutes before continuing. In the end, from boot menu to graphical> login screen, the start-up time totalled about ten minutes.
Perhaps, but if the reason for the slowness on their first boot was asuboptimal /dev/hwrng source, I would expect it to be equally slow foreach boot, since we unconditionally read 64 bytes each time.
However, their next sentence says, "Curiously, after the first boot,Guix started up considerably faster, generally taking less than a minuteto arrive at the login screen."
They are using an old machine with a spinning disk, so who knows...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl0VEvAACgkQJkb6MLrKfwi72Q/+KFZ5pVsyeDr/ouu9JIGSCfc+cCpSu88cOW0MhtWqLN+J38Gt+mtHHPIqn847ae2WKe+Sa/snMFmqRAkUgUw4vcjJgYFPwPU2f+EQP6Hl422mgdKknbMJbsaNpoOQJyf/I5sY96BPYHICegFfHUVd8NHwDTFbcPnh6TfU9WN/fx4CMuG6fy5Ten3/NSZ6m7QFYncrwvb9J9JJmNFl3LFhulA4Y68/lFnGBL3xVSHNDndOMBUMssg6EmsIy+oLIYiUsfdjfW/OsvZQNqdbGszgk0Ifany8n+u4KOTlMjv7mhIlXPyL0TfkU3XW5Vc5R7ga7f4jxRo2cDAIP55HhZ+lc3Y5nr59mnxzw/bIPWZpdtqYqE0zOlq4EIQ5eQFvSZdQv4yLXp5pD8hSxuLlh3Gj3qjqIqgFqf54d3XL+bWd+X1M6F1u+pKKLaQajs/O3yoR1xr11TH9fK2psA3FWwZWVal7mc/ujDHtd+njvf56jrbBZ10wqZlAkhPf2mbecVMNw9udQCrY+crivuiF+oCjYjNBQ85l7mUed92xK1eLcraPYJLpN/8ZEsdJHvBqSQOBkhQvstvC1GaSAo5xAZj+mX0UblRrM5CqVyZEDgBvqnX3cuayjRWUUthEmoHJvT0CpCvQHDhYI/gSNFJhLPBOY11fZtoV5oS3XYKrmcgz0/Q==iTHL-----END PGP SIGNATURE-----

L
L
Leo Famulari wrote on 27 Jun 2019 21:19
(name . Robert Vollmert)(address . rob@vllmrt.net)(address . 36380@debbugs.gnu.org)
20190627191915.GA9591@jasmine.lan
On Wed, Jun 26, 2019 at 06:02:03PM +0200, Robert Vollmert wrote:
Toggle quote (4 lines)> It’s with arpnetworks.com, their default “small" VPS. I don’t> really know more; here’s some dmesg output that might be relevant,> happy to provide more info.
Okay, I've asked them about it on IRC:
http://irclogger.arpnetworks.com/irclogger_log/arpnetworks?date=2019-06-27,Thu
Let's wait and see what they think about the issue.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl0VFrIACgkQJkb6MLrKfwjsXBAA71rlRFfbPpKY+ReKmJXfl9BygIKBSdiyI3Bm02DDUNQthEmXixJlZ/BmmBPaU7tnj83qlmCuF1z08oDj0hXgrvNTX9V4+BGK7IXLGPiG1ij3+/c6n+gW1ywGBhanbNXiarEZ3gQ1fN0f25qOqmqRQnJM+qx+RgelIJUsJ1TNHxE056fi1AiayqBcG70FgAVIn5ChrgiooK36S7FbM4d0KbHjcQSH7v594/iVXPyc5oOZKuIyWDuybBTtY8SdVT6Y151CwNXVia8+Y0g+esl3Pl20JF/V1CJUI7ZSLtTx0hvnDOuIa86xYkU5ELfZ6D4h+BhaOKnUCTkvfc2RaEsAL9+oVLvteAraW62lN3+7Mq7vtXvlNvBXsuXtb9FHQA96m8zwChJPb0lZIsQ1NBPqF7ToahfZLl9XzlKgyaD8Slca5+v3MnqelefAr/+1rizG3a0KJIuZ95P3NSemPe/R3CNuiCmzgv/njS9V7MuvZivFxPKcy6L9U5zJC/GItmv07UriYoSwT4mqBFTQVAf00dKUVXelDKpIsvJPb54/s8ZUVV+xSruefoRWl7FHDWgu2Q1tmHLA/G7oIKI6PVUmL/CtgR5eRGPzOKm/c8Waq2XIsgshugvBkf6SFlQWQ7BOQhxDnHBWUpYv/4P41F7NXXXeuAdCqkC5qfgUp4QdoMM==Vv23-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 27 Jun 2019 22:00
(name . Leo Famulari)(address . leo@famulari.name)
87r27evlwa.fsf@gnu.org
Leo Famulari <leo@famulari.name> skribis:
Toggle quote (14 lines)> On Thu, Jun 27, 2019 at 05:20:21PM +0200, Ludovic Courtès wrote:>> We had a “bug report” at>> <https://distrowatch.com/weekly.php?issue=20190624#guixsd>, which may be>> due to the same issue:>> >> The first time I loaded Guix the boot process took an unusually long>> time. At one point the system appeared to lock up for about five>> minutes before continuing. In the end, from boot menu to graphical>> login screen, the start-up time totalled about ten minutes.>> Perhaps, but if the reason for the slowness on their first boot was a> suboptimal /dev/hwrng source, I would expect it to be equally slow for> each boot, since we unconditionally read 64 bytes each time.
Perhaps VirtualBox behaves this way? For instance, the VM was rebootedbut VirtualBox itself was still running, and thus it had a good randomseed to start from on the second boot (does that make sense?). I guesswe should try.
Toggle quote (2 lines)> They are using an old machine with a spinning disk, so who knows...
Still, what else could be taking this long?
Thanks,Ludo’.
R
R
Robert Vollmert wrote on 28 Jun 2019 08:47
(name . Leo Famulari)(address . leo@famulari.name)
EA773339-2CC9-486E-99B8-A2E71B34467E@vllmrt.net
Toggle quote (6 lines)> On 27. Jun 2019, at 21:03, Leo Famulari <leo@famulari.name> wrote:
> Perhaps, but if the reason for the slowness on their first boot was a> suboptimal /dev/hwrng source, I would expect it to be equally slow for> each boot, since we unconditionally read 64 bytes each time.
It’s 512 bytes, not that that should fundamentally change anything.
L
L
Leo Famulari wrote on 28 Jun 2019 19:24
(name . Robert Vollmert)(address . rob@vllmrt.net)
20190628172401.GA17073@jasmine.lan
On Fri, Jun 28, 2019 at 08:47:35AM +0200, Robert Vollmert wrote:
Toggle quote (7 lines)> > On 27. Jun 2019, at 21:03, Leo Famulari <leo@famulari.name> wrote:> > Perhaps, but if the reason for the slowness on their first boot was a> > suboptimal /dev/hwrng source, I would expect it to be equally slow for> > each boot, since we unconditionally read 64 bytes each time.> > It’s 512 bytes, not that that should fundamentally change anything.
Oh right, my bad. It's been a while...
Anyways, this should either work immediately or fail. Aside fromgetrandom(2), which we aren't using here, nothing related to this stuffshould ever block, and if it does then it's a bug we need to workaround.
So, I suggest we add a 1 second timeout to this read.
I can work on that next week.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl0WTSsACgkQJkb6MLrKfwg/Fw//TuhAY/itosOCYb0BXcYPmNwKHSNM0BpI5nZk1v7vbyUTlEO2Hvr11ST2sbuAyvwvD7CER2dNp2ep9YeSGQ+jh2f0ZV/FI/VlSJo3pu53wCvPndwaJWfuUl+D1NI4v7m4Zgl4ZUnQRdcxaGHBf/waw0z3uRXK5lIxMQBB5895Bt/REkxLuz6E8JmqPJw86Wofhl6gR4CqznEv8GWraaNQvJmQxGMYAuENyxP+HTrtUS9BB46X7sOLO2jRSXMmp9UWDddsCs3FXEMxbnGZXVxLgwmY2NVVlNDU5dcrKy4qtwnAg73PriLT0XXBUXD2dWk6oZv/O7zLibsKacpZ5+dIoamyH+Y9fjGfGhmohVdYgYdB82I2cZw7OcYhD9RdcIKU1b+N56i+zVJirXOPg5iLDv+nMG6xJuo52JucH2HVhY4RNux2OTaUQxPopnHj4iKYYDJ8+JpA/1DhD0L3DZdigLSqFWmRAkAwrXCvZMotl5oOxwBozlcdNR6MOa11OrNUdL29G4Bo9VMNLZWgd8Mcpb6HAUfU/kzM4e6IauUFcakvKF+v8otwujUOfYSuim286GefG85PEJDxS7ak7G1LslrjVts8J/4CyQ5FYlvJ9TkG/Uh8dGA+M8bvbk96AOQxRyyF/j/Y3Pll9U8ltdr+vhU8DlOWZRihct5FYmid1Gk==//G7-----END PGP SIGNATURE-----

L
L
Leo Famulari wrote on 11 Jul 2019 19:44
(name . Robert Vollmert)(address . rob@vllmrt.net)
20190711174455.GA30457@jasmine.lan
On Fri, Jun 28, 2019 at 01:24:01PM -0400, Leo Famulari wrote:
Toggle quote (4 lines)> So, I suggest we add a 1 second timeout to this read.> > I can work on that next week.
I did try working on this, after reading the code in (guix scriptsoffload (call-with-timeout)).
But, I couldn't make it work at all — it always fails, with all theservices depending on urandom-seed-service failing, breaking the boot.
I don't know how to debug or work interactively in this part of thesystem.
Can anybody help?
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl0ndZQACgkQJkb6MLrKfwj4mQ//So6Z5OLmFEBI0QRaB1NuH1t12FKqGucuQvtuWPIWr9m1sreYDgHMk5oY35b7p74t3tasrD8B8ePI2kuKvLX7+rIORPXlARIBLtSvojmXTzpaslBr+FiIxzXSkFoLEFr/aT3BpxqVL0I3GG+ppz64B8aMUWcWRcFNQkUsg/FZttEx00AQ3dDrC8w3SSTgsVhWArIt1C8Ypr0Va5ALok6LMIBPa78PdDUS6W5sGXsskAjc+InAG8OKD/0whOtCLjwCTCTETElwHpDN1lKsINvvDWFpoqXiwzzcVlDp+7XONXAHdGlaq+b3PIGw7doOXFiTvaTNLqNRKfXJg2TRFTcrWEU8vmoNw/Stvyf2QeCDFtRoZ1hLbauYjzsJ+fJbrn80VZrEhIvQu/3ZZhjI8jTX81i/VvjxWKrXetRvgsb6WLL2s+L7O9BG8iRkBbyork1OjvXBv+Up0PKbDwx+rLDAvlFrrrxd1G4eVgwYAsHB8TdRb/8sfoclhIUaIDevyOr1MyVQHSbdn8iV2ag0Gbq9oAxMnk6l3p2p62Hv4r//ZpLfjVM01xC5BvbeW8GoH2YJuFzylEluJlvPcO3LauqlXvbWUaRoTZ3BrIOdPtrbNrXCJJw/IZoLcOCrwu/7Phf4Teq0tI22VY4PX4Q+OaGwLyLmdajJwspLGHH+dlEFybg==gN/V-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 11 Jul 2019 23:33
(name . Leo Famulari)(address . leo@famulari.name)
87k1contnw.fsf@gnu.org
Hi Leo,
Leo Famulari <leo@famulari.name> skribis:
Toggle quote (8 lines)> On Fri, Jun 28, 2019 at 01:24:01PM -0400, Leo Famulari wrote:>> So, I suggest we add a 1 second timeout to this read.>> >> I can work on that next week.>> I did try working on this, after reading the code in (guix scripts> offload (call-with-timeout)).
The ‘start’ method of the ‘urandom-seed’ Shepherd service runs in PID 1,so we certainly don’t want to fiddle with SIGALRM in that context, whichis what ‘call-with-timeout’ does.
Instead, I think we should use ‘select’ with a timeout:
(call-with-input-file "/dev/hwrng" (lambda (port) (match (select (list port) '() '() 3) …)))
I think we can then use ‘get-bytevector-n!’, assuming it doesn’t blockif less than COUNT bytes are available (I’m not sure this is the case.)
HTH!
Ludo’.
R
R
Robert Vollmert wrote on 17 Jul 2019 23:04
related article (Debian)
(address . 36380@debbugs.gnu.org)
B4E22190-D378-45C8-A2B8-DC7D59EF9B52@vllmrt.net
Just ran across this article about Debian dealing with similar issues.
https://daniel-lange.com/archives/152-hello-buster.html
One of the suggested work-arounds is to rely on virtio_rng onvirtual machines, but it seems Guix already uses that on my machine,so perhaps it doesn’t help:
rob@garp ~$ cat /sys/devices/virtual/misc/hw_random/rng_available virtio_rng.0 rob@garp ~$ cat /sys/devices/virtual/misc/hw_random/rng_current virtio_rng.0
B
B
Brice Waegeneire wrote on 22 Mar 09:43 +0100
Re: bug#36380: service urandom-seed takes too long on boot
(address . 36380@debbugs.gnu.org)
a24b84c858011117fd9ea7129af7232c@waegenei.re
Hello,
It would be nice if we could reproduce this issue.
Robert Vollmert <rob@vllmrt.net> writes:
Toggle quote (4 lines)> Just ran across this article about Debian dealing with similar issues.> > https://daniel-lange.com/archives/152-hello-buster.html
This article has been updated since then with a section[0] about a fixauthored by Linus[1][2] and merged in Linux 5.4. The gist of it that now`getrandom()' will actively try to collect entropy in early boot, if itis missing, by using the CPU jitter. The Debian wiki is saying the same[3].
Since we are using actually using Linux-libre 5.4 as the default kernelthis issue should probably be closed.
[0]: https://daniel-lange.com/archives/152-hello-buster.html#linustotherescue[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3f2dc2798b81531fd93a3b9b7c39da47ec689e55[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=50ee7529ec4500c88f8664560770a7a1b65db72b[3]: https://wiki.debian.org/BoottimeEntropyStarvation
Brice.
L
L
Leo Famulari wrote on 22 Mar 21:19 +0100
(name . Brice Waegeneire)(address . brice@waegenei.re)(address . 36380@debbugs.gnu.org)
20200322201919.GC16716@jasmine.lan
On Sun, Mar 22, 2020 at 08:43:33AM +0000, Brice Waegeneire wrote:
Toggle quote (5 lines)> This article has been updated since then with a section[0] about a fix> authored by Linus[1][2] and merged in Linux 5.4. The gist of it that now> `getrandom()' will actively try to collect entropy in early boot, if it> is missing, by using the CPU jitter. The Debian wiki is saying the same[3].
The issue here is not related to getrandom() or our kernel. I think thebug is still relevant.
The Guix system unconditionally reads from /dev/hwrng if it exists, andthere is no reason for that to take a noticeable amount of time.
But this bug report revealed that some VPS providers have a brokendeployment that does cause delays. Who knows how they are feeding/dev/hwrng... they would not reply to my questions.
It doesn't really matter though, the problem is ours to fix.
We need to make this read time out after a second, but in the past Icould not figure out how to do this without crashing the system (I'm nota strong Schemer).
Help is still wanted!
?