guix system hangs on boot with LUKS /home partition

  • Done
  • quality assurance status badge
Details
7 participants
  • aurtzy
  • Benjamin Slade
  • Fulbert
  • Lilah Tascheter
  • Ludovic Courtès
  • Adrien 'neox' Bourmault
  • Remco van 't Veer
Owner
unassigned
Submitted by
Fulbert
Severity
important
Merged with
F
F
Fulbert wrote on 28 Mar 12:24 +0100
(address . bug-guix@gnu.org)
ZgVTiUVfDlM20s8K@bluewin.ch
Hello,

Up to guix 9b84b36, my system was properly booting with a LUKS2 partition
mounted on /home. Starting with guix d5f857a (22 mar 2024), the boot hangs on
the same system using the same configuration.scm file. The only way out I found
when it hangs is hardware shutdown. There are no avaible console nor ssh server
started to help troubleshoot and there is nothing written to /var/log/messages
when it hangs.

I have tried to transfer my /home data to a brand new LUKS1 partition, (as well
as removing pointers to the old LUKS2 partition in my config.scm, of course) and
the problem remains exactly the same, including those error messages (obtained with
a video capture of the screen at boot, after removing 'quiet' from the kernel
command line in grub) :

#+begin_src boot
shepherd[1]: Starting service device-mapping-luks-homes...
shepherd[1]: Service device-mapping-luks-homes failed to start.
shepherd[1]: Exception caught while while starting device-mapping-luks-homes: (unbound-variable #f "Unbound variable: "S" (bytevector?) #f)
#+end_src

Maybe it's worth mentionning that I have then tried one configuration of the
'mapped-device' with 'luks-device-mapping' and another one with
'luks-device-mapping-with-options #:keyfile "/…"'. I also tried one
configuration with the 'source' declared in plain "/dev/..." and another one
declared with the luks '(uuid "…")', but this didnt change anything to the
"symptoms".

So, although I have learned in the process that LUKS2 is not yet fully
supported in guix, this problem also prevents booting using a LUKS1 /home
partition in my case.

Transfering the /home data to a clear (unencrypted) partition is my current
workaround to this problem.

Below is the configuration that has worked for several weeks, if not months, using my LUKS2 /home :

(mapped-devices
(list
(mapped-device
(source (uuid "<the uuid>"))
(target "luks-homes")
(type luks-device-mapping))))

(file-systems
(append
(list
[…]
(file-system (mount-point "/home")
(device (file-system-label "luks-homes"))
(type "ext4")
(dependencies mapped-devices))
[…]

Best regards and thanks for guix !
F
F
Fulbert wrote on 28 Mar 12:49 +0100
bug#70051
(address . 70051@debbugs.gnu.org)
ZgVZLnoaTNJr4AiT@bluewin.ch
… And I forgot to mention that, when the boot hangs, shepherd still responds to ctrl-alt-del by closing some services and then the system hangs with hardware button shutdown as last resort.
F
R
R
Remco van 't Veer wrote on 30 Mar 16:25 +0100
Re: system hangs at boot - LUKS /home/ problem(?)
(name . Fulbert)(address . fulbert@bluewin.ch)
87v853214c.fsf@remworks.net
Hi,

Confirmed on a couple of my installs. I too have an unencrypted root
and encrypted home filesystems. The passphrase prompt never appears and
the system seems to be waiting for something or is halted.

I've git bisected it down to:

commit 6f9d844d2ece7b369d17bbe678978462425f869c (HEAD)
Author: Ludovic Courtès <ludo@gnu.org>
Date: Wed Mar 20 18:48:38 2024 +0100

services: shepherd: Load each service file in a fresh module.


* gnu/home/services/shepherd.scm (home-shepherd-configuration-file)[config]:
Define ‘make-user-module’. Call ‘load’ in ‘save-module-excursion’.
* gnu/services/shepherd.scm (shepherd-configuration-file): Likewise.

Commit 2b052fe3c0fa85e9faa8873a581568ad4c78e151 still works.

Cheers,
Remco
L
L
Lilah Tascheter wrote on 31 Mar 01:56 +0100
same
(address . 70051@debbugs.gnu.org)
ce06e57d3bdec2ec09810ae34849cb0750063afb.camel@lunabee.space
yep I got the same issue too. but, in my case, I have an encrypted root
with three other encrypted partitions, none of them my home. initrd
decryption succeeds, but shepherd device-mapper services fail as usual
A
A
aurtzy wrote on 2 Apr 08:23 +0200
[PATCH] gnu: open-luks-device: Fix unbound variables.
(address . 70051@debbugs.gnu.org)(name . aurtzy)(address . aurtzy@gmail.com)
6b8484d383512fe8f9c2bc65ff495395f6fa0abf.1712036470.git.aurtzy@gmail.com
It seems like `use-modules' never actually worked due to the way it is eval'd
by the Shepherd, and was only apparent after a change that prevented other
module imports from leaking into the namespace. This is fixed by using direct
references instead.

* gnu/system/mapped-devices.scm (open-luks-device): Use direct references for
variables from other modules.


Change-Id: I993798e161c4b4fca6e8a4f14eea5042b184ebc9
---

Hi!

I encountered this issue as well, and think I've figured out what was
happening: `use-modules' appears to not work due to the way g-expressions are
evaluated by Shepherd, so after services became properly isolated in their own
modules, the variable references became no longer available. There's a
comment further down in the file that seems to confirm this ("XXX: We're not
at the top level here...").

Can anyone confirm this patch works for them too?

Cheers,

aurtzy

gnu/system/mapped-devices.scm | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

Toggle diff (29 lines)
diff --git a/gnu/system/mapped-devices.scm b/gnu/system/mapped-devices.scm
index c19a818453..e46e2c7954 100644
--- a/gnu/system/mapped-devices.scm
+++ b/gnu/system/mapped-devices.scm
@@ -201,14 +201,12 @@ (define* (open-luks-device source targets #:key key-file)
#~(let ((source #$(if (uuid? source)
(uuid-bytevector source)
source))
- (keyfile #$key-file))
- ;; XXX: 'use-modules' should be at the top level.
- (use-modules (rnrs bytevectors) ;bytevector?
- ((gnu build file-systems)
- #:select (find-partition-by-luks-uuid
- system*/tty))
- ((guix build utils) #:select (mkdir-p)))
-
+ (keyfile #$key-file)
+ (bytevector? (@ (rnrs bytevectors) bytevector?))
+ (find-partition-by-luks-uuid (@ (gnu build file-systems)
+ find-partition-by-luks-uuid))
+ (system*/tty (@ (gnu build file-systems) system*/tty))
+ (mkdir-p (@ (guix build utils) mkdir-p)))
;; Create '/run/cryptsetup/' if it does not exist, as device locking
;; is mandatory for LUKS2.
(mkdir-p "/run/cryptsetup/")

base-commit: 6e2db85ca83528199a46b002d2592bd4bef017c8
--
2.41.0
R
R
Remco van 't Veer wrote on 2 Apr 14:14 +0200
87zfucneq8.fsf@remworks.net
2024/04/02, aurtzy:

Toggle quote (2 lines)
> Can anyone confirm this patch works for them too?

Yes, it does.

Cheers,
Remco
R
R
Remco van 't Veer wrote on 2 Apr 22:19 +0200
Re: system hangs at boot - LUKS /home/ problem(?)
(name . Benjamin Slade)(address . beoram@gmail.com)
87bk6rjz5u.fsf@remworks.net
2024/04/02, Benjamin Slade:

Toggle quote (4 lines)
> I can't roll back to the earlier commit mentioned by Remco because
> other things/channels depend on me being roughly up-to-date on the
> main guix channel.

Reverting the commit on a local checkout of guix worked for me but isn't
workable of course. I tested the patch provided by aurtzy
(https://issues.guix.gnu.org/70051#5)and that worked worked too.

For now I won't reconfigure my system until this issue is fixed or try
out "guix pull --switch-generation" to go back to some earlier situation
when I really need to deploy some configuration change.

Remco
B
B
Benjamin Slade wrote on 2 Apr 22:00 +0200
87y19v4jst.fsf@gmail.com
I can't roll back to the earlier commit mentioned by Remco because other things/channels depend on me being roughly up-to-date on the main guix channel.

However, I can confirm the issue, as changing my configuration *not* to mount an encrypted /home resolves the boot issue.

I note two things:

a. when I try to configure with an encrypted /home, I get error/warning messages at the end: (earlier I also got a message about the "find-crypthome-by-uuid" process failing; I changed to specify a /dev/sXN device instead)

guix system: warning: exception caught while executing 'start' on service 'device-mapping-crypthome':
error: system*/tty: unbound variable
guix system: warning: some services could not be upgraded
hint: to allow changes to all the systems to take effect, you will need to reboot.

b. no `crypttab' is created (I don't remember how Guix handles encrypted /home's to know whether or not this is expected).


--B.

On Sat, 30 Mar 2024 16:25:07 +0100 (3 days, 4 hours, 30 minutes ago), Remco van 't Veer <remco@remworks.net> wrote:

Toggle quote (24 lines)
> Hi,

> Confirmed on a couple of my installs. I too have an unencrypted root
> and encrypted home filesystems. The passphrase prompt never appears and
> the system seems to be waiting for something or is halted.

> I've git bisected it down to:

> commit 6f9d844d2ece7b369d17bbe678978462425f869c (HEAD)
> Author: Ludovic Courtès <ludo@gnu.org>
> Date: Wed Mar 20 18:48:38 2024 +0100

> services: shepherd: Load each service file in a fresh module.

> Fixes <https://issues.guix.gnu.org/67649>.

> * gnu/home/services/shepherd.scm (home-shepherd-configuration-file)[config]:
> Define ‘make-user-module’. Call ‘load’ in ‘save-module-excursion’.
> * gnu/services/shepherd.scm (shepherd-configuration-file): Likewise.

> Commit 2b052fe3c0fa85e9faa8873a581568ad4c78e151 still works.

> Cheers,
> Remco
A
A
Adrien 'neox' Bourmault wrote on 3 Apr 20:01 +0200
guix system hangs on boot with LUKS /home partition
(address . 70051@debbugs.gnu.org)
8764972d22962dc64f3d6b6ccaf4bbfc2a52517b.camel@gnu.org
I can confirm aurtzy's patch works (just tested on top of
7af70efd7633b0d70091762cf43ce01a86176e8e)
L
L
Ludovic Courtès wrote on 8 Apr 01:13 +0200
control message for bug #70266
(address . control@debbugs.gnu.org)
877ch8ojgf.fsf@gnu.org
merge 70266 70051
quit
L
L
Ludovic Courtès wrote on 8 Apr 01:43 +0200
Re: bug#70266: Failure to open LUKS devices from a Shepherd service
(name . aurtzy)(address . aurtzy@gmail.com)
87wmp8n3h4.fsf@gnu.org
Hi aurtzy,

aurtzy <aurtzy@gmail.com> skribis:

Toggle quote (5 lines)
> This bug has also been reported here: https://issues.guix.gnu.org/70051
>
> I sent a patch that a few others have confirmed fixes the issue:
> https://issues.guix.gnu.org/70051#5

Oops, sorry for not noticing it earlier! (That was a hard-to-debug one
so kudos for the work you and others put in it.)

I pushed these two commits to address the problem:

49f82fca41 mapped-devices: luks: Specify modules needed at the top-level.
6062339156 mapped-devices: <mapped-device-type> can specify modules to import.

It works well in my tests but please let me know if something’s amiss.

Thanks,
Ludo’.
A
A
aurtzy wrote on 8 Apr 03:05 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
a0e3d327-822e-4c8e-84cb-37f91369621c@gmail.com
Hi Ludo',

On 4/7/24 19:43, Ludovic Courtès wrote:
Toggle quote (10 lines)
> Oops, sorry for not noticing it earlier! (That was a hard-to-debug one
> so kudos for the work you and others put in it.)
>
> I pushed these two commits to address the problem:
>
> 49f82fca41 mapped-devices: luks: Specify modules needed at the top-level.
> 6062339156 mapped-devices: <mapped-device-type> can specify modules to import.
>
> It works well in my tests but please let me know if something’s amiss.

Just pulled+reconfigured, and my machine boots just fine with the
problem LUKS device being decrypted as expected again. Thanks!

Cheers,

aurtzy
L
L
Ludovic Courtès wrote on 8 Apr 14:19 +0200
control message for bug #70266
(address . control@debbugs.gnu.org)
87jzl8qc73.fsf@gnu.org
severity 70266 important
quit
L
L
Ludovic Courtès wrote on 8 Apr 14:19 +0200
Re: bug#70266: Failure to open LUKS devices from a Shepherd service
(name . aurtzy)(address . aurtzy@gmail.com)
87le5oqc7b.fsf@gnu.org
Hi,

aurtzy <aurtzy@gmail.com> skribis:

Toggle quote (14 lines)
> On 4/7/24 19:43, Ludovic Courtès wrote:
>> Oops, sorry for not noticing it earlier! (That was a hard-to-debug one
>> so kudos for the work you and others put in it.)
>>
>> I pushed these two commits to address the problem:
>>
>> 49f82fca41 mapped-devices: luks: Specify modules needed at the top-level.
>> 6062339156 mapped-devices: <mapped-device-type> can specify modules to import.
>>
>> It works well in my tests but please let me know if something’s amiss.
>
> Just pulled+reconfigured, and my machine boots just fine with the
> problem LUKS device being decrypted as expected again. Thanks!

Awesome, thanks for confirming, and apologies for introducing this
regression in the first place!

Ludo’.
Closed
F
F
Fulbert wrote on 8 Apr 15:20 +0200
(no subject)
(address . 70051@debbugs.gnu.org)
ZhPvBjCL75qDGDaP@bluewin.ch
guix pull + reconfigure worked for me as well.

Thank you verry much.
?