ISO installer image is broken on i686

  • Done
  • quality assurance status badge
Details
6 participants
  • Gábor Boskovits
  • Brice Waegeneire
  • Ludovic Courtès
  • pelzflorian (Florian Pelz)
  • Thomas Schmitt
  • swedebugia
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
serious
Merged with
L
L
Ludovic Courtès wrote on 6 Dec 2018 01:02
(name . Bug Guix)(address . bug-guix@gnu.org)
87d0qfwmih.fsf@gnu.org
Hello,

The ISO installer image as produced on commit
4a0b87f0ec5b6c2dcf82b372dd20ca7ea6acdd9c by

guix system disk-image --file-system-type=iso9660 \
-s i686-linux gnu/system/install.scm

contains unreadable file(s), at least /var/guix/db/db.sqlite.

The build at https://hydra.gnu.org/build/3151513 (2018-11-12,
64461ba20a07a0cf3197de3e97cb44e0f66cebdc) seems is the only occurrence
of the problem I could find on the build farms: while running the
installation off the ISO image, it fails like this:

Toggle snippet (18 lines)
+ guix --version
guix (GNU Guix) 0.15.0-6.f9a8fce
Copyright (C) 2018 the Guix authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ export GUIX_BUILD_OPTIONS=--no-grafts
+ GUIX_BUILD_OPTIONS=--no-grafts
+ guix build isc-dhcp
[ 95.076694] attempt to access beyond end of device
[ 95.080672] sr0: rw=524288, want=2118580, limit=2115840
[ 95.082317] attempt to access beyond end of device
[ 95.083730] sr0: rw=0, want=2118332, limit=2115840
[ 95.097050] attempt to access beyond end of device
[ 95.098175] sr0: rw=0, want=2118332, limit=2115840
guix build: error: build failed: cannot open Nix database `/var/guix/db/db.sqlite'

Indeed, if you spawn the image and run “cat /var/guix/db/db.sqlite”, it
fails with EIO and “attempt to access beyond end of device.” I suspect
the bugs Mark reported at https://issues.guix.info/issue/33362 and

My guess is that the bug has always existed on ‘core-updates’ since
https://berlin.guixsd.org/build/662745 (‘master’, 2018-11-30, i.e.,
just before ‘core-updates’ was merged) shows a successful installation.

I tried running the ISO image in qemu-system-{x86_64,i386}, with and
without KVM, and the I/O errors are always there, including with a
pre-core-updates QEMU.

I tried reverting xorriso to 1.4.8 to no avail (which is not surprising
since xorriso was upgraded on 2018-09-18 and the successful installation
above which 2018-11-30.)

At this point I can only suspect a toolchain issue, probably binutils or
libc since gcc didn’t change.

Thoughts?

This is holding the 0.16.0 release and I’m unavailable to do it next
week and with little time over the next few days. Thus I’m considering
exceptionally releasing without the i686 GuixSD install image; thoughts?
The rest is all fine and ready to ship.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 6 Dec 2018 08:15
control message for bug #33639
(address . control@debbugs.gnu.org)
87in07m8h2.fsf@gnu.org
severity 33639 serious
L
L
Ludovic Courtès wrote on 6 Dec 2018 08:19
Re: bug#33639: ISO installer image is broken on i686
(address . 33639@debbugs.gnu.org)
87efavm8b3.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (8 lines)
> The ISO installer image as produced on commit
> 4a0b87f0ec5b6c2dcf82b372dd20ca7ea6acdd9c by
>
> guix system disk-image --file-system-type=iso9660 \
> -s i686-linux gnu/system/install.scm
>
> contains unreadable file(s), at least /var/guix/db/db.sqlite.

I can reproduce the I/O error by mounting the image:

Toggle snippet (18 lines)
ludo@ribbon ~/src/guix$ sudo losetup /dev/loop0 /gnu/store/1yanxg3cz5wi6vhpvhipxvmjwm201fbm-image.iso
ludo@ribbon ~/src/guix$ sudo mount -t iso9660 /dev/loop /mnt/disk/
mount: /mnt/disk: WARNING: device write-protected, mounted read-only.
ludo@ribbon ~/src/guix$ cat < /mnt/disk/var/guix/db/db.sqlite > /dev/null
cat: -: Eraro de en-eligo
ludo@ribbon ~/src/guix$ dmesg |tail
[ 41.186408] shepherd[1]: Service guix-daemon has been started.
[ 45.725418] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[ 45.933911] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[ 49.496112] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 49.496165] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[ 203.358136] ISO 9660 Extensions: RRIP_1991A
[ 215.199352] attempt to access beyond end of device
[ 215.199357] loop0: rw=524288, want=1903876, limit=1899264
[ 215.199362] attempt to access beyond end of device
[ 215.199363] loop0: rw=0, want=1903532, limit=1899264

So the problems lies with the VM that creates the image.

Ludo’.
S
S
swedebugia wrote on 6 Dec 2018 10:35
(name . Ludovic Courtès)(address . ludo@gnu.org)
8047bf42762c6c4f8106689097afa32d@riseup.net
On 2018-12-06 01:02, Ludovic Courtès wrote:
snip

Toggle quote (26 lines)
> Indeed, if you spawn the image and run “cat /var/guix/db/db.sqlite”, it
> fails with EIO and “attempt to access beyond end of device.” I suspect
> the bugs Mark reported at <https://issues.guix.info/issue/33362> and
> <https://issues.guix.info/issue/33555> are related.
>
> My guess is that the bug has always existed on ‘core-updates’ since
> <https://berlin.guixsd.org/build/662745> (‘master’, 2018-11-30, i.e.,
> just before ‘core-updates’ was merged) shows a successful installation.
>
> I tried running the ISO image in qemu-system-{x86_64,i386}, with and
> without KVM, and the I/O errors are always there, including with a
> pre-core-updates QEMU.
>
> I tried reverting xorriso to 1.4.8 to no avail (which is not surprising
> since xorriso was upgraded on 2018-09-18 and the successful installation
> above which 2018-11-30.)
>
> At this point I can only suspect a toolchain issue, probably binutils or
> libc since gcc didn’t change.
>
> Thoughts?
>
> This is holding the 0.16.0 release and I’m unavailable to do it next
> week and with little time over the next few days. Thus I’m considering
> exceptionally releasing without the i686 GuixSD install image; thoughts?

Ok, I see.

Has anybody tested that guix pull from 0.15 -> 0.16 works on an install
ISO? (I don't know if we want/agreed to support this at all but 1 bug
suggests problems related to https: )

I say go for release and note it on the download page and provide
0.15-i686 image for now.

I'm using i686 GuixSD on my devlaptop.

--
Cheers
Swedebugia
L
L
Ludovic Courtès wrote on 6 Dec 2018 11:34
874lbrkkog.fsf@gnu.org
Dear Xorriso hackers,

While building an ISO for i686, running Xorriso 1.5.0 built for i686
(actually ‘grub-mkrescue’, but that’s just a wrapper around Xorriso) in
qemu-system-i386, we end up with an ISO image containing files that lead
to I/O errors (“attempt to access beyond end of device”):

Toggle snippet (18 lines)
ludo@ribbon ~/src/guix$ sudo losetup /dev/loop0 /gnu/store/1yanxg3cz5wi6vhpvhipxvmjwm201fbm-image.iso
ludo@ribbon ~/src/guix$ sudo mount -t iso9660 /dev/loop /mnt/disk/
mount: /mnt/disk: WARNING: device write-protected, mounted read-only.
ludo@ribbon ~/src/guix$ cat < /mnt/disk/var/guix/db/db.sqlite > /dev/null
cat: -: Eraro de en-eligo
ludo@ribbon ~/src/guix$ dmesg |tail
[ 41.186408] shepherd[1]: Service guix-daemon has been started.
[ 45.725418] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[ 45.933911] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[ 49.496112] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 49.496165] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[ 203.358136] ISO 9660 Extensions: RRIP_1991A
[ 215.199352] attempt to access beyond end of device
[ 215.199357] loop0: rw=524288, want=1903876, limit=1899264
[ 215.199362] attempt to access beyond end of device
[ 215.199363] loop0: rw=0, want=1903532, limit=1899264

The output of Xorriso and the kernel when it builds the image looks
good.


Using the exact same build process for x86_64 leads to valid ISO images.

Does that ring a bell or would you have advice to further debug it?

Thanks,
Ludo’.
T
T
Thomas Schmitt wrote on 6 Dec 2018 15:08
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
22800682362436954162@scdbackup.webframe.org
Hi,

Toggle quote (2 lines)
> [ 215.199357] loop0: rw=524288, want=1903876, limit=1899264

This looks much like a truncated ISO image. (For what reason ever.)

There are at least 4612 blocks = ~ 9 MiB missing.
In the original message of https://issues.guix.info/issue/33639the
the minimum missing size is about 5 MiB.

Please consider local reasons for truncated ISO images.

In the following i will concentrate on a potential program bug.


Toggle quote (3 lines)
> [...] running Xorriso 1.5.0 built for i686 [...] I/O errors [...]
> Using the exact same build process for x86_64 leads to valid ISO images.

Well, this would explain why 1.5.0 passed a regression test on my 64 bit
system with repacking about 200 ISOs, mounting them, and comparing them
with the monted original ISOs.
I currently lack of opportunities to build 32 bit xorriso.

Is there such a damaged ISO available for download ?

How much effort would it be to create a Guix installation for building
xorriso, running your ISO production, and possibly running xorriso under
gdb ?
(Something for a run like

qemu-system-i386 \
-enable-kvm \
-nographic \
-m 512 \
-net nic \
-net user,hostfwd=tcp::5555-:22 \
-hda guix_on_qemu.img

with the opportunity to login from the host machine via SSH.
)

What do you get from this xorriso inspection run on a damaged ISO ?
(I tested it with the ISO from https://www.gnu.org/software/guix/download/):

xorriso -indev guixsd-install-0.15.0.i686-linux.iso \
-find / -sort_lba -exec report_lba -- \
>/tmp/xorriso_indev_find.txt 2>&1

In a preliminary test with
guixsd-install-0.15.0.i686-linux.iso
i get in /tmp/xorriso_indev_find.txt :

...
Media summary: 1 session, 454094 data blocks, 887m data, 384g free
...
Report layout: xt , Startlba , Blocks , Filesize , ISO image path
File data lba: 0 , 8527 , 1440 , 2949120 , '/efi.img'
... many other files ...
File data lba: 0 , 453781 , 122 , 249856 , '/var/guix/db/db.sqlite'

The ISO image file size is 929984512 bytes = 454094 blocks.
The image by its inner size counter also claims 454094 blocks.
The data file with the highest storage address ends before block
453781 + 122 = 453903.
That's 191 blocks before the image end. Padding and GPT backup follow.
(The data block size is 2048 bytes.)

So this image looks ok. Let's read all its files:

# mount guixsd-install-0.15.0.i686-linux.iso /mnt/iso
mount: /dev/loop0 is write-protected, mounting read-only
$ tar cf - /mnt/iso | wc
tar: Removing leading `/' from member names
7116387 35887498 1042391040
$

No i/o error.


Unrelated observation:
xorriso command -pvd_info reports that the ISO was made with xorriso-1.4.8
with
Creation Time: 1970010119010649
This means "1 Jan 1970 19:01:06". Something seems to be wrong with the
system clock of the producer machine.


Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 6 Dec 2018 16:34
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87va46is9h.fsf@gnu.org
Hi Thomas,

Thanks for the quick and insightful reply!

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (8 lines)
>> [ 215.199357] loop0: rw=524288, want=1903876, limit=1899264
>
> This looks much like a truncated ISO image. (For what reason ever.)
>
> There are at least 4612 blocks = ~ 9 MiB missing.
> In the original message of https://issues.guix.info/issue/33639 the
> the minimum missing size is about 5 MiB.

OK.

Toggle quote (2 lines)
> Please consider local reasons for truncated ISO images.

I’ve thought about this but that seem highly unlikely at this point.

Toggle quote (2 lines)
> Is there such a damaged ISO available for download ?

No.

Toggle quote (13 lines)
> How much effort would it be to create a Guix installation for building
> xorriso, running your ISO production, and possibly running xorriso under
> gdb ?
> (Something for a run like
>
> qemu-system-i386 \
> -enable-kvm \
> -nographic \
> -m 512 \
> -net nic \
> -net user,hostfwd=tcp::5555-:22 \
> -hda guix_on_qemu.img

You could install Guix on top of your distro following the instructions
at
Then you would need to run “guix pull” to get a current Guix (0.15.0
itself didn’t have this bug.) And finally, run:

guix system disk-image --file-system-type=iso9660 \
-s i686-linux \
~/.config/guix/current/share/guile/site/2.2/gnu/system/install.scm

(This command works on an x86_64 machine.)

The result will be an ISO that’s corrupt.

Toggle quote (7 lines)
> What do you get from this xorriso inspection run on a damaged ISO ?
> (I tested it with the ISO from https://www.gnu.org/software/guix/download/):
>
> xorriso -indev guixsd-install-0.15.0.i686-linux.iso \
> -find / -sort_lba -exec report_lba -- \
> >/tmp/xorriso_indev_find.txt 2>&1

I get:

Toggle snippet (43 lines)
GNU xorriso 1.5.0 : RockRidge filesystem manipulator, libburnia project.

libisoburn: WARNING : ISO image size 475636s larger than readable size 473456s
xorriso : NOTE : Loading ISO image tree from LBA 0
libburn : SORRY : Read start address 475635s larger than number of readable blocks 473456
xorriso : UPDATE : 46803 nodes read in 1 seconds
xorriso : NOTE : Detected El-Torito boot information which currently is set to be discarded
Drive current: -indev '/gnu/store/v13bryy1mrgrs694drsrknryf204q30j-image.iso'
Media current: stdio file, overwriteable
Media status : is written , is appendable
Boot record : El Torito , MBR protective-msdos-label grub2-mbr cyl-align-off GPT APM
Media summary: 1 session, 473456 data blocks, 925m data, 45.6g free
Volume id : 'GUIXSD_IMAGE'
xorriso : NOTE : Tolerated problem event of severity 'SORRY'
Report layout: xt , Startlba , Blocks , Filesize , ISO image path
File data lba: 0 , 8612 , 720 , 1474560 , '/efi.img'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/1zzgag2ca7xzklss2j6phh4580cgkbl2-flac-1.3.2/share/doc/flac-1.3.2/FLAC.tag'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/55m1dng1zw7fq7ni73nm2v7b84wghpka-libx11-1.6.6/share/X11/locale/am_ET.UTF-8/XI18N_OBJS'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/55m1dng1zw7fq7ni73nm2v7b84wghpka-libx11-1.6.6/share/X11/locale/cs_CZ.UTF-8/XI18N_OBJS'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/55m1dng1zw7fq7ni73nm2v7b84wghpka-libx11-1.6.6/share/X11/locale/el_GR.UTF-8/XI18N_OBJS'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/55m1dng1zw7fq7ni73nm2v7b84wghpka-libx11-1.6.6/share/X11/locale/fi_FI.UTF-8/XI18N_OBJS'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/746645dl4fmz9h12x247nyznalswqyzp-groff-minimal-1.22.3/share/groff/1.22.3/tmac/mm/locale'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/746645dl4fmz9h12x247nyznalswqyzp-groff-minimal-1.22.3/share/groff/1.22.3/tmac/mm/se_locale'
File data lba: 0 , 25032 , 0 , 0 , '/gnu/store/a1vpwa7wkxbxw18sz70rmp3cdfnf3jdj-libvorbis-1.3.6/share/doc/libvorbis-1.3.6/doxygen-build.stamp'
File data lba: 0 , 25032 , 0 , 0 , '/mach_kernel'
File data lba: 0 , 25034 , 1173 , 2400500 , '/boot/grub/fonts/unicode.pf2'
File data lba: 0 , 26207 , 1 , 1520 , '/boot/grub/grub.cfg'
File data lba: 0 , 26207 , 1 , 1520 , '/gnu/store/3zq39lvf12a87zcfrg87xgkllgfsyw3b-grub.cfg'
File data lba: 0 , 26208 , 5 , 9928 , '/boot/grub/i386-efi/acpi.mod'

[…]

File data lba: 0 , 475300 , 1 , 1651 , '/gnu/store/zrg4c2d0lvyw8z9xgh0darzglbxrm6b7-iptables-1.6.2/share/man/man8/iptables-restore.8.gz'
File data lba: 0 , 475301 , 1 , 1137 , '/gnu/store/zrg4c2d0lvyw8z9xgh0darzglbxrm6b7-iptables-1.6.2/share/man/man8/iptables-save.8.gz'
File data lba: 0 , 475302 , 4 , 7837 , '/gnu/store/zrg4c2d0lvyw8z9xgh0darzglbxrm6b7-iptables-1.6.2/share/man/man8/iptables.8.gz'
File data lba: 0 , 475306 , 47 , 96256 , '/System/Library/CoreServices/boot.efi'
File data lba: 0 , 475353 , 1 , 236 , '/System/Library/CoreServices/SystemVersion.plist'
File data lba: 0 , 475354 , 1 , 1399 , '/System/Library/CoreServices/.disk_label'
File data lba: 0 , 475355 , 1 , 10 , '/System/Library/CoreServices/.disk_label.contentDetails'
File data lba: 0 , 475356 , 88 , 180224 , '/var/guix/db/db.sqlite'
xorriso : NOTE : -return_with SORRY 32 triggered by problem severity SORRY

Something’s fishy, and Xorriso is sorry. :-)

Let me know if I can provide more info.

In the meantime I’ll see if I can build the image from x86_64 instead.

Toggle quote (7 lines)
> Unrelated observation:
> xorriso command -pvd_info reports that the ISO was made with xorriso-1.4.8
> with
> Creation Time: 1970010119010649
> This means "1 Jan 1970 19:01:06". Something seems to be wrong with the
> system clock of the producer machine.

For reproducibility purposes we set timestamps and related things to the
Epoch. This pseudo-UUID/timestamps is actually derived from the config
of the operating system in the image. It’s expected. :-)

Thank you!

Ludo’.
L
L
Ludovic Courtès wrote on 6 Dec 2018 17:28
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87k1kmipqk.fsf@gnu.org
Hi again,

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (8 lines)
>> [ 215.199357] loop0: rw=524288, want=1903876, limit=1899264
>
> This looks much like a truncated ISO image. (For what reason ever.)
>
> There are at least 4612 blocks = ~ 9 MiB missing.
> In the original message of https://issues.guix.info/issue/33639 the
> the minimum missing size is about 5 MiB.

Based on this and on a suggestion Ricardo made on IRC, I passed
“-padding 10m” and that solved the problem. \o/

I suppose you’ll have a scientific explanation, but I’m happy this
simple hacks works (and indeed, the documentation of “-padding” suggests
that this kind of problem is not uncommon.)

Thanks to both of you!

Ludo’.
T
T
Thomas Schmitt wrote on 6 Dec 2018 17:59
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
13661682393159200289@scdbackup.webframe.org
Hi,

i see that probably the kernel log talks of blocks of 512 bytes.
So the minimum missing size shrinks to 2.3 and 1.4 MiB, respectively.


I wrote:
Toggle quote (2 lines)
> > Please consider local reasons for truncated ISO images.

Ludovic Courtès wrote:
Toggle quote (2 lines)
> I’ve thought about this but that seem highly unlikely at this point.

It still looks like writing of the ISO image aborted prematurely.
Do you have the xorriso messages from the grub-mkrescue run ?

(If there are none, add the following three arguments to the grub-mkrescue
run:
-- -- -report_about update
The second "--" shall work around an intermediate version of grub-mkrescue
which ate the first "--" instead of forwarding it to xorriso.
)


Reasoning:

Toggle quote (3 lines)
> libisoburn: WARNING : ISO image size 475636s larger than readable size 473456s
> File data lba: 0 , 475356 , 88 , 180224 , '/var/guix/db/db.sqlite'

When the ISO is assessed by libisoburn, its nominal block count is
192 blocks higher than the end of the last file. Insofar ok. But the
ISO image file is smaller than that.

After the warning, libisoburn corrects the displayed size to the readable
size. So the number in this subsequent message is rather insignificant:
Toggle quote (1 lines)
> Media summary: 1 session, 473456 data blocks, 925m data, 45.6g free
(Only good that you also showed above warning message.)


The nominal count is recorded in the Primary Volume Descriptor, the
equivalent of a superblock. (Byte offset in the ISO file is 32768+80,
first as 4 byte little-endian, then again as 4 byte big-endian.)

The readable size is based on the byte size of the ISO file.

At ISO production time, the nominal block count is determined by libisofs
in a first dry-run. In the subsequent real production run, libisofs sticks
to the determined file sizes of the first run, even if some file changed
size inbetween. It would truncate or pad the copied file bytes to the
planned size. Directory data are written as assessed in the first run.

So from normal operation of libisofs it is guaranteed that the written
amount of data is the same as the nominal amount.

-----------------------------------------------------------------------

Possible glitches would be that libisofs skips to write some scheduled
data blocks, or that libburn drops blocks which were submitted by libisofs.
Both scenarios do not give me an idea how the difference between 32 bit
and 64 bit systems could be involved.

The theory of intermediately missing data blocks could be verified or
defuted by checking the content of the last file which sits in the
readable area. If it bears the expected content, then no blocks were
skipped or dropped inbetween.

So please look in the file listing for the last file which begins before
block 473456 and does not step over that limit by adding its "Blocks"
count (exact hit on the limit is ok).
If the filesystem refuses to obtain it, then use
dd bs=2048 skip=$Startlba count=$Blocks
to cut it out from the ISO and then truncate it to the reported "Filesize".

In any case compare its content with the original.

If the contents match, then we have a flat premature end of file.
In this case there should be error messages from xorriso or its libraries.
(In case of GNU xorriso, the libraries are fixely compiled in from source.)


Have a nice day :)

Thomas
T
T
Thomas Schmitt wrote on 6 Dec 2018 18:29
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
12559682391379993357@scdbackup.webframe.org
Hi,

Toggle quote (3 lines)
> Based on this and on a suggestion Ricardo made on IRC, I passed
> “-padding 10m” and that solved the problem. \o/

Ouchers. Do all files bear their expected content ?
Especially the last one: /var/guix/db/db.sqlite

If so, then something truncates the output stream of libisofs via libburn.
The only component that comes to my mind is the fifo between them.
The default fifo size is 4 MiB. Quite suspicious.

Try to reduce its size to the minimum by adding these grub-mkrescue
arguments:

-- -- -fs 64k -padding 64k

If the fifo is to blame, then a padding of 64k should suffice to protect
the valuable blocks from a premature end.


--------------------------------------------------------------------
A bit off topic:

Toggle quote (3 lines)
> the documentation of “-padding” suggests
> that this kind of problem is not uncommon.

It's normal purpose is to work around a traditional Linux kernel bug:

CDs written with write type Track-At-Once bear two unreadable blocks at
the end. Most CD drives report these blocks as part of the data range.
When Linux shall read a single block for isofs, it reads a larger chunk.
The chunk is not large enough to reach over the nominal end of the data
range, but it can reach the unreadable end blocks by mistake.
In this case Linux does not only miss the end blocks but also valid
payload blocks which are part of the filesystem. This yields I/O error.

The developer of cdrecord and the kernel people mistake this problem
for a "fuzziness" of a CD end by at most 2 seconds of audio play time.
This is wrong from reading the specs and from making experiments.
However, cdrecord introduced the tradition to add 150 blocks of padding
which would 2 seconds of sound.
As long as the read chunk of Linux is smaller than that, the padding
protects the operating system from touching the lead-out blocks of the
TAO track.

This cannot happen on hard disk or any optical media type other than CD.
If you write the CD by Session-At-Once it cannot happen. If you have one
of the rare CD drives which do not count the lead-out blocks to the
readable size of the CD, it cannot happen. (Currently 1 of my 7 drives
tells the truth.)

But who am i to stand against all others ?
So xorriso, too, adds 300 KiB of padding by default.


Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 7 Dec 2018 23:51
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87k1klar3e.fsf@gnu.org
Hello!

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (6 lines)
>> Based on this and on a suggestion Ricardo made on IRC, I passed
>> “-padding 10m” and that solved the problem. \o/
>
> Ouchers. Do all files bear their expected content ?
> Especially the last one: /var/guix/db/db.sqlite

It looks good, and there are no I/O errors left (I mounted it and run
“tar” over it.)

Note that the image is now available here:


(I haven’t tried smaller padding.)

Toggle quote (12 lines)
> If so, then something truncates the output stream of libisofs via libburn.
> The only component that comes to my mind is the fifo between them.
> The default fifo size is 4 MiB. Quite suspicious.
>
> Try to reduce its size to the minimum by adding these grub-mkrescue
> arguments:
>
> -- -- -fs 64k -padding 64k
>
> If the fifo is to blame, then a padding of 64k should suffice to protect
> the valuable blocks from a premature end.

OK, I’ll try to test this, but note that I’ll be largely unavailable for
a week.

Toggle quote (31 lines)
>> the documentation of “-padding” suggests
>> that this kind of problem is not uncommon.
>
> It's normal purpose is to work around a traditional Linux kernel bug:
>
> CDs written with write type Track-At-Once bear two unreadable blocks at
> the end. Most CD drives report these blocks as part of the data range.
> When Linux shall read a single block for isofs, it reads a larger chunk.
> The chunk is not large enough to reach over the nominal end of the data
> range, but it can reach the unreadable end blocks by mistake.
> In this case Linux does not only miss the end blocks but also valid
> payload blocks which are part of the filesystem. This yields I/O error.
>
> The developer of cdrecord and the kernel people mistake this problem
> for a "fuzziness" of a CD end by at most 2 seconds of audio play time.
> This is wrong from reading the specs and from making experiments.
> However, cdrecord introduced the tradition to add 150 blocks of padding
> which would 2 seconds of sound.
> As long as the read chunk of Linux is smaller than that, the padding
> protects the operating system from touching the lead-out blocks of the
> TAO track.
>
> This cannot happen on hard disk or any optical media type other than CD.
> If you write the CD by Session-At-Once it cannot happen. If you have one
> of the rare CD drives which do not count the lead-out blocks to the
> readable size of the CD, it cannot happen. (Currently 1 of my 7 drives
> tells the truth.)
>
> But who am i to stand against all others ?
> So xorriso, too, adds 300 KiB of padding by default.

I see, thanks for explaining!

Ludo’.
T
T
Thomas Schmitt wrote on 8 Dec 2018 13:42
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
14249682530673393275@scdbackup.webframe.org
Hi,

Toggle quote (3 lines)
> (I haven’t tried smaller padding.)

I downloaded it and get on "xorriso -indev":
libisoburn: WARNING : ISO image size 481129s larger than readable size 479184s

So the lack of 2k blocks is 1945 = nearly 4 MiB.
This is suspiciously near to the default fifo size.

The content of cleartext files near the payload end looks plausible:
/System/Library/CoreServices/.disk_label
/System/Library/CoreServices/SystemVersion.plist
Whether the last file's content is as expected can only be told by
its reader program, i guess:
/var/guix/db/db.sqlite

So for now it indeed looks like plain truncation and not like a hickup
somewhere in the middle of ISO writing.

Several distros use xorriso to build their 32 bit ISOs. No complaints.
So i asked on debian-cd and debian-live mailing lists whether the ISOs
for 32-bit systems are indeed made on 32-bit systems. The answer is
"All our images have been made on amd64 for years now."

So i need a 32-bit GNU/Linux VM for regression tests.

Being an untalented sysadmin, this can last a while. (First searching
for old cheat sheets and then stepping into any possible puddle ...)


I would still appreciate a test with minmally sized fifo. Its outcome would
be a strong indication whether the Guix problem is related to the fifo
at all. The result can be checked by executing

xorriso -indev ...path.to.iso...

and looking for message
libisoburn: WARNING : ISO image size ...s larger than readable size ...s
If the difference is in the range of only 32s, then the fifo stays
main suspect.

Also, the xorriso messages of a run with grub-mkrescue add-on arguments

-- -- -report_about all

would be very welcome.

--------------------------------------------------------------------------
(Be invited to stop reading here. Only code musings follow.)

I reviewed the fifo code in libisofs and found no obvious opportunity for
a bug that would drop the final fifo content rather than offering it to
libburn:

(iso_ring_buffer_read() is exposed to libburn via libisofs/ecma119.c
function bs_read() which serves as struct burn_source member (*read)()
as defined in libburn/libburn.h.)

The condition for end of reading is a combination of
- no data are available in the ring buffer
- the writer has set the flag for having ended its work

while (buf->size == 0) {
...
if (buf->wend) {

The member buf->size is of type size_t. I.e. good for at least 4 GiB - 1
before it rolls over. Neither the fifo size nor the transaction size come
near to that number.
buf->wend is unsigned int :2 with defined values
0 not finished, 1 finished ok, 2 finish with error


Have a nice day :)

Thomas
T
T
Thomas Schmitt wrote on 15 Dec 2018 19:40
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
23902682647998386729@scdbackup.webframe.org
Hi,

just to report that i did not forget this problem:

I have now a qemu-system-i386 VM with Debian GNU/Linux from
debian-9.6.0-i386-netinst.iso without desktop environment and reachable
via SSH. Very minimal. (I only did "apt-get install build-essential" to
feel not lonely without C compiler and friends.)

Then i followed the instructions of
with
up to step 7 ("guix archive --authorize ...").

Then i made the mistake to do the proposed

guix package -i hello

It downloads and builds and blows away the free space on the virtual 8 GB
disk ... /gnu is growing steadily and /tmp breathes between 50 MB and 2 GB.
I abort this after 100 minutes before the virtual disk gets too full and
my CPU melts.

"guix pull" happily begins to build that gcc-5.5.0 which is too much for my
feeble VM.

Back to step 0 ("rm -r /gnu /var/guix") and again to step 7.
(A small fight starts between me and systemd, to get guix-daemon running.
"start" did not help. It had to be "restart".)

Then

# guix system disk-image --file-system-type=iso9660 \
> -s i686-linux \
> ~/.config/guix/current/share/guile/site/2.2/gnu/system/install.scm

and the activities to build the world start again. Extra verbose.
This time i abort after 30 minutes.

Everything i do ends up in enormous production of gcc-5.5.0 related
software.

-------------------------------------------------------------------------

So for xorriso and a 32-bit system:

# apt-get install xorriso
...
# xorriso -version
xorriso 1.4.6 : RockRidge filesystem manipulator, libburnia project.
...

I try what happens if i pack up the /gnu tree:

# xorriso -as mkisofs -o /tmp/test.iso -J /gnu
...
ISO image produced: 643046 sectors
Written to medium : 643046 sectors at LBA 0
Writing to 'stdio:/tmp/test.iso' completed successfully.

Inspection shows that the size ideas of xorriso match the image file size:

# xorriso -indev /tmp/test.iso
... no warning about size mismatch ...
Media summary: 1 session, 643046 data blocks, 1256m data, 3234m free

# ls -l /tmp/test.iso
-rw-r--r-- 1 root root 1316958208 Dec 15 19:17 /tmp/test.iso

# expr 1316958208 / 2048
643046

Now with GNU xorriso 1.5.0:

...
$ tar xzf xorriso-1.5.0.tar.gz
$ cd xorriso-1.5.0
$ ./configure && make
...
$ xorriso/xorriso -version
GNU xorriso 1.5.0 : RockRidge filesystem manipulator, libburnia project.
...

# rm /tmp/test.iso
# xorriso/xorriso -as mkisofs -o /tmp/test.iso -J /gnu
GNU xorriso 1.5.0 : RockRidge filesystem manipulator, libburnia project.
...
ISO image produced: 643046 sectors
Written to medium : 643046 sectors at LBA 0
...

Inspection yields the same result. No truncation.

-------------------------------------------------------------------------

If i shall try again with "guix system disk-image", then i need more
guidance. E.g. about the required disk size and ways to curb the build
effort.


Have a nice day :)

Thomas
T
T
Thomas Schmitt wrote on 15 Dec 2018 20:24
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
16569682655711134021@scdbackup.webframe.org
Hi,

it comes to me that i can get nearer to the Guix ISO production:

# apt-get install grub-pc grub-efi-amd64-bin grub-efi-ia32-bin mtools
...
# grub-mkrescue -o /tmp/test.iso /gnu
xorriso 1.4.6 : RockRidge filesystem manipulator, libburnia project.
...
ISO image produced: 652920 sectors
Written to medium : 652920 sectors at LBA 0

# ls -l /tmp/test.iso
-rw-r--r-- 1 root root 1337180160 Dec 15 20:09 /tmp/test.iso

# expr 1337180160 / 2048
652920

# xorriso -indev /tmp/test.iso
... no complaints ...

And with GNU xorriso 1.5.0 :

# rm /tmp/test.iso
# grub-mkrescue --xorriso=/home/thomas/xorriso-1.5.0/xorriso/xorriso \
> -o /tmp/test.iso /gnu
GNU xorriso 1.5.0 : RockRidge filesystem manipulator, libburnia project.
...
ISO image produced: 652920 sectors
Written to medium : 652920 sectors at LBA 0

# ls -l /tmp/test.iso
-rw-r--r-- 1 root root 1337180160 Dec 15 20:15 /tmp/test.iso

# xorriso -indev /tmp/test.iso
... no complaints ...

All looks well.


Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 16 Dec 2018 16:52
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87ftuxtqn9.fsf@gnu.org
Hi Thomas,

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (5 lines)
> I have now a qemu-system-i386 VM with Debian GNU/Linux from
> debian-9.6.0-i386-netinst.iso without desktop environment and reachable
> via SSH. Very minimal. (I only did "apt-get install build-essential" to
> feel not lonely without C compiler and friends.)

If you’re testing in a VM you might just as well download the GuixSD VM
simpler than installing Debian and then installing Guix on top of
Debian.

Toggle quote (16 lines)
> Then i followed the instructions of
> https://www.gnu.org/software/guix/manual/en/html_node/Binary-Installation.html
> with
> https://alpha.gnu.org/gnu/guix/guix-binary-0.16.0.i686-linux.tar.xz
> https://alpha.gnu.org/gnu/guix/guix-binary-0.16.0.i686-linux.tar.xz.sig
> up to step 7 ("guix archive --authorize ...").
>
> Then i made the mistake to do the proposed
>
> guix package -i hello
>
> It downloads and builds and blows away the free space on the virtual 8 GB
> disk ... /gnu is growing steadily and /tmp breathes between 50 MB and 2 GB.
> I abort this after 100 minutes before the virtual disk gets too full and
> my CPU melts.

Did you actually run “guix archive --authorize < …/ci.guix.info.pub”?


If you didn’t, then you are not getting pre-built binaries and thus you
end up building the world.

HTH,
Ludo’.
T
T
Thomas Schmitt wrote on 16 Dec 2018 17:52
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
2198768286307958861@scdbackup.webframe.org
Hi,

Ludovic Courtès wrote:
Toggle quote (3 lines)
> If you’re testing in a VM you might just as well download the GuixSD VM
> image from <https://www.gnu.org/software/guix/download/>.

There i only see only "x86_64" for QEMU, not "i686" like with ISO or Binary.


Toggle quote (2 lines)
> Did you actually run “guix archive --authorize < …/ci.guix.info.pub”?

I did step 7 of Binary-Installation.html:

guix archive --authorize < \
~root/.config/guix/current/share/guix/hydra.gnu.org.pub

The text "ci.guix.info.pub" does not appear in

Looking at the existing state:

# ls -l ~root/.config/guix/current/share/guix/
total 12
-r--r--r-- 1 root root 118 Jan 1 1970 berlin.guixsd.org.pub
-r--r--r-- 1 root root 118 Jan 1 1970 ci.guix.info.pub
-r--r--r-- 1 root root 1083 Jan 1 1970 hydra.gnu.org.pub

Shall i authorize the others too ?
If so: Is there need for clean-up actions after the aborted build runs ?


(If you find a bit of time, please run grub-mkrescue with some arbitrary
input tree of about the size of the Guix ISO and check whether it gets
truncated. If so, the messages from xorriso would be very interesting.)


Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 18 Dec 2018 12:16
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
877eg7hypx.fsf@gnu.org
Hi Thomas,

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (6 lines)
> Ludovic Courtès wrote:
>> If you’re testing in a VM you might just as well download the GuixSD VM
>> image from <https://www.gnu.org/software/guix/download/>.
>
> There i only see only "x86_64" for QEMU, not "i686" like with ISO or Binary.

You’re right, my bad.

Toggle quote (10 lines)
>> Did you actually run “guix archive --authorize < …/ci.guix.info.pub”?
>
> I did step 7 of Binary-Installation.html:
>
> guix archive --authorize < \
> ~root/.config/guix/current/share/guix/hydra.gnu.org.pub
>
> The text "ci.guix.info.pub" does not appear in
> https://www.gnu.org/software/guix/manual/en/html_node/Binary-Installation.html

Oops, that was an omission that I’ve just fixed.

So yes, please authorize “ci.guix.info.pub” since https://ci.guix.info
is now the default substitute server.

HTH!

Ludo’.
T
T
Thomas Schmitt wrote on 18 Dec 2018 22:45
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
10322683426128283104@scdbackup.webframe.org
Hi,

Toggle quote (2 lines)
> Oops, that was an omission that I’ve just fixed.

Sometimes you need a clueless test user to clean the pipes.

I now succeeded in running the ISO production command, but the truncation
problem is not reproducible here.

Please re-consider local reasons ... yada yada ... my main suspect would
be the immediate end of VM after the xorriso run. Maybe some buffers don't
get flushed down to the real disk ?

------------------------------------------------------------------------
What i did in detail:

I removed /gnu and /var/guix to get to a halfways clean state for
repeating steps 2 and 7 of
I.e. i unpacked the tarball, moved the trees to /gnu and /var/guix,
and authorized ci.guix.info.pub.

Then i did step 8
guix package -i glibc-locales
This lasted 12 minutes (mainly with building 7 packages).

Now the proposed command to "confirm that Guix is working":
guix package -i hello
lasted only about 30 seconds.

Scrolling back in my mailbox to
Date: Thu, 06 Dec 2018 16:34:02 +0100
Message-ID: <87va46is9h.fsf@gnu.org>

Toggle quote (3 lines)
> Then you would need to run “guix pull” to get a current Guix (0.15.0
> itself didn’t have this bug.)

Do i still need this ? My tarball was already "0.16.0":
guix-binary-0.16.0.i686-linux.tar.xz

I bet on omitting this step and go on with:

Toggle quote (4 lines)
> guix system disk-image --file-system-type=iso9660 \
> -s i686-linux \
> ~/.config/guix/current/share/guile/site/2.2/gnu/system/install.scm

After 5 minutes i see boot messages of a Linux kernel.
Oh. Qemu running on qemu. (The local power plant shifts one gear up.)

12 minutes elapsed and xorriso has started. Sloowly adding files:

registering 302 items
GNU xorriso 1.5.0 : RockRidge filesystem manipulator, libburnia project.
...
45981 files added in 94 seconds
...
xorriso : UPDATE : Thank you for being patient. Working since 265 seconds.
ISO image produced: 500069 sectors
Written to medium : 500069 sectors at LBA 0
Writing to 'stdio:/xchg/guixsd.iso' completed successfully.

So far the xorriso run looks ok.
...
/gnu/store/a8wwjfihb161maww0c8x4r797prdn8rr-image.iso

So this is where the ISO ended up.

# ls -l /gnu/store/a8wwjfihb161maww0c8x4r797prdn8rr-image.iso
-r--r--r-- 2 root root 1024141312 Jan 1 1970 /gnu/store/a8wwjfihb161maww0c8x4r797prdn8rr-image.iso

# expr 1024141312 / 2048
500069

# xorriso -indev /gnu/store/a8wwjfihb161maww0c8x4r797prdn8rr-image.iso
... no complaints about size mismatch ...
Media summary: 1 session, 500069 data blocks, 977m data, 3052m free

Well, then with
guix pull
and then again
guix system disk-image ...
lasts 30 minutes,

# time guix system disk-image --file-system-type=iso9660 \
-s i686-linux \
~/.config/guix/current/share/guile/site/2.2/gnu/system/install.scm
...
GUILEC gnu/packages/emacs.go
GC Warning: Failed to expand heap by 8388608 bytes
...
GC Warning: Out of Memory! Heap size: 943 MiB. Returning NULL!
...
guix system: error: build failed: build of `/gnu/store/vr5mhnh430qabrrc1a82pv954b89axws-guix-0.16.0-4.60b0402.drv' failed
real 21m55.875s
user 0m5.816s
sys 0m1.384s
#

Looks like my VM needs more memory for that stunt.
So again with 2 GiB.

... it seems that "guix pull" brought back the addiction to world building.
I abort after 50 minutes while it is doing some qemu tests.

------------------------------------------------------------------------

Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 19 Dec 2018 15:05
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87tvj9wr1v.fsf@gnu.org
Hello,

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (7 lines)
>> Oops, that was an omission that I’ve just fixed.
>
> Sometimes you need a clueless test user to clean the pipes.
>
> I now succeeded in running the ISO production command, but the truncation
> problem is not reproducible here.

It’s not reproducible because I “fixed” it:


You should be able to reproduce it by running Guix from the parent
commit:

guix pull --commit=676c3adc14f63df0f7a549e518ac87481c0f3e37

‘guix pull’ populates ~/.config/guix/current/bin/guix so you’ll have to
make sure this is the one you’re running when you try to reproduce the
issue.

Thanks for your help and perseverance!

Ludo’.
T
T
Thomas Schmitt wrote on 19 Dec 2018 15:51
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
17182683634737195681@scdbackup.webframe.org
Hi,

Toggle quote (2 lines)
> It’s not reproducible because I “fixed” it:
> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=178be030c0e4fdeac5e1c968b5c99d84bb4691db
(This adds "-padding 10m" to the run of xorriso.)

No. The padding only moves the missing end piece into a region of the
image file where it does not matter for the filesystem payload files.
The ISO filesystem's meta data and the partition tables would still claim
the missing bytes of the image file, if the problem occured.

E.g. xorriso notices the mismatch in the ISO to which you pointed
me for download and which was most probably produced with -padding 10m:

$ xorriso -indev guixsd-install-0.16.0.i686-linux.iso
...
libisoburn: WARNING : ISO image size 481129s larger than readable size 479184s
...
libburn : SORRY : Read start address 481128s larger than number of readable blocks 479184
...

The GPT in the ISO says that its backup header is at 512-byte block
1,924,515 = block 481,128.75 in units of 2048 bytes.

Highest file block is 475879 + 87 = 475966
File data lba: 0 , 475879 , 88 , 180224 , '/var/guix/db/db.sqlite'
which is a bit more than than 10 MiB before the expected image file end.
Given the lack of 1945 blocks at the image file end, the payload file end
is still more than 6 MB away from the escarpment.

-----------------------------------------------------------------------

But the ISO which i produced myself is healthy in that aspect.
The used software version is obviously before the 10 MiB padding.

The ISO contains as many bytes

-r--r--r-- 2 root root 1024141312 Jan 1 1970 before_guix_pull.iso

as the ISO filesystem believes to cover, including the padding:

Media summary: 1 session, 500069 data blocks, 977m data, 2187m free

Highest data file block is 499788 + 87 = 499875 :

File data lba: 0 , 499788 , 88 , 180224 , '/var/guix/db/db.sqlite'

which means that at most 194 blocks are expected to follow the end of
this file, not 10 MiB.
The GPT in the image says that its backup header block is at 512-byte
address 2,000,275 which is 500,068.75 in blocks of 2048 bytes.

So the inner size counters and image file size do match exactly.

This was done with guix from
guix-binary-0.16.0.i686-linux.tar.xz
and with authorized ci.guix.info.pub.


Toggle quote (2 lines)
> guix pull --commit=676c3adc14f63df0f7a549e518ac87481c0f3e37

After "guix pull" the ISO production command indulges in building and
testing endlessly.
You will have to give me instructions how to get back to the ~ 12 minutes
of ISO production time which i had before trying "guix pull".


Have a nice day :)

Thomas
T
T
Thomas Schmitt wrote on 20 Dec 2018 14:38
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
30813683585630400731@scdbackup.webframe.org
Hi,

aside from my problems with the building and testing after "guix pull"
i also stand puzzled in front of the 8 files named "/gnu/.../build/vm.scm"
which all start grub-mkrescue.

If i'd succeed in reproducing the ISO image file truncation:
Which vm.scm file would i have to modify in order to report the size of
the freshly emerged ISO image in the filesystem of the upper VM ?
(I would suspect that this size is still untruncated and that the file
in the underlying VM's filesystem is then truncated.)

And how to say "ls -l $target" in Guile ?


Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 21 Dec 2018 21:44
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87pntumwy8.fsf@gnu.org
Hi,

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (8 lines)
> aside from my problems with the building and testing after "guix pull"
> i also stand puzzled in front of the 8 files named "/gnu/.../build/vm.scm"
> which all start grub-mkrescue.
>
> If i'd succeed in reproducing the ISO image file truncation:
> Which vm.scm file would i have to modify in order to report the size of
> the freshly emerged ISO image in the filesystem of the upper VM ?

None of those under /gnu/store. /gnu/store is explicitly read-only.
The actual source code you’d edit is a checkout of Guix. See

Toggle quote (2 lines)
> And how to say "ls -l $target" in Guile ?

In Scheme? You could use ‘scandir’:

Toggle snippet (5 lines)
scheme@(guile-user)> ,use (ice-9 ftw)
scheme@(guile-user)> (scandir "/")
$2 = ("." ".." "bin" "boot" "data" "dev" "etc" "gnu" "home" "lost+found" "mnt" "proc" "root" "run" "sys" "tmp" "var")

and also ‘lstat’, etc., but that’s not quite a “shell”.

HTH,
Ludo’.
T
T
Thomas Schmitt wrote on 21 Dec 2018 22:42
(address . bug-xorriso@gnu.org)(address . 33639@debbugs.gnu.org)
25824683177226565276@scdbackup.webframe.org
Hi,

Toggle quote (2 lines)
> ‘lstat’

Probably this.


Toggle quote (2 lines)
> but that’s not quite a “shell”.

If i could reproduce the problem then i would want a long time visible
message about how large the ISO image file is after grub-mkrescue has
ended successfully.
This would give an opportunity to compare the size as produced in the VM
with the size later perceived on the host machine (which is a VM, too,
in my case).
If the sizes differ, then the VM contraption is to blame.
If the size is too small already in the VM that ran grub-mkrescue, then
xorriso or the VM operating system are to blame.

Since i am not yet able to reproduce the problem, i propose that you add
the necessary code to then end of make-iso9660-image in gnu/build/vm.scm.
Such a report message cannot harm, given the existing verbosity of the
ISO build command.

Next time you make an ISO, retrieve the last size messages of xorriso:
ISO image produced: 500069 sectors
Written to medium : 500069 sectors at LBA 0
the new message about the ISO image file size in bytes, and the size of
the ISO image file size when it is finally ready for exposure in the web.

(I have to stress that the problem is not fixed but only got a band aid
of which it is not known whether its size will always be large enough.)


Have a nice day :)

Thomas
P
P
pelzflorian (Florian Pelz) wrote on 7 Apr 2019 22:18
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
20190407201849.74qtwvazknbsaklg@pelzflorian.localdomain
I have what may be the same problem on my x86_64 machine building for
x86_64 when creating an ISO install image by running

guix system disk-image --file-system-type=iso9660 gnu/system/install.scm

Since commit 45c0d1d790f01ebc020fc4b2787a6abcdaa3f383 increased the
RAM for the VM that builds the iso image from 256 to 512, iso files
consistently were corrupt, until I added an lstat call, see below. On
a second and third attempt to build with lstat I got a corrupt image
again. Guix install iso files I tested from before that commit were
fine.


florian@florianmacbook ~$ fdisk /gnu/store/4nrwajlpab4s8pdph4d77ww7716sa3ir-image.iso
[…]
GPT PMBR size mismatch (3231107 != 3200391) will be corrected by write.

xorriso is sorry exactly like in Ludo’s message from December 06. The
numbers reported and file sizes are not consistent between corrupt
rebuilds.



On Fri, Dec 21, 2018 at 10:42:14PM +0100, Thomas Schmitt wrote:
Toggle quote (5 lines)
> […]
> Next time you make an ISO, retrieve the last size messages of xorriso:
> ISO image produced: 500069 sectors
> Written to medium : 500069 sectors at LBA 0

For the corrupt iso with lstat call:

ISO image produced: 807777 sectors
Written to medium : 807777 sectors at LBA 0



Toggle quote (2 lines)
> the new message about the ISO image file size in bytes,

Within the VM lstat consistently reports 1654327296 for non-corrupt
and corrupt images alike.



Toggle quote (4 lines)
> and the size of
> the ISO image file size when it is finally ready for exposure in the web.
>

ls -l on the result reports 1638600704.

On the non-corrupt image after adding the lstat call, both lstat
within the VM and ls -l outside the VM print the same size: 1654327296
in this case, i.e. the same as lstat reported on the corrupt images
within the VM.


(To be precise, for lstat I added the following local git commit to my
copy of the Guix repo at the end of the G-expression executed by the
VM:

Toggle diff (24 lines)
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index db9b1707d7..18ccb8970e 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -309,7 +309,8 @@ INPUTS is a list of inputs (as for packages)."
#:closures graphs
#:volume-id #$file-system-label
#:volume-uuid #$(and=> file-system-uuid
- uuid-bytevector))))))
+ uuid-bytevector))
+ (error (lstat "/xchg/guixsd.iso"))))))
#:system system
;; Keep a local file system for /tmp so that we can populate it directly as



and then reconfigured the system after customizing the guix package to
use said commit and disabling tests on the guix package. This
reported an lstat Scheme object as an error. Note that the error
procedure does not cause a failed build.)

Regards,
Florian
T
T
Thomas Schmitt wrote on 7 Apr 2019 23:35
(address . bug-xorriso@gnu.org)
2660367208964033194@scdbackup.webframe.org
Hi,

Florian Pelz wrote:
Toggle quote (6 lines)
> fdisk /gnu/store/4nrwajlpab4s8pdph4d77ww7716sa3ir-image.iso
> [...]
> GPT PMBR size mismatch (3231107 != 3200391) will be corrected by write.
> For the corrupt iso with lstat call:
> and corrupt images alike.

The GPT Protective MBR counts with block size 512 up to the GPT backup
header block, not counting itself at block 0. So in blocks of 2048, the
expected size is
3231108 / 4 = 807777 ISO 9660 blocks
But the perceived size is
3200392 / 4 = 800098 ISO 9660 blocks

I wrote:
Toggle quote (8 lines)
> > retrieve the last size messages of xorriso:

> For the corrupt iso with lstat call:
> ISO image produced: 807777 sectors
> Written to medium : 807777 sectors at LBA 0
> Within the VM lstat consistently reports 1654327296 for non-corrupt
> and corrupt images alike.

1654327296 / 2048 = 807777
So from the view of the VM the ISO is as large as xorriso believes to have
written and as the GPT announces as position of the backup header block.


Toggle quote (5 lines)
> > and the size of
> > the ISO image file size when it is finally ready for exposure in the web.

> ls -l on the result reports 1638600704.

1638600704 / 2048 = 800098
This matches the perceived size from the fdisk complaint.


Toggle quote (3 lines)
> On the non-corrupt image after adding the lstat call, both lstat
> within the VM and ls -l outside the VM print the same size: 1654327296

The fact that the VM always sees the expected size but the host sees varying
sizes supports the suspicion that at the end of the VM its i/o buffers or
virtual disk are not always properly flushed to the i/o system of the host.
The varying success smells like a race condition.


Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 8 Apr 2019 10:50
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87h8b8284q.fsf@gnu.org
Hello,

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (5 lines)
> The fact that the VM always sees the expected size but the host sees varying
> sizes supports the suspicion that at the end of the VM its i/o buffers or
> virtual disk are not always properly flushed to the i/o system of the host.
> The varying success smells like a race condition.

Indeed, that rings a bell: I fixed a similar issue in commit
0dc7d298a33f83d5f02a962b5f1bd24ee0e8ef07.

Florian: could you check whether the patch below solves the problem for
you?

Thanks,
Ludo’.
Toggle diff (29 lines)
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index db9b1707d7..3ee03c84a0 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -240,7 +240,11 @@ made available under the /xchg CIFS share."
#:target-arm32? #$(target-arm32?)
#:disk-image-format #$disk-image-format
#:disk-image-size size
- #:references-graphs graphs))))))
+ #:references-graphs graphs)
+
+ ;; Make sure I/O buffers get flushed. This is particularly
+ ;; important when MAKE-DISK-IMAGE? is true.
+ (sync))))))
(gexp->derivation name builder
;; TODO: Require the "kvm" feature.
@@ -530,10 +534,7 @@ should set REGISTER-CLOSURES? to #f."
#$os
#:compressor '(#+(file-append gzip "/bin/gzip") "-9n")
#:creation-time (make-time time-utc 0 1)
- #:transformations `((,root-directory -> "")))
-
- ;; Make sure the tarball is fully written before rebooting.
- (sync))))))
+ #:transformations `((,root-directory -> ""))))))))
(expression->derivation-in-linux-vm
name build
#:make-disk-image? #f
P
P
pelzflorian (Florian Pelz) wrote on 10 Apr 2019 00:13
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190409221313.b3uzvcj5bluoygp5@pelzflorian.localdomain
On Mon, Apr 08, 2019 at 10:50:29AM +0200, Ludovic Courtès wrote:
Toggle quote (19 lines)
> Hello,
>
> "Thomas Schmitt" <scdbackup@gmx.net> skribis:
>
> > The fact that the VM always sees the expected size but the host sees varying
> > sizes supports the suspicion that at the end of the VM its i/o buffers or
> > virtual disk are not always properly flushed to the i/o system of the host.
> > The varying success smells like a race condition.
>
> Indeed, that rings a bell: I fixed a similar issue in commit
> 0dc7d298a33f83d5f02a962b5f1bd24ee0e8ef07.
>
> Florian: could you check whether the patch below solves the problem for
> you?
>
> Thanks,
> Ludo’.
>

No, sadly not. I reconfigured to a commit with the Guix package
changed to use your patch and I again got this:

GPT PMBR size mismatch (3231103 != 3187775) will be corrected by write.
libburn : SORRY : Read start address 807775s larger than number of readable blocks 796944
T
T
Thomas Schmitt wrote on 10 Apr 2019 13:17
(address . bug-xorriso@gnu.org)
16217671677318139528@scdbackup.webframe.org
Hi,

Ludovic Courtès wrote:
Toggle quote (3 lines)
> > Florian: could you check whether the patch below solves the problem for
> > you?

Florian Pelz wrote:
Toggle quote (2 lines)
> No, sadly not.

Given the smell of a race condition, i would next try to let the VM
wait 10 or 15 seconds after xorriso is finished and before it shuts down.

Not as a final remedy but just as proof that the VM end is really the
culprit. (It could also be an i/o problem between VM and host which
is unrelated to the VM end.)


Have a nice day :)

Thomas
P
P
pelzflorian (Florian Pelz) wrote on 10 Apr 2019 23:23
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
20190410212310.iv2t72rblhupcmkt@pelzflorian.localdomain
On Wed, Apr 10, 2019 at 01:17:14PM +0200, Thomas Schmitt wrote:
Toggle quote (4 lines)
> Given the smell of a race condition, i would next try to let the VM
> wait 10 or 15 seconds after xorriso is finished and before it shuts down.
>

I added a (sleep 15) after ludo’s (sync). The first image worked but
now I got

libburn : SORRY : Read start address 807777s larger than number of readable blocks 798640

again.
L
L
Ludovic Courtès wrote on 12 Apr 2019 23:26
87o95alxtn.fsf@gnu.org
Hello Florian & Thomas,

I was able to reproduce the issue: ‘guix system disk-image
--file-system-format=iso9660’ would create partly unreadable images.

Since this was pretty much like the issue I had encountered with ‘guix
system docker-image’, which would produce truncated tarballs, and since
calling ‘sync’ wasn’t enough, I looked at our file system mount options…

The attached patch fixes the problem for me. In hindsight, it’s not
surprising that “cache=loose” on the /xchg mount point (used to exchange
data between the host and the guest) would have this effect.

Florian, it would be great if you could confirm. Just apply it on
‘master’, and then run:

./pre-inst-env guix system disk-image --file-system-format=iso9660 \
gnu/system/install.scm

Thanks, and apologies for blaming Xorriso, which presumably never had
anything to do with it!

Ludo’.
Toggle diff (49 lines)
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index db9b1707d7..22e3fcc522 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -94,6 +94,12 @@
(define %linux-vm-file-systems
;; File systems mounted for 'derivation-in-linux-vm'. These are shared with
;; the host over 9p.
+ ;;
+ ;; The 9p documentation says that cache=loose is "intended for exclusive,
+ ;; read-only mounts", without additional details. It's much faster than the
+ ;; default cache=none, especially when copying and registering store items.
+ ;; Thus, use cache=loose, except for /xchg where we want to ensure
+ ;; consistency.
(list (file-system
(mount-point (%store-prefix))
(device "store")
@@ -102,18 +108,12 @@
(flags '(read-only))
(options "trans=virtio,cache=loose")
(check? #f))
-
- ;; The 9p documentation says that cache=loose is "intended for
- ;; exclusive, read-only mounts", without additional details. In
- ;; practice it seems to work well for these, and it's much faster than
- ;; the default cache=none, especially when copying and registering
- ;; store items.
(file-system
(mount-point "/xchg")
(device "xchg")
(type "9p")
(needed-for-boot? #t)
- (options "trans=virtio,cache=loose")
+ (options "trans=virtio")
(check? #f))
(file-system
(mount-point "/tmp")
@@ -530,10 +530,7 @@ should set REGISTER-CLOSURES? to #f."
#$os
#:compressor '(#+(file-append gzip "/bin/gzip") "-9n")
#:creation-time (make-time time-utc 0 1)
- #:transformations `((,root-directory -> "")))
-
- ;; Make sure the tarball is fully written before rebooting.
- (sync))))))
+ #:transformations `((,root-directory -> ""))))))))
(expression->derivation-in-linux-vm
name build
#:make-disk-image? #f
T
T
Thomas Schmitt wrote on 13 Apr 2019 08:37
(address . bug-xorriso@gnu.org)
1173672442521511321@scdbackup.webframe.org
Hi,

Toggle quote (3 lines)
> apologies for blaming Xorriso, which presumably never had
> anything to do with it!

I will not complain that this time it was not my fault.


Have a nice day :)

Thomas
P
P
pelzflorian (Florian Pelz) wrote on 13 Apr 2019 15:46
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190413134609.kwmx53hyawgtaaza@pelzflorian.localdomain
On Fri, Apr 12, 2019 at 11:26:28PM +0200, Ludovic Courtès wrote:
Toggle quote (7 lines)
> Florian, it would be great if you could confirm. Just apply it on
> ‘master’, and then run:
>
> ./pre-inst-env guix system disk-image --file-system-format=iso9660 \
> gnu/system/install.scm
>

Yes, it seems fixed, I can confirm. Four rebuilds seem fine and are
bootable in QEMU. They have the same size and `xorriso -indev` is
happy. The content is different at the beginning of the ISO image
(maybe padding or timestamps in the file system) and in the EFI
partition at the very end of the ISO, but this seems insignificant.

Regards,
Florian
T
T
Thomas Schmitt wrote on 13 Apr 2019 18:20
(address . bug-xorriso@gnu.org)
3867672606037906126@scdbackup.webframe.org
Hi,

Florian Pelz wrote:
Toggle quote (2 lines)
> Yes, it seems fixed, I can confirm.

Way back in december, Ludovic Courtès wrote:
Toggle quote (3 lines)
>...> Based on this and on a suggestion Ricardo made on IRC, I passed
>...> -padding 10m and that solved the problem. \o/

Please do not forget to remove this -padding command.


Florian Pelz wrote:
Toggle quote (3 lines)
> The content is different at the beginning of the ISO image
> (maybe padding or timestamps in the file system)

That's to expect if not environment SOURCE_DATE_EPOCH is set and exported.

SOURCE_DATE_EPOCH belongs to the specs of reproducible-builds.org. It
is supposed to be either undefined or to contain a decimal number which
tells the seconds since january 1st 1970. If it contains a number, then
it is used for all timestamps and as seed of pseudo-random numbers like
MBR id or GPT UUIDs.

If all files and directories have the same names and the same content,
then xorriso runs with the same arguments and the same SOURCE_DATE_EPOCH
value are supposed to create byte-identical result ISOs.

In december, i wrote:
Toggle quote (1 lines)
>...> > Creation Time: 1970010119010649
Ludovic Courtès wrote:
Toggle quote (3 lines)
>...> For reproducibility purposes we set timestamps and related things
>...> to the Epoch.

Is this independent of SOURCE_DATE_EPOCH ?


Have a nice day :)

Thomas
L
L
Ludovic Courtès wrote on 14 Apr 2019 17:03
control message for bug #33639
(address . control@debbugs.gnu.org)
874l70ljdn.fsf@gnu.org
merge 33639 35136
L
L
Ludovic Courtès wrote on 14 Apr 2019 17:47
Re: bug#33639: ISO installer image is broken on i686
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87tvf0io7a.fsf@gnu.org
Hello,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (11 lines)
> On Fri, Apr 12, 2019 at 11:26:28PM +0200, Ludovic Courtès wrote:
>> Florian, it would be great if you could confirm. Just apply it on
>> ‘master’, and then run:
>>
>> ./pre-inst-env guix system disk-image --file-system-format=iso9660 \
>> gnu/system/install.scm
>>
>
> Yes, it seems fixed, I can confirm. Four rebuilds seem fine and are
> bootable in QEMU.

This is a happy end. :-)
Committed as 66ec389580d4f1e4b81e1c72afe2749a547a0e7c.

Thank you!

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 14 Apr 2019 23:43
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
87h8b0i7ol.fsf@gnu.org
Hi Thomas,

"Thomas Schmitt" <scdbackup@gmx.net> skribis:

Toggle quote (9 lines)
> Florian Pelz wrote:
>> Yes, it seems fixed, I can confirm.
>
> Way back in december, Ludovic Courtès wrote:
>>...> Based on this and on a suggestion Ricardo made on IRC, I passed
>>...> -padding 10m and that solved the problem. \o/
>
> Please do not forget to remove this -padding command.

Done in f6e3f0f9b1287eca120517a0161e3d0b1ed6ed44.

Toggle quote (4 lines)
> If all files and directories have the same names and the same content,
> then xorriso runs with the same arguments and the same SOURCE_DATE_EPOCH
> value are supposed to create byte-identical result ISOs.

I’ve tried setting it but that doesn’t make any difference.

How did you visualize differences, Florian? Diffoscope fails for me
here (missing tools and scalability issue.)

Toggle quote (8 lines)
> In december, i wrote:
>>...> > Creation Time: 1970010119010649
> Ludovic Courtès wrote:
>>...> For reproducibility purposes we set timestamps and related things
>>...> to the Epoch.
>
> Is this independent of SOURCE_DATE_EPOCH ?

Yes.

Thanks,
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 15 Apr 2019 08:07
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190415060737.aw2msuviarkrd66a@pelzflorian.localdomain
On Sun, Apr 14, 2019 at 11:43:54PM +0200, Ludovic Court�s wrote:
Toggle quote (4 lines)
> How did you visualize differences, Florian? Diffoscope fails for me
> here (missing tools and scalability issue.)
>

For me diffoscope failed too. I used cmp as described here:


and then looked at the addresses in ghex. It is not a nice method.
Sorry. It works though.

Regards,
Florian
T
T
Thomas Schmitt wrote on 15 Apr 2019 10:16
(address . bug-xorriso@gnu.org)
3082867220863987596@scdbackup.webframe.org
Hi,

I wrote:
Toggle quote (4 lines)
> > If all files and directories have the same names and the same content,
> > then xorriso runs with the same arguments and the same SOURCE_DATE_EPOCH
> > value are supposed to create byte-identical result ISOs.

Ludovic Courtès wrote:
Toggle quote (2 lines)
> I’ve tried setting it but that doesn’t make any difference.

We should investigate this ...
... yes, there is some problem. But not always.

Timestamps of the root directory differ after mapping to an address
that is not the ISO root directory (here: /x):

xorriso -outdev test.iso -map x /x
xorriso -outdev test2.iso -map x /x

but not after mapping to the root directory:

xorriso -outdev test.iso -map x /
xorriso -outdev test2.iso -map x /

This would explain why my tests for Debian ISOs do not show this problem.

Do i get it right that gnu/build/vm.scm maps no files to "/" but all to
deeper paths:
"etc=/tmp/root/etc"
"var=/tmp/root/var"
"run=/tmp/root/run"
I am unsure about
"-path-list" "-"


I will now dig into the source to find the reason and maybe a preliminary
remedy.


Toggle quote (2 lines)
> How did you visualize differences, Florian?

(I'm aware that i am not Florian.)

I made myself a little program "hxd" for combined hex-cleartext-decimal dump,
positional diff, and (not to be focused too much) CD-Text decoding.

===========================================================================

$ export SOURCE_DATE_EPOCH=$(date +%s)
$ xorriso -outdev test.iso -map x /x
...
xorriso : NOTE : Environment variable SOURCE_DATE_EPOCH encountered with value 1555311212
...
$ xorriso -outdev test2.iso -map x /x
...
xorriso : NOTE : Environment variable SOURCE_DATE_EPOCH encountered with value 1555311212
...
$ hxd -diff test.iso test2.iso

32944 : 15 7 38 43 0 2 0 0 1 0 0 1 1 0 32 32
& +
000080b0 : 0f 07 26 2b 00 02 00 00 01 00 00 01 01 00 20 20
###
000080b0 : 0f 07 26 36 00 02 00 00 01 00 00 01 01 00 20 20
& 6
32944 : 15 7 38 54 0 2 0 0 1 0 0 1 1 0 32 32

... more differences ...

===========================================================================

It looks like the root directory got the current timestamp. The other
differences are with the ".." directory entries of the directories in
the first level under "/".


The source of "hxd" is pure C, no special dependencies, 8141 bytes.
Shall i upload it somewhere ?


Have a nice day :)

Thomas
T
T
Thomas Schmitt wrote on 15 Apr 2019 10:35
(address . bug-xorriso@gnu.org)
3171667222963526138@scdbackup.webframe.org
Hi,

it seems to help if you explicitely set the timestamps of the "/" directory

export SOURCE_DATE_EPOCH=1555311212

xorriso -outdev test.iso -map x /x \
-alter_date b-c 1970010100000000 / -- \
-alter_date c 1970010100000000 / --

ISOs made with these xorriso commands match perfectly.

A bit more elegant than 1970 would be to use the seconds value from
SOURCE_DATE_EPOCH (prefix "=" announces date +%s format):

-alter_date b-c =$SOURCE_DATE_EPOCH / -- \
-alter_date c =$SOURCE_DATE_EPOCH / --

The -alter_date commands should be performed after all -map commands,
just to make sure that the timestamps do not get changed again.

I still need to find out where the current time sneaks in.
But this workaround should not do harm after the bug was corrected.


Have a nice day :)

Thomas
P
P
pelzflorian (Florian Pelz) wrote on 15 Apr 2019 18:54
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190415165451.dpzngealeisbibc7@pelzflorian.localdomain
On Sat, Apr 13, 2019 at 03:46:09PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (2 lines)
> Yes, it seems fixed, I can confirm.

Well this is strange. I got fine ISO images each time (fine with no
complaints from xorriso or fdisk and bootable in QEMU without errors),
but after dd’ing them to different USB flash drives each time I get
kernel output when inserting the flash drive:

[ 10.025223] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 10.026735] GPT:3220583 != 7831551
[ 10.028235] GPT:Alternate GPT header not at the end of the disk.
[ 10.029764] GPT:3220583 != 7831551
[ 10.031290] GPT: Use GNU Parted to correct GPT errors.


Having such a USB flash drive inside my computer makes UEFI get stuck
on some computers but not on others.

Why is this? Are all my USB drives bad? I presume this is a
different bug, or is it?

Regards,
Florian
T
T
Thomas Schmitt wrote on 15 Apr 2019 19:55
(address . bug-xorriso@gnu.org)
1582867226375139246@scdbackup.webframe.org
Hi,

Florian Pelz wrote:
Toggle quote (7 lines)
> Well this is strange. I got fine ISO images each time (fine with no
> complaints from xorriso or fdisk and bootable in QEMU without errors),
> but after dd’ing them to different USB flash drives each time I get
> kernel output when inserting the flash drive:
> [ 10.025223] GPT:Primary header thinks Alt. header is not at the end of
> the disk.

The alternative/backup header is a property of GPT which makes it
rather unsuitable for disk images. xorriso puts it correctly into the
last 512-byte block of the image. But when copied to a storage device,
it should move up to the last block of the device.
Even worse, the main GPT header at 512-byte LBA 1 needs to learn the
new address.

So i would rather advise to use a MBR partition table. Wonderfully dumb
and open ended.

I see from
that program grub-mkrescue is in control of xorrisofs boot options.
Vladimir Serbinenko decided for GPT with no mountable ISO partition.

The libisoburn repo and tarball have a wrapper script by which other
boot layouts can be derived from the options which grub-mkrescue hands
over to xorrisofs:


To get MBR instead of GPT do:

export MKRESCUE_SED_MODE=mbr_only
export MKRESCUE_SED_PROTECTIVE=""

and maybe

export MKRESCUE_SED_XORRISO=/...path/to/the/xorriso/binary/if/exotic...

Then start grub-mkrescue with the wrapper in the role of "xorriso":

grub-mkrescue --xorriso=...path/to/grub-mkrescue-sed.sh \
-partition_offset 16 \
-iso_mbr_part_type 0x83 \
\
...all.other.usual.arguments...

The mode "mbr_only" will move the EFI partition image out of the ISO
filesystem and rather append it after the ISO's end.

The option
-partition_offset 16
costs the space of a second superblock and directory tree. But it brings
as benefits:
- More normal partition layout with partition 1 starting at block 64
rather than at block 0.
- Nevertheless the partition 1 is mountable and shows the ISO content.
- The base device is mountable as the the same ISO too.
(The ISO superblock of the base device also serves on CD or DVD.)
- Th base device superblock claims not only the ISO in partition 1 but
also the EFI partition 2. So "/sbin/isosize" will tell the size of the
image file, not only of the ISO filesystem.

Option
-iso_mbr_part_type 0x83
chooses for partition 1 the MBR partitions type "Linux". (This is
purely ornamental. Nobody cares. But it looks good in partition editors.)

The partition layout of above wrapper run's output ISO will look like:

$ /sbin/fdisk -l output.iso
Disk output.iso: 16.5 MiB, 17338368 bytes, 33864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
output.iso1 * 64 28103 28040 13.7M 83 Linux
output.iso2 28104 33863 5760 2.8M ef EFI (FAT-12/16/32)

$ expr $(/sbin/isosize output.iso) / 512
33864


Have a nice day :)

Thomas
G
G
Gábor Boskovits wrote on 16 Apr 2019 11:57
(name . Thomas Schmitt)(address . scdbackup@gmx.net)
CAE4v=phJmiS77k_YZ25ObxQ14J3f1y+H65+AjJ9om42OCUs=5g@mail.gmail.com
Hello people,

Thomas Schmitt <scdbackup@gmx.net> ezt írta (id?pont: 2019. ápr. 15., H, 19:54):
Toggle quote (19 lines)
>
> Hi,
>
> Florian Pelz wrote:
> > Well this is strange. I got fine ISO images each time (fine with no
> > complaints from xorriso or fdisk and bootable in QEMU without errors),
> > but after dd’ing them to different USB flash drives each time I get
> > kernel output when inserting the flash drive:
> > [ 10.025223] GPT:Primary header thinks Alt. header is not at the end of
> > the disk.
>
> The alternative/backup header is a property of GPT which makes it
> rather unsuitable for disk images. xorriso puts it correctly into the
> last 512-byte block of the image. But when copied to a storage device,
> it should move up to the last block of the device.
> Even worse, the main GPT header at 512-byte LBA 1 needs to learn the
> new address.
>

Yes, this is a really painful point.

Could we create a simple tool to write the disk images to a disk
correcting this problem?

Does not look too hard?

I am also forwarding this to guix devel. I removed the xorriso bug
list, as I feel this does not belong there.

Best regards,
g_bor
L
L
Ludovic Courtès wrote on 16 Apr 2019 23:01
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87zhopbr5y.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (14 lines)
> On Sat, Apr 13, 2019 at 03:46:09PM +0200, pelzflorian (Florian Pelz) wrote:
>> Yes, it seems fixed, I can confirm.
>
> Well this is strange. I got fine ISO images each time (fine with no
> complaints from xorriso or fdisk and bootable in QEMU without errors),
> but after dd’ing them to different USB flash drives each time I get
> kernel output when inserting the flash drive:
>
> [ 10.025223] GPT:Primary header thinks Alt. header is not at the end of the disk.
> [ 10.026735] GPT:3220583 != 7831551
> [ 10.028235] GPT:Alternate GPT header not at the end of the disk.
> [ 10.029764] GPT:3220583 != 7831551
> [ 10.031290] GPT: Use GNU Parted to correct GPT errors.

Could it be simply due to the incorrect location of the GPT backup as
Thomas explained?

Toggle quote (3 lines)
> Having such a USB flash drive inside my computer makes UEFI get stuck
> on some computers but not on others.

So you cannot boot from these USB drives at all?

Thanks,
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 17 Apr 2019 11:03
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190417090358.6l6g5xuzpyjs5q7v@pelzflorian.localdomain
On Tue, Apr 16, 2019 at 11:01:45PM +0200, Ludovic Court�s wrote:
Toggle quote (7 lines)
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > Having such a USB flash drive inside my computer makes UEFI get stuck
> > on some computers but not on others.
>
> So you cannot boot from these USB drives at all?
>

No, I cannot boot from them on this Macbook. I wonder how I installed
Guix System here; it may have been on a Debian ISO.

Regards,
Florian
B
B
Brice Waegeneire wrote on 11 Dec 2019 18:19
(no subject)
(address . control@debbugs.gnu.org)
f62a2457e39b00adc81ae457baa0a950@waegenei.re
unarchive 33639
B
B
Brice Waegeneire wrote on 11 Dec 2019 18:21
Fixing the GPT errors from an installer on a USB stick
ed8be43c383b4c8291e1b456ff47ee1e@waegenei.re
I have the same issue as pelzflorian about the GPT errors.

To fix the GPT mismatch you just need to execute the following command,
where device is something like "/dev/sdc":
sudo fdisk "$device" <<EOF
w
EOF
?