Segfault in libssh during ‘guix copy’

  • Done
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • maxim.cournoyer
  • Artyom V. Poptsov
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
serious
Merged with
L
L
Ludovic Courtès wrote on 7 Aug 2020 11:24
Segfault in libssh during ‘guix copy’
(address . bug-guix@gnu.org)
871rkin6zi.fsf@inria.fr
Hi,

I observe the following segfault:

Toggle snippet (214 lines)
$ guix copy --to=olimex /gnu/store/… -v3 --debug=3
sending 66 store items (166 MiB) to 'A20-OLinuXino.local'...
exporting path `/gnu/store/als3v92k7l6ny44sci1x0p9x6d7z0ivp-mesa-20.0.8'
Adres-eraro(nekropsio el?utita)
$ guix describe
Generacio 152 Aug 04 2020 17:34:23 (nuna)
guix abe3c5e
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: abe3c5ed7d04985c987e6c81aeb1284354ea0c77
$ gdb $(type -P guile) core

[...]

[Current thread is 1 (Thread 0x7f947b084b80 (LWP 12777))]
(gdb) bt
#0 0x00007f94751c0185 in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#1 0x00007f94751c253d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#2 0x00007f9474a8db4a in gzip_compress (session=session@entry=0x1ff0b10, source=source@entry=0x1ff07e0,
level=<optimized out>) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:91
#3 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10, buf=0x1ff07e0)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
#4 0x00007f9474a6ab5f in packet_send2 (session=session@entry=0x1ff0b10)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632
#5 0x00007f9474a6ac32 in ssh_packet_send (session=session@entry=0x1ff0b10)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810
#6 0x00007f9474a54639 in channel_write_common (channel=0x1ff43a0, data=0x7f9477995020, len=65536, is_stderr=0)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1488
#7 0x00007f9474ad8a9e in write_to_channel_port ()
from /gnu/store/vj92bd6lcknylwka9v4n4h0i360n6vn4-guile-ssh-0.13.0/lib/libguile-ssh.so.13
#8 0x00007f947b749edc in scm_i_write_bytes (port=#<port #<port-type channel 7f9476af56c0> 7f9477c23e00>,
src="#<vu8vector>" = {...}, start=0, count=65536) at ports.c:2865
#9 0x00007f947b75186f in scm_put_bytevector (port=#<port #<port-type channel 7f9476af56c0> 7f9477c23e00>,
bv="#<vu8vector>" = {...}, start=<optimized out>, count=<optimized out>) at r6rs-ports.c:676
#10 0x00007f94750be427 in ?? ()
#11 0x00007f947ad60d80 in ?? ()
#12 0x00007f947b7ee620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.0.4/lib/libguile-3.0.so.1
#13 0x00007f947ad60d80 in ?? ()
#14 0x00007f947b72743b in scm_jit_enter_mcode (thread=0x7f947ad60d80, thread@entry=0x28,
mcode=0x7f94750c63a0 "I\211\314I)\304I\203\374\020\017\214k ") at jit.c:5852
#15 0x00007f947b7828e9 in vm_regular_engine (thread=0x28) at vm-engine.c:360
#16 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff14699b38, nargs=nargs@entry=1)
at vm.c:1608
#17 0x00007f947b700c97 in scm_primitive_eval (exp=<optimized out>) at eval.c:671
#18 0x00007f947b7290fb in scm_primitive_load (filename=<optimized out>) at load.c:131
#19 0x00007f947b782151 in vm_regular_engine (thread=0x7f947ad60d80) at vm-engine.c:972
#20 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff14699d08, nargs=nargs@entry=1)
at vm.c:1608
#21 0x00007f947b700c97 in scm_primitive_eval (exp=<optimized out>,
exp@entry=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/home/ludo/.config/guix/current/bin/guix") (quit)))) at eval.c:671
#22 0x00007f947b700cf3 in scm_eval (
exp=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/home/ludo/.config/guix/current/bin/guix") (quit))), module_or_state=module_or_state@entry="#<struct module>" = {...}) at eval.c:705
#23 0x00007f947b7595a0 in scm_shell (argc=8, argv=0x7fff1469a378) at script.c:357
#24 0x00007f947b7186ad in invoke_main_func (body_data=0x7fff1469a210) at init.c:308
#25 0x00007f947b6fab2a in c_body (d=0x7fff1469a150) at continuations.c:430
#26 0x00007f947b782151 in vm_regular_engine (thread=0x7f947ad60d80) at vm-engine.c:972
#27 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff14699f10, nargs=nargs@entry=2)
at vm.c:1608
#28 0x00007f947b6ffb2a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
at eval.c:503
#29 0x00007f947b70132a in scm_c_with_exception_handler (type=type@entry=#t,
handler=handler@entry=0x7f947b7787e0 <catch_post_unwind_handler>,
handler_data=handler_data@entry=0x7fff1469a080, thunk=thunk@entry=0x7f947b778920 <catch_body>,
thunk_data=thunk_data@entry=0x7fff1469a080) at exceptions.c:170
#30 0x00007f947b778b1d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f947b6fab20 <c_body>,
body_data=body_data@entry=0x7fff1469a150, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
handler_data=handler_data@entry=0x7fff1469a150,
pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
pre_unwind_handler_data=0x7f9479037240) at throw.c:168
#31 0x00007f947b6fb0d3 in scm_i_with_continuation_barrier (body=body@entry=0x7f947b6fab20 <c_body>,
body_data=body_data@entry=0x7fff1469a150, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
handler_data=handler_data@entry=0x7fff1469a150,
pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
pre_unwind_handler_data=0x7f9479037240) at continuations.c:368
#32 0x00007f947b6fb165 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
at continuations.c:464
#33 0x00007f947b7775bf in with_guile (base=0x7fff1469a1b8, data=0x7fff1469a1e0) at threads.c:645
#34 0x00007f947b65ca68 in GC_call_with_stack_base ()
from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1
#35 0x00007f947b7778d8 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7fff1469a1e0,
func=func@entry=0x7f947b718690 <invoke_main_func>) at threads.c:688
#36 scm_with_guile (func=func@entry=0x7f947b718690 <invoke_main_func>, data=data@entry=0x7fff1469a210)
at threads.c:694
#37 0x00007f947b718822 in scm_boot_guile (argc=argc@entry=8, argv=argv@entry=0x7fff1469a378,
main_func=main_func@entry=0x401240 <inner_main>, closure=closure@entry=0x0) at init.c:291
#38 0x0000000000401100 in main (argc=8, argv=0x7fff1469a378) at guile.c:95
(gdb) frame 3
#3 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10, buf=0x1ff07e0)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
112 /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c: Dosiero a? dosierujo ne ekzistas.
(gdb) p *session
$1 = {common = {error = {error_code = 0, error_buffer = '\000' <repeats 1023 times>}, callbacks = 0x0,
log_verbosity = 0}, socket = 0x1ff0720, serverbanner = 0x1ff1e00 "SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u1",
clientbanner = 0x1ff15d0 "SSH-2.0-libssh_0.9.3", protoversion = 0, server = 0, client = 1, openssh = 395008,
send_seq = 437, recv_seq = 157, last_rekey_time = {seconds = 0, useconds = 0}, connected = 1, alive = 1,
flags = 3, extensions = 0, banner = 0x0, discon_msg = 0x0, in_buffer = 0x1ff0810, in_packet = {valid = 1,
len = 0, type = 93 ']'}, out_buffer = 0x1ff07e0, out_queue = 0x1ff03f0,
pending_call_state = SSH_PENDING_CALL_NONE, session_state = SSH_SESSION_STATE_AUTHENTICATED,
packet_state = PACKET_STATE_INIT, dh_handshake_state = DH_STATE_FINISHED,
global_req_state = SSH_CHANNEL_REQ_STATE_NONE, agent_state = 0x0, auth = {auto_state = 0x1ff72b0,
service_state = SSH_AUTH_SERVICE_ACCEPTED, state = SSH_AUTH_STATE_SUCCESS, supported_methods = 6,
current_method = 0}, first_kex_follows_guess_wrong = 0, in_hashbuf = 0x0, out_hashbuf = 0x0,
current_crypto = 0x1ff0480, next_crypto = 0x1ff7b90, channels = 0x1ff7290, maxchannel = 45, agent = 0x1ff0840,
kbdint = 0x0, gssapi = 0x0, srv = {rsa_key = 0x0, dsa_key = 0x0, ecdsa_key = 0x0, ed25519_key = 0x0,
hostkey = SSH_KEYTYPE_UNKNOWN, hostkey_digest = SSH_DIGEST_AUTO}, ssh_message_list = 0x0,
ssh_message_callback = 0x0, ssh_message_callback_data = 0x0, server_callbacks = 0x0,
ssh_connection_callback = 0x7f9474a58b30 <ssh_client_connection_callback>, default_packet_callbacks = {
start = 1 '\001', n_callbacks = 100 'd', callbacks = 0x7f9474acb380 <default_packet_handlers>,
user = 0x1ff0b10}, packet_callbacks = 0x1ff1610, socket_callbacks = {userdata = 0x1ff0b10,
data = 0x7f9474a6b030 <ssh_packet_socket_callback>,
controlflow = 0x7f9474a6a0e0 <ssh_packet_socket_controlflow_callback>,
exception = 0x7f9474a72830 <ssh_socket_exception_callback>, connected = 0x0}, default_poll_ctx = 0x1ff1dd0,
pcap_ctx = 0x0, opts = {identity = 0x1ff1340, username = 0x1ff14a0 "olimex",
host = 0x1ff15f0 "A20-OLinuXino.local", bindaddr = 0x0, sshdir = 0x1ff1660 "/home/ludo/.ssh",
knownhosts = 0x1ff1680 "/home/ludo/.ssh/known_hosts",
global_knownhosts = 0x1ff15a0 "/etc/ssh/ssh_known_hosts", wanted_methods = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x1ff1520 "zlib@openssh.com,zlib", 0x1ff1630 "zlib@openssh.com,zlib", 0x0, 0x0},
pubkey_accepted_types = 0x0, ProxyCommand = 0x0, custombanner = 0x0, timeout = 3600, timeout_usec = 0,
port = 22, fd = -1, StrictHostKeyChecking = 1, compressionlevel = 3 '\003', gss_server_identity = 0x0,
gss_client_identity = 0x0, gss_delegate_creds = 0, flags = 15, nodelay = 1, config_processed = true,
options_seen = "\000\000\001\000\001", '\000' <repeats 24 times>, rekey_data = 0, rekey_time = 0},
socket_counter = 0x0, raw_counter = 0x0}
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f947b084b80 (LWP 12777) 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10,
buf=0x1ff07e0) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
2 Thread 0x7f947ad5e700 (LWP 12778) warning: Unexpected size of section `.reg-xstate/12778' in core file.
0x00007f947b62b94c in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x7f947b6836ec <mark_cv+44>) at ../sysdeps/nptl/futex-internal.h:183
3 Thread 0x7f9478a9b700 (LWP 12781) warning: Unexpected size of section `.reg-xstate/12781' in core file.
0x00007f94751c0782 in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
4 Thread 0x7f947a3cd700 (LWP 12779) warning: Unexpected size of section `.reg-xstate/12779' in core file.
0x00007f947b62b94c in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x7f947b6836ec <mark_cv+44>) at ../sysdeps/nptl/futex-internal.h:183
5 Thread 0x7f94767d6700 (LWP 12782) warning: Unexpected size of section `.reg-xstate/12782' in core file.
0x00007f947b62f0a4 in __libc_read (fd=11, buf=buf@entry=0x7f94767d5660,
nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:26
6 Thread 0x7f9479a3c700 (LWP 12780) warning: Unexpected size of section `.reg-xstate/12780' in core file.
0x00007f947b62b94c in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x7f947b6836ec <mark_cv+44>) at ../sysdeps/nptl/futex-internal.h:183
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f9478a9b700 (LWP 12781))]
#0 0x00007f94751c0782 in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
(gdb) bt
#0 0x00007f94751c0782 in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#1 0x00007f94751c253d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#2 0x00007f9474a8db4a in gzip_compress (session=session@entry=0x1ff0b10, source=source@entry=0x1ff07e0,
level=<optimized out>) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:91
#3 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10, buf=0x1ff07e0)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
#4 0x00007f9474a6ab5f in packet_send2 (session=session@entry=0x1ff0b10)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632
#5 0x00007f9474a6ac32 in ssh_packet_send (session=session@entry=0x1ff0b10)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810
#6 0x00007f9474a56178 in ssh_channel_send_eof (channel=channel@entry=0x1ff1fd0)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1250
#7 0x00007f9474a5623b in ssh_channel_close (channel=0x1ff1fd0)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1301
#8 0x00007f9474ad8a56 in ptob_close ()
from /gnu/store/vj92bd6lcknylwka9v4n4h0i360n6vn4-guile-ssh-0.13.0/lib/libguile-ssh.so.13
#9 0x00007f947b747153 in release_port (port=#<port #<port-type channel 7f9476af56c0> 7f9474e9d060>)
at ports.c:165
#10 0x00007f947b74a19b in close_port (port=#<port #<port-type channel 7f9476af56c0> 7f9474e9d060>,
explicit=<optimized out>) at ports.c:893
#11 0x00007f947b70132a in scm_c_with_exception_handler (type=type@entry=#t,
handler=handler@entry=0x7f947b7787e0 <catch_post_unwind_handler>,
handler_data=handler_data@entry=0x7f9478a9a970, thunk=thunk@entry=0x7f947b778920 <catch_body>,
thunk_data=thunk_data@entry=0x7f9478a9a970) at exceptions.c:170
#12 0x00007f947b778b1d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f947b74a200 <do_close>,
body_data=<optimized out>, handler=<optimized out>, handler_data=handler_data@entry=0x0,
pre_unwind_handler=pre_unwind_handler@entry=0x0, pre_unwind_handler_data=0x0) at throw.c:168
#13 0x00007f947b778b3e in scm_internal_catch (tag=tag@entry=#t, body=body@entry=0x7f947b74a200 <do_close>,
body_data=<optimized out>, handler=<optimized out>, handler_data=handler_data@entry=0x0) at throw.c:177
#14 0x00007f947b745d84 in finalize_port (ptr=<optimized out>, data=<optimized out>) at ports.c:710
#15 0x00007f947b6556ef in GC_invoke_finalizers ()
from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1
#16 0x00007f947b709e79 in scm_run_finalizers () at finalizers.c:399
#17 0x00007f947b709efd in finalization_thread_proc (unused=<optimized out>) at finalizers.c:234
#18 0x00007f947b6fab2a in c_body (d=0x7f9478a9ae50) at continuations.c:430
#19 0x00007f947b782151 in vm_regular_engine (thread=0x7f947ad60b40) at vm-engine.c:972
#20 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7f9478a9ac10, nargs=nargs@entry=2)
at vm.c:1608
#21 0x00007f947b6ffb2a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
at eval.c:503
#22 0x00007f947b70132a in scm_c_with_exception_handler (type=type@entry=#t,
handler=handler@entry=0x7f947b7787e0 <catch_post_unwind_handler>,
handler_data=handler_data@entry=0x7f9478a9ad80, thunk=thunk@entry=0x7f947b778920 <catch_body>,
thunk_data=thunk_data@entry=0x7f9478a9ad80) at exceptions.c:170
#23 0x00007f947b778b1d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f947b6fab20 <c_body>,
body_data=body_data@entry=0x7f9478a9ae50, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
handler_data=handler_data@entry=0x7f9478a9ae50,
pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
pre_unwind_handler_data=0x7f9479037240) at throw.c:168
#24 0x00007f947b6fb0d3 in scm_i_with_continuation_barrier (body=body@entry=0x7f947b6fab20 <c_body>,
body_data=body_data@entry=0x7f9478a9ae50, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
handler_data=handler_data@entry=0x7f9478a9ae50,
pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
pre_unwind_handler_data=0x7f9479037240) at continuations.c:368
#25 0x00007f947b6fb165 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
at continuations.c:464
#26 0x00007f947b7775bf in with_guile (base=0x7f9478a9aeb8, data=0x7f9478a9aee0) at threads.c:645
#27 0x00007f947b65ca68 in GC_call_with_stack_base ()
from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1
#28 0x00007f947b7778d8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>,
func=<optimized out>) at threads.c:688
#29 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#30 0x00007f947b624f64 in start_thread (arg=0x7f9478a9b700) at pthread_create.c:477
#31 0x00007f947b17c9af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This looks similar to https://issues.guix.gnu.org/26976#11: we have a
finalization thread closing a channel port that is still being used.
This may be a regression in Guile-SSH 0.13.0.

Does that ring a bell, Artyom?

Thanks,
Ludo’.
A
A
Artyom V. Poptsov wrote on 7 Aug 2020 20:36
Re: bug#42740: Segfault in libssh during ‘guix c opy’
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 42740@debbugs.gnu.org)
87364yb8vr.fsf@gmail.com
Hello Ludovic,

Toggle quote (2 lines)
> Does that ring a bell, Artyom?

Yes. I'll see what I can do to fix that.

- Artyom

--
Artyom "avp" Poptsov <poptsov.artyom@gmail.com>
CADR Hackerspace co-founder: https://cadrspace.ru/
GPG: D0C2 EAC1 3310 822D 98DE B57C E9C5 A2D9 0898 A02F
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEE0MLqwTMQgi2Y3rV86cWi2QiYoC8FAl8tn0gACgkQ6cWi2QiY
oC+wcggApAtdVdeszxXgqD9q/I2TD+jbarKlyIzmiqH/XyTMOKxjyTjaBi8oL7jN
mhf8VAxzISDGurT3kGoeDG6QUm8myLFVFdRggt8Flon+ThZl6yTYw56yE1Ie8nCZ
ONBn2w3vuDqG5RB83ayT1hi/K5hBxSixrwWnrcvgPrilIdc0sgGXxiw6LO64KXxR
sQ3ia1bF6vX9GagTNlpn6PHKb0IJeCBEV5Lxm3z8oXit+VY9CTsRrlJ2V8lQBa9h
l9JFEjTXSDqZYwoRegXthxWOCZGtsC65letpoxZJwIxJFw8UnUfrqqlYRmNOE6Wm
nMg58foSBFopmbZkosRAyCOtEMcBTw==
=2QA1
-----END PGP SIGNATURE-----

A
A
Artyom Poptsov wrote on 9 Aug 2020 10:48
Re: bug#42740: Segfault in libssh during ‘guix cop y’
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 42740@debbugs.gnu.org)
CAN9MppvbwojKtz8gLgxSRwscM3H_VDxfZ6MGkyinASx+H2GAxQ@mail.gmail.com
Hello Ludovic,

please check if this branch will work without segfaults in Guix:

Key changes:

- Channels are now protecting the parent session from GC'ing.

- Every channel procedure now ensures that the parent session is
connected before calling any libssh procedures upon a channel
instance. The idea is that a channel cannot be created when a session
is disconnected and when channel is present and the session is closed,
it means that the session is disconnected and freed.

- Artyom

On Fri, 7 Aug 2020 at 12:25, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (232 lines)
>
> Hi,
>
> I observe the following segfault:
>
> --8<---------------cut here---------------start------------->8---
> $ guix copy --to=olimex /gnu/store/… -v3 --debug=3
> sending 66 store items (166 MiB) to 'A20-OLinuXino.local'...
> exporting path `/gnu/store/als3v92k7l6ny44sci1x0p9x6d7z0ivp-mesa-20.0.8'
> Adres-eraro(nekropsio el?utita)
> $ guix describe
> Generacio 152 Aug 04 2020 17:34:23 (nuna)
> guix abe3c5e
> repository URL: https://git.savannah.gnu.org/git/guix.git
> branch: master
> commit: abe3c5ed7d04985c987e6c81aeb1284354ea0c77
> $ gdb $(type -P guile) core
>
> [...]
>
> [Current thread is 1 (Thread 0x7f947b084b80 (LWP 12777))]
> (gdb) bt
> #0 0x00007f94751c0185 in deflate_fast ()
> from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
> #1 0x00007f94751c253d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
> #2 0x00007f9474a8db4a in gzip_compress (session=session@entry=0x1ff0b10, source=source@entry=0x1ff07e0,
> level=<optimized out>) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:91
> #3 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10, buf=0x1ff07e0)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
> #4 0x00007f9474a6ab5f in packet_send2 (session=session@entry=0x1ff0b10)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632
> #5 0x00007f9474a6ac32 in ssh_packet_send (session=session@entry=0x1ff0b10)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810
> #6 0x00007f9474a54639 in channel_write_common (channel=0x1ff43a0, data=0x7f9477995020, len=65536, is_stderr=0)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1488
> #7 0x00007f9474ad8a9e in write_to_channel_port ()
> from /gnu/store/vj92bd6lcknylwka9v4n4h0i360n6vn4-guile-ssh-0.13.0/lib/libguile-ssh.so.13
> #8 0x00007f947b749edc in scm_i_write_bytes (port=#<port #<port-type channel 7f9476af56c0> 7f9477c23e00>,
> src="#<vu8vector>" = {...}, start=0, count=65536) at ports.c:2865
> #9 0x00007f947b75186f in scm_put_bytevector (port=#<port #<port-type channel 7f9476af56c0> 7f9477c23e00>,
> bv="#<vu8vector>" = {...}, start=<optimized out>, count=<optimized out>) at r6rs-ports.c:676
> #10 0x00007f94750be427 in ?? ()
> #11 0x00007f947ad60d80 in ?? ()
> #12 0x00007f947b7ee620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.0.4/lib/libguile-3.0.so.1
> #13 0x00007f947ad60d80 in ?? ()
> #14 0x00007f947b72743b in scm_jit_enter_mcode (thread=0x7f947ad60d80, thread@entry=0x28,
> mcode=0x7f94750c63a0 "I\211\314I)\304I\203\374\020\017\214k ") at jit.c:5852
> #15 0x00007f947b7828e9 in vm_regular_engine (thread=0x28) at vm-engine.c:360
> #16 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff14699b38, nargs=nargs@entry=1)
> at vm.c:1608
> #17 0x00007f947b700c97 in scm_primitive_eval (exp=<optimized out>) at eval.c:671
> #18 0x00007f947b7290fb in scm_primitive_load (filename=<optimized out>) at load.c:131
> #19 0x00007f947b782151 in vm_regular_engine (thread=0x7f947ad60d80) at vm-engine.c:972
> #20 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff14699d08, nargs=nargs@entry=1)
> at vm.c:1608
> #21 0x00007f947b700c97 in scm_primitive_eval (exp=<optimized out>,
> exp@entry=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/home/ludo/.config/guix/current/bin/guix") (quit)))) at eval.c:671
> #22 0x00007f947b700cf3 in scm_eval (
> exp=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/home/ludo/.config/guix/current/bin/guix") (quit))), module_or_state=module_or_state@entry="#<struct module>" = {...}) at eval.c:705
> #23 0x00007f947b7595a0 in scm_shell (argc=8, argv=0x7fff1469a378) at script.c:357
> #24 0x00007f947b7186ad in invoke_main_func (body_data=0x7fff1469a210) at init.c:308
> #25 0x00007f947b6fab2a in c_body (d=0x7fff1469a150) at continuations.c:430
> #26 0x00007f947b782151 in vm_regular_engine (thread=0x7f947ad60d80) at vm-engine.c:972
> #27 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff14699f10, nargs=nargs@entry=2)
> at vm.c:1608
> #28 0x00007f947b6ffb2a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
> at eval.c:503
> #29 0x00007f947b70132a in scm_c_with_exception_handler (type=type@entry=#t,
> handler=handler@entry=0x7f947b7787e0 <catch_post_unwind_handler>,
> handler_data=handler_data@entry=0x7fff1469a080, thunk=thunk@entry=0x7f947b778920 <catch_body>,
> thunk_data=thunk_data@entry=0x7fff1469a080) at exceptions.c:170
> #30 0x00007f947b778b1d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f947b6fab20 <c_body>,
> body_data=body_data@entry=0x7fff1469a150, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
> handler_data=handler_data@entry=0x7fff1469a150,
> pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
> pre_unwind_handler_data=0x7f9479037240) at throw.c:168
> #31 0x00007f947b6fb0d3 in scm_i_with_continuation_barrier (body=body@entry=0x7f947b6fab20 <c_body>,
> body_data=body_data@entry=0x7fff1469a150, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
> handler_data=handler_data@entry=0x7fff1469a150,
> pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
> pre_unwind_handler_data=0x7f9479037240) at continuations.c:368
> #32 0x00007f947b6fb165 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
> at continuations.c:464
> #33 0x00007f947b7775bf in with_guile (base=0x7fff1469a1b8, data=0x7fff1469a1e0) at threads.c:645
> #34 0x00007f947b65ca68 in GC_call_with_stack_base ()
> from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1
> #35 0x00007f947b7778d8 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7fff1469a1e0,
> func=func@entry=0x7f947b718690 <invoke_main_func>) at threads.c:688
> #36 scm_with_guile (func=func@entry=0x7f947b718690 <invoke_main_func>, data=data@entry=0x7fff1469a210)
> at threads.c:694
> #37 0x00007f947b718822 in scm_boot_guile (argc=argc@entry=8, argv=argv@entry=0x7fff1469a378,
> main_func=main_func@entry=0x401240 <inner_main>, closure=closure@entry=0x0) at init.c:291
> #38 0x0000000000401100 in main (argc=8, argv=0x7fff1469a378) at guile.c:95
> (gdb) frame 3
> #3 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10, buf=0x1ff07e0)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
> 112 /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c: Dosiero a? dosierujo ne ekzistas.
> (gdb) p *session
> $1 = {common = {error = {error_code = 0, error_buffer = '\000' <repeats 1023 times>}, callbacks = 0x0,
> log_verbosity = 0}, socket = 0x1ff0720, serverbanner = 0x1ff1e00 "SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u1",
> clientbanner = 0x1ff15d0 "SSH-2.0-libssh_0.9.3", protoversion = 0, server = 0, client = 1, openssh = 395008,
> send_seq = 437, recv_seq = 157, last_rekey_time = {seconds = 0, useconds = 0}, connected = 1, alive = 1,
> flags = 3, extensions = 0, banner = 0x0, discon_msg = 0x0, in_buffer = 0x1ff0810, in_packet = {valid = 1,
> len = 0, type = 93 ']'}, out_buffer = 0x1ff07e0, out_queue = 0x1ff03f0,
> pending_call_state = SSH_PENDING_CALL_NONE, session_state = SSH_SESSION_STATE_AUTHENTICATED,
> packet_state = PACKET_STATE_INIT, dh_handshake_state = DH_STATE_FINISHED,
> global_req_state = SSH_CHANNEL_REQ_STATE_NONE, agent_state = 0x0, auth = {auto_state = 0x1ff72b0,
> service_state = SSH_AUTH_SERVICE_ACCEPTED, state = SSH_AUTH_STATE_SUCCESS, supported_methods = 6,
> current_method = 0}, first_kex_follows_guess_wrong = 0, in_hashbuf = 0x0, out_hashbuf = 0x0,
> current_crypto = 0x1ff0480, next_crypto = 0x1ff7b90, channels = 0x1ff7290, maxchannel = 45, agent = 0x1ff0840,
> kbdint = 0x0, gssapi = 0x0, srv = {rsa_key = 0x0, dsa_key = 0x0, ecdsa_key = 0x0, ed25519_key = 0x0,
> hostkey = SSH_KEYTYPE_UNKNOWN, hostkey_digest = SSH_DIGEST_AUTO}, ssh_message_list = 0x0,
> ssh_message_callback = 0x0, ssh_message_callback_data = 0x0, server_callbacks = 0x0,
> ssh_connection_callback = 0x7f9474a58b30 <ssh_client_connection_callback>, default_packet_callbacks = {
> start = 1 '\001', n_callbacks = 100 'd', callbacks = 0x7f9474acb380 <default_packet_handlers>,
> user = 0x1ff0b10}, packet_callbacks = 0x1ff1610, socket_callbacks = {userdata = 0x1ff0b10,
> data = 0x7f9474a6b030 <ssh_packet_socket_callback>,
> controlflow = 0x7f9474a6a0e0 <ssh_packet_socket_controlflow_callback>,
> exception = 0x7f9474a72830 <ssh_socket_exception_callback>, connected = 0x0}, default_poll_ctx = 0x1ff1dd0,
> pcap_ctx = 0x0, opts = {identity = 0x1ff1340, username = 0x1ff14a0 "olimex",
> host = 0x1ff15f0 "A20-OLinuXino.local", bindaddr = 0x0, sshdir = 0x1ff1660 "/home/ludo/.ssh",
> knownhosts = 0x1ff1680 "/home/ludo/.ssh/known_hosts",
> global_knownhosts = 0x1ff15a0 "/etc/ssh/ssh_known_hosts", wanted_methods = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x1ff1520 "zlib@openssh.com,zlib", 0x1ff1630 "zlib@openssh.com,zlib", 0x0, 0x0},
> pubkey_accepted_types = 0x0, ProxyCommand = 0x0, custombanner = 0x0, timeout = 3600, timeout_usec = 0,
> port = 22, fd = -1, StrictHostKeyChecking = 1, compressionlevel = 3 '\003', gss_server_identity = 0x0,
> gss_client_identity = 0x0, gss_delegate_creds = 0, flags = 15, nodelay = 1, config_processed = true,
> options_seen = "\000\000\001\000\001", '\000' <repeats 24 times>, rekey_data = 0, rekey_time = 0},
> socket_counter = 0x0, raw_counter = 0x0}
> (gdb) info threads
> Id Target Id Frame
> * 1 Thread 0x7f947b084b80 (LWP 12777) 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10,
> buf=0x1ff07e0) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
> 2 Thread 0x7f947ad5e700 (LWP 12778) warning: Unexpected size of section `.reg-xstate/12778' in core file.
> 0x00007f947b62b94c in futex_wait_cancelable (private=<optimized out>,
> expected=0, futex_word=0x7f947b6836ec <mark_cv+44>) at ../sysdeps/nptl/futex-internal.h:183
> 3 Thread 0x7f9478a9b700 (LWP 12781) warning: Unexpected size of section `.reg-xstate/12781' in core file.
> 0x00007f94751c0782 in deflate_fast ()
> from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
> 4 Thread 0x7f947a3cd700 (LWP 12779) warning: Unexpected size of section `.reg-xstate/12779' in core file.
> 0x00007f947b62b94c in futex_wait_cancelable (private=<optimized out>,
> expected=0, futex_word=0x7f947b6836ec <mark_cv+44>) at ../sysdeps/nptl/futex-internal.h:183
> 5 Thread 0x7f94767d6700 (LWP 12782) warning: Unexpected size of section `.reg-xstate/12782' in core file.
> 0x00007f947b62f0a4 in __libc_read (fd=11, buf=buf@entry=0x7f94767d5660,
> nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:26
> 6 Thread 0x7f9479a3c700 (LWP 12780) warning: Unexpected size of section `.reg-xstate/12780' in core file.
> 0x00007f947b62b94c in futex_wait_cancelable (private=<optimized out>,
> expected=0, futex_word=0x7f947b6836ec <mark_cv+44>) at ../sysdeps/nptl/futex-internal.h:183
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7f9478a9b700 (LWP 12781))]
> #0 0x00007f94751c0782 in deflate_fast ()
> from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
> (gdb) bt
> #0 0x00007f94751c0782 in deflate_fast ()
> from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
> #1 0x00007f94751c253d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
> #2 0x00007f9474a8db4a in gzip_compress (session=session@entry=0x1ff0b10, source=source@entry=0x1ff07e0,
> level=<optimized out>) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:91
> #3 0x00007f9474a8de83 in compress_buffer (session=session@entry=0x1ff0b10, buf=0x1ff07e0)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
> #4 0x00007f9474a6ab5f in packet_send2 (session=session@entry=0x1ff0b10)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632
> #5 0x00007f9474a6ac32 in ssh_packet_send (session=session@entry=0x1ff0b10)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810
> #6 0x00007f9474a56178 in ssh_channel_send_eof (channel=channel@entry=0x1ff1fd0)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1250
> #7 0x00007f9474a5623b in ssh_channel_close (channel=0x1ff1fd0)
> at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1301
> #8 0x00007f9474ad8a56 in ptob_close ()
> from /gnu/store/vj92bd6lcknylwka9v4n4h0i360n6vn4-guile-ssh-0.13.0/lib/libguile-ssh.so.13
> #9 0x00007f947b747153 in release_port (port=#<port #<port-type channel 7f9476af56c0> 7f9474e9d060>)
> at ports.c:165
> #10 0x00007f947b74a19b in close_port (port=#<port #<port-type channel 7f9476af56c0> 7f9474e9d060>,
> explicit=<optimized out>) at ports.c:893
> #11 0x00007f947b70132a in scm_c_with_exception_handler (type=type@entry=#t,
> handler=handler@entry=0x7f947b7787e0 <catch_post_unwind_handler>,
> handler_data=handler_data@entry=0x7f9478a9a970, thunk=thunk@entry=0x7f947b778920 <catch_body>,
> thunk_data=thunk_data@entry=0x7f9478a9a970) at exceptions.c:170
> #12 0x00007f947b778b1d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f947b74a200 <do_close>,
> body_data=<optimized out>, handler=<optimized out>, handler_data=handler_data@entry=0x0,
> pre_unwind_handler=pre_unwind_handler@entry=0x0, pre_unwind_handler_data=0x0) at throw.c:168
> #13 0x00007f947b778b3e in scm_internal_catch (tag=tag@entry=#t, body=body@entry=0x7f947b74a200 <do_close>,
> body_data=<optimized out>, handler=<optimized out>, handler_data=handler_data@entry=0x0) at throw.c:177
> #14 0x00007f947b745d84 in finalize_port (ptr=<optimized out>, data=<optimized out>) at ports.c:710
> #15 0x00007f947b6556ef in GC_invoke_finalizers ()
> from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1
> #16 0x00007f947b709e79 in scm_run_finalizers () at finalizers.c:399
> #17 0x00007f947b709efd in finalization_thread_proc (unused=<optimized out>) at finalizers.c:234
> #18 0x00007f947b6fab2a in c_body (d=0x7f9478a9ae50) at continuations.c:430
> #19 0x00007f947b782151 in vm_regular_engine (thread=0x7f947ad60b40) at vm-engine.c:972
> #20 0x00007f947b7835b5 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7f9478a9ac10, nargs=nargs@entry=2)
> at vm.c:1608
> #21 0x00007f947b6ffb2a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
> at eval.c:503
> #22 0x00007f947b70132a in scm_c_with_exception_handler (type=type@entry=#t,
> handler=handler@entry=0x7f947b7787e0 <catch_post_unwind_handler>,
> handler_data=handler_data@entry=0x7f9478a9ad80, thunk=thunk@entry=0x7f947b778920 <catch_body>,
> thunk_data=thunk_data@entry=0x7f9478a9ad80) at exceptions.c:170
> #23 0x00007f947b778b1d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f947b6fab20 <c_body>,
> body_data=body_data@entry=0x7f9478a9ae50, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
> handler_data=handler_data@entry=0x7f9478a9ae50,
> pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
> pre_unwind_handler_data=0x7f9479037240) at throw.c:168
> #24 0x00007f947b6fb0d3 in scm_i_with_continuation_barrier (body=body@entry=0x7f947b6fab20 <c_body>,
> body_data=body_data@entry=0x7f9478a9ae50, handler=handler@entry=0x7f947b6fadc0 <c_handler>,
> handler_data=handler_data@entry=0x7f9478a9ae50,
> pre_unwind_handler=pre_unwind_handler@entry=0x7f947b6fac20 <pre_unwind_handler>,
> pre_unwind_handler_data=0x7f9479037240) at continuations.c:368
> #25 0x00007f947b6fb165 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
> at continuations.c:464
> #26 0x00007f947b7775bf in with_guile (base=0x7f9478a9aeb8, data=0x7f9478a9aee0) at threads.c:645
> #27 0x00007f947b65ca68 in GC_call_with_stack_base ()
> from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1
> #28 0x00007f947b7778d8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>,
> func=<optimized out>) at threads.c:688
> #29 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
> #30 0x00007f947b624f64 in start_thread (arg=0x7f9478a9b700) at pthread_create.c:477
> #31 0x00007f947b17c9af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> --8<---------------cut here---------------end--------------->8---
>
> This looks similar to <https://issues.guix.gnu.org/26976#11>: we have a
> finalization thread closing a channel port that is still being used.
> This may be a regression in Guile-SSH 0.13.0.
>
> Does that ring a bell, Artyom?
>
> Thanks,
> Ludo’.
>
>


--
Artyom V. Poptsov <poptsov.artyom@gmail.com>
CADR Hackerspace co-founder: https://cadrspace.ru/
GPG: D0C2 EAC1 3310 822D 98DE B57C E9C5 A2D9 0898 A02F
M
M
maxim.cournoyer wrote on 14 Aug 2020 07:24
control message for bug #42858
(address . control@debbugs.gnu.org)
87o8ndu7e1.fsf@hurd.i-did-not-set--mail-host-address--so-tickle-me
merge 42858 42740
quit
M
M
Maxim Cournoyer wrote on 17 Aug 2020 05:31
Re: bug#42740: Segfault in libssh during ‘guix c opy’
(name . Artyom Poptsov)(address . poptsov.artyom@gmail.com)
87zh6uvtga.fsf@gmail.com
Hello Artyom and Ludovic,

Artyom Poptsov <poptsov.artyom@gmail.com> writes:

Toggle quote (5 lines)
> Hello Ludovic,
>
> please check if this branch will work without segfaults in Guix:
> https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-segfaults-on-gc

I tested this branch and it indeed resolves the issue. I went ahead and
updated the guile-ssh package in Guix to this commit. It seems a
guile-ssh release is proper :-)

Closing. Thanks for fixing the issue!

Maxim
Closed
L
L
Ludovic Courtès wrote on 23 Aug 2020 18:31
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
877dtpb9yj.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (13 lines)
> Artyom Poptsov <poptsov.artyom@gmail.com> writes:
>
>> Hello Ludovic,
>>
>> please check if this branch will work without segfaults in Guix:
>> https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-segfaults-on-gc
>
> I tested this branch and it indeed resolves the issue. I went ahead and
> updated the guile-ssh package in Guix to this commit. It seems a
> guile-ssh release is proper :-)
>
> Closing. Thanks for fixing the issue!

Awesome, thanks to the two of you for the quick fixes!

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 29 Aug 2020 15:00
control message for bug #42740
(address . control@debbugs.gnu.org)
87mu2dli8q.fsf@gnu.org
severity 42740 serious
quit
L
L
Ludovic Courtès wrote on 29 Aug 2020 15:31
Re: bug#42740: Segfault in libssh during ‘guix c opy’
(name . Artyom Poptsov)(address . poptsov.artyom@gmail.com)
874kollgst.fsf@gnu.org
Hi Artyom!

Artyom Poptsov <poptsov.artyom@gmail.com> skribis:

Toggle quote (13 lines)
> please check if this branch will work without segfaults in Guix:
> https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-segfaults-on-gc
>
> Key changes:
>
> - Channels are now protecting the parent session from GC'ing.
>
> - Every channel procedure now ensures that the parent session is
> connected before calling any libssh procedures upon a channel
> instance. The idea is that a channel cannot be created when a session
> is disconnected and when channel is present and the session is closed,
> it means that the session is disconnected and freed.

Looks like the problem is still there, after all:

Toggle snippet (104 lines)
$ guix describe
Generacio 154 Aug 29 2020 14:49:14 (nuna)
guix 0ec6b8a
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 0ec6b8afd7e7a6c288fbf48c5779f2e0bdaffb55
$ guix copy --to=olimex coreutils-minimal
sending 3 store items (86 MiB) to 'A20-OLinuXino.local'...
Adres-eraro(nekropsio el?utita)
$ gdb $(type -P guile) core

[...]

Core was generated by `/gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.0.4/bin/guile --no-auto-com'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Unexpected size of section `.reg-xstate/25533' in core file.
#0 0x00007f1ba90e4185 in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
[Current thread is 1 (Thread 0x7f1baefb9b80 (LWP 25533))]
(gdb) bt
#0 0x00007f1ba90e4185 in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#1 0x00007f1ba90e653d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#2 0x00007f1ba89b1b4a in gzip_compress (session=session@entry=0x12a4b20, source=source@entry=0x12a5580,
level=<optimized out>) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:91
#3 0x00007f1ba89b1e83 in compress_buffer (session=session@entry=0x12a4b20, buf=0x12a5580)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
#4 0x00007f1ba898eb5f in packet_send2 (session=session@entry=0x12a4b20)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632
#5 0x00007f1ba898ec32 in ssh_packet_send (session=session@entry=0x12a4b20)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810
#6 0x00007f1ba8978639 in channel_write_common (channel=0x12b0e90, data=0x7f1b9dba7020, len=65536, is_stderr=0)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1488
#7 0x00007f1ba89fce7a in write_to_channel_port ()
from /gnu/store/hw2wb78q8zxza1p1kdi8bffdbi1hb19n-guile-ssh-0.13.1/lib/libguile-ssh.so.13
#8 0x00007f1baf67eedc in scm_i_write_bytes (port=#<port #<port-type channel 7f1baaa1c6c0> 7f1ba7f25300>,
src="#<vu8vector>" = {...}, start=0, count=65536) at ports.c:2865
#9 0x00007f1baf68686f in scm_put_bytevector (port=#<port #<port-type channel 7f1baaa1c6c0> 7f1ba7f25300>,
bv="#<vu8vector>" = {...}, start=<optimized out>, count=<optimized out>) at r6rs-ports.c:676

[...]

(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f1baefb9b80 (LWP 25533) 0x00007f1ba90e4185 in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
2 Thread 0x7f1baec93700 (LWP 25534) warning: Unexpected size of section `.reg-xstate/25534' in core file.
0x00007f1baf56094c in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x7f1baf5b86e8 <mark_cv+40>) at ../sysdeps/nptl/futex-internal.h:183
3 Thread 0x7f1bac9d0700 (LWP 25537) warning: Unexpected size of section `.reg-xstate/25537' in core file.
0x00007f1ba90e479f in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
4 Thread 0x7f1bae302700 (LWP 25535) warning: Unexpected size of section `.reg-xstate/25535' in core file.
0x00007f1baf56094c in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x7f1baf5b86e8 <mark_cv+40>) at ../sysdeps/nptl/futex-internal.h:183
5 Thread 0x7f1baa6f9700 (LWP 25538) warning: Unexpected size of section `.reg-xstate/25538' in core file.
0x00007f1baf5640a4 in __libc_read (fd=10, buf=buf@entry=0x7f1baa6f8660,
nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:26
6 Thread 0x7f1bad971700 (LWP 25536) warning: Unexpected size of section `.reg-xstate/25536' in core file.
0x00007f1baf56094c in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x7f1baf5b86e8 <mark_cv+40>) at ../sysdeps/nptl/futex-internal.h:183
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f1bac9d0700 (LWP 25537))]
#0 0x00007f1ba90e479f in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
(gdb) bt
#0 0x00007f1ba90e479f in deflate_fast ()
from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#1 0x00007f1ba90e653d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1
#2 0x00007f1ba89b1b4a in gzip_compress (session=session@entry=0x12a4b20, source=source@entry=0x12a5580,
level=<optimized out>) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:91
#3 0x00007f1ba89b1e83 in compress_buffer (session=session@entry=0x12a4b20, buf=0x12a5580)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112
#4 0x00007f1ba898eb5f in packet_send2 (session=session@entry=0x12a4b20)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632
#5 0x00007f1ba898ec32 in ssh_packet_send (session=session@entry=0x12a4b20)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810
#6 0x00007f1ba897a178 in ssh_channel_send_eof (channel=channel@entry=0x12b0930)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1250
#7 0x00007f1ba897a23b in ssh_channel_close (channel=0x12b0930)
at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1301
#8 0x00007f1ba89fcc36 in ptob_close ()
from /gnu/store/hw2wb78q8zxza1p1kdi8bffdbi1hb19n-guile-ssh-0.13.1/lib/libguile-ssh.so.13
#9 0x00007f1baf67c153 in release_port (port=#<port #<port-type channel 7f1baaa1c6c0> 7f1ba8e73400>)
at ports.c:165
#10 0x00007f1baf67f19b in close_port (port=#<port #<port-type channel 7f1baaa1c6c0> 7f1ba8e73400>,
explicit=<optimized out>) at ports.c:893
#11 0x00007f1baf63632a in scm_c_with_exception_handler (type=type@entry=#t,
handler=handler@entry=0x7f1baf6ad7e0 <catch_post_unwind_handler>,
handler_data=handler_data@entry=0x7f1bac9cf970, thunk=thunk@entry=0x7f1baf6ad920 <catch_body>,
thunk_data=thunk_data@entry=0x7f1bac9cf970) at exceptions.c:170
#12 0x00007f1baf6adb1d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f1baf67f200 <do_close>,
body_data=<optimized out>, handler=<optimized out>, handler_data=handler_data@entry=0x0,
pre_unwind_handler=pre_unwind_handler@entry=0x0, pre_unwind_handler_data=0x0) at throw.c:168
#13 0x00007f1baf6adb3e in scm_internal_catch (tag=tag@entry=#t, body=body@entry=0x7f1baf67f200 <do_close>,
body_data=<optimized out>, handler=<optimized out>, handler_data=handler_data@entry=0x0) at throw.c:177
#14 0x00007f1baf67ad84 in finalize_port (ptr=<optimized out>, data=<optimized out>) at ports.c:710
#15 0x00007f1baf58a6ef in GC_invoke_finalizers ()
from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1
#16 0x00007f1baf63ee79 in scm_run_finalizers () at finalizers.c:399
#17 0x00007f1baf63eefd in finalization_thread_proc (unused=<optimized out>) at finalizers.c:234

So we have the finalization thread closing a channel of session
0x12a4b20 (which causes a write on the channel), and the main thread
writing to a channel of that same session. This is exactly what I

AIUI, that means there’s one output compression buffer per session,
and it’s not thread-safe (in Guile 2.2 finalizers are called from a
separate thread.)

I think the fix, in Guile-SSH, is to associate each libssh object
(session, channel, etc.) with a mutex, and to protect all uses of the
libssh object by that mutex.

Artyom, WDYT? Do you think you could take a look into that?

In the meantime, I’ll look for the origin of the channel port that’s not
explicitly closed and see if we can work around it.

Ludo’.
L
L
Ludovic Courtès wrote on 29 Aug 2020 16:31
(name . Artyom Poptsov)(address . poptsov.artyom@gmail.com)
87h7sljzgn.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (18 lines)
> So we have the finalization thread closing a channel of session
> 0x12a4b20 (which causes a write on the channel), and the main thread
> writing to a channel of that same session. This is exactly what I
> described at <https://issues.guix.gnu.org/26976#11>:
>
> AIUI, that means there’s one output compression buffer per session,
> and it’s not thread-safe (in Guile 2.2 finalizers are called from a
> separate thread.)
>
> I think the fix, in Guile-SSH, is to associate each libssh object
> (session, channel, etc.) with a mutex, and to protect all uses of the
> libssh object by that mutex.
>
> Artyom, WDYT? Do you think you could take a look into that?
>
> In the meantime, I’ll look for the origin of the channel port that’s not
> explicitly closed and see if we can work around it.

I’ve pushed this change on our side to explicitly close channels and
sessions:


This workaround is similar to that of 2017:


It’s really just a workaround so I think we should fix the core issue in
Guile-SSH (or libssh) so it doesn’t pop up again next month—it’s hard to
ensure code that opens a channel explicitly closes it.

Anyway, I would welcome tests using ‘guix copy’, ‘guix deploy’, and
offloading. (For offloading, make sure to run the daemon from your
build tree.)

Ludo’.
M
M
Maxim Cournoyer wrote on 1 Sep 2020 15:56
(name . Ludovic Courtès)(address . ludo@gnu.org)
87eenlzjkn.fsf@gmail.com
Hi Ludovic and Artyom,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (33 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> So we have the finalization thread closing a channel of session
>> 0x12a4b20 (which causes a write on the channel), and the main thread
>> writing to a channel of that same session. This is exactly what I
>> described at <https://issues.guix.gnu.org/26976#11>:
>>
>> AIUI, that means there’s one output compression buffer per session,
>> and it’s not thread-safe (in Guile 2.2 finalizers are called from a
>> separate thread.)
>>
>> I think the fix, in Guile-SSH, is to associate each libssh object
>> (session, channel, etc.) with a mutex, and to protect all uses of the
>> libssh object by that mutex.
>>
>> Artyom, WDYT? Do you think you could take a look into that?
>>
>> In the meantime, I’ll look for the origin of the channel port that’s not
>> explicitly closed and see if we can work around it.
>
> I’ve pushed this change on our side to explicitly close channels and
> sessions:
>
> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=61fe9ced7da7eefceb931af0cb7363b721f5bdd6
>
> This workaround is similar to that of 2017:
>
> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=8e469b67f95cfe5b95405b503b8ee315fdf8ce66
>
> It’s really just a workaround so I think we should fix the core issue in
> Guile-SSH (or libssh) so it doesn’t pop up again next month—it’s hard to
> ensure code that opens a channel explicitly closes it.

Do you think the issue lies in guile-ssh or in libssh itself? Sorry for
not having caught these problems earlier; it seemed to work reliably
when I last tested it.

Toggle quote (4 lines)
> Anyway, I would welcome tests using ‘guix copy’, ‘guix deploy’, and
> offloading. (For offloading, make sure to run the daemon from your
> build tree.)

While attempting to use offload on the core-updates branch, I
encountered stalls and file errors, but with your patch it seems to work
reliable (it's been offloading builds for the last 15 minutes or so
without interruption).

So your workaround fixes seem to work as intended.

I also agree that it'd be much nicer and future proof if we could fix
the root issue.

Thanks!

Maxim
L
L
Ludovic Courtès wrote on 1 Sep 2020 22:57
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87y2lt6wq1.fsf@gnu.org
Hello!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (2 lines)
> Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (17 lines)
>> I’ve pushed this change on our side to explicitly close channels and
>> sessions:
>>
>> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=61fe9ced7da7eefceb931af0cb7363b721f5bdd6
>>
>> This workaround is similar to that of 2017:
>>
>> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=8e469b67f95cfe5b95405b503b8ee315fdf8ce66
>>
>> It’s really just a workaround so I think we should fix the core issue in
>> Guile-SSH (or libssh) so it doesn’t pop up again next month—it’s hard to
>> ensure code that opens a channel explicitly closes it.
>
> Do you think the issue lies in guile-ssh or in libssh itself? Sorry for
> not having caught these problems earlier; it seemed to work reliably
> when I last tested it.

No worries; these issues are non-deterministic and we can spend months
without noticing.

Perhaps libssh is documented as not being thread-safe, in which case
it’s up to its users to do the right thing. In that case, the problem
would be in Guile-SSH, since in Guile 2.2/3.0 finalizers can be called
from a separate thread.

It’s also probably easier to fix in Guile-SSH (technically & socially).

Toggle quote (11 lines)
>> Anyway, I would welcome tests using ‘guix copy’, ‘guix deploy’, and
>> offloading. (For offloading, make sure to run the daemon from your
>> build tree.)
>
> While attempting to use offload on the core-updates branch, I
> encountered stalls and file errors, but with your patch it seems to work
> reliable (it's been offloading builds for the last 15 minutes or so
> without interruption).
>
> So your workaround fixes seem to work as intended.

Good, thanks for testing. We should update the ‘guix’ package again
soonish.

Ludo’.
L
L
Ludovic Courtès wrote on 2 Sep 2020 23:19
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87mu273mhh.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (3 lines)
> Good, thanks for testing. We should update the ‘guix’ package again
> soonish.

Done in commit eea36f8ecae818d1eed1a115ec6ff6b4db4a13f4. I’ve just
reconfigured berlin, which is now running this commit. Let’s monitor
and hopefully we’ll see fewer offloading-related build failures!

Ludo’.
M
M
Maxim Cournoyer wrote on 15 Sep 2020 04:05
(name . Ludovic Courtès)(address . ludo@gnu.org)
875z8f4ww5.fsf@gmail.com
Hello,

What is the manifestation of the current guile-ssh remaining problems?

Is it like the error message below?

Toggle snippet (3 lines)
guix offload: error: corrupt input while restoring archive from #<input: string 7ff371d9e7e0>

I just had this today during guix offload.

Maxim
L
L
Ludovic Courtès wrote on 15 Sep 2020 10:14
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87tuvz8ni8.fsf@gnu.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (2 lines)
> What is the manifestation of the current guile-ssh remaining problems?

To me, the Guile-SSH problems were successfully worked around on our
side with 61fe9ced7da7eefceb931af0cb7363b721f5bdd6. It could be that
you found an issue we didn’t address.

Toggle quote (6 lines)
> Is it like the error message below?
>
> guix offload: error: corrupt input while restoring archive from #<input: string 7ff371d9e7e0>
>
> I just had this today during guix offload.

Hmm. Is it reproducible? Can you show the 10-or-so lines above, to see
if the build completed, whether the error happened while sending build
results, etc.?

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 15 Sep 2020 15:48
(name . Ludovic Courtès)(address . ludo@gnu.org)
87h7rz1783.fsf@gmail.com
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> What is the manifestation of the current guile-ssh remaining problems?
>
> To me, the Guile-SSH problems were successfully worked around on our
> side with 61fe9ced7da7eefceb931af0cb7363b721f5bdd6. It could be that
> you found an issue we didn’t address.
>
>> Is it like the error message below?
>>
>> guix offload: error: corrupt input while restoring archive from #<input: string 7ff371d9e7e0>
>>
>> I just had this today during guix offload.
>
> Hmm. Is it reproducible? Can you show the 10-or-so lines above, to see
> if the build completed, whether the error happened while sending build
> results, etc.?

Sorry for the noise, I cannot reproduce after restarting guix-daemon. I
guess I just hadn't restarted it in a long time (the recent testing I
did for the fix was with a locally started daemon in my guix checkout).

Thank you,

Maxim
?