From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 04 10:21:51 2020 Received: (at 31785) by debbugs.gnu.org; 4 Nov 2020 15:21:52 +0000 Received: from localhost ([127.0.0.1]:50270 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kaKbf-00030L-Kk for submit@debbugs.gnu.org; Wed, 04 Nov 2020 10:21:51 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60520) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kaKbd-000308-Sq for 31785@debbugs.gnu.org; Wed, 04 Nov 2020 10:21:50 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]:41337) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kaKbW-0008Oq-US for 31785@debbugs.gnu.org; Wed, 04 Nov 2020 10:21:44 -0500 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=56034 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kaKbV-0007OP-DN; Wed, 04 Nov 2020 10:21:41 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: 31785@debbugs.gnu.org Subject: Re: bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadlock References: <87602ph0yv.fsf@gnu.org> Date: Wed, 04 Nov 2020 16:21:39 +0100 In-Reply-To: <87602ph0yv.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Mon, 11 Jun 2018 16:06:16 +0200") Message-ID: <87361p9mgs.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 31785 Cc: Mathieu Othacehe X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > This comes from the fact that =E2=80=98LocalStore::buildPaths=E2=80=99 ta= kes the > user-supplied derivation list as is, without sorting it, and then > acquires locks in that order in =E2=80=98Worker::run=E2=80=99. This diagnostic is incorrect: =E2=80=98Goals=E2=80=99 is a set sorted accor= ding to =E2=80=98CompareGoalPtrs=E2=80=99, which is lexical sort that arranges so s= ubstitution goals come before derivation goals. Thus, =E2=80=98_topGoals=E2=80=99 and = =E2=80=98awake2=E2=80=99 in Worker::run are sorted in a deterministic fashion. The problem is that =E2=80=98Worker::waitForAWhile=E2=80=99 reshuffles the = order of goals by temporarily moving goals out of the way. This can happen when offloading replies =E2=80=9Cpostpone=E2=80=9D, which is inherently non-dete= rministic (which goals are put to sleep will vary from one session to another session.) When those goals are eventually woken up from =E2=80=98Worker::waitForInput= =E2=80=99, they=E2=80=99re reprocessed, in sorted order, but potentially with =E2=80= =9Choles=E2=80=9D compared to other =E2=80=98guix-daemon=E2=80=99 processes. That=E2=80=99s only a partial explanation; we need to go further to come up= with an actual deadlock scenario. Ludo=E2=80=99.