From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 20 11:52:28 2020 Received: (at 42162) by debbugs.gnu.org; 20 Jul 2020 15:52:29 +0000 Received: from localhost ([127.0.0.1]:35887 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jxY5c-0002a2-Gw for submit@debbugs.gnu.org; Mon, 20 Jul 2020 11:52:28 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:39308) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jxY5a-0002Zn-RJ for 42162@debbugs.gnu.org; Mon, 20 Jul 2020 11:52:27 -0400 Received: by mail-qt1-f193.google.com with SMTP id w9so823405qts.6 for <42162@debbugs.gnu.org>; Mon, 20 Jul 2020 08:52:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=6yY2+ItFn799SwQP0b0qpZo6OSFqO3GENHFW05g8sSM=; b=ObApXvrb6JfdIOuO2Yd6kHhOnqFDxqFA5OR4KSPxQA2YxTglMnWr4XtShI+p5Fmr/F wteauilSVlS+BDM69DCA7q/yoz+/0VS1fECQDh3Yz1JN1WJ8GU/3B88ARsUfMZct2Zpv CvB4lAKmCgX9ZsN7JYmihj+yOX/UDQkkg06Aa3Ol68wR9AyZ7W9b8DClRIHw13Ci2r6n 28RUWH8DBBcQBBj5l5yAD926dUHeGRtYqnZq87ySu3Rek/YVnXTrZSd3nWhgkmBVlL2f OA1uY2ZxZIL2TKwM0qYytu6fvUxhMo4dgCHLIQxB276Vph06WNdbCYTKcWpYSFklFMKw zXMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=6yY2+ItFn799SwQP0b0qpZo6OSFqO3GENHFW05g8sSM=; b=FpQSHDzgRuLNYcrduo869TKFX5mD3YFVXXyNhwM/Tm3j0bXJb6Y+GnJdHKzivQRSYZ JL4kvI0K/rUtMiYtIqR6i56Yg6G4HHCiGs3DED+RFvXOBCsDjLycH8DGpbzaJphy8E3P U+59ih0waWDTs9o1RInzxbG7Od4vO0dR3DXzFcOtz29gv2QCwKpxeQ1T97ZHlAc/3jGf McrweSqWsUdRGNZnzZct0lOnlX6mseBA03qN0Fyh/FUFm4FsDFWZoToRu9kCjJHH8OH8 W9diTtHM2/v/e5VRwD2b1CtssZfqG0YArLCyTHBLBNtmaY3+7cUv1Bvj2eFKcUI7Y+gI jgbg== X-Gm-Message-State: AOAM531opwisS+U+XH7J5nrbKMzA/cdfFr18hx5gZG/7OWs0gEMsIYby 6EbgW5QRJ1xUXO+6INfca/lN6XAAQCTOd5IXGMc= X-Google-Smtp-Source: ABdhPJzXsR8UaDNJsfDMd3h34l/gx1x8kbGUM4K3u+a/Fp86UjmqpatewgKU5TS9xe3VH3XHvez6Q/xfINJx9ZL6Yx4= X-Received: by 2002:aed:34e2:: with SMTP id x89mr37227qtd.313.1595260341047; Mon, 20 Jul 2020 08:52:21 -0700 (PDT) MIME-Version: 1.0 References: <87mu4iv0gc.fsf@inria.fr> <86h7uq8fmk.fsf@gmail.com> <87d05etero.fsf@gnu.org> <87r1tit5j6.fsf_-_@gnu.org> <87365mzil1.fsf@gnu.org> In-Reply-To: <87365mzil1.fsf@gnu.org> From: zimoun Date: Mon, 20 Jul 2020 17:52:09 +0200 Message-ID: Subject: Re: Recovering source tarballs To: =?UTF-8?Q?Ludovic_Court=C3=A8s?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 42162 Cc: 42162@debbugs.gnu.org, =?UTF-8?Q?Maurice_Br=C3=A9mond?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi, On Mon, 20 Jul 2020 at 10:39, Ludovic Court=C3=A8s wrote: > zimoun skribis: > > On Sat, 11 Jul 2020 at 17:50, Ludovic Court=C3=A8s wrote= : > There are many many comments in your message, so I took the liberty to > reply only to the essence of it. :-) Many comments because many open topics. ;-) > However, the two examples above are good ideas as to the way forward: we > could start a url-fetch-to-git-fetch migration in these two cases, and > perhaps more. Well, to be honest, I have tried to probe such migration when I opened this thread: https://lists.gnu.org/archive/html/guix-devel/2020-05/msg00224.html and I have tried to summarized the pros/cons arguments here: https://lists.gnu.org/archive/html/guix-devel/2020-05/msg00448.html > > What about in addition push to IPFS? Feasible? Lookup issue? > > Lookup issue. :-) The hash in a CID is not just a raw blob hash. > Files are typically chunked beforehand, assembled as a Merkle tree, and > the CID is roughly the hash to the tree root. So it would seem we can=E2= =80=99t > use IPFS as-is for tarballs. Using the Git-repo map/table, then it becomes an option, right? Well, SWH would be a backend and IPFS could be another one. Or any "cloudy" storage system that could appear in the future, right? > >> =E2=80=A2 If we no longer deal with tarballs but upstreams keep sign= ing > >> tarballs (not raw directory hashes), how can we authenticate our > >> code after the fact? > > > > Does Guix automatically authenticate code using signed tarballs? > > Not automatically; packagers are supposed to authenticate code when they > add a package (=E2=80=98guix refresh -u=E2=80=99 does that automatically)= . So I miss the point of having this authentication information in the future where upstream has disappeared. The authentication is done at packaging time. So once it is done, merged into master and then pushed to SWH, being able to authenticate again does not really matter. And if it matters, all should be updated each time vulnerabilities are discovered and so I am not sure SWH makes sense for this use-case. > But today, we store tarball hashes, not directory hashes. We store what "guix hash" returns. ;-) So it is easy to migrate from tarball hashes to whatever else. :-) I mean, it is "(sha256 (base32" and it is easy to have also "(sha256-tree (base32" or something like that. In the case where the integrity is also used as lookup key. > > The format of metadata (disassemble) that you propose is schemish > > (obviously! :-)) but we could propose something more JSON-like. > > Sure, if that helps get other people on-board, why not (though sexps > have lived much longer than JSON and XML together :-)). Lived much longer and still less less less used than JSON or XML alone. ;-) I have not done yet the clear back-to-envelop computations. Roughly, there are ~23 commits on average per day updating packages, so say 70% of them are url-fetch, it is ~16 new tarballs per day, on average. How the model using a Git-repo will scale? Because, naively the output of "disassemble-archive" in full text (pretty-print format) for the hello-2.10.tar is 120KB and so 16*365*120K =3D ~700Mb per year without considering all the Git internals. Obviously, it depends on the number of files and I do not know if hello is a representative example. And I do not know how Git operates on binary files if the disassembled tarball is stored as .go file, or any other. All the best, simon ps: Just if someone wants to check from where I estimate the numbers. --8<---------------cut here---------------start------------->8--- for ci in $(git log --after=3Dv1.0.0 --oneline \ | grep "gnu:" | grep -E "(Add|Update)" \ | cut -f1 -d' ') do git --no-pager log -1 $ci --format=3D"%cs" done | uniq -c > /tmp/commits guix environment --ad-hoc r-minimal \ -- R -e 'summary(read.table("/tmp/commits"))' gzip -dc < $(guix build -S hello) > /tmp/hello.tar guix repl -L /tmp/tar/ scheme@(guix-user)> (call-with-input-file "hello.tar" (lambda (port) (disassemble-archive port))) --8<---------------cut here---------------end--------------->8---