From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 26 17:12:00 2020 Received: (at 42162) by debbugs.gnu.org; 26 Aug 2020 21:12:00 +0000 Received: from localhost ([127.0.0.1]:40981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kB2i8-0001u0-3m for submit@debbugs.gnu.org; Wed, 26 Aug 2020 17:12:00 -0400 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:57449) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kB2i6-0001to-If for 42162@debbugs.gnu.org; Wed, 26 Aug 2020 17:11:59 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 952B91653; Wed, 26 Aug 2020 17:11:52 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Wed, 26 Aug 2020 17:11:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=AxLkDrBOJnmHcfRv+GNc3Kv+MFRjakCXWZGBhdAJN dw=; b=gUVFRJr9Fp8QM8wQyf7nGI086/PzndWc0KJaGAfmzmO/GBTdrcbT74q9d 2ovINT/9o0Yz9/GfSPO1FaK7ryK0L/RG9bxoLgpLCAnvhWhArAaCfkavbl4fUv22 Bi/NClGE+n7xPjUP+lUYkSDtuPUNK2yLSBn8voLhSB19Mo2nR3jMFUekQvIQSktV YjCOe02NRT1seg6i8IO9reajNKM06hxZzmf6iHjvrumbcqgBaBfS0gYoF8DglXwp n0QhNcQD4zOEf6aHNDJxPSHcNirZrcm/lLFZYBRkeQ66bxSKgNvNCcwY/mjFWd48 u2jJ7Zm/BnwWe3RLflBtiI1ystw2w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedruddvvddgudehhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufhffjgfkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhm ohhthhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucggtf frrghtthgvrhhnpefhtefhiedvtdeftdffvdehkeejhedvvdetuedtvdefgedtuedujeel ueetvdektdenucffohhmrghinhepghhnuhdrohhrghdpshhofhhtfigrrhgvhhgvrhhith grghgvrdhorhhgnecukfhppeejgedrudduiedrudekiedrgeegnecuvehluhhsthgvrhfu ihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepshgrmhhplhgvthesnhhghihroh drtghomh X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA id 849C330600A3; Wed, 26 Aug 2020 17:11:51 -0400 (EDT) From: Timothy Sample To: zimoun Subject: Re: bug#42162: Recovering source tarballs References: <87mu4iv0gc.fsf@inria.fr> <86h7uq8fmk.fsf@gmail.com> <87d05etero.fsf@gnu.org> <87r1tit5j6.fsf_-_@gnu.org> <875za4ykej.fsf@ngyro.com> <86blixyb7c.fsf@gmail.com> Date: Wed, 26 Aug 2020 17:11:50 -0400 In-Reply-To: <86blixyb7c.fsf@gmail.com> (zimoun's message of "Wed, 26 Aug 2020 12:04:55 +0200") Message-ID: <87k0xlaz8p.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 42162 Cc: 42162@debbugs.gnu.org, Maurice =?utf-8?Q?Br=C3=A9mond?= , Ludovic =?utf-8?Q?Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hi zimoun, zimoun writes: > One question is how this database scales? > > For example, a quick back-to-envelop estimation leads to ~1.2GB metadata > for ~14k packages and then an increase of ~700MB per year, both with the > Ludo=E2=80=99s code [1]. > > [1] It=E2=80=99s a good question. A good part of the size comes from the representation rather than the data. Compression helps a lot here. I have a database of 3,912 packages. It=E2=80=99s 295M uncompressed (which i= s a little better than your estimation). If I pass each file through Lzip, it shrinks down to 60M. That=E2=80=99s more like 15.5K per package, which = is almost an order of magnitude smaller than the estimation you used (120K). I think that makes the numbers rather pleasant, but it comes at the expense of easy storing in Git. > As mentioned [2], should this service be part of SWH (download cooking > task)? Or project side? > > [2] It would be interesting to just have SWH absorb the project. Since other distros already know how to produce a =E2=80=9Csources.json=E2=80=9D = and how to query the SWH archive, it would mean that they benefit for free (and so would Guix, for that matter). I=E2=80=99m open to that, but right now havi= ng the freedom to experiment is important. -- Tim