Furthermore, it uses it in (guix git-download) to download code from SWH when it is unavailable upstream and on our servers. This bit relies on the “vault” API of SWH, which allows you to fetch a checkout as a tarball. Not all revisions are readily available as tarballs, understandably, so the vault API has a mechanism that allows you to request the “cooking” of a specific checkout. Cooking is asynchronous and can take some time.
When downloading over SWH, the ‘swh-download’ procedure first resolves the tag (if it’s a tag), then tries to download the corresponding tarball from the vault. If the vault doesn’t have it yet, it sends a cooking request and waits for it to complete by periodically checking the cooking status.
In the future, we should provide a “lister” and “loader” so that SWH can regularly obtain a list of Guix packages with their source URL and commit/tag:
The SWH team is also considering pre-cooking all VCS tags such that every time we refer to a tag, we can be sure its contents are already available in the vault:
Ludovic Courtès (2): Add (guix swh). git-download: Download from Software Heritage as a last resort.
> When downloading over SWH, the ‘swh-download’ procedure first resolves > the tag (if it’s a tag), then tries to download the corresponding tarball
Speaking of tags, it’s not news but tags are bad from a reproducibility standpoint: they are mutable and per-repository. Tag lookup is necessarily relative to a repository URL (and to a snapshot of the repository, since it can be mutated):
So if, say, SWH archived a mirror of <https://git.savannah.gnu.org/git/guix.git> but not <https://git.savannah.gnu.org/git/guix.git> itself, then tag lookup will fail, which is sad given that the code is actually there.
To address this, possible options include:
1. Always store commit IDs rather than tags, effectively giving us “normal” Git content-addressability. This is not great for code readability and review though.
2. Store ‘sha1_git’ hashes (SHA1s of Git trees) instead of or in addition to nar sha256 hashes so we can perform lookups by content hash on SWH or Git mirrors.
#2 might be the best long-term option though it would require daemon support to compute, store, and check these Git-style hashes.