That is, in a way that tightly controls which assets that they pull in so that things can be easily rebuilt without needing internet access, for situations including air-gapped networks, or simply a desire to not inadvertently bring in upstream changes while trying to make a small tweak to a private script.
It works by starting a local HTTP/HTTPS proxy server, then starting a subprocess with appropriate environment variables and certificate files set. It has special support for passing these into the RUN context for building of images, so that existing Dockerfiles can be used without modification.
Let me know what you think.
compressedgas•6mo ago
Why does it not retain the request headers and response headers and support more than one response per URL as an proper archiving proxy would?
aeijdenberg•6mo ago
The intent was to support basic build systems accessing package eco-systems that tend to always serve the same response for the same URL.
Docker registries do this reasonably well, as do Maven repos.
It wasn't intended to be a full on proper archiving proxy (and I'll admit I hadn't heard that term - I'll look into it and see what else exists in that space).
The main use-case I had in mind for this is private projects, that are developed on workstations which have internet access, but are deployed to other environments using CI/CD systems with less network access. If both systems have access to a common blob store, then that can be populated with htvend build on a workstation and replayed at build time with htvend offline.
For that, there's no need to capture additional request information, as the focus wasn't to support getting a manifest file and being able to reliably re-download all the blobs from internet (and often those responses may have changed in the interim). And for the same reason, would expect to only need to one response per URL, per assets.json file.
Does that make sense?