Cache Go module downloads in a persistent named cache (GOMODCACHE)#23424
Open
rdeknijf wants to merge 1 commit into
Open
Cache Go module downloads in a persistent named cache (GOMODCACHE)#23424rdeknijf wants to merge 1 commit into
rdeknijf wants to merge 1 commit into
Conversation
Previously every sandbox received a fresh GOMODCACHE, causing all
third-party modules to be re-downloaded from the network on every cold
build. This makes third-party module downloads use the `go_mod_cache`
named cache as an accelerator: the download process writes modules into
the shared cache and copies them into a sandbox-local `gopath/pkg/mod`
tree before capture. Captured digests remain the source of truth, so
results are byte-identical whether the cache is warm or cold.
Key design choices:
- Download and copy happen in a single process (the `__PANTS_GO_FETCH_MODULE`
mode in `__run_go.sh`) to satisfy the self-healing invariant: if the
named cache is wiped between steps, the process that captures `gopath/`
must itself be able to re-populate it. A two-process shape would violate
this invariant.
- GOMODCACHE is set to `__gomodcache` (sibling of `gopath/`, outside the
captured tree) when `use_module_cache=True`. `output_directories=("gopath",)`
can therefore never accidentally walk into the shared cache.
- `-modcacherw` is injected via `GOFLAGS` so Pants can prune or clear the
named cache without `rm: cannot remove ... Permission denied` errors.
- Only the two `allow_downloads=True` processes in `third_party_pkg.py`
get the named cache. Compile/link/vet processes run `GOPROXY=off` against
captured digests and must not see a shared cache that could silently satisfy
a missing input.
- The fetch mode uses only shell builtins and /bin tools (no grep/sed/PATH assumptions), emits the module metadata on stdout and propagates the
exit code of `go mod download`, so failures surface through the normal
fallible-process path and the engine never materializes the module tree
just to read the metadata.
- Remote execution without `--remote-execution-append-only-caches-base-path`
degrades gracefully: the cache mount is absent, `mkdir -p $GOMODCACHE`
creates a plain sandbox-local directory, and behavior is identical to today.
- `ModuleDescriptors.go_mods_digest` is removed (zero consumers, verified).
One-time invalidation: the `__run_go.sh` script change invalidates all
previously cached Go process results.
See pantsbuild#13390.
Contributor
Author
|
Confirming the description's framing with a real-repo measurement. On a 24-module monorepo A/B (warm daemon and store, remote cache off), cold wall time is flat, exactly as the description says: on a fast connection, downloading was never the slow part. This PR only touches the two download processes, so it does not affect compile, link, or the warm incremental build at all. The win remains cold CI and slow or rate-limited proxies, where a cold build otherwise pays the full download set every time. Methodology and the per-lever breakdown are on #20274. |
This was referenced Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Disclaimer: Like my previous Go PR, I'm still not primarily a golang developer. However, the fact that Golang doesn't work properly in Pants has been the bane of my existence for like 2 years now. The moment that Mythos/Fable dropped I immediately had it dig deeply into whether the whole road to proper-golang-in-pants could be opened up. So this is all by Claude Code with Fable 5 (xhigh), with consults to GPT 5.5 (xhigh) and Gemini 3.1 Pro. I've had it check and recheck, I ran many different roles over it, and I had it explain and re-explain it to me, and then I checked myself.
So, as much as I dislike AI slop and am worried about AI PR overload, I've done my very best to avoid exactly that while still using AI. I hope we can get this road unblocked.
The rest is Fable talking:
Every Go sandbox currently gets a fresh GOMODCACHE, so a cold build re-downloads every third-party module from the network, and so does any change that invalidates the cached download processes. This is #13390; I posted the design there in #13390 (comment) and benjyw signed off on it, deferring the Go specifics to tdyas.
This PR gives the two download processes in
third_party_pkg.py(the only Go processes that run withallow_downloads=True) ago_mod_cachenamed cache, used purely as a download accelerator. Captured digests remain the source of truth: modules are copied out of the shared cache into the sandbox'sgopath/pkg/modbefore capture, so process results are byte-identical whether the cache is warm or cold, and compile, link, and vet sandboxes never see the shared cache at all. They keep running withGOPROXY=offagainst captured digests, exactly as today.Mechanics:
__run_go.shrunsgo mod download -json, emits the metadata on stdout, and copies the extracted module and its go.mod out of the cache intogopath/pkg/mod. Keeping it one process preserves self-healing: wipe the named cache and the next run of the same process repopulates it.__gomodcache, a sibling of the capturedgopath/tree, sooutput_directories=("gopath",)can never capture the shared cache by accident.-modcacherwis added via GOFLAGS so the cache can be pruned with plainrm -rf.-modcacherw. The trust model of the shared cache is the same as the default~/go/pkg/modon a developer machine.--remote-execution-append-only-caches-base-paththe cache mount is absent, the fetch falls back to a sandbox-local directory, and behavior is identical to today.ModuleDescriptors.go_mods_digestis removed; it had no consumers.On a 3-go.mod / 206-module reproducer with a cold engine store, a warm module cache eliminates all 103 network downloads (the slowest module went from 8.6 s of network fetch to 3.1 s of local copy) and module-graph analysis (
go list -m) stops hitting the network entirely. Cold wall clock on that reproducer is unchanged within noise, because compile time dominates there and proxy.golang.org is fast from my machine. The practical win is cold CI builds and slow or rate-limited proxies, where today every cold build pays the full download set again.Tests: warm-vs-cold digest equality (which also catches absolute paths leaking into captured digests), named cache populated and writable after a download, and module analysis served from the cache. Release note in
docs/notes/2.33.x.md, including the one-time invalidation from the__run_go.shchange.Note: this touches the same
__run_go.shheredoc as #23420 (GOTOOLCHAIN pin); whichever lands second needs a trivial rebase.