ci(cache): re-add ccache as artifact-cache fallback#360
Merged
hedgar2017 merged 21 commits intomainfrom Apr 23, 2026
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reintroduces ccache as a fallback layer behind the existing SHA-keyed GitHub Actions artifact cache for LLVM/solc builds, and updates the cache warmup workflow to “touch” ccache entries so they don’t get evicted during quiet periods.
Changes:
- Add
hendrikmuhs/ccache-actionsetup +--ccache-variant=ccachefor LLVM and solc builds on artifact-cache miss. - Add “Touch LLVM/solc ccache” steps to
cache-warmup.yamljobs usingactions/cache/restorewithlookup-only: true. - Configure per-scope ccache size cap (
max-size: "10G").
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.github/workflows/cache-warmup.yaml |
Adds lookup-only cache restores to refresh ccache LRU for LLVM/solc warmup jobs. |
.github/actions/build-solc/action.yml |
Re-adds ccache setup/stats and passes --ccache-variant=ccache when artifact cache misses. |
.github/actions/build-llvm/action.yml |
Re-adds ccache setup/stats, defines per-config ccache keys, and passes --ccache-variant=ccache when artifact cache misses. |
hedgar2017
reviewed
Apr 21, 2026
nebasuke
added a commit
that referenced
this pull request
Apr 21, 2026
Five fixes from hedgar2017's review on #360: 1. Touch-step path fix (was a silent no-op). The `Touch {LLVM,solc} ccache` steps used `${{ runner.temp }}/ccache-{touch-llvm,touch-solc}`, but ccache-action saves with `path = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}`. GHA includes `path` in the cache version hash, so the Touch restore-keys prefix match never saw the real ccache entries. After 7 quiet days the cron would silently fail to refresh the LRU and the entries would age out. Fixed to the real paths across all six Touch steps. 2. solc ccache key missing `cmake-build-type`. `warm-solc` builds `RelWithDebInfo`; `warm-llvm-integration` builds `Release`. Same key → the two configs evict each other. Added the build-type component and the `-end` terminator (same rationale as a239dea) to the solc key and the two solc Touch restore-keys. 3. `Show ccache stats` → `continue-on-error: true` so a missing `ccache` binary (e.g. ccache-action install failed upstream) can't mask the real build failure in the step summary. 4. `CCACHE_*` env deduplication via YAML anchor (`&ccache-{llvm,solc}-env`) and `<<:` merge keys. Composite-action sibling steps don't share env, so four variables were duplicated three times per action, with no guardrail against drift. PyYAML verified the merge resolves to the same env sets previously written by hand. 5. `apt update` → `sudo apt-get update -qq`. Works in both root (current Docker container) and non-root (hosted runner) environments; `-qq` silences the default chatter.
13d56d5 to
08feef9
Compare
4 tasks
nebasuke
added a commit
that referenced
this pull request
Apr 22, 2026
Five fixes from hedgar2017's review on #360: 1. Touch-step path fix (was a silent no-op). The `Touch {LLVM,solc} ccache` steps used `${{ runner.temp }}/ccache-{touch-llvm,touch-solc}`, but ccache-action saves with `path = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}`. GHA includes `path` in the cache version hash, so the Touch restore-keys prefix match never saw the real ccache entries. After 7 quiet days the cron would silently fail to refresh the LRU and the entries would age out. Fixed to the real paths across all six Touch steps. 2. solc ccache key missing `cmake-build-type`. `warm-solc` builds `RelWithDebInfo`; `warm-llvm-integration` builds `Release`. Same key → the two configs evict each other. Added the build-type component and the `-end` terminator (same rationale as a239dea) to the solc key and the two solc Touch restore-keys. 3. `Show ccache stats` → `continue-on-error: true` so a missing `ccache` binary (e.g. ccache-action install failed upstream) can't mask the real build failure in the step summary. 4. `CCACHE_*` env deduplication via YAML anchor (`&ccache-{llvm,solc}-env`) and `<<:` merge keys. Composite-action sibling steps don't share env, so four variables were duplicated three times per action, with no guardrail against drift. PyYAML verified the merge resolves to the same env sets previously written by hand. 5. `apt update` → `sudo apt-get update -qq`. Works in both root (current Docker container) and non-root (hosted runner) environments; `-qq` silences the default chatter.
90df14b to
9659de6
Compare
nebasuke
added a commit
that referenced
this pull request
Apr 22, 2026
Five fixes from hedgar2017's review on #360: 1. Touch-step path fix (was a silent no-op). The `Touch {LLVM,solc} ccache` steps used `${{ runner.temp }}/ccache-{touch-llvm,touch-solc}`, but ccache-action saves with `path = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}`. GHA includes `path` in the cache version hash, so the Touch restore-keys prefix match never saw the real ccache entries. After 7 quiet days the cron would silently fail to refresh the LRU and the entries would age out. Fixed to the real paths across all six Touch steps. 2. solc ccache key missing `cmake-build-type`. `warm-solc` builds `RelWithDebInfo`; `warm-llvm-integration` builds `Release`. Same key → the two configs evict each other. Added the build-type component and the `-end` terminator (same rationale as a239dea) to the solc key and the two solc Touch restore-keys. 3. `Show ccache stats` → `continue-on-error: true` so a missing `ccache` binary (e.g. ccache-action install failed upstream) can't mask the real build failure in the step summary. 4. `CCACHE_*` env deduplication via YAML anchor (`&ccache-{llvm,solc}-env`) and `<<:` merge keys. Composite-action sibling steps don't share env, so four variables were duplicated three times per action, with no guardrail against drift. PyYAML verified the merge resolves to the same env sets previously written by hand. 5. `apt update` → `sudo apt-get update -qq`. Works in both root (current Docker container) and non-root (hosted runner) environments; `-qq` silences the default chatter.
9659de6 to
a444822
Compare
When the solx-llvm submodule is bumped the SHA-keyed artifact cache misses and every platform does a ~3.5h cold LLVM build. PR #246 removed ccache on the assumption that artifact cache hits made it redundant; that doesn't hold for the bump case we're now in. Restore the pre-removal ccache steps (commit 1320d1b), layered behind the existing artifact cache so ccache only runs on artifact-cache miss: - build-llvm/action.yml: define ccache key, install ccache, pass --ccache-variant=ccache to solx-dev, report --show-stats. - build-solc/action.yml: same pattern, separate ccache dir. - Cap max-size at 4G (was 10G) for a tighter cache budget. - Align ccache key schema with the current artifact-cache key (includes -no-assertions; matches the 4-config matrix introduced by #246). Expected: 3.5h cold builds drop to ~45-90min on submodule bumps; warm-cache fast path unchanged.
build-llvm and build-solc now wire ccache behind the artifact cache, but the gate (steps.<artifact>-cache.outputs.cache-hit != 'true') means that when the daily cache-warmup cron fires on an unchanged submodule SHA, the artifact cache hits, the composite skips its ccache steps, and the ccache entry's LRU timer is never refreshed. After 7 days of hits-only it gets evicted, and the next submodule bump finds ccache also cold. Add a Touch step after each build in cache-warmup.yaml that resolves the ccache entry by prefix via actions/cache/restore with lookup-only: true. The lookup alone resets the 7-day access timer without downloading the 2-4 GB entry.
4G is tight enough that a single full LLVM EVM-target build (~4000 TUs, ~2.4 GB of cached objects on average) plus one re-build with slightly different inputs can push ccache over the cap and trigger LRU eviction of still-useful entries, hurting hit rate on the next submodule bump. 10G (matching the pre-PR-#246 value) leaves plenty of headroom. Worst-case 13 scopes × 10G = ~130 GB on-disk (compressed: ~50-80 GB), well under the 150 GB repo quota.
ccache-action's restore-keys does prefix matching, and the previous key shape let shorter variant keys (e.g. `llvm-Linux-X64-RelWithDebInfo-mlir`) prefix-match longer ones (`...-mlir-coverage-no-assertions`). On Linux x64 where both dev and coverage warm-ups run, the newer entry would win and the ccache dir would be cross-restored with differently-compiled objects — zero hit rate plus cache churn. Append a literal `-end` terminator to every llvm ccache key and to the corresponding Touch restore-keys in cache-warmup.yaml. `build-solc`'s single-variant key is unaffected.
Two changes to validate ccache end-to-end on this branch without waiting for a post-merge submodule bump: 1. Disable the SHA-keyed artifact cache restore in build-llvm and build-solc (`if: false`). Forces the ccache path to run on every CI run. The matching Save steps are already guarded by `github.event_name != 'pull_request'`, so main's artifact cache is not polluted. 2. Flip ccache-action `save` from `github.event_name != 'pull_request'` to `true`. Lets this PR's runs populate ccache so a second run can restore from the first and `Show ccache stats` reports hit rate. Expected: first run cold (ccache miss, saves); second run warm (prefix match on `-end`-terminated key, high hit rate on LLVM/solc builds).
Five fixes from hedgar2017's review on #360: 1. Touch-step path fix (was a silent no-op). The `Touch {LLVM,solc} ccache` steps used `${{ runner.temp }}/ccache-{touch-llvm,touch-solc}`, but ccache-action saves with `path = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}`. GHA includes `path` in the cache version hash, so the Touch restore-keys prefix match never saw the real ccache entries. After 7 quiet days the cron would silently fail to refresh the LRU and the entries would age out. Fixed to the real paths across all six Touch steps. 2. solc ccache key missing `cmake-build-type`. `warm-solc` builds `RelWithDebInfo`; `warm-llvm-integration` builds `Release`. Same key → the two configs evict each other. Added the build-type component and the `-end` terminator (same rationale as a239dea) to the solc key and the two solc Touch restore-keys. 3. `Show ccache stats` → `continue-on-error: true` so a missing `ccache` binary (e.g. ccache-action install failed upstream) can't mask the real build failure in the step summary. 4. `CCACHE_*` env deduplication via YAML anchor (`&ccache-{llvm,solc}-env`) and `<<:` merge keys. Composite-action sibling steps don't share env, so four variables were duplicated three times per action, with no guardrail against drift. PyYAML verified the merge resolves to the same env sets previously written by hand. 5. `apt update` → `sudo apt-get update -qq`. Works in both root (current Docker container) and non-root (hosted runner) environments; `-qq` silences the default chatter.
The hosted `ubuntu-24.04` runner has ~14 GB free at start. A cold
LLVM+MLIR RelWithDebInfo build fills ~12 GB, and the new ccache adds
another ~1.1 GB — on this run cargo-checks tipped over with a SIGBUS
while linking llvm-opt-fuzzer ("no space left on device" moments later
during the ccache save). Linux x86 gnu happened to squeak by on the
same commit.
Reclaim ~24 GB of preinstalled host tooling solx doesn't use: .NET SDK,
Android SDK+NDK, Haskell (ghc + ghcup), CodeQL bundles. These live on
the runner VM's disk outside the container's view, so callers bind-mount
each host path into `/mnt/free-disk-space/<name>` via `container.volumes:`
and the composite action rms them from inside. The action refuses to
operate on anything outside that prefix, and skips missing or empty
mounts so a forgotten volume can't cause harm.
Wired into the six hosted-ubuntu-x64 jobs that cold-build LLVM:
- test.yaml::cargo-checks
- test.yaml::build-and-test (gated to containerized Linux legs)
- cache-warmup.yaml::warm-llvm (gated to containerized Linux legs)
- cache-warmup.yaml::warm-llvm-sanitizer
- cache-warmup.yaml::warm-llvm-coverage
- cache-warmup.yaml::warm-llvm-integration
Boost intentionally not removed — solc's `--build-boost` builds its own,
but the preinstalled `/usr/local/share/boost` headers may be pulled in
transitively by other tooling. CodeQL confirmed unused in this repo.
With ~38 GB free during build (vs ~14 GB before), the cold-build path
has a comfortable margin for the LLVM build, ccache, and future growth.
Deduplicates the five-entry `container.volumes:` list that was repeated on six jobs. YAML anchor-alias within each file collapses the four single-job cache-warmup definitions to one-liners and the two test.yaml jobs' definitions to the anchor + one alias. Cross-file sharing isn't possible (YAML anchors are per-document), so test.yaml and cache-warmup.yaml each carry their own anchor definition with a sync comment pointing at the other.
Run 24745149414 failed at action-load time with:
/home/runner/.../.github/actions/build-llvm/action.yml:
Anchors are not currently supported. Remove the anchor
'ccache-llvm-env'
GHA's composite-action manifest loader (ActionManifestManagerLegacy)
explicitly rejects YAML anchors, independent of whether the underlying
YAML library supports them (PyYAML parses these fine; the workflow
parser apparently also does — only action.yml is restricted).
Restore the explicit duplicated CCACHE_* env blocks on the Build and
Show-ccache-stats steps in build-llvm/action.yml and build-solc/action.yml,
with a comment noting why we can't dedupe.
…ontents
First real run confirmed the action freed ~20 GB (85 → 105 GB avail),
but surfaced cosmetic "Device or resource busy" errors on each mount:
rm: cannot remove '/mnt/free-disk-space/android': Device or resource busy
`rm -rf "$p"` on a bind mount deletes everything underneath just fine,
but the final syscall to unlink the mount-point directory itself fails
with EBUSY because the mount is live. The disk reclaim already happened
by that point, so it's noise — but the non-zero exit also masks any
*real* rm failure under the same error.
Switch to `find "$p" -mindepth 1 -delete`, which only touches contents
under the mount point. Same disk reclaim, clean logs, and the exit code
now actually signals problems worth looking at.
GitHub's macOS x86 runner ships ~16 Xcode versions. Removing them sequentially took ~45s each (~8 min total on the free-disk-space step) because `rm -rf` is I/O-bound per inode on APFS. Fan them out concurrently with one worker per bundle — should cut this to under 2 min. Soft-fails by design: an opportunistic cleanup shouldn't fail the job if one rm hits a stray file handle; rm's stderr still pinpoints the path.
With ccache compile-cache hitting at 99%+ (measured on the macOS LLVM leg of run 24745368771), link time dominates the remaining wall-clock. Each RelWithDebInfo LLVM tool link peaks at 2-4 GB RSS and ccache doesn't cache links. The hosted macos-15 ARM runner has 3 vCPU / 7 GB RAM — two parallel links exceed RAM and push into swap, which is net slower than serialized links (paging during mmap-heavy linker work is devastating, and shows up as elapsed time without CPU utilization). Other hosted runners have headroom: macos-15-intel 4 vCPU / 14 GB RAM ← 2 parallel links fit Linux / Windows 16 GB RAM ← 2 parallel links fit macos-15 (ARM) 3 vCPU / 7 GB RAM ← 2 parallel links = swap Drive LLVM_PARALLEL_LINK_JOBS from a shell conditional on runner.os and runner.arch: 1 only on macOS ARM64, 2 elsewhere. Confined to the one cmake flag in the --extra-args string.
Run 24751389733 confirmed the hosted `solx-ci-runner` container image doesn't ship sudo at all: /__w/_temp/…sh: line 1: sudo: command not found Process completed with exit code 127 Steps already run as root inside the container, so `sudo` was both unnecessary and broken. Invoke `apt-get update -qq` directly in both composite actions; update the surrounding comment to reflect the actual container behavior rather than the generic root/non-root portability framing. Reverts the sudo-adding hunk from 35949d3 (where the original review suggestion for portability was accepted).
On Windows with 100% ccache hits, Build LLVM still takes ~42 min. The
bottleneck is `lld-link` processing the ~200 LLVM tool executables
(opt, llc, llvm-objdump, llvm-pdbutil, ...) — each link is ~10-25 s on
Windows RelWithDebInfo with PDBs, and ccache doesn't cache link.
solx doesn't use any of these tool binaries at runtime:
- solx consumes LLVM as a library via inkwell FFI on the static libs
under target-llvm/target-final/ (LLVM_SYS_211_PREFIX). No reference
to target-llvm/target-final/bin/ anywhere in solx or solx-dev.
- Runtime tool deps (llvm-cov, llvm-profdata, llvm-symbolizer,
llvm-lipo) come from distro/Xcode packages via the ci-runner
Dockerfile / macOS runner tooling, not from the built LLVM.
- No CI path passes enable-tests: true to build-llvm (the
enable-tests in deploy-mdbook is for mdbook test, not LLVM tests).
- solx-llvm's regression-tests.yml runs only inside solx-llvm's own
CI, not from solx.
Gate the new flags on runner.os == 'Windows' so non-Windows runs stay
byte-identical pending independent validation. Restructures the
--extra-args positional list into a bash array that grows conditionally;
solx-dev's --extra-args is already declared `num_args = 1..` so the
`"${EXTRA_ARGS[@]}"` expansion works as before.
Expected Windows Build LLVM: ~42 min → ~5-10 min (link count drops by
roughly an order of magnitude, scheduling unchanged at link-jobs=2).
Link-jobs cap from #245 is deliberately left at 2 on Windows — #245's
rationale (OOM on 16 GB runners) still applies; we're reducing the
number of links, not running more in parallel.
…ORE MERGE" This reverts commit fb9f415.
- build-{llvm,solc}/action.yml: collapse the four-line YAML-anchor
history block into a one-sentence KEEP-IN-SYNC marker that points at
the downstream steps and notes the action-manifest parser limitation.
- free-disk-space/action.yml: drop the sudo/Boost paragraph from the
action description. The sudo half was stale once we confirmed the
container runs as root, and the Boost half is tangential to
free-disk-space itself.
Net -8 lines of comments, no behaviour change.
…ll-clock" This reverts commit 3106d1e.
nebasuke
added a commit
that referenced
this pull request
Apr 22, 2026
Retry of the Windows LLVM tool-disable that landed+reverted in #360 (commits 3106d1e / 90df14b). The first attempt broke `Run tests` because `llvm-sys` (pulled in via inkwell) runs `${LLVM_SYS_211_PREFIX}/bin/llvm-config` at Rust crate-build time to discover include/lib paths, and `llvm-config` is itself an LLVM tool that got disabled along with the rest. This version keeps `llvm-config` alive while disabling the other ~200 tool binaries: -DLLVM_BUILD_TOOLS=Off # tools are no longer in the ALL target -DLLVM_INCLUDE_TOOLS=On # tools/ subdirectory still configured, # so individual tool targets exist -DLLVM_TOOL_LLVM_CONFIG_BUILD=On # per-tool override forces llvm-config # specifically into the ALL target LLVM's cmake uses the `LLVM_TOOL_<name>_BUILD` pattern to let individual tools opt back in when LLVM_BUILD_TOOLS is off. Expected effect: only `llvm-config.exe` builds (small link), the ~200 heavy tool links are skipped. Run 24771100328 confirmed the validation target: with all tools disabled the Windows Build LLVM step went from 42 min → 9:44 (4.3×). This PR aims to preserve that win while keeping `llvm-sys` happy. Cache-key hardening (required for correctness) Extracts the cmake `--extra-args` construction into a new `Compute LLVM build config` step and hashes the flag list into the artifact cache key (`...-args<sha8>-<solx-llvm-sha>`). Without this, the existing key only reflects the solx-dev action inputs + the solx-llvm submodule SHA — not the cmake flags — so entries built with `LLVM_BUILD_TOOLS=Off` on Windows would share a key with entries built with tools `On` and silently serve the wrong install tree. Side-benefit: the hash catches any future output-affecting flag added to `--extra-args` without requiring reviewer discipline to update the key. Non-output-affecting flag tweaks (e.g. LLVM_PARALLEL_LINK_JOBS scheduling) also rotate the key, costing one cold build per tweak — acceptable with ccache as fallback and much more robust than manual key maintenance. The Build LLVM step now reads `${RUNNER_TEMP}/llvm-extra-args` produced by the new step, so EXTRA_ARGS is constructed once and consumed twice (once for the hash, once for the build). Acceptance: - Windows Build LLVM step < 15 min on a warm-ccache run. - Windows `Run tests` succeeds (llvm-sys finds llvm-config). - No regression on Linux/macOS (no flag change; just the new args-hash key component, which triggers a one-time cold build on first run). - `Show ccache stats` still reports near-100% hit rate. See #364 for the full design rationale and alternatives considered.
nebasuke
added a commit
that referenced
this pull request
Apr 22, 2026
…ERGE Mirrors the TEMP pattern from #360 but scoped to `build-llvm/action.yml` only (this branch doesn't touch solc): 1. `if: false` on the LLVM artifact cache restore → forces Build LLVM to always run, so the Windows tool-disable change is actually exercised. Without this, the args-hash in the cache key only triggers miss on the first push (when no entry exists); subsequent pushes to this branch would hit the entry we just saved and skip the build, masking any regression. 2. `save: true` on the ccache-action → populates ccache in this branch's cache scope even on PR events, so a follow-up push can observe warm-ccache + tool-disable together (the actual target configuration). Revert both before merge. The end-state behaviour (artifact cache restore works, ccache saves only on non-PR events) is what ships.
a444822 to
bf8cf2c
Compare
hedgar2017
reviewed
Apr 23, 2026
hedgar2017
reviewed
Apr 23, 2026
hedgar2017
reviewed
Apr 23, 2026
hedgar2017
reviewed
Apr 23, 2026
hedgar2017
requested changes
Apr 23, 2026
Contributor
hedgar2017
left a comment
There was a problem hiding this comment.
Ran another round! Everything is mostly good but a couple of things worth tightening.
Bare `wait` returned 0 regardless of child exit codes, so a failed `find -delete` on any bind mount was silently swallowed. Capture each PID and wait per-child; `rc` keeps the last non-zero exit, which is all we need to trigger the warning path (per-cleanup stderr identifies which path failed). On non-zero rc, emit a `::warning::` annotation instead of exiting non-zero. Partial cleanup is usually still enough headroom for the downstream build; when it isn't, ENOSPC will surface at a more specific call site than this action can name. Also reworded the inline comment at the `find` call — the previous note about "a real rm failure can still surface as a non-zero exit" no longer matches the deliberate warn-and-continue semantics. Reported by @hedgar2017 in PR #360.
The case guard is a shell glob, not a realpath check — so the doc claim that it "caps the blast radius to /mnt/free-disk-space/*" is false for `..` traversal (e.g. `/mnt/free-disk-space/../host-dir` passes the glob). It's unexploitable today since all callers pass hard-coded literals, but the wording should match what the code actually does. Reword both the action description and the inline comment to describe the guard as best-effort typo-catching, explicitly not a security boundary. No code change. Reported by @hedgar2017 in PR #360.
warm-solc was the only Linux-container job in cache-warmup.yaml without volumes + Free disk space, making it an implicit special case that the next reader has to reason about. solc's build + boost footprint (~3-5 GB) fits in container headroom today, but the ~30 s the cleanup costs is cheap insurance against future solc/boost growth hitting ENOSPC mid-warmup. Reported by @hedgar2017 in PR #360.
The old code gated `removed=${#to_remove[@]}` on xargs returning zero,
but xargs exits non-zero if *any* child `rm -rf` fails — so the summary
would print "Removed 0 inactive Xcode version(s)" even after 3/4
bundles (~45 GB) were successfully deleted. Misleading on a cleanup
step whose value is precisely the freed disk.
Set `removed` to the attempt count unconditionally before xargs and
keep the warning path for partial failure. Off-by-at-most-one in the
rare failure case is a much better signal than zero.
Reported by @hedgar2017 in PR #360.
nebasuke
added a commit
that referenced
this pull request
Apr 23, 2026
Retry of the Windows LLVM tool-disable that landed+reverted in #360 (commits 3106d1e / 90df14b). The first attempt broke `Run tests` because `llvm-sys` (pulled in via inkwell) runs `${LLVM_SYS_211_PREFIX}/bin/llvm-config` at Rust crate-build time to discover include/lib paths, and `llvm-config` is itself an LLVM tool that got disabled along with the rest. This version keeps `llvm-config` alive while disabling the other ~200 tool binaries: -DLLVM_BUILD_TOOLS=Off # tools are no longer in the ALL target -DLLVM_INCLUDE_TOOLS=On # tools/ subdirectory still configured, # so individual tool targets exist -DLLVM_TOOL_LLVM_CONFIG_BUILD=On # per-tool override forces llvm-config # specifically into the ALL target LLVM's cmake uses the `LLVM_TOOL_<name>_BUILD` pattern to let individual tools opt back in when LLVM_BUILD_TOOLS is off. Expected effect: only `llvm-config.exe` builds (small link), the ~200 heavy tool links are skipped. Run 24771100328 confirmed the validation target: with all tools disabled the Windows Build LLVM step went from 42 min → 9:44 (4.3×). This PR aims to preserve that win while keeping `llvm-sys` happy. Cache-key hardening (required for correctness) Extracts the cmake `--extra-args` construction into a new `Compute LLVM build config` step and hashes the flag list into the artifact cache key (`...-args<sha8>-<solx-llvm-sha>`). Without this, the existing key only reflects the solx-dev action inputs + the solx-llvm submodule SHA — not the cmake flags — so entries built with `LLVM_BUILD_TOOLS=Off` on Windows would share a key with entries built with tools `On` and silently serve the wrong install tree. Side-benefit: the hash catches any future output-affecting flag added to `--extra-args` without requiring reviewer discipline to update the key. Non-output-affecting flag tweaks (e.g. LLVM_PARALLEL_LINK_JOBS scheduling) also rotate the key, costing one cold build per tweak — acceptable with ccache as fallback and much more robust than manual key maintenance. The Build LLVM step now reads `${RUNNER_TEMP}/llvm-extra-args` produced by the new step, so EXTRA_ARGS is constructed once and consumed twice (once for the hash, once for the build). Acceptance: - Windows Build LLVM step < 15 min on a warm-ccache run. - Windows `Run tests` succeeds (llvm-sys finds llvm-config). - No regression on Linux/macOS (no flag change; just the new args-hash key component, which triggers a one-time cold build on first run). - `Show ccache stats` still reports near-100% hit rate. See #364 for the full design rationale and alternatives considered.
nebasuke
added a commit
that referenced
this pull request
Apr 23, 2026
…ERGE Mirrors the TEMP pattern from #360 but scoped to `build-llvm/action.yml` only (this branch doesn't touch solc): 1. `if: false` on the LLVM artifact cache restore → forces Build LLVM to always run, so the Windows tool-disable change is actually exercised. Without this, the args-hash in the cache key only triggers miss on the first push (when no entry exists); subsequent pushes to this branch would hit the entry we just saved and skip the build, masking any regression. 2. `save: true` on the ccache-action → populates ccache in this branch's cache scope even on PR events, so a follow-up push can observe warm-ccache + tool-disable together (the actual target configuration). Revert both before merge. The end-state behaviour (artifact cache restore works, ccache saves only on non-PR events) is what ships.
nebasuke
added a commit
that referenced
this pull request
Apr 27, 2026
Retry of the Windows LLVM tool-disable that landed+reverted in #360 (commits 3106d1e / 90df14b). The first attempt broke `Run tests` because `llvm-sys` (pulled in via inkwell) runs `${LLVM_SYS_211_PREFIX}/bin/llvm-config` at Rust crate-build time to discover include/lib paths, and `llvm-config` is itself an LLVM tool that got disabled along with the rest. This version keeps `llvm-config` alive while disabling the other ~200 tool binaries: -DLLVM_BUILD_TOOLS=Off # tools are no longer in the ALL target -DLLVM_INCLUDE_TOOLS=On # tools/ subdirectory still configured, # so individual tool targets exist -DLLVM_TOOL_LLVM_CONFIG_BUILD=On # per-tool override forces llvm-config # specifically into the ALL target LLVM's cmake uses the `LLVM_TOOL_<name>_BUILD` pattern to let individual tools opt back in when LLVM_BUILD_TOOLS is off. Expected effect: only `llvm-config.exe` builds (small link), the ~200 heavy tool links are skipped. Run 24771100328 confirmed the validation target: with all tools disabled the Windows Build LLVM step went from 42 min → 9:44 (4.3×). This PR aims to preserve that win while keeping `llvm-sys` happy. Cache-key hardening (required for correctness) Extracts the cmake `--extra-args` construction into a new `Compute LLVM build config` step and hashes the flag list into the artifact cache key (`...-args<sha8>-<solx-llvm-sha>`). Without this, the existing key only reflects the solx-dev action inputs + the solx-llvm submodule SHA — not the cmake flags — so entries built with `LLVM_BUILD_TOOLS=Off` on Windows would share a key with entries built with tools `On` and silently serve the wrong install tree. Side-benefit: the hash catches any future output-affecting flag added to `--extra-args` without requiring reviewer discipline to update the key. Non-output-affecting flag tweaks (e.g. LLVM_PARALLEL_LINK_JOBS scheduling) also rotate the key, costing one cold build per tweak — acceptable with ccache as fallback and much more robust than manual key maintenance. The Build LLVM step now reads `${RUNNER_TEMP}/llvm-extra-args` produced by the new step, so EXTRA_ARGS is constructed once and consumed twice (once for the hash, once for the build). Acceptance: - Windows Build LLVM step < 15 min on a warm-ccache run. - Windows `Run tests` succeeds (llvm-sys finds llvm-config). - No regression on Linux/macOS (no flag change; just the new args-hash key component, which triggers a one-time cold build on first run). - `Show ccache stats` still reports near-100% hit rate. See #364 for the full design rationale and alternatives considered.
nebasuke
added a commit
that referenced
this pull request
Apr 27, 2026
…ERGE Mirrors the TEMP pattern from #360 but scoped to `build-llvm/action.yml` only (this branch doesn't touch solc): 1. `if: false` on the LLVM artifact cache restore → forces Build LLVM to always run, so the Windows tool-disable change is actually exercised. Without this, the args-hash in the cache key only triggers miss on the first push (when no entry exists); subsequent pushes to this branch would hit the entry we just saved and skip the build, masking any regression. 2. `save: true` on the ccache-action → populates ccache in this branch's cache scope even on PR events, so a follow-up push can observe warm-ccache + tool-disable together (the actual target configuration). Revert both before merge. The end-state behaviour (artifact cache restore works, ccache saves only on non-PR events) is what ships.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
build-llvmandbuild-solccomposite actions as a fallback behind the SHA-keyed artifact cache (removed in ci: replace ccache with artifact caching + standardize coverage config #246). ccache only runs on artifact-cache miss; the warm-cache fast path (~30 s restore) is unchanged.cache-warmup.yamljob so the daily cron extends the ccache LRU even when the artifact cache hits, preventing GHA's 7-day eviction from emptying ccache on quiet periods.free-disk-spacecomposite action that reclaims ~24 GB of pre-installed host tooling (dotnet, Android SDK, Haskell, CodeQL) viacontainer.volumes:bind-mounts, so hostedubuntu-24.04runners have enough room for cold LLVM builds + ccache on the same disk pool.LLVM_PARALLEL_LINK_JOBSper platform — 2 on Linux/Windows and macOS x86 (14–16 GB RAM), 1 on macOS ARM64 — because link memory pressure becomes the bottleneck once compile-phase is served from ccache at 99%+ hit rates. The hostedmacos-15ARM runner has 3 vCPU / 7 GB RAM; two parallel RelWithDebInfo links at 2–4 GB RSS each push it into swap.max-size: "10G"per ccache scope (matches pre-ci: replace ccache with artifact caching + standardize coverage config #246 value).Observed runtimes
Three reference runs on this PR branch show the progression:
fb9f415— first run, ccache empty, pre-free-disk-space; cargo-checks fails on disk exhaustion duringlldlink ofllvm-opt-fuzzer.13d56d5— LLVM ccache populated by the cold run, free-disk-space wired, but parallel links still causing swap on macOS ARM64.6f85d0a—LLVM_PARALLEL_LINK_JOBS=1on macOS ARM64, 100% ccache hit rate across the board.Whole-job wall-clock
Build LLVM step only
Isolates the LLVM-building cost from checkout / solc build / tests / etc. Shows that the dramatic macOS ARM64 win is mechanism-specific:
Two effects stacking:
LLVM_PARALLEL_LINK_JOBS=1eliminated swap thrashing on macOS ARM64, which is the 20× speedup. 25 additional cache hits don't explain ~1h 50min of saved wall time; that was the linker stalled on page faults in a 7 GB runner trying to run two ~3 GB RSS link jobs concurrently. Windows' 1.1× result is the control group: 16 GB RAM, no swap pressure, no link-jobs change, speedup limited to cache warming.ccache itself handles compile, not link. The ~200 link steps per build still run every time; that's the floor. On most platforms link time is tolerable; on macOS ARM64 specifically the interaction between link count, link RSS, and runner RAM was catastrophic before this PR and is now fine.
Building solc step
solc's ccache key changed in
35949d3(adding-{cmake-build-type}-endper review feedback), which invalidated run 1's saves — run 2 re-populated with the new key format, and run 3 is the first run that actually restores a warm solc ccache at 100% hit rate.Observations:
solcandsolc-testsbinaries, vs LLVM's ~200 tool-executables). Almost all of the wall-clock is the link-floor we can't eliminate.lld-linkis inherently slow per-executable on Windows and solc + boost together produce enough binaries for that to dominate. No swap here (16 GB RAM, not memory-pressured), just slow links.Fixes and tuning during review
-end):llvm-…-mlirwas a prefix ofllvm-…-mlir-coverage-no-assertions, so ccache-action'srestore-keysprefix match was cross-restoring between dev and coverage configs on Linux x64. All LLVM and solc keys now end with an-endmarker.cache-warmup.yaml's Touch steps used${{ runner.temp }}/ccache-touch-{llvm,solc}but ccache-action saves withpath = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}. GHA includespathin the cache version hash, so the Touch was a silent no-op. Fixed to real paths.cmake-build-type:warm-solc(RelWithDebInfo) andwarm-llvm-integration(Release) were colliding on one key. Key now includes build-type.Show ccache stats→continue-on-error: trueso a missing-ccache-binary failure can't mask the real build failure.apt updatehandling — went through a couple of iterations: first bareapt update, thensudo apt-get update -qq(per review suggestion for portability), then back to plainapt-get update -qqonce a test run (#24751389733) confirmed thesolx-ci-runnercontainer image doesn't shipsudoat all. Kept the add-then-remove as separate commits so the iteration is visible in history.ActionManifestManagerLegacyexplicit refusal), so the four duplicatedCCACHE_*env vars across steps inbuild-llvm/action.ymlandbuild-solc/action.ymlstay inlined with a sync comment. Workflow files do accept anchors, which is used to dedupe the five-entrycontainer.volumes:list across the six affected jobs within each oftest.yamlandcache-warmup.yaml.find -mindepth 1 -deleteinstead ofrm -rfon the bind-mounted host paths:rm -rfon a bind-mount directory fails with EBUSY on the mount point itself (disk reclaim still happens, but log noise and a masked non-zero exit).find -deleteclears contents, leaves the mount intact, and preserves a meaningful exit code.xargs -Pfor macOS Xcode removal: the pre-installed runner has ~16 Xcode versions. Sequentialrm -rfwas ~8 min; parallel fan-out cuts it well under two.3106d1eaddedLLVM_BUILD_TOOLS=Off+LLVM_INCLUDE_TOOLS=Offon Windows to skip ~200 unused tool-binary links. Build LLVM step dropped from 42 min → 9:44 (4.3×) on the validation run. Reverted in90df14bafterRun testsfailed withllvm-sysunable to findllvm-configat Rust crate-build time —llvm-configis itself a tool binary and got disabled along with the rest. Follow-up to retry with a surgical "build onlyllvm-config" approach tracked in ci(llvm): reduce Windows LLVM tool count while keeping llvm-config (retry of reverted tool-disable) #364.Prior art (why this shape)
Cache calculations
Current observed active usage (
gh api repos/NomicFoundation/solx/actions/cache/usage):v1-llvm-Windows-X64-RelWithDebInfo-mlir-…v1-llvm-macOS-X64-RelWithDebInfo-mlir-…v1-llvm-macOS-ARM64-RelWithDebInfo-mlir-…v1-llvm-Linux-X64-RelWithDebInfo-mlir-…v1-llvm-Linux-ARM64-RelWithDebInfo-mlir-…build-and-test-v2-*(rust-cache, 3 OSes)v1-solc-*(5 platforms)¹ GHA scopes caches per-ref: each merge queue branch (
gh-readonly-queue/main/pr-###-…) creates its own cache copy.Projected ccache additions
With
max-size: 10Gper scope, actual on-disk ccache per variant tends to sit at ~2–3 GB for LLVM, ~1 GB for solc. GHA stores the zstd-compressed tarball, typically 40–60 % of on-disk size.llvm-<OS>-<arch>-RelWithDebInfo-mlir), 5 platformssolc-<OS>-<arch>-<build-type>), 5 platformsWorst-case upper bound
If every ccache scope fills its 10G cap (unlikely, but the theoretical max):
Projected totals
Headroom under quota stays comfortable in the typical case and survives the worst case. If observed usage approaches 150 GB after a few weeks, the
max-sizecap is a single-line dial-back.Why ccache-touch in cache-warmup?
The composite action gates the ccache steps on
cache-hit != 'true'. When the daily cron fires on an unchanged submodule SHA, the artifact cache hits → composite skips ccache → the ccache entry's LRU timer isn't refreshed. After 7 quiet days, GHA evicts it. TheTouch ccachesteps useactions/cache/restorewithlookup-only: true— the lookup hits the cache service endpoint, resetting the 7-day access timer, without downloading the ~2–4 GB entry.Test plan
LLVM_PARALLEL_LINK_JOBS=1on macOS ARM64 validated — Build LLVM step 1h 57min → 5:54 (20×) compared to prior warm-ccache run at link-jobs=2.cache-warmuprun populate ccache entries on main.gh api repos/NomicFoundation/solx/actions/cache/usageover 1–2 weeks — confirm total stays under 150 GB.Out of scope / follow-ups
solx-llvmrepo (build once there, download by SHA here) — biggest long-term win but requires cross-repo infra..libfiles inRelWithDebInfo. Not in scope per prior discussion.LLVM_PARALLEL_COMPILE_JOBS) if link-jobs tuning isn't enough on some future platform.llvm-configretained. Expected ~4× Windows Build LLVM speedup.