ci(cache): re-add ccache as artifact-cache fallback by nebasuke · Pull Request #360 · NomicFoundation/solx

nebasuke · 2026-04-21T15:00:05Z

Summary

I've upped the solx cache size to 150GB.
Re-adds ccache to the build-llvm and build-solc composite actions as a fallback behind the SHA-keyed artifact cache (removed in ci: replace ccache with artifact caching + standardize coverage config #246). ccache only runs on artifact-cache miss; the warm-cache fast path (~30 s restore) is unchanged.
Adds ccache-touch steps to each cache-warmup.yaml job so the daily cron extends the ccache LRU even when the artifact cache hits, preventing GHA's 7-day eviction from emptying ccache on quiet periods.
Adds a free-disk-space composite action that reclaims ~24 GB of pre-installed host tooling (dotnet, Android SDK, Haskell, CodeQL) via container.volumes: bind-mounts, so hosted ubuntu-24.04 runners have enough room for cold LLVM builds + ccache on the same disk pool.
Tunes LLVM_PARALLEL_LINK_JOBS per platform — 2 on Linux/Windows and macOS x86 (14–16 GB RAM), 1 on macOS ARM64 — because link memory pressure becomes the bottleneck once compile-phase is served from ccache at 99%+ hit rates. The hosted macos-15 ARM runner has 3 vCPU / 7 GB RAM; two parallel RelWithDebInfo links at 2–4 GB RSS each push it into swap.
max-size: "10G" per ccache scope (matches pre-ci: replace ccache with artifact caching + standardize coverage config #246 value).

Observed runtimes

Three reference runs on this PR branch show the progression:

Cold: run #24731422129 at fb9f415 — first run, ccache empty, pre-free-disk-space; cargo-checks fails on disk exhaustion during lld link of llvm-opt-fuzzer.
Warm ccache, link-jobs=2: run #24745368771 at 13d56d5 — LLVM ccache populated by the cold run, free-disk-space wired, but parallel links still causing swap on macOS ARM64.
Warm ccache + link-jobs tuning: run #24751701680 at 6f85d0a — LLVM_PARALLEL_LINK_JOBS=1 on macOS ARM64, 100% ccache hit rate across the board.

Whole-job wall-clock

Platform	Cold	Warm ccache, link-jobs=2	Warm ccache + link-jobs tuned
cargo-checks	2h 17min, failed on disk	18.5 min	16.5 min
Linux x86 gnu	3h 00min	32.5 min	15.5 min
Linux ARM64 gnu	1h 44min	25 min	14.5 min
macOS x86	3h 28min	1h 53min	45 min
macOS ARM64	4h 44min	2h 23min	21 min
Windows	4h 31min	1h 46min	1h 19min

Build LLVM step only

Isolates the LLVM-building cost from checkout / solc build / tests / etc. Shows that the dramatic macOS ARM64 win is mechanism-specific:

Platform	Warm ccache, link-jobs=2	Warm ccache + link-jobs tuned	Speedup
cargo-checks	7:07	4:20	1.6×
Linux x86 gnu	7:00	3:59	1.8×
Linux ARM64 gnu	6:11	4:22	1.4×
macOS x86	17:40	10:23	1.7×
macOS ARM64	1h 57min	5:54	20×
Windows	46:35	42:16	1.1×

Two effects stacking:

ccache hit rate crept from 99.35% → 100% (confirmed on both macOS legs this run). The previous run had 25 misses out of 3839 cacheable calls; those got saved, so this run hits them. Accounts for the modest 1.4–1.8× speedup on every non-ARM64 leg — roughly 25 cold compiles × ~10–15 s each.
LLVM_PARALLEL_LINK_JOBS=1 eliminated swap thrashing on macOS ARM64, which is the 20× speedup. 25 additional cache hits don't explain ~1h 50min of saved wall time; that was the linker stalled on page faults in a 7 GB runner trying to run two ~3 GB RSS link jobs concurrently. Windows' 1.1× result is the control group: 16 GB RAM, no swap pressure, no link-jobs change, speedup limited to cache warming.

ccache itself handles compile, not link. The ~200 link steps per build still run every time; that's the floor. On most platforms link time is tolerable; on macOS ARM64 specifically the interaction between link count, link RSS, and runner RAM was catastrophic before this PR and is now fine.

Building solc step

solc's ccache key changed in 35949d3 (adding -{cmake-build-type}-end per review feedback), which invalidated run 1's saves — run 2 re-populated with the new key format, and run 3 is the first run that actually restores a warm solc ccache at 100% hit rate.

Platform	Cold (run 1, `fb9f415`)	Warm ccache (run 3, `6f85d0a`)	Speedup
Linux x86 gnu	19:11	1:32	12.5×
Linux ARM64 gnu	10:26	1:25	7.4×
macOS x86	25:32	7:12	3.5×
macOS ARM64	14:07	2:31	5.6×
Windows	35:50	16:03	2.2×

Observations:

Linux warm solc is the clean case — under 2 min thanks to a small link graph (just solc and solc-tests binaries, vs LLVM's ~200 tool-executables). Almost all of the wall-clock is the link-floor we can't eliminate.
Windows warm solc is still 16 min. Same link-bound story as LLVM: lld-link is inherently slow per-executable on Windows and solc + boost together produce enough binaries for that to dominate. No swap here (16 GB RAM, not memory-pressured), just slow links.
macOS ARM64 solc doesn't need the link-jobs treatment. Solc produces few enough binaries that even link-jobs=2 doesn't exceed the 7 GB RAM.

Fixes and tuning during review

Cache-key terminator (-end): llvm-…-mlir was a prefix of llvm-…-mlir-coverage-no-assertions, so ccache-action's restore-keys prefix match was cross-restoring between dev and coverage configs on Linux x64. All LLVM and solc keys now end with an -end marker.
Touch-step path mismatch: cache-warmup.yaml's Touch steps used ${{ runner.temp }}/ccache-touch-{llvm,solc} but ccache-action saves with path = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}. GHA includes path in the cache version hash, so the Touch was a silent no-op. Fixed to real paths.
solc key missing cmake-build-type: warm-solc (RelWithDebInfo) and warm-llvm-integration (Release) were colliding on one key. Key now includes build-type.
Show ccache stats → continue-on-error: true so a missing-ccache-binary failure can't mask the real build failure.
apt update handling — went through a couple of iterations: first bare apt update, then sudo apt-get update -qq (per review suggestion for portability), then back to plain apt-get update -qq once a test run (#24751389733) confirmed the solx-ci-runner container image doesn't ship sudo at all. Kept the add-then-remove as separate commits so the iteration is visible in history.
YAML anchor reality check: GHA composite-action manifests reject YAML anchors (ActionManifestManagerLegacy explicit refusal), so the four duplicated CCACHE_* env vars across steps in build-llvm/action.yml and build-solc/action.yml stay inlined with a sync comment. Workflow files do accept anchors, which is used to dedupe the five-entry container.volumes: list across the six affected jobs within each of test.yaml and cache-warmup.yaml.
find -mindepth 1 -delete instead of rm -rf on the bind-mounted host paths: rm -rf on a bind-mount directory fails with EBUSY on the mount point itself (disk reclaim still happens, but log noise and a masked non-zero exit). find -delete clears contents, leaves the mount intact, and preserves a meaningful exit code.
xargs -P for macOS Xcode removal: the pre-installed runner has ~16 Xcode versions. Sequential rm -rf was ~8 min; parallel fan-out cuts it well under two.
Windows LLVM tool-disable experiment, landed and reverted: commit 3106d1e added LLVM_BUILD_TOOLS=Off + LLVM_INCLUDE_TOOLS=Off on Windows to skip ~200 unused tool-binary links. Build LLVM step dropped from 42 min → 9:44 (4.3×) on the validation run. Reverted in 90df14b after Run tests failed with llvm-sys unable to find llvm-config at Rust crate-build time — llvm-config is itself a tool binary and got disabled along with the rest. Follow-up to retry with a surgical "build only llvm-config" approach tracked in ci(llvm): reduce Windows LLVM tool count while keeping llvm-config (retry of reverted tool-disable) #364.

Prior art (why this shape)

ci: fix cache eviction and add ccache to solc builds #217 — original ccache. Revealed cargo caches evicting LLVM ccache from the 10 GB budget; fixed by removing cargo caches from that path.
ci: revert sccache to ccache on Windows #226 — sccache is broken on MSYS2 with cmake+ninja+clang (mozilla/sccache#1097, open since 2022). Windows uses ccache.
ci: replace ccache with artifact caching + standardize coverage config #246 — removed ccache entirely in favour of SHA-keyed artifact caching on the (now-outdated) assumption that submodule bumps would be rare. This PR restores only the ccache steps; the 5→4 LLVM config consolidation from ci: replace ccache with artifact caching + standardize coverage config #246 stays.

Cache calculations

Current observed active usage (gh api repos/NomicFoundation/solx/actions/cache/usage):

Entry	Per entry	Copies¹	Subtotal
`v1-llvm-Windows-X64-RelWithDebInfo-mlir-…`	9.09 GB	4	36.4 GB
`v1-llvm-macOS-X64-RelWithDebInfo-mlir-…`	2.39 GB	2	4.8 GB
`v1-llvm-macOS-ARM64-RelWithDebInfo-mlir-…`	2.28 GB	1	2.3 GB
`v1-llvm-Linux-X64-RelWithDebInfo-mlir-…`	1.86 GB	2	3.7 GB
`v1-llvm-Linux-ARM64-RelWithDebInfo-mlir-…`	1.82 GB	2	3.6 GB
`build-and-test-v2-*` (rust-cache, 3 OSes)	~1.5 GB	3	~4.7 GB
`v1-solc-*` (5 platforms)	~0.35 GB	5	1.8 GB
misc	small	—	~0.05 GB
Current total			~59.7 GB / 28 entries

¹ GHA scopes caches per-ref: each merge queue branch (gh-readonly-queue/main/pr-###-…) creates its own cache copy.

Projected ccache additions

With max-size: 10G per scope, actual on-disk ccache per variant tends to sit at ~2–3 GB for LLVM, ~1 GB for solc. GHA stores the zstd-compressed tarball, typically 40–60 % of on-disk size.

New ccache entry	On-disk (typical)	Count	Compressed subtotal
LLVM dev (`llvm-<OS>-<arch>-RelWithDebInfo-mlir`), 5 platforms	~3 GB	5	~8 GB
LLVM sanitizer (Linux x86 only)	~3 GB	1	~1.5 GB
LLVM coverage (Linux x86 only)	~3 GB	1	~1.5 GB
LLVM integration / Release (Linux x86 only)	~3 GB	1	~1.5 GB
solc (`solc-<OS>-<arch>-<build-type>`), 5 platforms	~1 GB	5	~3 GB
New subtotal		13	~15–20 GB

Worst-case upper bound

If every ccache scope fills its 10G cap (unlikely, but the theoretical max):

	On-disk	Compressed in GHA
13 scopes × 10G	130 GB	~50–80 GB

Projected totals

Scenario	Active cache
Current (artifact cache only)	~60 GB
Current + typical ccache	~75–80 GB
Current + worst-case ccache	~110–140 GB
Quota (newly raised)	150 GB

Headroom under quota stays comfortable in the typical case and survives the worst case. If observed usage approaches 150 GB after a few weeks, the max-size cap is a single-line dial-back.

Why ccache-touch in cache-warmup?

The composite action gates the ccache steps on cache-hit != 'true'. When the daily cron fires on an unchanged submodule SHA, the artifact cache hits → composite skips ccache → the ccache entry's LRU timer isn't refreshed. After 7 quiet days, GHA evicts it. The Touch ccache steps use actions/cache/restore with lookup-only: true — the lookup hits the cache service endpoint, resetting the 7-day access timer, without downloading the ~2–4 GB entry.

Test plan

CI passes cold build — confirmed in run #24745368771 once free-disk-space was wired. cargo-checks passes in 16.5 min (was failing on disk with SIGBUS during link).
LLVM ccache active, populated, and restored across runs. Stats confirm 100% hit rate once warmed.
solc ccache validated end-to-end on run #24751701680 — 100% hit rate on all five platforms after the key change propagated.
LLVM wall-clock drops substantially vs cold baseline on every leg. Linux x86: 3h → 15.5 min (12×). Linux ARM64: 1h 44min → 14.5 min (7×). macOS x86: 3h 28min → 45 min (4.6×). macOS ARM64: 4h 44min → 21 min (13×). Windows: 4h 31min → 1h 19min (3.4×).
LLVM_PARALLEL_LINK_JOBS=1 on macOS ARM64 validated — Build LLVM step 1h 57min → 5:54 (20×) compared to prior warm-ccache run at link-jobs=2.
After merge, watch push-to-main cache-warmup run populate ccache entries on main.
Monitor gh api repos/NomicFoundation/solx/actions/cache/usage over 1–2 weeks — confirm total stays under 150 GB.

Out of scope / follow-ups

Publishing prebuilt LLVM binary tarballs from the solx-llvm repo (build once there, download by SHA here) — biggest long-term win but requires cross-repo infra.
Windows LLVM artifact is 9 GB (4× macOS, 5× Linux) — likely PDBs / .lib files in RelWithDebInfo. Not in scope per prior discussion.
Capping ninja compile parallelism (LLVM_PARALLEL_COMPILE_JOBS) if link-jobs tuning isn't enough on some future platform.
Reduce Windows LLVM tool count (ci(llvm): reduce Windows LLVM tool count while keeping llvm-config (retry of reverted tool-disable) #364) — attempted in this PR and reverted; worth retrying surgically with llvm-config retained. Expected ~4× Windows Build LLVM speedup.

Copilot

Pull request overview

This PR reintroduces ccache as a fallback layer behind the existing SHA-keyed GitHub Actions artifact cache for LLVM/solc builds, and updates the cache warmup workflow to “touch” ccache entries so they don’t get evicted during quiet periods.

Changes:

Add hendrikmuhs/ccache-action setup + --ccache-variant=ccache for LLVM and solc builds on artifact-cache miss.
Add “Touch LLVM/solc ccache” steps to cache-warmup.yaml jobs using actions/cache/restore with lookup-only: true.
Configure per-scope ccache size cap (max-size: "10G").

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`.github/workflows/cache-warmup.yaml`	Adds lookup-only cache restores to refresh ccache LRU for LLVM/solc warmup jobs.
`.github/actions/build-solc/action.yml`	Re-adds ccache setup/stats and passes `--ccache-variant=ccache` when artifact cache misses.
`.github/actions/build-llvm/action.yml`	Re-adds ccache setup/stats, defines per-config ccache keys, and passes `--ccache-variant=ccache` when artifact cache misses.

Five fixes from hedgar2017's review on #360: 1. Touch-step path fix (was a silent no-op). The `Touch {LLVM,solc} ccache` steps used `${{ runner.temp }}/ccache-{touch-llvm,touch-solc}`, but ccache-action saves with `path = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}`. GHA includes `path` in the cache version hash, so the Touch restore-keys prefix match never saw the real ccache entries. After 7 quiet days the cron would silently fail to refresh the LRU and the entries would age out. Fixed to the real paths across all six Touch steps. 2. solc ccache key missing `cmake-build-type`. `warm-solc` builds `RelWithDebInfo`; `warm-llvm-integration` builds `Release`. Same key → the two configs evict each other. Added the build-type component and the `-end` terminator (same rationale as a239dea) to the solc key and the two solc Touch restore-keys. 3. `Show ccache stats` → `continue-on-error: true` so a missing `ccache` binary (e.g. ccache-action install failed upstream) can't mask the real build failure in the step summary. 4. `CCACHE_*` env deduplication via YAML anchor (`&ccache-{llvm,solc}-env`) and `<<:` merge keys. Composite-action sibling steps don't share env, so four variables were duplicated three times per action, with no guardrail against drift. PyYAML verified the merge resolves to the same env sets previously written by hand. 5. `apt update` → `sudo apt-get update -qq`. Works in both root (current Docker container) and non-root (hosted runner) environments; `-qq` silences the default chatter.

When the solx-llvm submodule is bumped the SHA-keyed artifact cache misses and every platform does a ~3.5h cold LLVM build. PR #246 removed ccache on the assumption that artifact cache hits made it redundant; that doesn't hold for the bump case we're now in. Restore the pre-removal ccache steps (commit 1320d1b), layered behind the existing artifact cache so ccache only runs on artifact-cache miss: - build-llvm/action.yml: define ccache key, install ccache, pass --ccache-variant=ccache to solx-dev, report --show-stats. - build-solc/action.yml: same pattern, separate ccache dir. - Cap max-size at 4G (was 10G) for a tighter cache budget. - Align ccache key schema with the current artifact-cache key (includes -no-assertions; matches the 4-config matrix introduced by #246). Expected: 3.5h cold builds drop to ~45-90min on submodule bumps; warm-cache fast path unchanged.

build-llvm and build-solc now wire ccache behind the artifact cache, but the gate (steps.<artifact>-cache.outputs.cache-hit != 'true') means that when the daily cache-warmup cron fires on an unchanged submodule SHA, the artifact cache hits, the composite skips its ccache steps, and the ccache entry's LRU timer is never refreshed. After 7 days of hits-only it gets evicted, and the next submodule bump finds ccache also cold. Add a Touch step after each build in cache-warmup.yaml that resolves the ccache entry by prefix via actions/cache/restore with lookup-only: true. The lookup alone resets the 7-day access timer without downloading the 2-4 GB entry.

4G is tight enough that a single full LLVM EVM-target build (~4000 TUs, ~2.4 GB of cached objects on average) plus one re-build with slightly different inputs can push ccache over the cap and trigger LRU eviction of still-useful entries, hurting hit rate on the next submodule bump. 10G (matching the pre-PR-#246 value) leaves plenty of headroom. Worst-case 13 scopes × 10G = ~130 GB on-disk (compressed: ~50-80 GB), well under the 150 GB repo quota.

ccache-action's restore-keys does prefix matching, and the previous key shape let shorter variant keys (e.g. `llvm-Linux-X64-RelWithDebInfo-mlir`) prefix-match longer ones (`...-mlir-coverage-no-assertions`). On Linux x64 where both dev and coverage warm-ups run, the newer entry would win and the ccache dir would be cross-restored with differently-compiled objects — zero hit rate plus cache churn. Append a literal `-end` terminator to every llvm ccache key and to the corresponding Touch restore-keys in cache-warmup.yaml. `build-solc`'s single-variant key is unaffected.

Two changes to validate ccache end-to-end on this branch without waiting for a post-merge submodule bump: 1. Disable the SHA-keyed artifact cache restore in build-llvm and build-solc (`if: false`). Forces the ccache path to run on every CI run. The matching Save steps are already guarded by `github.event_name != 'pull_request'`, so main's artifact cache is not polluted. 2. Flip ccache-action `save` from `github.event_name != 'pull_request'` to `true`. Lets this PR's runs populate ccache so a second run can restore from the first and `Show ccache stats` reports hit rate. Expected: first run cold (ccache miss, saves); second run warm (prefix match on `-end`-terminated key, high hit rate on LLVM/solc builds).

Five fixes from hedgar2017's review on #360: 1. Touch-step path fix (was a silent no-op). The `Touch {LLVM,solc} ccache` steps used `${{ runner.temp }}/ccache-{touch-llvm,touch-solc}`, but ccache-action saves with `path = CCACHE_DIR = ${{ runner.temp }}/ccache-{llvm,solc}`. GHA includes `path` in the cache version hash, so the Touch restore-keys prefix match never saw the real ccache entries. After 7 quiet days the cron would silently fail to refresh the LRU and the entries would age out. Fixed to the real paths across all six Touch steps. 2. solc ccache key missing `cmake-build-type`. `warm-solc` builds `RelWithDebInfo`; `warm-llvm-integration` builds `Release`. Same key → the two configs evict each other. Added the build-type component and the `-end` terminator (same rationale as a239dea) to the solc key and the two solc Touch restore-keys. 3. `Show ccache stats` → `continue-on-error: true` so a missing `ccache` binary (e.g. ccache-action install failed upstream) can't mask the real build failure in the step summary. 4. `CCACHE_*` env deduplication via YAML anchor (`&ccache-{llvm,solc}-env`) and `<<:` merge keys. Composite-action sibling steps don't share env, so four variables were duplicated three times per action, with no guardrail against drift. PyYAML verified the merge resolves to the same env sets previously written by hand. 5. `apt update` → `sudo apt-get update -qq`. Works in both root (current Docker container) and non-root (hosted runner) environments; `-qq` silences the default chatter.

The hosted `ubuntu-24.04` runner has ~14 GB free at start. A cold LLVM+MLIR RelWithDebInfo build fills ~12 GB, and the new ccache adds another ~1.1 GB — on this run cargo-checks tipped over with a SIGBUS while linking llvm-opt-fuzzer ("no space left on device" moments later during the ccache save). Linux x86 gnu happened to squeak by on the same commit. Reclaim ~24 GB of preinstalled host tooling solx doesn't use: .NET SDK, Android SDK+NDK, Haskell (ghc + ghcup), CodeQL bundles. These live on the runner VM's disk outside the container's view, so callers bind-mount each host path into `/mnt/free-disk-space/<name>` via `container.volumes:` and the composite action rms them from inside. The action refuses to operate on anything outside that prefix, and skips missing or empty mounts so a forgotten volume can't cause harm. Wired into the six hosted-ubuntu-x64 jobs that cold-build LLVM: - test.yaml::cargo-checks - test.yaml::build-and-test (gated to containerized Linux legs) - cache-warmup.yaml::warm-llvm (gated to containerized Linux legs) - cache-warmup.yaml::warm-llvm-sanitizer - cache-warmup.yaml::warm-llvm-coverage - cache-warmup.yaml::warm-llvm-integration Boost intentionally not removed — solc's `--build-boost` builds its own, but the preinstalled `/usr/local/share/boost` headers may be pulled in transitively by other tooling. CodeQL confirmed unused in this repo. With ~38 GB free during build (vs ~14 GB before), the cold-build path has a comfortable margin for the LLVM build, ccache, and future growth.

Deduplicates the five-entry `container.volumes:` list that was repeated on six jobs. YAML anchor-alias within each file collapses the four single-job cache-warmup definitions to one-liners and the two test.yaml jobs' definitions to the anchor + one alias. Cross-file sharing isn't possible (YAML anchors are per-document), so test.yaml and cache-warmup.yaml each carry their own anchor definition with a sync comment pointing at the other.

Run 24745149414 failed at action-load time with: /home/runner/.../.github/actions/build-llvm/action.yml: Anchors are not currently supported. Remove the anchor 'ccache-llvm-env' GHA's composite-action manifest loader (ActionManifestManagerLegacy) explicitly rejects YAML anchors, independent of whether the underlying YAML library supports them (PyYAML parses these fine; the workflow parser apparently also does — only action.yml is restricted). Restore the explicit duplicated CCACHE_* env blocks on the Build and Show-ccache-stats steps in build-llvm/action.yml and build-solc/action.yml, with a comment noting why we can't dedupe.

…ontents First real run confirmed the action freed ~20 GB (85 → 105 GB avail), but surfaced cosmetic "Device or resource busy" errors on each mount: rm: cannot remove '/mnt/free-disk-space/android': Device or resource busy `rm -rf "$p"` on a bind mount deletes everything underneath just fine, but the final syscall to unlink the mount-point directory itself fails with EBUSY because the mount is live. The disk reclaim already happened by that point, so it's noise — but the non-zero exit also masks any *real* rm failure under the same error. Switch to `find "$p" -mindepth 1 -delete`, which only touches contents under the mount point. Same disk reclaim, clean logs, and the exit code now actually signals problems worth looking at.

GitHub's macOS x86 runner ships ~16 Xcode versions. Removing them sequentially took ~45s each (~8 min total on the free-disk-space step) because `rm -rf` is I/O-bound per inode on APFS. Fan them out concurrently with one worker per bundle — should cut this to under 2 min. Soft-fails by design: an opportunistic cleanup shouldn't fail the job if one rm hits a stray file handle; rm's stderr still pinpoints the path.

With ccache compile-cache hitting at 99%+ (measured on the macOS LLVM leg of run 24745368771), link time dominates the remaining wall-clock. Each RelWithDebInfo LLVM tool link peaks at 2-4 GB RSS and ccache doesn't cache links. The hosted macos-15 ARM runner has 3 vCPU / 7 GB RAM — two parallel links exceed RAM and push into swap, which is net slower than serialized links (paging during mmap-heavy linker work is devastating, and shows up as elapsed time without CPU utilization). Other hosted runners have headroom: macos-15-intel 4 vCPU / 14 GB RAM ← 2 parallel links fit Linux / Windows 16 GB RAM ← 2 parallel links fit macos-15 (ARM) 3 vCPU / 7 GB RAM ← 2 parallel links = swap Drive LLVM_PARALLEL_LINK_JOBS from a shell conditional on runner.os and runner.arch: 1 only on macOS ARM64, 2 elsewhere. Confined to the one cmake flag in the --extra-args string.

Run 24751389733 confirmed the hosted `solx-ci-runner` container image doesn't ship sudo at all: /__w/_temp/…sh: line 1: sudo: command not found Process completed with exit code 127 Steps already run as root inside the container, so `sudo` was both unnecessary and broken. Invoke `apt-get update -qq` directly in both composite actions; update the surrounding comment to reflect the actual container behavior rather than the generic root/non-root portability framing. Reverts the sudo-adding hunk from 35949d3 (where the original review suggestion for portability was accepted).

On Windows with 100% ccache hits, Build LLVM still takes ~42 min. The bottleneck is `lld-link` processing the ~200 LLVM tool executables (opt, llc, llvm-objdump, llvm-pdbutil, ...) — each link is ~10-25 s on Windows RelWithDebInfo with PDBs, and ccache doesn't cache link. solx doesn't use any of these tool binaries at runtime: - solx consumes LLVM as a library via inkwell FFI on the static libs under target-llvm/target-final/ (LLVM_SYS_211_PREFIX). No reference to target-llvm/target-final/bin/ anywhere in solx or solx-dev. - Runtime tool deps (llvm-cov, llvm-profdata, llvm-symbolizer, llvm-lipo) come from distro/Xcode packages via the ci-runner Dockerfile / macOS runner tooling, not from the built LLVM. - No CI path passes enable-tests: true to build-llvm (the enable-tests in deploy-mdbook is for mdbook test, not LLVM tests). - solx-llvm's regression-tests.yml runs only inside solx-llvm's own CI, not from solx. Gate the new flags on runner.os == 'Windows' so non-Windows runs stay byte-identical pending independent validation. Restructures the --extra-args positional list into a bash array that grows conditionally; solx-dev's --extra-args is already declared `num_args = 1..` so the `"${EXTRA_ARGS[@]}"` expansion works as before. Expected Windows Build LLVM: ~42 min → ~5-10 min (link count drops by roughly an order of magnitude, scheduling unchanged at link-jobs=2). Link-jobs cap from #245 is deliberately left at 2 on Windows — #245's rationale (OOM on 16 GB runners) still applies; we're reducing the number of links, not running more in parallel.

…ORE MERGE" This reverts commit fb9f415.

- build-{llvm,solc}/action.yml: collapse the four-line YAML-anchor history block into a one-sentence KEEP-IN-SYNC marker that points at the downstream steps and notes the action-manifest parser limitation. - free-disk-space/action.yml: drop the sudo/Boost paragraph from the action description. The sudo half was stale once we confirmed the container runs as root, and the Boost half is tangential to free-disk-space itself. Net -8 lines of comments, no behaviour change.

…ll-clock" This reverts commit 3106d1e.

Retry of the Windows LLVM tool-disable that landed+reverted in #360 (commits 3106d1e / 90df14b). The first attempt broke `Run tests` because `llvm-sys` (pulled in via inkwell) runs `${LLVM_SYS_211_PREFIX}/bin/llvm-config` at Rust crate-build time to discover include/lib paths, and `llvm-config` is itself an LLVM tool that got disabled along with the rest. This version keeps `llvm-config` alive while disabling the other ~200 tool binaries: -DLLVM_BUILD_TOOLS=Off # tools are no longer in the ALL target -DLLVM_INCLUDE_TOOLS=On # tools/ subdirectory still configured, # so individual tool targets exist -DLLVM_TOOL_LLVM_CONFIG_BUILD=On # per-tool override forces llvm-config # specifically into the ALL target LLVM's cmake uses the `LLVM_TOOL_<name>_BUILD` pattern to let individual tools opt back in when LLVM_BUILD_TOOLS is off. Expected effect: only `llvm-config.exe` builds (small link), the ~200 heavy tool links are skipped. Run 24771100328 confirmed the validation target: with all tools disabled the Windows Build LLVM step went from 42 min → 9:44 (4.3×). This PR aims to preserve that win while keeping `llvm-sys` happy. Cache-key hardening (required for correctness) Extracts the cmake `--extra-args` construction into a new `Compute LLVM build config` step and hashes the flag list into the artifact cache key (`...-args<sha8>-<solx-llvm-sha>`). Without this, the existing key only reflects the solx-dev action inputs + the solx-llvm submodule SHA — not the cmake flags — so entries built with `LLVM_BUILD_TOOLS=Off` on Windows would share a key with entries built with tools `On` and silently serve the wrong install tree. Side-benefit: the hash catches any future output-affecting flag added to `--extra-args` without requiring reviewer discipline to update the key. Non-output-affecting flag tweaks (e.g. LLVM_PARALLEL_LINK_JOBS scheduling) also rotate the key, costing one cold build per tweak — acceptable with ccache as fallback and much more robust than manual key maintenance. The Build LLVM step now reads `${RUNNER_TEMP}/llvm-extra-args` produced by the new step, so EXTRA_ARGS is constructed once and consumed twice (once for the hash, once for the build). Acceptance: - Windows Build LLVM step < 15 min on a warm-ccache run. - Windows `Run tests` succeeds (llvm-sys finds llvm-config). - No regression on Linux/macOS (no flag change; just the new args-hash key component, which triggers a one-time cold build on first run). - `Show ccache stats` still reports near-100% hit rate. See #364 for the full design rationale and alternatives considered.

…ERGE Mirrors the TEMP pattern from #360 but scoped to `build-llvm/action.yml` only (this branch doesn't touch solc): 1. `if: false` on the LLVM artifact cache restore → forces Build LLVM to always run, so the Windows tool-disable change is actually exercised. Without this, the args-hash in the cache key only triggers miss on the first push (when no entry exists); subsequent pushes to this branch would hit the entry we just saved and skip the build, masking any regression. 2. `save: true` on the ccache-action → populates ccache in this branch's cache scope even on PR events, so a follow-up push can observe warm-ccache + tool-disable together (the actual target configuration). Revert both before merge. The end-state behaviour (artifact cache restore works, ccache saves only on non-PR events) is what ships.

hedgar2017

Ran another round! Everything is mostly good but a couple of things worth tightening.

@hedgar2017

Bare `wait` returned 0 regardless of child exit codes, so a failed `find -delete` on any bind mount was silently swallowed. Capture each PID and wait per-child; `rc` keeps the last non-zero exit, which is all we need to trigger the warning path (per-cleanup stderr identifies which path failed). On non-zero rc, emit a `::warning::` annotation instead of exiting non-zero. Partial cleanup is usually still enough headroom for the downstream build; when it isn't, ENOSPC will surface at a more specific call site than this action can name. Also reworded the inline comment at the `find` call — the previous note about "a real rm failure can still surface as a non-zero exit" no longer matches the deliberate warn-and-continue semantics. Reported by @hedgar2017 in PR #360.

@hedgar2017

The case guard is a shell glob, not a realpath check — so the doc claim that it "caps the blast radius to /mnt/free-disk-space/*" is false for `..` traversal (e.g. `/mnt/free-disk-space/../host-dir` passes the glob). It's unexploitable today since all callers pass hard-coded literals, but the wording should match what the code actually does. Reword both the action description and the inline comment to describe the guard as best-effort typo-catching, explicitly not a security boundary. No code change. Reported by @hedgar2017 in PR #360.

@hedgar2017

warm-solc was the only Linux-container job in cache-warmup.yaml without volumes + Free disk space, making it an implicit special case that the next reader has to reason about. solc's build + boost footprint (~3-5 GB) fits in container headroom today, but the ~30 s the cleanup costs is cheap insurance against future solc/boost growth hitting ENOSPC mid-warmup. Reported by @hedgar2017 in PR #360.

@hedgar2017

The old code gated `removed=${#to_remove[@]}` on xargs returning zero, but xargs exits non-zero if *any* child `rm -rf` fails — so the summary would print "Removed 0 inactive Xcode version(s)" even after 3/4 bundles (~45 GB) were successfully deleted. Misleading on a cleanup step whose value is precisely the freed disk. Set `removed` to the attempt count unconditionally before xargs and keep the warning path for partial failure. Off-by-at-most-one in the rare failure case is a much better signal than zero. Reported by @hedgar2017 in PR #360.

hedgar2017

Thank you sir!

Retry of the Windows LLVM tool-disable that landed+reverted in #360 (commits 3106d1e / 90df14b). The first attempt broke `Run tests` because `llvm-sys` (pulled in via inkwell) runs `${LLVM_SYS_211_PREFIX}/bin/llvm-config` at Rust crate-build time to discover include/lib paths, and `llvm-config` is itself an LLVM tool that got disabled along with the rest. This version keeps `llvm-config` alive while disabling the other ~200 tool binaries: -DLLVM_BUILD_TOOLS=Off # tools are no longer in the ALL target -DLLVM_INCLUDE_TOOLS=On # tools/ subdirectory still configured, # so individual tool targets exist -DLLVM_TOOL_LLVM_CONFIG_BUILD=On # per-tool override forces llvm-config # specifically into the ALL target LLVM's cmake uses the `LLVM_TOOL_<name>_BUILD` pattern to let individual tools opt back in when LLVM_BUILD_TOOLS is off. Expected effect: only `llvm-config.exe` builds (small link), the ~200 heavy tool links are skipped. Run 24771100328 confirmed the validation target: with all tools disabled the Windows Build LLVM step went from 42 min → 9:44 (4.3×). This PR aims to preserve that win while keeping `llvm-sys` happy. Cache-key hardening (required for correctness) Extracts the cmake `--extra-args` construction into a new `Compute LLVM build config` step and hashes the flag list into the artifact cache key (`...-args<sha8>-<solx-llvm-sha>`). Without this, the existing key only reflects the solx-dev action inputs + the solx-llvm submodule SHA — not the cmake flags — so entries built with `LLVM_BUILD_TOOLS=Off` on Windows would share a key with entries built with tools `On` and silently serve the wrong install tree. Side-benefit: the hash catches any future output-affecting flag added to `--extra-args` without requiring reviewer discipline to update the key. Non-output-affecting flag tweaks (e.g. LLVM_PARALLEL_LINK_JOBS scheduling) also rotate the key, costing one cold build per tweak — acceptable with ccache as fallback and much more robust than manual key maintenance. The Build LLVM step now reads `${RUNNER_TEMP}/llvm-extra-args` produced by the new step, so EXTRA_ARGS is constructed once and consumed twice (once for the hash, once for the build). Acceptance: - Windows Build LLVM step < 15 min on a warm-ccache run. - Windows `Run tests` succeeds (llvm-sys finds llvm-config). - No regression on Linux/macOS (no flag change; just the new args-hash key component, which triggers a one-time cold build on first run). - `Show ccache stats` still reports near-100% hit rate. See #364 for the full design rationale and alternatives considered.

…ERGE Mirrors the TEMP pattern from #360 but scoped to `build-llvm/action.yml` only (this branch doesn't touch solc): 1. `if: false` on the LLVM artifact cache restore → forces Build LLVM to always run, so the Windows tool-disable change is actually exercised. Without this, the args-hash in the cache key only triggers miss on the first push (when no entry exists); subsequent pushes to this branch would hit the entry we just saved and skip the build, masking any regression. 2. `save: true` on the ccache-action → populates ccache in this branch's cache scope even on PR events, so a follow-up push can observe warm-ccache + tool-disable together (the actual target configuration). Revert both before merge. The end-state behaviour (artifact cache restore works, ccache saves only on non-PR events) is what ships.

Retry of the Windows LLVM tool-disable that landed+reverted in #360 (commits 3106d1e / 90df14b). The first attempt broke `Run tests` because `llvm-sys` (pulled in via inkwell) runs `${LLVM_SYS_211_PREFIX}/bin/llvm-config` at Rust crate-build time to discover include/lib paths, and `llvm-config` is itself an LLVM tool that got disabled along with the rest. This version keeps `llvm-config` alive while disabling the other ~200 tool binaries: -DLLVM_BUILD_TOOLS=Off # tools are no longer in the ALL target -DLLVM_INCLUDE_TOOLS=On # tools/ subdirectory still configured, # so individual tool targets exist -DLLVM_TOOL_LLVM_CONFIG_BUILD=On # per-tool override forces llvm-config # specifically into the ALL target LLVM's cmake uses the `LLVM_TOOL_<name>_BUILD` pattern to let individual tools opt back in when LLVM_BUILD_TOOLS is off. Expected effect: only `llvm-config.exe` builds (small link), the ~200 heavy tool links are skipped. Run 24771100328 confirmed the validation target: with all tools disabled the Windows Build LLVM step went from 42 min → 9:44 (4.3×). This PR aims to preserve that win while keeping `llvm-sys` happy. Cache-key hardening (required for correctness) Extracts the cmake `--extra-args` construction into a new `Compute LLVM build config` step and hashes the flag list into the artifact cache key (`...-args<sha8>-<solx-llvm-sha>`). Without this, the existing key only reflects the solx-dev action inputs + the solx-llvm submodule SHA — not the cmake flags — so entries built with `LLVM_BUILD_TOOLS=Off` on Windows would share a key with entries built with tools `On` and silently serve the wrong install tree. Side-benefit: the hash catches any future output-affecting flag added to `--extra-args` without requiring reviewer discipline to update the key. Non-output-affecting flag tweaks (e.g. LLVM_PARALLEL_LINK_JOBS scheduling) also rotate the key, costing one cold build per tweak — acceptable with ccache as fallback and much more robust than manual key maintenance. The Build LLVM step now reads `${RUNNER_TEMP}/llvm-extra-args` produced by the new step, so EXTRA_ARGS is constructed once and consumed twice (once for the hash, once for the build). Acceptance: - Windows Build LLVM step < 15 min on a warm-ccache run. - Windows `Run tests` succeeds (llvm-sys finds llvm-config). - No regression on Linux/macOS (no flag change; just the new args-hash key component, which triggers a one-time cold build on first run). - `Show ccache stats` still reports near-100% hit rate. See #364 for the full design rationale and alternatives considered.

…ERGE Mirrors the TEMP pattern from #360 but scoped to `build-llvm/action.yml` only (this branch doesn't touch solc): 1. `if: false` on the LLVM artifact cache restore → forces Build LLVM to always run, so the Windows tool-disable change is actually exercised. Without this, the args-hash in the cache key only triggers miss on the first push (when no entry exists); subsequent pushes to this branch would hit the entry we just saved and skip the build, masking any regression. 2. `save: true` on the ccache-action → populates ccache in this branch's cache scope even on PR events, so a follow-up push can observe warm-ccache + tool-disable together (the actual target configuration). Revert both before merge. The end-state behaviour (artifact cache restore works, ccache saves only on non-PR events) is what ships.

hedgar2017 requested a review from Copilot April 21, 2026 16:27

Copilot started reviewing on behalf of hedgar2017 April 21, 2026 16:28 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread .github/actions/build-llvm/action.yml Outdated

Comment thread .github/actions/build-llvm/action.yml Outdated

Comment thread .github/actions/build-solc/action.yml Outdated

Comment thread .github/actions/build-solc/action.yml Outdated

hedgar2017 reviewed Apr 21, 2026

View reviewed changes

Comment thread .github/workflows/cache-warmup.yaml Outdated

Comment thread .github/actions/build-solc/action.yml Outdated

Comment thread .github/actions/build-llvm/action.yml

Comment thread .github/actions/build-llvm/action.yml

Comment thread .github/actions/build-llvm/action.yml Outdated

nebasuke force-pushed the ci/ccache-fallback-llvm-builds branch from 13d56d5 to 08feef9 Compare April 21, 2026 23:15

nebasuke mentioned this pull request Apr 22, 2026

ci(llvm): reduce Windows LLVM tool count while keeping llvm-config (retry of reverted tool-disable) #364

Open

4 tasks

nebasuke force-pushed the ci/ccache-fallback-llvm-builds branch from 90df14b to 9659de6 Compare April 22, 2026 10:29

nebasuke marked this pull request as ready for review April 22, 2026 10:48

nebasuke requested a review from hedgar2017 April 22, 2026 10:48

nebasuke force-pushed the ci/ccache-fallback-llvm-builds branch from 9659de6 to a444822 Compare April 22, 2026 11:16

nebasuke added 17 commits April 22, 2026 12:44

Revert "TEMP(ci:ccache-test): exercise ccache on this PR — REVERT BEF…

479b225

…ORE MERGE" This reverts commit fb9f415.

Revert "ci(cache): disable LLVM tools on Windows to cut link-bound wa…

bf8cf2c

…ll-clock" This reverts commit 3106d1e.

nebasuke force-pushed the ci/ccache-fallback-llvm-builds branch from a444822 to bf8cf2c Compare April 22, 2026 11:55

nebasuke mentioned this pull request Apr 22, 2026

ci(llvm): disable Windows LLVM tools via install-distribution (#364) #365

Draft

hedgar2017 reviewed Apr 23, 2026

View reviewed changes

Comment thread .github/actions/free-disk-space/action.yml Outdated

hedgar2017 reviewed Apr 23, 2026

View reviewed changes

Comment thread .github/actions/free-disk-space/action.yml

hedgar2017 reviewed Apr 23, 2026

View reviewed changes

Comment thread .github/workflows/cache-warmup.yaml

hedgar2017 reviewed Apr 23, 2026

View reviewed changes

Comment thread .github/workflows/test.yaml

hedgar2017 requested changes Apr 23, 2026

View reviewed changes

nebasuke added 4 commits April 23, 2026 08:26

nebasuke requested a review from hedgar2017 April 23, 2026 08:44

hedgar2017 approved these changes Apr 23, 2026

View reviewed changes

hedgar2017 added this pull request to the merge queue Apr 23, 2026

Merged via the queue into main with commit 6fc0aaf Apr 23, 2026
41 checks passed

hedgar2017 deleted the ci/ccache-fallback-llvm-builds branch April 23, 2026 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(cache): re-add ccache as artifact-cache fallback#360

ci(cache): re-add ccache as artifact-cache fallback#360
hedgar2017 merged 21 commits intomainfrom
ci/ccache-fallback-llvm-builds

nebasuke commented Apr 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hedgar2017 left a comment

Uh oh!

hedgar2017 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nebasuke commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Observed runtimes

Whole-job wall-clock

Build LLVM step only

Building solc step

Fixes and tuning during review

Prior art (why this shape)

Cache calculations

Projected ccache additions

Worst-case upper bound

Projected totals

Why ccache-touch in cache-warmup?

Test plan

Out of scope / follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hedgar2017 left a comment

Choose a reason for hiding this comment

Uh oh!

hedgar2017 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nebasuke commented Apr 21, 2026 •

edited

Loading