Skip to content

perf(pluginfs): cross-run glob walk cache#68

Merged
raphaelvigee merged 6 commits into
masterfrom
perf/cache-hit-levers
Jun 10, 2026
Merged

perf(pluginfs): cross-run glob walk cache#68
raphaelvigee merged 6 commits into
masterfrom
perf/cache-hit-levers

Conversation

@raphaelvigee

Copy link
Copy Markdown
Member

Summary

fs-glob targets are CacheConfig::off(), so the engine re-walks the tree on every run (walkdir + per-entry stat + per-file open/read/hash). On a warm cache-hit run of a 500-package go workspace this glob walk was ~19% of CPU.

This adds a single-file cross-run cache so an unchanged tree is reconstructed with stat-only — no readdir, no file opens, no reads, no hashing.

How it works

  • One sidecar per workspace: <root>/.heph3/cache/fsglob.bin, loaded once per process and flushed on Driver drop (a pure cache-hit run leaves it clean → no write).
  • Each (root, pattern, exclude) walk is memoized, validated on reuse by:
    • directory mtimes — the matched file set (an add/remove/rename bumps the parent dir's mtime), and
    • per-file (size, mtime) — file content.
  • mtime+size is a fast-path proxy for content identity (heph otherwise hashes content precisely; a same-size in-place rewrite within the filesystem's mtime granularity can be missed — accepted tradeoff). Disable with HEPH_FS_GLOB_CACHE=0.
  • Correct-by-fallback: any IO/decode/validation mismatch (or a freshly-stamped codegen xattr) falls through to a full walk. The cache lives under .heph3, which the engine always prunes from walks, so writing it never self-invalidates a recorded directory.

Measurement

gen-go-large (500 pkgs, depth 7) → heph r test //go/large/..., 517 targets / 5401 cache hits, profiling binary, same binary toggled via HEPH_FS_GLOB_CACHE:

warm wall (median) cached_glob_walk CPU
pre-change baseline ~2.50s 19.3%
with cache ~2.27s (~9% faster) 2.5%

Note: the first implementation used one cache file per glob target (~3342 files); the per-file open cost cancelled the walk savings (net 0). Consolidating to a single sidecar fixed it.

Tests

src/pluginfs/mod.rs: reconstruct round-trip + invalidation (content size/mtime, directory-set mtime, codegen xattr), and an end-to-end cross-run test (fresh Driver loads the sidecar from disk, reuses it, then re-walks after an add). Full lib suite (997) passes; clippy -D warnings clean.

🤖 Generated with Claude Code

raphaelvigee and others added 3 commits June 10, 2026 11:02
fs-glob targets are cache=off, so the engine re-walks the tree every run
(walkdir + per-entry stat + per-file open/read/hash). On a warm go/large
run this glob walk was ~19% of CPU.

Add a single-file sidecar (<root>/.heph3/cache/fsglob.bin) memoizing each
(root,pattern,exclude) walk, validated by directory mtimes (the matched
file set) and per-file (size,mtime) (content). A full match reconstructs
the artifacts with stat only — no readdir, no opens, no reads, no hashing.
Loaded once per process, flushed on Driver drop. mtime+size is a fast-path
proxy for content identity (heph otherwise hashes content); disable with
HEPH_FS_GLOB_CACHE=0. Correct-by-fallback: any mismatch re-walks.

Measured on example/go/large warm cache-hit run (profiling binary):
  cached_glob_walk CPU: 19.3% -> 2.5%
  warm wall median:     ~2.50s -> ~2.27s (~9%)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract the fs glob cache into a reusable module and move its storage from
a per-workspace borsh sidecar to the durable cache's SQLite db, then reuse
it for the buildfile provider's package discovery.

What changed:
- engine::walk_cache — generic `WalkCache<T>` keyed by an arbitrary string,
  validated by a `WalkSignature` (directory mtimes for the file *set* + optional
  per-file size/mtime for *content*). Loaded once from the KV namespace, served
  from memory, write-through on insert (a pure cache-hit run writes nothing).
- LocalCache gains a namespaced key→blob KV store (`kv_get`/`kv_list`/`kv_put`),
  implemented on the SQLite backend (new `kv` table + a fire-and-forget
  WriterCmd, flushed when the writer thread joins on drop) and delegated through
  LocalCacheMem; default no-ops elsewhere.
- PluginInit now carries the engine's `Arc<dyn LocalCache>`, so plugins reach
  the KV. The fs Driver and buildfile Provider take it (Driver::new gains a cache
  arg; Provider::with_cache builder).
- pluginfs: the inline GlobStore/sidecar is replaced by `WalkCache<GlobValue>`.
  No flush-on-drop — inserts write through to the KV.
- pluginbuildfile: `find_packages_sync` now records directory mtimes, and
  `list_packages` memoizes the discovery walk across runs via
  `WalkCache<Vec<String>>` (dir-set validation only — BUILD *contents* don't
  change the package set). `HEPH_FS_GLOB_CACHE=0` still disables the glob cache.

Behavior/perf: glob cache unchanged on example/go/large (cached_glob_walk
19.3% -> 2.3% CPU; warm wall ~2.88s -> ~2.33s, ~19%); the KV load is ~60ms
one-time. Package discovery is now cross-run cached too.

Tests: walk_cache (signature validation, KV roundtrip, disabled passthrough),
sqlite kv_put/get/list, pluginfs glob signature+reconstruct+xattr+cross-run,
pluginbuildfile cross-run discovery. Full lib suite (1003) passes; clippy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the per-consumer WalkCache (and the cache.db KV it rode on) with a
shared, path-keyed, on-demand cached walker in htwalk, used by every
tree-walking plugin irrespective of who asks.

htwalk::CachedWalker exposes two consumer-agnostic primitives:
  - read_dir(dir)  → cached directory listing, validated by dir mtime
  - file_hash(file)→ cached content hash + exec bit, validated by (size, mtime)
Filtering (globs, excludes, skip dirs, codegen xattr) and the decision to
recurse belong to the consumer, so a requester that stops shallow and one that
recurses deep reuse the dirs they share and independently cache the ones they
don't. Each explored directory is cached on its own.

Backed by a dedicated fswalk.db (separate from the artifact cache.db) so it can
be GC'd independently: a read pool + single write connection; rows carry a
last-access stamp; `heph tool gc` prunes rows past a 14-day TTL and orphaned
rows (path no longer exists). In-process front + write-through; a pure cache-hit
run performs no writes. Correct-by-fallback: any db/decode/validation failure
re-reads from disk.

Consumers rewired:
  - pluginfs glob (`walk_glob`) + `file()` targets + the `heph.fs.glob` BUILD
    function now recurse via the walker; `file_hashout` moved into htwalk.
  - pluginbuildfile package discovery (`find_packages_sync`) reads dirs via the
    walker (dir-set only — BUILD contents don't change the package set).
The walker is handed to plugins through PluginInit; the engine owns it.

Removed: engine::walk_cache, the LocalCache kv_get/kv_list/kv_put + the sqlite
`kv` table, and PluginInit.cache.

Measured on example/go/large warm cache-hit run (profiling binary):
  warm (walker cache) ~2.44s vs cold (db wiped) ~2.74s — ~11%.
Full lib suite (1003) passes; clippy clean; `heph tool gc` prunes fswalk rows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@raphaelvigee raphaelvigee force-pushed the perf/cache-hit-levers branch from c82fe64 to 3f2e29d Compare June 10, 2026 09:06
Add a transparent escape hatch: set HEPH_DEBUG_CACHED_WALKER=0 and
CachedWalker::open returns a fully bypassing walker — no in-process
front, no durable store, every read_dir/file_hash goes straight to
disk. The consumer-facing API is unchanged, so plugins are unaware.

Distinct from disabled(), which still keeps the in-process map; the new
bypassing() mode does no caching at all, for isolating cache bugs from
correctness bugs.
The fswalk db is a pure optimization cache — every row is rebuildable
from disk — so crash durability buys us nothing. Drop synchronous from
NORMAL to OFF to skip the remaining checkpoint fsyncs.

Trade-off: an OS crash or power loss can now corrupt the db (an app
crash is still safe — the OS flushes the pages). That's acceptable: a
corrupt cache just re-reads from disk and rebuilds.

WAL and busy_timeout are kept — they govern concurrent multi-process
access, not durability, so cross-process behavior is unchanged.
@raphaelvigee raphaelvigee enabled auto-merge (rebase) June 10, 2026 10:30
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@raphaelvigee raphaelvigee merged commit 60ab214 into master Jun 10, 2026
8 checks passed
@raphaelvigee raphaelvigee deleted the perf/cache-hit-levers branch June 10, 2026 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant