feat(cache): filesystem hardening with hex sharding and size limits#1125
feat(cache): filesystem hardening with hex sharding and size limits#1125vibegui wants to merge 2 commits into
Conversation
… warnings
- Add hex directory sharding ({key[0:2]}/{key[2:4]}/{key}) to avoid flat dirs with 10k+ files
- Enforce CACHE_MAX_ENTRY_SIZE (2MB default) at filesystem and worker layers
- Evict existing stale entries when oversized writes are rejected
- Add write rate warning (CACHE_WRITE_RATE_WARN, default 500/min)
- Add disk fill warning when LRU tracked bytes exceed LRU_DISK_WARN_RATIO (90%)
- Add LRU eviction counter with reason dimension
- Add observable gauges for LRU keys and bytes per cache
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tagging OptionsShould a new tag be published when this PR is merged?
|
📝 WalkthroughWalkthroughThe PR enhances the caching system across three core modules by introducing cache entry size enforcement, filesystem sharding for improved distribution, write-rate monitoring, metrics instrumentation for LRU eviction and storage usage, and optional worker-based write delegation. Changes
Sequence DiagramsequenceDiagram
participant Client as Application
participant FileSystem as FileSystem Cache
participant Worker as CacheWriteWorker
participant Metrics as Metrics
Client->>FileSystem: putFile(key, data)
FileSystem->>FileSystem: Check size limit
alt Size exceeds CACHE_MAX_ENTRY_SIZE
FileSystem->>FileSystem: Delete existing entry
FileSystem->>Client: Return (skip write)
end
FileSystem->>FileSystem: Compute shardedPath(key)
FileSystem->>FileSystem: Ensure shard directory exists
alt CACHE_WRITE_WORKER_ENABLED
FileSystem->>Worker: Delegate write task
Worker->>FileSystem: Write to sharded path
Worker->>Metrics: Increment write count
else
FileSystem->>FileSystem: Write to sharded path
FileSystem->>Metrics: Track write rate
end
Metrics->>Metrics: Check write rate vs threshold
alt Rate exceeds CACHE_WRITE_RATE_WARN
Metrics->>Client: Log warning
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
1 issue found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtime/caches/fileSystem.ts">
<violation number="1" location="runtime/caches/fileSystem.ts:169">
P2: Oversized-entry eviction only targets the new sharded path, leaving legacy flat-path cache files undeleted.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| const filePath = `${FILE_SYSTEM_CACHE_DIRECTORY}/${key}`; | ||
|
|
||
| trackWriteRate(key); | ||
| const filePath = shardedPath(FILE_SYSTEM_CACHE_DIRECTORY, key); |
There was a problem hiding this comment.
P2: Oversized-entry eviction only targets the new sharded path, leaving legacy flat-path cache files undeleted.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At runtime/caches/fileSystem.ts, line 169:
<comment>Oversized-entry eviction only targets the new sharded path, leaving legacy flat-path cache files undeleted.</comment>
<file context>
@@ -118,11 +157,25 @@ function createFileSystemCache(): CacheStorage {
- const filePath = `${FILE_SYSTEM_CACHE_DIRECTORY}/${key}`;
-
+ trackWriteRate(key);
+ const filePath = shardedPath(FILE_SYSTEM_CACHE_DIRECTORY, key);
+ const dir = filePath.substring(0, filePath.lastIndexOf("/"));
+ if (!initializedShardDirs.has(dir)) {
</file context>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
runtime/caches/fileSystem.ts (4)
171-178: Silent catch in mkdir suppresses transient failures.The empty catch block swallows all errors during shard directory creation. While the comment explains the intent (retry on next write), logging transient failures at debug level would help diagnose intermittent issues.
♻️ Proposed logging for transient failures
} catch (err) { + logger.debug?.(`fs_cache: mkdir failed for shard dir ${dir}, will retry: ${err}`); // transient failure — don't mark initialized so next write retries mkdir }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runtime/caches/fileSystem.ts` around lines 171 - 178, The empty catch swallowing errors from the Deno.mkdir call should instead log the caught error at debug (or trace) level so transient failures are visible while preserving retry behavior; inside the catch for the Deno.mkdir(...) call that currently leaves initializedShardDirs unchanged, capture the exception and call the module/logger debug method with a clear message and the error details (e.g., referencing initializedShardDirs and the target dir) before returning so the directory is not marked initialized and subsequent writes will retry.
188-193: Oversized cached entries are removed without logging.When
getFileencounters an entry exceedingCACHE_MAX_ENTRY_SIZE, it silently removes the file and returnsnull. This is correct behavior for enforcing limits retroactively, but logging would help diagnose unexpected cache misses.♻️ Proposed logging addition
if (fileContent.length > CACHE_MAX_ENTRY_SIZE) { + logger.debug?.(`fs_cache: evicting oversized cached entry (${fileContent.length} bytes): ${key}`); Deno.remove(filePath).catch(() => {}); return null; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runtime/caches/fileSystem.ts` around lines 188 - 193, When getFile finds a cached entry larger than CACHE_MAX_ENTRY_SIZE it currently deletes filePath and returns null silently; add a warning-level log before removal that includes the cache key, filePath and the fileContent.length to aid debugging. Locate the logic in getFile (uses FILE_SYSTEM_CACHE_DIRECTORY, key, filePath, fileContent) and call the module's existing logger (e.g., logger or processLogger) to emit a concise message about the oversized entry, then proceed to remove the file and return null.
28-43: Write rate warning only fires once per window, missing sustained high rates.The condition
writeCount === CACHE_WRITE_RATE_WARNtriggers only when the count exactly equals the threshold. If writes continue at a high rate (e.g., 1000/min), no further warnings appear until the window resets.Consider using a modulo or a flag to periodically warn during sustained high write rates.
♻️ Proposed fix for sustained high-rate warnings
writeCount++; - if (writeCount === CACHE_WRITE_RATE_WARN) { + if ( + writeCount === CACHE_WRITE_RATE_WARN || + (writeCount > CACHE_WRITE_RATE_WARN && writeCount % CACHE_WRITE_RATE_WARN === 0) + ) { logger.warn( `fs_cache: high write rate — ${writeCount} writes in the last minute. ` +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runtime/caches/fileSystem.ts` around lines 28 - 43, The trackWriteRate function currently only logs when writeCount === CACHE_WRITE_RATE_WARN so sustained high rates produce a single warning; change the condition to emit repeated warnings during a high-rate window (for example: if (writeCount % CACHE_WRITE_RATE_WARN === 0) or use a boolean/lastWarnCount flag) so every N additional writes in the same window triggers logger.warn; update the logic that resets writeWindowStart and writeCount to keep behavior per minute and ensure the warning message still references key, CACHE_WRITE_RATE_WARN, writeCount, and the same guidance text.
160-164: Silent eviction of oversized entries may hinder debugging.When an entry exceeds the size limit, it's silently rejected and any existing stale entry is evicted. Consider adding a debug log to help operators understand why certain entries are not being cached.
♻️ Proposed logging addition
if (responseArray.length > CACHE_MAX_ENTRY_SIZE) { + logger.debug?.( + `fs_cache: rejecting oversized entry (${responseArray.length} bytes > ${CACHE_MAX_ENTRY_SIZE} limit), evicting stale key: ${key}`, + ); // Evict any existing entry so stale data doesn't stay pinned on disk. deleteFile(key).catch(() => {}); return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runtime/caches/fileSystem.ts` around lines 160 - 164, Add a debug/log statement when an entry exceeds CACHE_MAX_ENTRY_SIZE so operators can see evictions: inside the branch that checks if (responseArray.length > CACHE_MAX_ENTRY_SIZE) (near deleteFile(key).catch(() => {})), log the key, the entry size (responseArray.length) and the limit (CACHE_MAX_ENTRY_SIZE) before calling deleteFile; use the module's existing logger (e.g., logger.debug or processLogger.debug) or console.debug if none, and keep the deleteFile(...).catch(() => {}) behavior unchanged.runtime/caches/cacheWriteWorker.ts (1)
98-98: Silent rejection of oversized entries hinders debugging.When a buffer exceeds
CACHE_MAX_ENTRY_SIZE, the worker silently returns without any logging. This makes it difficult to diagnose why certain entries are not being cached.Consider adding a debug/info log to help operators understand cache behavior.
♻️ Proposed logging addition
- if (buffer.length > CACHE_MAX_ENTRY_SIZE) return; + if (buffer.length > CACHE_MAX_ENTRY_SIZE) { + console.debug( + `[cache-write-worker] skipping oversized entry: ${buffer.length} bytes > ${CACHE_MAX_ENTRY_SIZE} limit`, + ); + return; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runtime/caches/cacheWriteWorker.ts` at line 98, The current early-return in cacheWriteWorker (the check using buffer.length > CACHE_MAX_ENTRY_SIZE) silently drops oversized buffers; update that branch to log an informative message (using the existing logger/processLogger) before returning, including the buffer length and CACHE_MAX_ENTRY_SIZE and any available cache key or id (e.g., itemKey or requestId in the surrounding scope) so operators can see why entries were skipped; keep the same early-return behavior but add the debug/info log call immediately before it.runtime/caches/lrucache.ts (1)
75-88: Disk usage warning fires on every observation cycle when threshold is exceeded.The warning inside
lruBytesGauge.addCallbackwill log repeatedly (typically every metrics scrape interval) as long asratio >= LRU_DISK_WARN_RATIO. This could flood logs.Consider debouncing or tracking the last warning time per cache to avoid excessive logging.
♻️ Proposed debounce approach
+const lastWarnTime = new Map<string, number>(); +const WARN_INTERVAL_MS = 60_000; // 1 minute debounce + lruBytesGauge.addCallback((observer) => { for (const [name, lru] of activeCaches) { observer.observe(lru.calculatedSize, { cache: name }); const ratio = lru.calculatedSize / CACHE_MAX_SIZE; if (ratio >= LRU_DISK_WARN_RATIO) { + const now = Date.now(); + const last = lastWarnTime.get(name) ?? 0; + if (now - last < WARN_INTERVAL_MS) continue; + lastWarnTime.set(name, now); logger.warn( `lru_cache: disk usage for cache "${name}" is at ` +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@runtime/caches/lrucache.ts` around lines 75 - 88, The current callback in lruBytesGauge.addCallback iterates activeCaches and calls logger.warn whenever lru.calculatedSize/CACHE_MAX_SIZE >= LRU_DISK_WARN_RATIO, which floods logs; fix by adding a debounce map (e.g. Map<string, number> lastWarnTs) defined outside the callback and a constant debounce interval (e.g. LRU_WARN_DEBOUNCE_MS), then inside the loop check Date.now() - (lastWarnTs.get(name) || 0) >= LRU_WARN_DEBOUNCE_MS before calling logger.warn and update lastWarnTs.set(name, Date.now()) when you log; keep all other logic (ratio calc, message) the same and reference lruBytesGauge.addCallback, activeCaches, lru.calculatedSize, CACHE_MAX_SIZE, LRU_DISK_WARN_RATIO, and logger.warn.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@runtime/caches/lrucache.ts`:
- Around line 1-4: The file fails formatting checks; run the Deno formatter and
commit the changes (or apply equivalent formatting) so `deno fmt --check`
passes; specifically reformat the import block and surrounding code in the
module that imports LRUCache, ValueType, logger, and meter (the
exports/definitions around LRUCache and any functions or classes in lrucache.ts)
to match Deno's style guidelines and ensure the file is syntactically unchanged
except for formatting.
- Around line 60-67: activeCaches currently holds strong references to every
LRUCache and is never pruned, causing unbounded growth; update the
implementation to remove caches when they are no longer needed by either (a)
providing explicit lifecycle APIs (e.g., registerCache(name, cache: LRUCache)
and unregisterCache(name) and call unregisterCache from where caches are
disposed) or (b) replacing the Map<string, LRUCache> with Map<string,
WeakRef<LRUCache>> plus a FinalizationRegistry to remove entries when an
LRUCache is GC'd, and then change lruSizeGauge.addCallback to dereference
WeakRefs and skip or remove cleared entries before calling observer.observe;
reference activeCaches, LRUCache, lruSizeGauge.addCallback, observer.observe,
and add/register/unregister (or FinalizationRegistry) logic accordingly.
- Around line 41-44: The dispose callback for the LRU must be synchronous:
remove the async keyword from the dispose function on the LRU options (the
dispose handler currently declared as dispose: async (_value: boolean, key:
string, reason: string) => { ... }) and remove the await before
cache.delete(key) so the function returns void; keep the
lruEvictionCounter.add(1, { reason }) call but perform synchronous
cache.delete(key) (or defer any asynchronous cleanup to disposeAfter) to match
the required signature and allowed DisposeReason values.
---
Nitpick comments:
In `@runtime/caches/cacheWriteWorker.ts`:
- Line 98: The current early-return in cacheWriteWorker (the check using
buffer.length > CACHE_MAX_ENTRY_SIZE) silently drops oversized buffers; update
that branch to log an informative message (using the existing
logger/processLogger) before returning, including the buffer length and
CACHE_MAX_ENTRY_SIZE and any available cache key or id (e.g., itemKey or
requestId in the surrounding scope) so operators can see why entries were
skipped; keep the same early-return behavior but add the debug/info log call
immediately before it.
In `@runtime/caches/fileSystem.ts`:
- Around line 171-178: The empty catch swallowing errors from the Deno.mkdir
call should instead log the caught error at debug (or trace) level so transient
failures are visible while preserving retry behavior; inside the catch for the
Deno.mkdir(...) call that currently leaves initializedShardDirs unchanged,
capture the exception and call the module/logger debug method with a clear
message and the error details (e.g., referencing initializedShardDirs and the
target dir) before returning so the directory is not marked initialized and
subsequent writes will retry.
- Around line 188-193: When getFile finds a cached entry larger than
CACHE_MAX_ENTRY_SIZE it currently deletes filePath and returns null silently;
add a warning-level log before removal that includes the cache key, filePath and
the fileContent.length to aid debugging. Locate the logic in getFile (uses
FILE_SYSTEM_CACHE_DIRECTORY, key, filePath, fileContent) and call the module's
existing logger (e.g., logger or processLogger) to emit a concise message about
the oversized entry, then proceed to remove the file and return null.
- Around line 28-43: The trackWriteRate function currently only logs when
writeCount === CACHE_WRITE_RATE_WARN so sustained high rates produce a single
warning; change the condition to emit repeated warnings during a high-rate
window (for example: if (writeCount % CACHE_WRITE_RATE_WARN === 0) or use a
boolean/lastWarnCount flag) so every N additional writes in the same window
triggers logger.warn; update the logic that resets writeWindowStart and
writeCount to keep behavior per minute and ensure the warning message still
references key, CACHE_WRITE_RATE_WARN, writeCount, and the same guidance text.
- Around line 160-164: Add a debug/log statement when an entry exceeds
CACHE_MAX_ENTRY_SIZE so operators can see evictions: inside the branch that
checks if (responseArray.length > CACHE_MAX_ENTRY_SIZE) (near
deleteFile(key).catch(() => {})), log the key, the entry size
(responseArray.length) and the limit (CACHE_MAX_ENTRY_SIZE) before calling
deleteFile; use the module's existing logger (e.g., logger.debug or
processLogger.debug) or console.debug if none, and keep the
deleteFile(...).catch(() => {}) behavior unchanged.
In `@runtime/caches/lrucache.ts`:
- Around line 75-88: The current callback in lruBytesGauge.addCallback iterates
activeCaches and calls logger.warn whenever lru.calculatedSize/CACHE_MAX_SIZE >=
LRU_DISK_WARN_RATIO, which floods logs; fix by adding a debounce map (e.g.
Map<string, number> lastWarnTs) defined outside the callback and a constant
debounce interval (e.g. LRU_WARN_DEBOUNCE_MS), then inside the loop check
Date.now() - (lastWarnTs.get(name) || 0) >= LRU_WARN_DEBOUNCE_MS before calling
logger.warn and update lastWarnTs.set(name, Date.now()) when you log; keep all
other logic (ratio calc, message) the same and reference
lruBytesGauge.addCallback, activeCaches, lru.calculatedSize, CACHE_MAX_SIZE,
LRU_DISK_WARN_RATIO, and logger.warn.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2c891ec1-4da0-46e8-9668-d1730314c73e
📒 Files selected for processing (3)
runtime/caches/cacheWriteWorker.tsruntime/caches/fileSystem.tsruntime/caches/lrucache.ts
| import { LRUCache } from "npm:lru-cache@10.2.0"; | ||
| import { ValueType } from "../../deps.ts"; | ||
| import { logger } from "../../observability/otel/config.ts"; | ||
| import { meter } from "../../observability/otel/metrics.ts"; |
There was a problem hiding this comment.
Address pipeline failure: file not formatted.
The CI pipeline reports that this file fails deno fmt --check. Please run deno fmt to fix formatting.
🧰 Tools
🪛 GitHub Actions: ci
[error] 1-1: deno fmt --check found not formatted file: runtime/caches/lrucache.ts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@runtime/caches/lrucache.ts` around lines 1 - 4, The file fails formatting
checks; run the Deno formatter and commit the changes (or apply equivalent
formatting) so `deno fmt --check` passes; specifically reformat the import block
and surrounding code in the module that imports LRUCache, ValueType, logger, and
meter (the exports/definitions around LRUCache and any functions or classes in
lrucache.ts) to match Deno's style guidelines and ensure the file is
syntactically unchanged except for formatting.
| dispose: async (_value: boolean, key: string, reason: string) => { | ||
| lruEvictionCounter.add(1, { reason }); | ||
| await cache.delete(key); | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
lru-cache npm dispose callback signature reason parameter
💡 Result:
The dispose callback in the lru-cache npm package (current version 11.2.7 as of 2026-03-13) has the signature (value: V, key: K, reason: DisposeReason) => void. It is provided via the dispose or disposeAfter options when creating an LRUCache instance. The reason parameter is a string literal union type DisposeReason with the following possible values: - 'evict': Item evicted because it is the least recently used and the cache is full. - 'set': A new value was set, overwriting the old value. - 'delete': Item explicitly deleted via delete, clear, or set with undefined. - 'expire': Item removed due to exceeding its TTL. - 'fetch': A fetchMethod operation returned undefined or was aborted. Example usage: const cache = new LRUCache({ max: 500, dispose: (value, key, reason) => { console.log(Disposing ${key} (${reason}):, value); // cleanup logic here } }); Note: The dispose callback is called before the item is fully removed from the cache. Use disposeAfter for post-removal cleanup. In older versions (< v7), the signature was (key, value) without reason.
Citations:
- 1: https://isaacs.github.io/node-lru-cache/types/LRUCache.Disposer.html
- 2: https://www.npmjs.com/package/lru-cache
- 3: https://isaacs.github.io/node-lru-cache/interfaces/LRUCache.OptionsBase.html
- 4: https://github.com/isaacs/node-lru-cache/blob/main/src/index.ts
Fix the dispose callback signature: it must be synchronous, not async.
The lru-cache library (v10.2.0 and later) expects the dispose callback to have the signature (value: V, key: K, reason: DisposeReason) => void where reason is one of 'evict', 'set', 'delete', or 'expire'. The current implementation incorrectly marks the callback as async. Remove the async keyword and the await on cache.delete(key) to match the expected synchronous signature. If post-removal cleanup is needed, use the disposeAfter option instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@runtime/caches/lrucache.ts` around lines 41 - 44, The dispose callback for
the LRU must be synchronous: remove the async keyword from the dispose function
on the LRU options (the dispose handler currently declared as dispose: async
(_value: boolean, key: string, reason: string) => { ... }) and remove the await
before cache.delete(key) so the function returns void; keep the
lruEvictionCounter.add(1, { reason }) call but perform synchronous
cache.delete(key) (or defer any asynchronous cleanup to disposeAfter) to match
the required signature and allowed DisposeReason values.
LRU eviction counter, size/key gauges, and disk fill warning now ship in the correctness+observability PR for earlier visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="runtime/caches/lrucache.ts">
<violation number="1">
P2: This drops the eviction counter entirely, so LRU evictions and their reasons are no longer observable.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Summary
{key[0:2]}/{key[2:4]}/{key}) to avoid flat dirs with 10k+ filesCACHE_MAX_ENTRY_SIZE(2MB default) at filesystem and worker layersCACHE_WRITE_RATE_WARN, default 500/min)LRU_DISK_WARN_RATIO(90%)Test plan
/tmp/deco_cache/ab/cd/abcd...)lru_cache_keys,lru_cache_bytes,lru_cache_evictionmetrics🤖 Generated with Claude Code
PR 2 of 5 — split from #1122. Merge order: 1 → 2 → 3 → 4 → 5
Summary by cubic
Hardens the filesystem cache with hex directory sharding, entry size limits, and write-rate warnings. This reduces directory hot spots and blocks oversized entries from hitting disk.
New Features
{key[0:2]}/{key[2:4]}/{key}to avoid huge flat dirs.CACHE_MAX_ENTRY_SIZE(2MB default) in worker and FS; rejects oversize writes and evicts any stale oversize file on read.CACHE_WRITE_RATE_WARN(default 500/min).Migration
CACHE_MAX_ENTRY_SIZE,CACHE_WRITE_RATE_WARN. Defaults are safe.Written for commit 947e388. Summary will update on new commits.
Summary by CodeRabbit
New Features
Performance