feat(daemon): opt-in S3 cache restore + snapshot for sandbox runner by nicacioliveira · Pull Request #1182 · deco-cx/deco

nicacioliveira · 2026-05-05T01:43:25Z

Summary

Sandbox cold start today re-downloads ~200MB of Deno modules every wake-up because /deno-dir is on the pod's writable layer (lost between claims). On a typical Storefront site this is the largest single chunk in the 12-15s wake-up budget.

This patch caches /deno-dir in S3 per site (one cache per repo, shared across envs of the same site since deps are identical):

s3://<bucket>/<site>/cache.tgz

Relationship to the existing `daemon/cache.ts` (`downloadCache`)

There is already an S3 cache flow for sandbox mode:

// daemon/main.ts:312
if (SANDBOX_MODE) {
  await downloadCache(siteName).catch(...)
}

It downloads cache.tar.zst from s3://deco-assets-storage/deco-sites/<site>/ (sa-east-1, static ADMIN_S3_* keys) into DENO_DIR_RUN. The cache is produced by an external build pipeline (decocms/platform-infra/site-builder), not by the runner — so it only updates when the build pipeline runs, not after a runtime session adds new modules.

This PR is complementary, not a replacement:

	existing `downloadCache`	this PR `restoreCache`/`snapshotCache`
Producer	external build pipeline	the runner itself, post-boot
Direction	download only	download + upload
Bucket	`deco-assets-storage` (sa-east-1)	configurable via deploy body (us-west-2 in prod)
Auth	static `ADMIN_S3_*` keys	Pod Identity (default credential chain)
Target	`DENO_DIR_RUN` (worker)	`/deno-dir` (daemon)
Update cadence	per build	per wake-up

A future PR can consolidate (move the build-pipeline-produced cache to the same us-west-2 bucket and Pod Identity, retire the sa-east-1 + static keys path). Out of scope here.

How activation works (this PR)

Config arrives via the /sandbox/deploy POST body (envs field) — not via container env vars. The agent-sandbox operator (v0.4.x) rejects per-claim env when a warm pool is bound to the claim:

// extensions/controllers/sandboxclaim_controller.go
if policy != WarmPoolPolicyNone && len(claim.Spec.Env) > 0 {
    err := fmt.Errorf("custom environment variables are not supported when using a warm pool")

So the admin forwards cache hints through the deploy body, and this PR consumes them.

Hint env vars (all optional, all forwarded by admin via the deploy body):

DECO_CACHE_S3_BUCKET (presence enables the feature)
DECO_CACHE_S3_REGION (default us-west-2)
DECO_CACHE_S3_KEY (admin sets to <site>/cache.tgz)

Lifecycle in the runner

onDeploy:
  setSiteName(site)
  await restoreCache(cfg)            ← BLOCKS deploy: fetch tarball, untar to /deno-dir
  // existing downloadCache(siteName) still runs unchanged after this
  createSiteApp({...})
  currentSite.gitReady
    .then(() => snapshotCache(cfg))   ← DETACHED: tar /deno-dir + upload to S3

Rollout-safe

Off by default. When the deploy body has no cache hints (DECO_CACHE_S3_BUCKET unset on the admin pod), isEnabled(cfg) is false and both restoreCache and snapshotCache return { ok: false, reason: "disabled" } immediately. Zero behaviour change vs. main.
Cache is advisory. Every error path logs and returns a no-op result — missing key on first wake-up of a site (NoSuchKey is normal, logged at info level), S3 outage, malformed tarball, tar process failure. The wake-up may be slower, never broken.
Auth via standard AWS SDK credential chain → Pod Identity in EKS supplies credentials transparently. No static keys baked in for this path.

Files

daemon/sandboxCache.ts (new): cacheConfigFromEnvs, restoreCache, snapshotCache. All wrapped in try/catch.
daemon/main.ts: two hooks in the SANDBOX_MODE onDeploy — restore before createSiteApp, snapshot detached after gitReady. The existing downloadCache(siteName) invocation a few lines below is left intact.

Companion PRs

decocms/terraform-eks-cluster#5 — bucket IAM + Pod Identity associations
deco-sites/admin#3098 — admin forwards DECO_CACHE_S3_* in deploy body + S3 cleanup on site delete
deco-sites/admin#3096 — wake-up splash (independent of this work)

Future work (out of scope)

Consolidate the legacy daemon/cache.ts downloadCache into the same path (us-west-2 bucket, Pod Identity, switchable bucket via env). The legacy flow is still useful as a "build-pipeline-produced" cache, but the auth + region should match this PR for consistency.
Decide whether to spin up a separate sandbox-runner image (Dockerfile + workflow) tailored just for sandbox mode — currently the runner image (ghcr.io/deco-cx/deco/ai) carries non-sandbox modes too.

Test plan

First wake-up of a site (no cache exists): runner logs [sandbox-cache] no cache yet for s3://... (cold start), deploy completes normally, then logs [sandbox-cache] snapshotted YMB → s3://....
Second wake-up after snapshot: runner logs [sandbox-cache] restored cache from s3://..., deno cache resolution noticeably faster.
No DECO_CACHE_S3_BUCKET in deploy body: disabled returned, no S3 calls. Existing behaviour preserved.
S3 outage during restore: restore failed warning, deploy still completes via cold-start path.
S3 outage during snapshot: snapshot failed warning, deploy result already returned to admin (detached).

🤖 Generated with Claude Code

Summary by cubic

Adds opt-in S3-backed restore and snapshot of the Deno cache to cut sandbox cold starts by reusing /deno-dir (saves ~5s on typical sites). Restores on deploy, snapshots after startup; off by default and never blocks or breaks deploys.

New Features
- Restore: download s3://<bucket>/<site>/cache.tgz into /deno-dir during /sandbox/deploy; awaited before worker starts.
- Snapshot: tar and upload /deno-dir after gitReady; detached from the deploy response.
- Enable via deploy body envs: DECO_CACHE_S3_BUCKET (enables), DECO_CACHE_S3_REGION (default us-west-2), DECO_CACHE_S3_KEY (e.g., <site>/cache.tgz).
- Auth via AWS default credential chain (Pod Identity). No static keys.
- Works alongside existing downloadCache; that flow is unchanged.
- Advisory behavior: errors are logged and ignored; cold start path remains.

^{Written for commit c6e44db. Summary will update on new commits.}

Summary by CodeRabbit

New Features
- Added Deno module caching integration for sandbox deployments, automatically restoring cached modules during initialization and capturing snapshots after deployment completes for improved build efficiency.

Sandbox cold start today re-downloads ~200MB of Deno modules every wake-up because /deno-dir is on the pod's writable layer (lost between claims). On a typical Storefront site this is the largest single chunk in the 12-15s wake-up budget (~5s). This patch caches /deno-dir in S3 per site (one cache per repo, shared across envs of the same site since deps are identical): s3://<bucket>/<site>/cache.tgz Config arrives via the /sandbox/deploy POST body (envs field). The admin forwards the cache hints there because the agent-sandbox operator (v0.4.x) rejects per-claim env when a warm pool is bound to the claim: Error seen: custom environment variables are not supported when using a warm pool — kubernetes-sigs/agent-sandbox extensions/controllers Hint env vars (all optional, all forwarded by admin via deploy body): - DECO_CACHE_S3_BUCKET (presence enables the feature) - DECO_CACHE_S3_REGION (default: us-west-2) - DECO_CACHE_S3_KEY (admin sets to "<site>/cache.tgz") Lifecycle: - restoreCache(cfg) runs at the top of onDeploy, awaited so /deno-dir is populated before the worker subprocess starts requiring modules. - snapshotCache(cfg) fires after gitReady, detached from the deploy response — the user-visible deploy never waits on the upload. Reliability invariant: the cache is *advisory*. Every error path logs and returns a no-op result. A missing key (first wake-up of a site), S3 outage, malformed tarball, tar process failure — all fall back to the cold-start path. The wake-up may be slower, never broken. Auth via standard AWS SDK credential chain → Pod Identity in EKS provides credentials transparently (no static keys baked in). Companion changes (separate PRs): - decocms/terraform-eks-cluster: bucket + IAM + Pod Identity - deco-sites/admin: forward DECO_CACHE_S3_* in /sandbox/deploy body Co-Authored-By: Nicacio Oliveira <nicacio@deco.cx>

github-actions · 2026-05-05T01:43:36Z

Tagging Options

Should a new tag be published when this PR is merged?

👍 for Patch 1.197.1 update
🎉 for Minor 1.198.0 update
🚀 for Major 2.0.0 update

coderabbitai · 2026-05-05T01:43:37Z

📝 Walkthrough

Walkthrough

A new S3-backed cache module is added to store and restore Deno module dependencies for sandbox deploys. The daemon integrates this cache by restoring modules at deploy start and snapshotting them afterward, with graceful error handling throughout.

Changes

Deno Module Cache Integration

Layer / File(s)	Summary
Cache Infrastructure `daemon/sandboxCache.ts`	Defines `CacheConfig` and `CacheResult` types. Implements `cacheConfigFromEnvs()` to extract S3 bucket, region, and key from environment variables. Adds region-scoped S3 client caching to avoid re-instantiation.
Cache Operations `daemon/sandboxCache.ts`	`restoreCache()` downloads `cache.tgz` from S3 and streams it into `tar -xzf -` to unpack into `/`. `snapshotCache()` creates a gzipped tarball of `/deno-dir`, buffers it in memory, and uploads to S3 via `PutObjectCommand`. Both operations return structured results with timing and reason; failures are logged but not thrown.
Deploy Lifecycle Integration `daemon/main.ts`	Imports cache helpers and computes `CacheConfig` from deploy envs. Calls `restoreCache()` synchronously before deploy proceeds. After site's git-ready phase, triggers `snapshotCache()` asynchronously without blocking the deploy response.

Sequence Diagram

sequenceDiagram
    participant Deploy as Deploy Lifecycle
    participant Cache as Cache Module
    participant S3 as S3 Storage
    participant Tar as Tar Process
    participant FS as Filesystem<br>(/deno-dir)

    Deploy->>Cache: restoreCache(config)
    Cache->>S3: GetObjectCommand(bucket, key)
    S3-->>Cache: cache.tgz stream
    Cache->>Tar: pipe stream → tar -xzf -
    Tar->>FS: extract to /
    Tar-->>Cache: exit status
    Cache-->>Deploy: CacheResult {ok, ms}

    Deploy->>Deploy: Deploy proceeds

    Deploy->>Deploy: git-ready resolves

    Deploy->>Cache: snapshotCache(config) [async]
    Cache->>FS: Deno.stat(/deno-dir)
    FS-->>Cache: directory exists
    Cache->>Tar: tar -czf - /deno-dir
    Tar-->>Cache: gzipped archive (buffered)
    Cache->>S3: PutObjectCommand(bucket, key, archive)
    S3-->>Cache: upload complete
    Cache-->>Deploy: CacheResult {ok, ms} [fire & forget]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

deco-cx/deco#1061: Implements the same sandbox Deno-dir S3 cache restore functionality with a non-fatal cache restore step in the deploy lifecycle.

Poem

🐰 A warren of modules, cached in the cloud,
Tar-zipped and shipped with no fuss, no crowd,
S3 keeps the bundles, we fetch them with speed,
No errors will break us—we plant the seed.
Swift deploys, swift snapshots, the rabbit's delight! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(daemon): opt-in S3 cache restore + snapshot for sandbox runner' accurately summarizes the main change: adding optional S3-backed caching for Deno module restoration and snapshots in the sandbox runner.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/sandbox-cache-restore

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@daemon/main.ts`:
- Around line 623-629: The snapshot is being triggered too early from
currentSite.gitReady which can cause uploading a cold/incomplete /deno-dir; move
the call to snapshotCache(cacheCfg) off the gitReady promise and instead invoke
it only after runtime dependencies are fully initialized (e.g., after the first
successful worker/module resolution or a deps-ready signal). Concretely, remove
or stop using currentSite.gitReady.then(...snapshotCache...) and wire
snapshotCache(cacheCfg) to the lifecycle event that represents "deps fully
initialized" (add or use an existing currentSite.depsReady, worker.initComplete,
or the first successful module resolution hook) so the snapshot runs only once
dependencies and manifest/block generation have completed; keep snapshotCache
detached and swallow errors as before.

In `@daemon/sandboxCache.ts`:
- Around line 95-100: The spawned Deno.Command instances for "tar" (the variable
tar used in restore and snapshot) set stderr to "piped" but only read it after
await tar.status(), which can deadlock if stderr fills; fix by draining
tar.stderr concurrently (e.g., spawn an async task that reads/consumes
tar.stderr.readable or pipes it to /dev/null or processLogger) immediately after
.spawn() so stderr cannot block the child, and apply the same change to the
other tar invocations noted (the Deno.Command calls around the regions
referenced).
- Around line 80-87: The restoreCache function can hang because S3 download and
untar have no timeout; add a hard timeout (e.g. using AbortController with
setTimeout or Promise.race) around the S3 client(region).send(new
GetObjectCommand(...)) and the subsequent untar/stream processing so the whole
restore path aborts after a configured max (e.g. 30s); when timeout triggers
abort the S3 request, kill/cleanup any tar streams, and return { ok: false, ms:
elapsed, reason: "timeout" } from restoreCache (update the code paths around
restoreCache, GetObjectCommand handling and the untar/pipeline section
referenced at lines ~102-107 to respect the abort signal and perform cleanup).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1739163f-da8f-417b-8738-e21ebfe034f4

📥 Commits

Reviewing files that changed from the base of the PR and between 2666148 and c6e44db.

📒 Files selected for processing (2)

daemon/main.ts
daemon/sandboxCache.ts

coderabbitai · 2026-05-05T01:46:51Z

+      // Best-effort cache snapshot — fires after the worker is up so
+      // /deno-dir reflects every module the site actually needed.
+      // Detached: never blocks the deploy response. Failures are
+      // logged and swallowed inside snapshotCache().
+      currentSite.gitReady
+        .then(() => snapshotCache(cacheCfg))
+        .catch(() => {/* never throws */});


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Snapshot trigger is too early; it can upload a cold/incomplete /deno-dir.

This hook runs after gitReady, which resolves before manifest/block generation and before runtime dependency warmup. You can end up overwriting S3 with a mostly empty cache on cold deploys.

Please trigger snapshot from a “deps fully initialized” signal (or after first successful worker/module resolution), not from gitReady.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@daemon/main.ts` around lines 623 - 629, The snapshot is being triggered too early from currentSite.gitReady which can cause uploading a cold/incomplete /deno-dir; move the call to snapshotCache(cacheCfg) off the gitReady promise and instead invoke it only after runtime dependencies are fully initialized (e.g., after the first successful worker/module resolution or a deps-ready signal). Concretely, remove or stop using currentSite.gitReady.then(...snapshotCache...) and wire snapshotCache(cacheCfg) to the lifecycle event that represents "deps fully initialized" (add or use an existing currentSite.depsReady, worker.initComplete, or the first successful module resolution hook) so the snapshot runs only once dependencies and manifest/block generation have completed; keep snapshotCache detached and swallow errors as before.

coderabbitai · 2026-05-05T01:46:51Z

+export async function restoreCache(cfg: CacheConfig): Promise<CacheResult> {
+  if (!isEnabled(cfg)) return { ok: false, ms: 0, reason: "disabled" };
+  const { bucket, key, region = "us-west-2" } = cfg;
+  const start = performance.now();
+  try {
+    const resp = await client(region).send(
+      new GetObjectCommand({ Bucket: bucket, Key: key }),
+    );


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a hard timeout to restore to avoid blocking deploy startup.

restoreCache is awaited in deploy startup, but the S3 fetch/untar path has no timeout. A stalled S3 stream or tar process can hang deploy indefinitely instead of falling back to cold start.

Proposed change

export async function restoreCache(cfg: CacheConfig): Promise<CacheResult> { if (!isEnabled(cfg)) return { ok: false, ms: 0, reason: "disabled" }; const { bucket, key, region = "us-west-2" } = cfg; const start = performance.now(); + const timeoutMs = 15_000; try { + const abortSignal = AbortSignal.timeout(timeoutMs); const resp = await client(region).send( new GetObjectCommand({ Bucket: bucket, Key: key }), + { abortSignal }, ); @@ - await stream.pipeTo(tar.stdin); - const status = await tar.status; + await Promise.race([ + stream.pipeTo(tar.stdin), + new Promise((_, reject) => + setTimeout(() => { + tar.kill("SIGKILL"); + reject(new Error(`restore timeout after ${timeoutMs}ms`)); + }, timeoutMs) + ), + ]); + const status = await tar.status;

Also applies to: 102-107

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@daemon/sandboxCache.ts` around lines 80 - 87, The restoreCache function can hang because S3 download and untar have no timeout; add a hard timeout (e.g. using AbortController with setTimeout or Promise.race) around the S3 client(region).send(new GetObjectCommand(...)) and the subsequent untar/stream processing so the whole restore path aborts after a configured max (e.g. 30s); when timeout triggers abort the S3 request, kill/cleanup any tar streams, and return { ok: false, ms: elapsed, reason: "timeout" } from restoreCache (update the code paths around restoreCache, GetObjectCommand handling and the untar/pipeline section referenced at lines ~102-107 to respect the abort signal and perform cleanup).

coderabbitai · 2026-05-05T01:46:51Z

+    const tar = new Deno.Command("tar", {
+      args: ["-xzf", "-", "-C", "/"],
+      stdin: "piped",
+      stdout: "null",
+      stderr: "piped",
+    }).spawn();


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drain tar.stderr concurrently to prevent child-process deadlock.

In both restore and snapshot, stderr is piped but only read after waiting for process completion. If stderr output exceeds pipe capacity, the process can block before exit.

Proposed change

const tar = new Deno.Command("tar", { @@ stderr: "piped", }).spawn(); + const stderrText = new Response(tar.stderr).text().catch(() => ""); @@ - const status = await tar.status; + const status = await tar.status; @@ - const stderr = await new Response(tar.stderr).text().catch(() => ""); + const stderr = await stderrText;

const tar = new Deno.Command("tar", { @@ stderr: "piped", }).spawn(); + const stderrText = new Response(tar.stderr).text().catch(() => ""); const body = new Uint8Array(await new Response(tar.stdout).arrayBuffer()); const status = await tar.status; if (!status.success) { - const stderr = await new Response(tar.stderr).text().catch(() => ""); + const stderr = await stderrText;

Also applies to: 106-110, 158-162, 165-168

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@daemon/sandboxCache.ts` around lines 95 - 100, The spawned Deno.Command instances for "tar" (the variable tar used in restore and snapshot) set stderr to "piped" but only read it after await tar.status(), which can deadlock if stderr fills; fix by draining tar.stderr concurrently (e.g., spawn an async task that reads/consumes tar.stderr.readable or pipes it to /dev/null or processLogger) immediately after .spawn() so stderr cannot block the child, and apply the same change to the other tar invocations noted (the Deno.Command calls around the regions referenced).

cubic-dev-ai

5 issues found across 2 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="daemon/sandboxCache.ts">

<violation number="1" location="daemon/sandboxCache.ts:105">
P1: Potential child-process deadlock: `stderr` is set to `"piped"` but is only consumed after `await tar.status`. If tar writes more than the OS pipe buffer capacity (~64KB) to stderr, it will block waiting for a reader, and `tar.status` will never resolve. Start draining stderr concurrently before awaiting status:
```ts
const stderrText = new Response(tar.stderr).text().catch(() => "");
await stream.pipeTo(tar.stdin);
const status = await tar.status;
// then: const stderr = await stderrText;
```</violation>

<violation number="2" location="daemon/sandboxCache.ts:154">
P2: The empty-cache guard is incomplete: this only checks that `DENO_DIR` exists, so an empty directory can still be snapshotted and overwrite a valid cache in S3.</violation>

<violation number="3" location="daemon/sandboxCache.ts:164">
P1: `stderr` is piped but not drained until after `await tar.status`; this can block the tar process on a full stderr pipe and hang snapshotting.</violation>
</file>

<file name="daemon/main.ts">

<violation number="1" location="daemon/main.ts:587">
P1: `restoreCache` is awaited on the deploy critical path but has no timeout. A stalled S3 stream or hung `tar` process will block deploy startup indefinitely instead of falling back to the cold-start path. Add an `AbortSignal.timeout()` to the S3 request and a timeout/race around `stream.pipeTo(tar.stdin)` to ensure the deploy proceeds within a bounded time.</violation>

<violation number="2" location="daemon/main.ts:628">
P2: `snapshotCache` is scheduled on `gitReady`, which resolves right after git clone (before startup/cache resolution). This can upload a premature cache snapshot that misses dependencies needed by the running site.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-05T01:51:45Z

+    const body = new Uint8Array(await new Response(tar.stdout).arrayBuffer());
+    const status = await tar.status;
+    if (!status.success) {
+      const stderr = await new Response(tar.stderr).text().catch(() => "");


P1: stderr is piped but not drained until after await tar.status; this can block the tar process on a full stderr pipe and hang snapshotting.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At daemon/sandboxCache.ts, line 164: <comment>`stderr` is piped but not drained until after `await tar.status`; this can block the tar process on a full stderr pipe and hang snapshotting.</comment> <file context> @@ -0,0 +1,198 @@ + stderr: "piped", + }).spawn(); + + const body = new Uint8Array(await new Response(tar.stdout).arrayBuffer()); + const status = await tar.status; + if (!status.success) { </file context>

Suggested change

const body = new Uint8Array(await new Response(tar.stdout).arrayBuffer());

const status = await tar.status;

if (!status.success) {

const stderr = await new Response(tar.stderr).text().catch(() => "");

const stderrPromise = new Response(tar.stderr).text().catch(() => "");

const body = new Uint8Array(await new Response(tar.stdout).arrayBuffer());

const status = await tar.status;

const stderr = await stderrPromise;

if (!status.success) {

cubic-dev-ai · 2026-05-05T01:51:45Z

+    const stream =
+      (body as { transformToWebStream: () => ReadableStream<Uint8Array> })
+        .transformToWebStream();
+    await stream.pipeTo(tar.stdin);


P1: Potential child-process deadlock: stderr is set to "piped" but is only consumed after await tar.status. If tar writes more than the OS pipe buffer capacity (~64KB) to stderr, it will block waiting for a reader, and tar.status will never resolve. Start draining stderr concurrently before awaiting status:

const stderrText = new Response(tar.stderr).text().catch(() => ""); await stream.pipeTo(tar.stdin); const status = await tar.status; // then: const stderr = await stderrText;

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At daemon/sandboxCache.ts, line 105: <comment>Potential child-process deadlock: `stderr` is set to `"piped"` but is only consumed after `await tar.status`. If tar writes more than the OS pipe buffer capacity (~64KB) to stderr, it will block waiting for a reader, and `tar.status` will never resolve. Start draining stderr concurrently before awaiting status: ```ts const stderrText = new Response(tar.stderr).text().catch(() => ""); await stream.pipeTo(tar.stdin); const status = await tar.status; // then: const stderr = await stderrText; ```</comment> <file context> @@ -0,0 +1,198 @@ + const stream = + (body as { transformToWebStream: () => ReadableStream<Uint8Array> }) + .transformToWebStream(); + await stream.pipeTo(tar.stdin); + const status = await tar.status; + </file context>

cubic-dev-ai · 2026-05-05T01:51:45Z

+      // because we want it done before the worker subprocess starts
+      // requesting modules.
+      const cacheCfg = cacheConfigFromEnvs(envs);
+      await restoreCache(cacheCfg).catch(() => {/* never throws */});


P1: restoreCache is awaited on the deploy critical path but has no timeout. A stalled S3 stream or hung tar process will block deploy startup indefinitely instead of falling back to the cold-start path. Add an AbortSignal.timeout() to the S3 request and a timeout/race around stream.pipeTo(tar.stdin) to ensure the deploy proceeds within a bounded time.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At daemon/main.ts, line 587: <comment>`restoreCache` is awaited on the deploy critical path but has no timeout. A stalled S3 stream or hung `tar` process will block deploy startup indefinitely instead of falling back to the cold-start path. Add an `AbortSignal.timeout()` to the S3 request and a timeout/race around `stream.pipeTo(tar.stdin)` to ensure the deploy proceeds within a bounded time.</comment> <file context> @@ -570,6 +575,17 @@ if (SANDBOX_MODE) { + // because we want it done before the worker subprocess starts + // requesting modules. + const cacheCfg = cacheConfigFromEnvs(envs); + await restoreCache(cacheCfg).catch(() => {/* never throws */}); + // Use run command from deploy request. </file context>

cubic-dev-ai · 2026-05-05T01:51:45Z

+    if (!stat?.isDirectory) {
+      return { ok: false, ms: 0, reason: "no deno-dir" };
+    }


P2: The empty-cache guard is incomplete: this only checks that DENO_DIR exists, so an empty directory can still be snapshotted and overwrite a valid cache in S3.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At daemon/sandboxCache.ts, line 154: <comment>The empty-cache guard is incomplete: this only checks that `DENO_DIR` exists, so an empty directory can still be snapshotted and overwrite a valid cache in S3.</comment> <file context> @@ -0,0 +1,198 @@ + try { + // Skip if /deno-dir doesn't exist or is empty + const stat = await Deno.stat(DENO_DIR).catch(() => null); + if (!stat?.isDirectory) { + return { ok: false, ms: 0, reason: "no deno-dir" }; + } </file context>

Suggested change

if (!stat?.isDirectory) {

return { ok: false, ms: 0, reason: "no deno-dir" };

}

if (!stat?.isDirectory) {

return { ok: false, ms: 0, reason: "no deno-dir" };

}

let hasEntries = false;

for await (const _entry of Deno.readDir(DENO_DIR)) {

hasEntries = true;

break;

}

if (!hasEntries) {

return { ok: false, ms: 0, reason: "empty deno-dir" };

}

cubic-dev-ai · 2026-05-05T01:51:45Z

+      // Detached: never blocks the deploy response. Failures are
+      // logged and swallowed inside snapshotCache().
+      currentSite.gitReady
+        .then(() => snapshotCache(cacheCfg))


P2: snapshotCache is scheduled on gitReady, which resolves right after git clone (before startup/cache resolution). This can upload a premature cache snapshot that misses dependencies needed by the running site.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At daemon/main.ts, line 628: <comment>`snapshotCache` is scheduled on `gitReady`, which resolves right after git clone (before startup/cache resolution). This can upload a premature cache snapshot that misses dependencies needed by the running site.</comment> <file context> @@ -604,6 +620,14 @@ if (SANDBOX_MODE) { + // Detached: never blocks the deploy response. Failures are + // logged and swallowed inside snapshotCache(). + currentSite.gitReady + .then(() => snapshotCache(cacheCfg)) + .catch(() => {/* never throws */}); + </file context>

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 5, 2026

View reviewed changes

-    if (!stat?.isDirectory) {
-      return { ok: false, ms: 0, reason: "no deno-dir" };
-    }
+    if (!stat?.isDirectory) {
+      return { ok: false, ms: 0, reason: "no deno-dir" };
+    }
+    let hasEntries = false;
+    for await (const _entry of Deno.readDir(DENO_DIR)) {
+      hasEntries = true;
+      break;
+    }
+    if (!hasEntries) {
+      return { ok: false, ms: 0, reason: "empty deno-dir" };
+    }

Conversation

nicacioliveira commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relationship to the existing daemon/cache.ts (downloadCache)

How activation works (this PR)

Lifecycle in the runner

Rollout-safe

Files

Companion PRs

Future work (out of scope)

Test plan

Summary by cubic

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 5, 2026

Tagging Options

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nicacioliveira commented May 5, 2026 •

edited by coderabbitai Bot

Loading

Relationship to the existing `daemon/cache.ts` (`downloadCache`)

coderabbitai Bot commented May 5, 2026 •

edited

Loading