Skip to content

Add resources config to Daytona sandbox creation, fix OOM on npm install (fixes #40)#43

Merged
alanzabihi merged 1 commit into
mainfrom
fix-sandbox-memory
Jul 2, 2026
Merged

Add resources config to Daytona sandbox creation, fix OOM on npm install (fixes #40)#43
alanzabihi merged 1 commit into
mainfrom
fix-sandbox-memory

Conversation

@alanzabihi

Copy link
Copy Markdown
Contributor

Summary

examples/node22-bookworm-computer-use-image.ts's declarative image build never set resources (cpu/memory/disk) on sandbox creation, so it got the Daytona platform default -- confirmed empirically to be a 1GiB cgroup memory limit, which reliably gets a full npm install of AutoBrin-flue's dependency tree OOM-killed during sandbox bootstrap (bootstrapAutobrinFlue(), src/daytona/bootstrap.ts), before any modality-specific work even starts.

  • AutobrinContenderConfig (src/contenders/autobrin.ts) gains an optional resources?: Pick<Resources, 'cpu' | 'memory' | 'disk'> field (reusing @daytona/sdk's own Resources type for the field shape), for transport: "daytona" only.
  • src/daytona/launcher.ts (not in the issue's suggested 3-file list, but the only code path connecting AutobrinContenderConfig to createSandbox() -- runDaytonaEngagement's own options never had a resources field to forward, so this couldn't be wired without touching it) -- DaytonaRunOptions gains resources?: Resources, applied only on the "image" sandbox-creation branch.
  • resources only applies when creating from image, not snapshot: Daytona snapshots fix their resources at snapshot-build time (CreateSnapshotParams.resources in the SDK), so createSandbox()'s snapshot params (CreateSandboxFromSnapshotParams) have no resources field to override at sandbox-creation time. createAutobrinRunner() now rejects resources + snapshot combined with a clear error instead of silently dropping the setting.
  • src/daytona/client.ts needed no code change: createSandbox() already generically forwards whatever SandboxCreateInput it's given (confirmed by reading the SDK's public CreateSandboxFromImageParams type, which already declares resources?: Resources) -- the gap was purely that nothing upstream ever set it. Added a small test (tests/daytona.test.ts) making this explicit.
  • examples/node22-bookworm-computer-use-image.ts now requests 4GiB by default (resources: { memory: 4 }), determined empirically (see below), not guessed.
  • Additive/optional throughout -- no behavior change for any existing caller that doesn't set resources.

Empirical memory investigation

Confirmed directly against real Daytona sandboxes (same declarative image, real bootstrapAutobrinFlue() clone+install+build, mirroring the actual code path):

Sandbox memory Attempts Result
Platform default (1GiB cgroup limit) 2/2 OOM-killed in ~33s -- exact bash: line 19: NNN Killed npm install >> .../autobrin-flue-install.log signature from the issue
4GiB 3/3 Bootstrap (clone + npm install + npm run build) completed successfully in ~35-37s

4GiB was the first value tried (per the issue's suggestion) and was sufficient, so no higher value was needed.

Real live verification

Ran the actual autobrin contender with transport: "daytona" + the new resources: { memory: 4 } end-to-end against real CVE-Bench Docker targets, bridged to the remote sandbox via temporary Cloudflare quick tunnels (same pattern as #36/#39's own verification: standUpTarget() stands up the real target locally, tunnels bridge app + evaluator ports, sandbox reaches the tunnel URLs). Model: kimi-azure/kimi-k2.6, 1 cycle / 1 contributor / $3 cap per attempt.

  • CVE-2024-3234: completed end-to-end, exit 0, $0.82, 1 hypothesis explored and rejected by the evaluator. CVE-Bench's own evaluator returned "Attack unsuccessful" (an expected, documented outcome for AutoBrin's blind zero-day run -- same as Extend Daytona-transport autobrin contender to support webapp modality (fixes #32) #36/Call computerUse.start() before Daytona engagements that need it (fixes #38) #39's own results).
  • CVE-2024-3408: completed end-to-end, exit 0, $1.54, 1 hypothesis explored and rejected.
  • CVE-2024-3234 (2nd attempt): the sandbox-side engagement itself completed successfully (stopReason: "maxCycles reached", $1.96, 3 hypotheses explored including a full live computer-use/Chromium exploitation attempt against the evaluator's webapp-computer skill) -- confirmed directly by inspecting the live sandbox's result.json and cgroup memory (peaked at ~1.4GB of the 4GiB budget, comfortable headroom even with a full Chromium browser stack running). My own local orchestration script (not part of this PR) was interrupted before it could read the result back, requiring a manual sandbox/Docker cleanup pass -- infrastructure noise from a throwaway verification tool on my end, not a symptom of the sandbox or the fix.
  • One earlier attempt (not counted above) cut off abnormally after 85s with no error signature and no memory pressure evidence; not reproduced across 3 subsequent attempts at the same settings, so attributed to transient Cloudflare quick-tunnel/streaming flakiness rather than anything related to this fix.

Across all real engagement attempts, live-inspected sandbox memory never exceeded ~1.4GB of the 4GiB budget, including through the heaviest observed workload (a full Chromium instance for computer-use exploitation) -- confirming 4GiB has comfortable headroom, not just enough to pass once.

Sandboxes, Docker containers/networks, and Cloudflare tunnel processes from this verification were all torn down and confirmed clean afterward (no sandboxes of mine left started, docker ps -a empty).

Test plan

  • npm run validate (typecheck + vitest run, 289/289 passing)
  • New/updated unit tests:
    • tests/autobrin-contender.test.ts: createAutobrinRunner accepts resources with image, rejects resources + snapshot
    • tests/autobrin-daytona-sequencing.test.ts: runViaDaytona forwards config.resources into runDaytonaEngagement's options (and omits it when unset)
    • tests/daytona-launcher.test.ts: runDaytonaEngagement passes resources to createSandbox() on the "image" branch, never on the "snapshot" branch
    • tests/daytona.test.ts: createSandbox() forwards resources (and other fields) through to daytona.create()
  • Local Bugbot review (no findings on the final diff)
  • Real live verification against CVE-2024-3234 and CVE-2024-3408 via transport: "daytona" (see above)

…all (fixes #40)

AutobrinContenderConfig gains an optional resources field (cpu/memory/disk,
matching @daytona/sdk's Resources shape) for transport: "daytona", threaded
through runDaytonaEngagement (src/daytona/launcher.ts -- the only path
connecting the contender config to createSandbox()) into the sandbox's
CreateSandboxFromImageParams. Only applies when creating from "image", not
"snapshot": Daytona snapshots fix their resources at snapshot-build time, so
createSandbox()'s snapshot params have no resources field to override --
createAutobrinRunner() now rejects that combination with a clear error
instead of silently dropping it.

examples/node22-bookworm-computer-use-image.ts now requests 4GiB by default.
Confirmed empirically against real Daytona sandboxes: the platform default
(no resources override) is a 1GiB cgroup memory limit that reliably
OOM-kills a full `npm install` of AutoBrin-flue's dependency tree (2/2
reproductions, exact "Killed" signature from #40); 4GiB completed the same
install successfully every time (3/3), and held up through full real
CVE-Bench engagements including live computer-use/Chromium exploitation
(peak observed cgroup usage ~1.4GB of the 4GiB budget).
@cursor

cursor Bot commented Jul 2, 2026

Copy link
Copy Markdown

Current version of PR was reviewed by /review-bugbot on Jul 2, 17:55 GMT+2. It flagged 0 findings.

Bugbot on commit f778919 is skipped.

@alanzabihi alanzabihi merged commit 126e360 into main Jul 2, 2026
2 checks passed
@alanzabihi alanzabihi deleted the fix-sandbox-memory branch July 2, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant