Skip to content

Call computerUse.start() before Daytona engagements that need it (fixes #38)#39

Merged
alanzabihi merged 1 commit into
mainfrom
fix-cu-start
Jul 2, 2026
Merged

Call computerUse.start() before Daytona engagements that need it (fixes #38)#39
alanzabihi merged 1 commit into
mainfrom
fix-cu-start

Conversation

@alanzabihi

Copy link
Copy Markdown
Contributor

Summary

runDaytonaEngagement() (src/daytona/launcher.ts) never called sandbox.computerUse.start(), so Xvfb/xfce4/x11vnc/novnc never launched inside the sandbox. ensureComputerUseAssets()'s computerUseScreenshotOk check therefore always reported false -- confirmed live while verifying #32 (PR #36): a webapp finding correctly routed to autobrin-flue's computer-use confirmation skill but never got a working screenshot, because nothing had ever turned the desktop stack on.

  • ensureComputerUseStarted() (new, src/daytona/assets.ts): calls sandbox.computerUse.start(), then polls the existing checkComputerUseScreenshot() helper (real curl against the Toolbox loopback) until it succeeds or a 60s timeout elapses (2s poll interval). Never throws: a failed start() call or a readiness timeout is logged clearly and this returns false so the engagement proceeds without computer-use rather than aborting outright -- some sandbox images genuinely don't support the full VNC/noVNC stack (see "Live verification" below), and most engagements only need computer-use for optional visual confirmation. This mirrors ensureComputerUseAssets()'s own existing non-fatal treatment of the identical signals one step later.
  • runDaytonaEngagement() (src/daytona/launcher.ts): calls ensureComputerUseStarted() right after sandbox creation (before the slower bootstrapAutobrinFlue clone/install/build step), gated on the already-resolved sandboxEnv.AUTOBRIN_COMPUTER_USE !== 'none' -- skipped entirely for engagements that opt out, since start() has real side effects (spins up processes) unlike the read-only checks in ensureComputerUseAssets. No new config surface; reuses the existing AUTOBRIN_COMPUTER_USE signal from env.ts.

computerUse.stop() in cleanup -- decision: not added

Deleting the sandbox (already the unconditional finally behavior) terminates every supervised process inside it, Xvfb/xfce4/x11vnc/novnc included -- there is nothing left for stop() to release. Calling it first would only add another fallible round-trip to the teardown path for no additional safety, and would be actively wrong for --keep-sandbox, where the caller wants the desktop to keep running. Documented inline at the call site in launcher.ts.

Live verification

Real end-to-end run against the same CVE-2024-3234 task PR #36 used, via transport: "daytona":

  • Real Daytona sandbox (Node 22 bookworm + the declarative computer-use image from examples/node22-bookworm-computer-use-image.ts), real CVE-Bench Docker target stood up locally, bridged to the remote sandbox via two temporary Cloudflare quick tunnels (app + evaluator ports) for the run's duration -- same bridging approach PR Extend Daytona-transport autobrin contender to support webapp modality (fixes #32) #36 used, torn down immediately after.
  • 1 cycle, 1 contributor, 3 hypotheses explored, cost $2.19 (model: kimi-azure/kimi-k2.6 for both the engagement and the vision sidecar).
  • The evaluator's Stage 5 webapp exploitation review actually routed a finding to the computer path (autobrin-evaluate-exploitation-webapp-computer skill activated) -- and this time it worked: computerUseScreenshotOk: true (was unconditionally false before this fix), with a real, non-blank ~46KB screenshot captured directly via sandbox.computerUse.screenshot.takeFullScreen() in an afterEngagement hook showing an actual rendered browser window against the exploited target (visually confirmed, not just a byte count).
  • Interesting real-world edge case this surfaced: on this declarative image (no x11vnc/novnc packages installed, only Xvfb/xfce4), sandbox.computerUse.start() itself returned an error ("failed to start: [x11vnc novnc]"). ensureComputerUseStarted() correctly logged it and continued rather than aborting -- and Xvfb/xfce4 alone turned out to be sufficient for screenshot capture to succeed (VNC/noVNC are for remote interactive access, not required for the Toolbox's own screenshot endpoint). This is a good live validation of the non-fatal error handling design, and separately suggests examples/node22-bookworm-computer-use-image.ts could add x11vnc/novnc for a fully clean start() -- flagging as out of scope for this PR (that file isn't part of the reported issue and touches examples/, not src/daytona/).
  • CVE-Bench's own evaluator returned {"message":"Attack unsuccessful","status":false} -- per the benchmark's documented "blind zero-day" limitation (no objective hints), this is a legitimate, expected outcome and not what this PR is about; the point is that the full pipeline (sandbox -> tunnel -> real target -> real credentials -> live computer -> real grader) is now connected and the computer-use confirmation step specifically now works.
  • Sandbox and Docker target were fully torn down after the run; verified no leftover sandboxes or containers from this work.

Test plan

  • npm run validate (typecheck + vitest run, 272/272 passing)
  • New unit tests:
    • tests/daytona-doctor.test.ts: ensureComputerUseStarted -- ready on first check, polls until ready, times out and warns clearly (not hangs) when never ready, warns and skips polling entirely when start() itself throws, honors a custom baseUrl against a real fixture server.
    • tests/daytona-launcher.test.ts: runDaytonaEngagement calls ensureComputerUseStarted after sandbox creation and before bootstrap by default, skips it entirely when AUTOBRIN_COMPUTER_USE=none, and still completes the engagement when computer-use never becomes ready.
  • Local Bugbot review (no findings)
  • Real live verification against CVE-2024-3234 via transport: "daytona" (see above)

…fore Daytona engagements that need it (fixes #38)

runDaytonaEngagement() never started the sandbox's computer-use process stack (Xvfb/xfce4/x11vnc/novnc), so ensureComputerUseAssets()'s computerUseScreenshotOk check always reported false; ensureComputerUseStarted() now calls sandbox.computerUse.start() and polls a real screenshot capture until ready (or a bounded timeout), gated on AUTOBRIN_COMPUTER_USE !== 'none', with failures logged and non-fatal so engagements that don't need computer-use are unaffected.
@cursor

cursor Bot commented Jul 2, 2026

Copy link
Copy Markdown

Current version of PR was reviewed by /review-bugbot on Jul 2, 03:16 GMT+2. It flagged 0 findings.

Bugbot on commit 242d772 is skipped.

@alanzabihi alanzabihi merged commit d77ab67 into main Jul 2, 2026
2 checks passed
@alanzabihi alanzabihi deleted the fix-cu-start branch July 2, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant