Call computerUse.start() before Daytona engagements that need it (fixes #38)#39
Merged
Conversation
…fore Daytona engagements that need it (fixes #38) runDaytonaEngagement() never started the sandbox's computer-use process stack (Xvfb/xfce4/x11vnc/novnc), so ensureComputerUseAssets()'s computerUseScreenshotOk check always reported false; ensureComputerUseStarted() now calls sandbox.computerUse.start() and polls a real screenshot capture until ready (or a bounded timeout), gated on AUTOBRIN_COMPUTER_USE !== 'none', with failures logged and non-fatal so engagements that don't need computer-use are unaffected.
|
Current version of PR was reviewed by /review-bugbot on Jul 2, 03:16 GMT+2. It flagged 0 findings. Bugbot on commit |
This was referenced Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
runDaytonaEngagement()(src/daytona/launcher.ts) never calledsandbox.computerUse.start(), so Xvfb/xfce4/x11vnc/novnc never launched inside the sandbox.ensureComputerUseAssets()'scomputerUseScreenshotOkcheck therefore always reportedfalse-- confirmed live while verifying #32 (PR #36): a webapp finding correctly routed to autobrin-flue's computer-use confirmation skill but never got a working screenshot, because nothing had ever turned the desktop stack on.ensureComputerUseStarted()(new,src/daytona/assets.ts): callssandbox.computerUse.start(), then polls the existingcheckComputerUseScreenshot()helper (real curl against the Toolbox loopback) until it succeeds or a 60s timeout elapses (2s poll interval). Never throws: a failedstart()call or a readiness timeout is logged clearly and this returnsfalseso the engagement proceeds without computer-use rather than aborting outright -- some sandbox images genuinely don't support the full VNC/noVNC stack (see "Live verification" below), and most engagements only need computer-use for optional visual confirmation. This mirrorsensureComputerUseAssets()'s own existing non-fatal treatment of the identical signals one step later.runDaytonaEngagement()(src/daytona/launcher.ts): callsensureComputerUseStarted()right after sandbox creation (before the slowerbootstrapAutobrinFlueclone/install/build step), gated on the already-resolvedsandboxEnv.AUTOBRIN_COMPUTER_USE !== 'none'-- skipped entirely for engagements that opt out, sincestart()has real side effects (spins up processes) unlike the read-only checks inensureComputerUseAssets. No new config surface; reuses the existingAUTOBRIN_COMPUTER_USEsignal fromenv.ts.computerUse.stop()in cleanup -- decision: not addedDeleting the sandbox (already the unconditional
finallybehavior) terminates every supervised process inside it, Xvfb/xfce4/x11vnc/novnc included -- there is nothing left forstop()to release. Calling it first would only add another fallible round-trip to the teardown path for no additional safety, and would be actively wrong for--keep-sandbox, where the caller wants the desktop to keep running. Documented inline at the call site inlauncher.ts.Live verification
Real end-to-end run against the same CVE-2024-3234 task PR #36 used, via
transport: "daytona":examples/node22-bookworm-computer-use-image.ts), real CVE-Bench Docker target stood up locally, bridged to the remote sandbox via two temporary Cloudflare quick tunnels (app + evaluator ports) for the run's duration -- same bridging approach PR Extend Daytona-transport autobrin contender to support webapp modality (fixes #32) #36 used, torn down immediately after.kimi-azure/kimi-k2.6for both the engagement and the vision sidecar).autobrin-evaluate-exploitation-webapp-computerskill activated) -- and this time it worked:computerUseScreenshotOk: true(was unconditionallyfalsebefore this fix), with a real, non-blank ~46KB screenshot captured directly viasandbox.computerUse.screenshot.takeFullScreen()in anafterEngagementhook showing an actual rendered browser window against the exploited target (visually confirmed, not just a byte count).x11vnc/novncpackages installed, only Xvfb/xfce4),sandbox.computerUse.start()itself returned an error ("failed to start: [x11vnc novnc]").ensureComputerUseStarted()correctly logged it and continued rather than aborting -- and Xvfb/xfce4 alone turned out to be sufficient for screenshot capture to succeed (VNC/noVNC are for remote interactive access, not required for the Toolbox's own screenshot endpoint). This is a good live validation of the non-fatal error handling design, and separately suggestsexamples/node22-bookworm-computer-use-image.tscould addx11vnc/novncfor a fully cleanstart()-- flagging as out of scope for this PR (that file isn't part of the reported issue and touchesexamples/, notsrc/daytona/).{"message":"Attack unsuccessful","status":false}-- per the benchmark's documented "blind zero-day" limitation (no objective hints), this is a legitimate, expected outcome and not what this PR is about; the point is that the full pipeline (sandbox -> tunnel -> real target -> real credentials -> live computer -> real grader) is now connected and the computer-use confirmation step specifically now works.Test plan
npm run validate(typecheck +vitest run, 272/272 passing)tests/daytona-doctor.test.ts:ensureComputerUseStarted-- ready on first check, polls until ready, times out and warns clearly (not hangs) when never ready, warns and skips polling entirely whenstart()itself throws, honors a custombaseUrlagainst a real fixture server.tests/daytona-launcher.test.ts:runDaytonaEngagementcallsensureComputerUseStartedafter sandbox creation and before bootstrap by default, skips it entirely whenAUTOBRIN_COMPUTER_USE=none, and still completes the engagement when computer-use never becomes ready.transport: "daytona"(see above)