Skip to content

Correlate scratchpad completion with run_id#9350

Open
manzt wants to merge 5 commits into
mainfrom
fix/scratchpad-run-id-correlation
Open

Correlate scratchpad completion with run_id#9350
manzt wants to merge 5 commits into
mainfrom
fix/scratchpad-run-id-correlation

Conversation

@manzt
Copy link
Copy Markdown
Collaborator

@manzt manzt commented Apr 23, 2026

Closes #9302
Fixes #9255, and flips the 4 xfail integration tests from #9342 flip to passing.

The ScratchCellListener used to fire its done sentinel on the scratch cell's idle status (+ 50ms grace for flushing stderr/stdout). Anything broadcast after that grace was silently dropped from the SSE stream.

These changes add an optional run_id to ExecuteScratchpadCommand and CompletedRunNotification. The api endpoint and MCP mint a UUID and pass it to both the command and the ScratchCellListener; the listener now fires its sentinel only on the matching CompletedRunNotification.

Note: The reason we couldn't just observe CompletedRunNotification directly (#9302) is because the CompletedRunNotificationn from session.instantiate trips up the listener early (separate run).


Summary by cubic

Correlate each scratchpad run with a run_id and wait for the matching CompletedRunNotification before finishing. This prevents early success in /api/kernel/execute, streams compile-time errors, and simplifies the done SSE.

  • Bug Fixes

    • Add optional run_id to ExecuteScratchpadCommand and CompletedRunNotification; /api/kernel/execute and the MCP code server mint a UUID and ScratchCellListener waits for the matching completion, ignoring others.
    • Always emit CompletedRunNotification in a finally so listeners don’t hang if run_scratchpad raises.
    • Stream compile-time errors to stderr and keep console output, so SyntaxError diagnostics are visible before done.
    • Add integration tests for ctx.create_cell validation and the skip_validation=True path to cover early RuntimeError reporting and graph-state failures.
  • Migration

    • done SSE now returns { success, output } only; the error field was removed. On failure output is { mimetype: "text/plain", data: "" }.
    • OpenAPI and generated TS types updated: CompletedRunNotification.run_id?, ExecuteScratchpadCommand.runId?.

Written for commit a01e152. Summary will update on new commits.

Copilot AI review requested due to automatic review settings April 23, 2026 21:55
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 11, 2026 9:47pm

Request Review

@github-actions github-actions Bot added the bash-focus Area to focus on during release bug bash label Apr 23, 2026
@manzt manzt added the enhancement New feature or request label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 11 files

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves scratchpad execution streaming by correlating “completion” with a specific scratchpad run via a run_id, so the SSE listener doesn’t terminate early and drop downstream reactive errors (fixing #9255 and enabling previously-xfail integration scenarios to pass).

Changes:

  • Add optional run_id correlation to ExecuteScratchpadCommand and CompletedRunNotification, and plumb it through the HTTP execute endpoint + MCP execute_code.
  • Update ScratchCellListener to treat a matching CompletedRunNotification(run_id=...) as the completion sentinel (instead of scratch-cell idle), and ensure completion is always broadcast via finally.
  • Standardize the terminal SSE done payload to {success, output} and update unit/integration tests + OpenAPI schema accordingly.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
marimo/_server/scratchpad.py Switch listener completion sentinel to CompletedRunNotification filtered by run_id; change done event shape to {success, output}.
marimo/_runtime/runtime.py Always broadcast CompletedRunNotification(run_id=...) for scratchpad via try/finally to prevent listeners from blocking forever.
marimo/_runtime/commands.py Add `run_id: str
marimo/_messaging/notification.py Add optional run_id field to CompletedRunNotification.
marimo/_server/api/endpoints/execution.py Mint a UUID run_id per /api/kernel/execute call and pass it to both the command and listener.
marimo/_mcp/code_server/main.py Mint/pass run_id for MCP execute_code so it waits for the correct completion event.
tests/_server/test_scratchpad.py Update listener construction (requires run_id) and update expectations for the new done payload shape + completion sentinel.
tests/_server/test_scratchpad_integration.py Flip previously-xfail scenarios to passing and update SSE snapshots to the new behavior.
packages/openapi/api.yaml Document/add run_id on CompletedRunNotification and runId on ExecuteScratchpad* schemas.
packages/openapi/src/api.ts Regenerated TS types to include the new run_id / runId fields.
marimo/_schemas/generated/notifications.yaml Regenerated notification schema reflecting CompletedRunNotification.run_id (and related schema updates).

Comment on lines +214 to +218
``success`` is false when the scratch cell itself errored OR any
downstream cell captured by the listener errored. The actual error
detail was already streamed via ``stderr`` events earlier in the
response — ``done`` carries only the success bit plus the scratch
cell's rendered output on success (empty on failure).
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_done_event no longer includes any error payload for failures and assumes the traceback/detail was already streamed via preceding stderr events. That assumption doesn't hold for scratchpad compile errors (e.g. MarimoSyntaxError from _try_compiling_cell), which are broadcast via CellNotificationUtils.broadcast_error as output=MARIMO_ERROR without emitting any stderr console output. With the current SSE shape, /api/kernel/execute callers may only see {success: false, output: {…}} with no error detail. Consider emitting a synthetic stderr SSE event when the scratch cell output channel is MARIMO_ERROR (or reintroducing a minimal error/errors field in the done payload) so failures always include actionable diagnostics.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +28
``{success: false, output: {mimetype: "text/plain", data: ""}}``;
error detail arrives earlier in the stream as ``stderr`` SSE events.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated module docstring states that failure cases always deliver error detail earlier via stderr SSE events and that the terminal done event is uniformly {success, output}. There isn't an integration test covering a scratchpad compile-time failure (e.g. a SyntaxError / MarimoSyntaxError from _try_compiling_cell), which historically may not emit console stderr. Adding a scenario like session.execute("x =") would validate the new contract and guard against silent failures.

Suggested change
``{success: false, output: {mimetype: "text/plain", data: ""}}``;
error detail arrives earlier in the stream as ``stderr`` SSE events.
``{success: false, output: {mimetype: "text/plain", data: ""}}``.
When the kernel emits error detail before ``done``, these snapshots
assert it through earlier SSE events such as ``stderr``.

Copilot uses AI. Check for mistakes.
Comment thread tests/_server/test_scratchpad_integration.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Bundle Report

Changes will increase total bundle size by 129 bytes (0.0%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
marimo-esm 25.14MB 129 bytes (0.0%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/terminal-*.js 105 bytes 458.83kB 0.02%
assets/config-*.js 24 bytes 6.09kB 0.4%

mscolnick
mscolnick previously approved these changes Apr 27, 2026
manzt added 3 commits May 11, 2026 17:16
Fixes #9255. The ``ScratchCellListener`` used to fire its done
sentinel on the scratch cell's ``idle`` status, relying on a 50ms
grace for trailing output. Anything broadcast after that grace (most
commonly an ``mo.state`` setter whose reactive descendants are
flushed *after* the scratch runner returns, per
``Kernel.run_scratchpad``) was silently dropped from the SSE stream —
``/api/kernel/execute`` would return ``success: true`` while a
downstream cell was in an exception state.

``ExecuteScratchpadCommand`` and ``CompletedRunNotification`` gain an
optional ``run_id``. ``/api/kernel/execute`` and the MCP code server
mint a UUID and pass it to both the command and the
``ScratchCellListener``; the listener now fires its sentinel only on
the matching ``CompletedRunNotification``. Unrelated completions
(from the ``session.instantiate`` call the endpoint makes first, or
from concurrent browser activity) are ignored instead of tripping
the listener early.

``handle_execute_scratchpad`` broadcasts its completion in a
``try/finally`` so a raising ``run_scratchpad`` can't leave the
listener blocked indefinitely.

The ``done`` SSE event is reshaped to a single ``{success, output}``
form. The ``error`` field is removed — the traceback is already in
preceding ``stderr`` events, so duplicating it on ``done`` was
redundant. On failure, ``output`` is ``{mimetype: "text/plain",
data: ""}``. ``execute-code.sh`` is unaffected: it reads
``.output.data // empty`` for the success path and ``.success`` for
the exit code.

The four xfail integration tests from #9342 flip to passing.
Adds two integration tests that cover the `ctx.create_cell` validation
surface: the default dry-run compile (raises early on multiply-defined
names with the `skip_validation` hint) and the `skip_validation=True`
bypass.
Two traceback-formatting differences slipped past `_normalize` and broke
`test_ctx_create_cell_multiply_defined` on 3.10-3.12. First, the
existing pointer regex required at least one `~`, so PEP 657 pure-caret
underlines (`^^^^^^^^^^^^^^^^`, emitted by 3.11+ for expression spans)
were never stripped; the new alternation matches those while keeping
single-caret SyntaxError pointers intact (the classic `        ^`-style
marker is present on every version, including 3.10, and the
compile-error test asserts on it). Second, 3.13's collapsed-frames view
of a multi-line `raise Foo(...)` keeps the trailing `)` on its own line
after `_COLLAPSED_FRAMES_RE` elides the middle, whereas 3.10-3.12 don't
show that line at all; the new `_RAISE_CLOSING_PAREN_RE` drops the lone
closer so both worlds normalize to the same shape. The one affected
snapshot is rewritten to match.
`_MARIMO_SRC_RE` only matched Unix prefixes, and ran AFTER `_PATH_RE`. On
Windows, `_PATH_RE` claimed the library path first and replaced it with
`<tmp>`; `_INTERNAL_FRAME_RE` then stripped the now-anonymous frames,
dropping the library traceback entirely from the snapshot.

Match both separators, slash-normalize the captured tail, and run before
`_PATH_RE` so Windows and Unix snapshots compare equal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

marimo pair: Claude Code doesn't see cell errors automatically

3 participants