Skip to content

refactor: replace MarimoFileKey alias with FileKey ADT#9483

Open
mscolnick wants to merge 3 commits into
mainfrom
ms/feature/phase-3
Open

refactor: replace MarimoFileKey alias with FileKey ADT#9483
mscolnick wants to merge 3 commits into
mainfrom
ms/feature/phase-3

Conversation

@mscolnick
Copy link
Copy Markdown
Contributor

Introduces a tagged union (NewFileKey | PathFileKey) for notebook keys,
replacing the MarimoFileKey = str alias and __new__ magic-string
sentinel. Boundary code parses raw query-param strings via
parse_file_key once, internal code pattern-matches by type, and
serialize_file_key round-trips back to the wire format where needed
(session initialization_id, service-worker injection, log messages).

Also tightens the new-file sentinel from a startswith prefix check to
an exact match, so non-sentinel keys consistently flow through normal
path validation instead of returning the blank notebook.

Introduces a tagged union (NewFileKey | PathFileKey) for notebook keys,
replacing the `MarimoFileKey = str` alias and `__new__` magic-string
sentinel. Boundary code parses raw query-param strings via
`parse_file_key` once, internal code pattern-matches by type, and
`serialize_file_key` round-trips back to the wire format where needed
(session initialization_id, service-worker injection, log messages).

Also tightens the new-file sentinel from a `startswith` prefix check to
an exact match, so non-sentinel keys consistently flow through normal
path validation instead of returning the blank notebook.
Copilot AI review requested due to automatic review settings May 8, 2026 16:12
@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 8, 2026 9:07pm

Request Review

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 31 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_server/api/endpoints/ws/ws_connection_validator.py">

<violation number="1" location="marimo/_server/api/endpoints/ws/ws_connection_validator.py:124">
P2: `?file=` is now parsed as an empty `PathFileKey` instead of being treated as missing, which changes prior fallback behavior and can drive invalid file resolution/config lookup.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Client as HTTP Client
    participant API as API Gateway
    participant WS as WebSocket Handler
    participant AppState as AppState
    participant SessionMgr as SessionManager
    participant Workspace as NotebookWorkspace
    participant FileKeyMod as FileKey Module
    participant SessionRep as SessionRepository
    participant RTC as LoroDocManager
    participant ConfigMgr as ConfigManager

    Note over Client,ConfigMgr: FileKey ADT Flow - Boundary Parsing and Internal Dispatch

    Client->>API: GET /?file=__new__ or /?file=path.py
    API->>AppState: extract FILE_QUERY_PARAM_KEY
    AppState-->>API: raw_file_key string

    alt query param provided
        API->>FileKeyMod: parse_file_key(raw_file_key)
        Note over FileKeyMod: NEW: exact match "=__new__" else PathFileKey
        FileKeyMod-->>API: NewFileKey() or PathFileKey(path)
    else no query param
        API->>SessionMgr: get_unique_file_key()
        SessionMgr->>Workspace: get_unique_file_key()
        Workspace-->>SessionMgr: FileKey (NewFileKey or PathFileKey)
        SessionMgr-->>API: FileKey
    end

    API->>FileKeyMod: serialize_file_key(file_key)
    Note over FileKeyMod: convert back to wire string for config/init
    FileKeyMod-->>API: wire_string

    API->>ConfigMgr: config_manager_at_file(wire_string)
    ConfigMgr-->>API: config

    API->>SessionMgr: app_manager(file_key)
    alt PathFileKey
        SessionMgr->>Workspace: register_allowed_path(file_key.path)
        SessionMgr->>Workspace: load(file_key, defaults)
    else NewFileKey
        SessionMgr->>Workspace: load(file_key, defaults) - no path register
    end
    Workspace-->>SessionMgr: AppFileManager
    SessionMgr-->>API: AppFileManager

    API-->>Client: HTML response with wire_string embedded

    Note over Client,RTC: WebSocket Connection and Session Creation

    Client->>WS: /ws?session_id=xyz&file=path.py
    WS->>WS: extract_connection_params()
    WS->>FileKeyMod: parse_file_key(raw_file_key)
    WS->>SessionMgr: get_unique_file_key()
    FileKeyMod-->>WS: FileKey
    WS->>ConfigMgr: serialize_file_key(file_key) -> get_config()
    ConfigMgr-->>WS: config (rtc, auto_instantiate)

    WS->>SessionMgr: create_session(session_id, file_key)
    alt PathFileKey
        SessionMgr->>Workspace: register_allowed_path(file_key.path)
    end
    SessionMgr->>Workspace: load(file_key, defaults)

    alt NewFileKey
        SessionMgr->>SessionRep: store with initialization_id="__new__"
    else PathFileKey
        SessionMgr->>SessionRep: store with initialization_id=key.path
    end
    SessionMgr->>SessionRep: store with app_file_manager.path=absolute_path

    WS->>SessionMgr: any_clients_connected(file_key)
    SessionMgr->>WS: bool

    alt NewFileKey
        SessionMgr->>SessionMgr: return False
    else PathFileKey
        SessionMgr->>SessionRep: get_by_file_path(file_key.path)
        SessionRep-->>SessionMgr: sessions list
    end

    Note over WS,RTC: RTC Doc Lifecycle with FileKey

    alt rtc_enabled
        WS->>RTC: create_doc(file_key, cell_ids, codes)
        alt NewFileKey
            RTC->>RTC: use NewFileKey() as dict key
        else PathFileKey
            RTC->>RTC: use PathFileKey(path) as dict key
        end
        RTC-->>WS: LoroDoc
    end

    Note over SessionRep,SessionMgr: Session Lookup by FileKey

    alt get_session_by_file_key
        SessionMgr->>SessionRep: get_by_file_key(file_key)
        SessionRep->>FileKeyMod: serialize_file_key(file_key)
        SessionRep->>SessionRep: compare against stored initialization_id
        alt PathFileKey
            SessionRep->>SessionRep: also compare against os.path.abspath(file_key.path)
        end
        SessionRep-->>SessionMgr: Session or None
    end

    alt maybe_resume_session
        SessionMgr->>SessionRep: find orphaned sessions
        alt NewFileKey
            SessionRep->>SessionRep: return None (no resume for new files)
        else PathFileKey
            SessionRep->>SessionRep: match by abs path
        end
        SessionRep-->>SessionMgr: Session or None
    end

    Note over API,RTC: Session Restart Flow
    API->>API: restart_session()
    API->>AppState: extract raw_file_key
    alt query param exists
        API->>FileKeyMod: parse_file_key(raw)
        FileKeyMod-->>API: FileKey
    else fallback
        API->>SessionMgr: get_unique_file_key()
        alt is None and app_file_manager.path exists
            API->>FileKeyMod: create PathFileKey(path)
        end
    end
    API->>RTC: remove_doc(file_key)
    API-->>Client: session restarted

    Note over Client,ConfigMgr: Key Change - Exact sentinel matching not prefix
    Client->>API: GET /?file=__new__file.py
    API->>FileKeyMod: parse_file_key("__new__file.py")
    Note over FileKeyMod: exact match fails, becomes PathFileKey
    FileKeyMod-->>API: PathFileKey("__new__file.py")
    API->>Workspace: resolve(PathFileKey("__new__file.py"))
    Workspace->>Workspace: isinstance check fails, go through path validation
    Workspace-->>API: HTTPException NOT_FOUND/FORBIDDEN
    API-->>Client: 404/403 (not blank notebook)
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread marimo/_server/api/endpoints/ws/ws_connection_validator.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the legacy MarimoFileKey = str + "__new__" sentinel approach with an explicit FileKey tagged union (NewFileKey | PathFileKey). Boundary code parses raw wire strings once via parse_file_key, internal code uses type checks, and serialize_file_key is used when a string is required (URLs, session initialization IDs, logs). It also tightens the “new file” detection from a startswith check to an exact sentinel match.

Changes:

  • Introduce FileKey ADT (NewFileKey, PathFileKey) with parse/serialize helpers and wire sentinel NEW_FILE_WIRE.
  • Refactor workspaces, session management, RTC, and server endpoints to accept FileKey internally and serialize only at I/O boundaries.
  • Update and expand tests to reflect the ADT and the exact-match sentinel behavior.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/_server/test_workspace.py Updates workspace tests to use FileKey variants; adds parse/serialize round-trip coverage and exact-sentinel security assertions.
tests/_server/test_sessions.py Updates session tests to pass PathFileKey instead of raw strings.
tests/_server/test_session_manager.py Updates session manager tests to use NewFileKey/PathFileKey and new session matching behavior.
tests/_server/test_file_manager_absolute_path.py Updates file-manager tests to load via PathFileKey.
tests/_server/rtc/test_rtc_doc.py Updates RTC tests to use FileKey types instead of the removed MarimoFileKey alias.
tests/_server/api/endpoints/test_ws.py Serializes FileKey for filesystem operations in websocket tests.
tests/_server/api/endpoints/test_ws_rtc.py Serializes FileKey when constructing ws_sync URLs.
tests/_server/api/endpoints/test_resume_session.py Serializes FileKey before reading/writing notebook files in resume-session tests.
tests/_server/api/endpoints/test_home.py Serializes FileKey for API expectations; uses PathFileKey for app manager access.
tests/_server/api/endpoints/test_files.py Serializes FileKey before filesystem interactions in file endpoint tests.
tests/_server/api/endpoints/test_file_explorer.py Serializes FileKey before wrapping into Path(...) in file-explorer tests.
tests/_server/api/endpoints/test_assets.py Updates index tests for exact __new__ sentinel; serializes FileKey for headers and paths.
marimo/_session/session_repository.py Changes session lookup to accept FileKey and match by serialized initialization ID and/or absolute path.
marimo/_server/workspace/_single.py Migrates single-file workspace APIs to FileKey; resolves PathFileKey.path.
marimo/_server/workspace/_keys.py Adds FileKey ADT, NEW_FILE_WIRE, parse_file_key, and serialize_file_key.
marimo/_server/workspace/_fixed.py Migrates fixed-files workspace APIs to FileKey; rejects NewFileKey.
marimo/_server/workspace/_empty.py Migrates empty workspace APIs to FileKey; uses NewFileKey as unique key.
marimo/_server/workspace/_directory.py Migrates directory workspace APIs to FileKey; exact-match new-file behavior via NewFileKey.
marimo/_server/workspace/_base.py Updates NotebookWorkspace contract to FileKey; updates 404 message formatting via serialize_file_key.
marimo/_server/workspace/init.py Re-exports FileKey/parse/serialize utilities and NEW_FILE_WIRE.
marimo/_server/session_manager.py Refactors session creation/app-manager access to operate on FileKey and serialize initialization IDs.
marimo/_server/rtc/doc.py Changes RTC document maps and APIs to key by FileKey.
marimo/_server/resume_strategies.py Updates resume strategy protocol/logic to accept FileKey and treat NewFileKey as non-resumable.
marimo/_server/api/lifespans.py Updates startup logging “new notebook” detection to use NewFileKey.
marimo/_server/api/endpoints/ws/ws_rtc_handler.py Updates RTC websocket handler typing to accept FileKey.
marimo/_server/api/endpoints/ws/ws_kernel_ready.py Updates kernel-ready helpers typing to accept FileKey for RTC initialization.
marimo/_server/api/endpoints/ws/ws_connection_validator.py Parses query-param file into a FileKey and serializes for config lookup.
marimo/_server/api/endpoints/home.py Updates home endpoint metadata resolution to call workspace.resolve(PathFileKey(...)).
marimo/_server/api/endpoints/health.py Uses NEW_FILE_WIRE for sessions without filenames.
marimo/_server/api/endpoints/execution.py Parses file query param into FileKey for RTC cleanup/takeover behavior.
marimo/_server/api/endpoints/assets.py Parses file query param into FileKey, serializes for templating/service-worker injection/OpenGraph context.

Comment thread marimo/_server/api/endpoints/ws/ws_connection_validator.py Outdated
Comment thread marimo/_server/api/endpoints/assets.py Outdated
Comment thread marimo/_server/api/endpoints/execution.py
Comment thread marimo/_server/workspace/_single.py Outdated
Comment thread marimo/_server/workspace/_directory.py
Comment thread marimo/_server/workspace/_fixed.py Outdated
Comment thread marimo/_server/workspace/_empty.py
- Treat empty ``?file=`` as missing in HTTP/WS boundary parsers, falling
  back to the workspace key. Without this, ``?file=`` parsed as
  ``PathFileKey("")`` and tried to resolve an empty path.
- Drop ``assert isinstance(key, PathFileKey)`` from workspace
  ``resolve`` overrides. The closed ``FileKey`` union narrows to
  ``PathFileKey`` after the ``NewFileKey`` early-return, and asserts
  are stripped under ``python -O``.
@mscolnick mscolnick added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels May 8, 2026
Wrap path strings in PathFileKey when calling try_resume — these tests
live under tests/_session/ and were missed in the initial migration,
which caused AttributeError on file_key.path in CI.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Comment thread marimo/_server/workspace/_base.py
@kirangadhave
Copy link
Copy Markdown
Member

@cubic-dev-ai

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented May 11, 2026

@cubic-dev-ai

@kirangadhave I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 32 files

Architecture diagram
sequenceDiagram
    participant Client as Browser Client
    participant HTTP as HTTP Endpoints
    participant WS as WebSocket Handler
    participant Workspace as NotebookWorkspace
    participant FileKey as FileKey ADT
    participant SessionMgr as SessionManager
    participant SessionRepo as SessionRepository
    participant RTC as LoroDocManager
    participant ConfigMgr as ConfigManager

    Note over Client,ConfigMgr: PRIMARY FLOW: File key parsing at HTTP boundary

    Client->>HTTP: GET /, /file_thumbnail, /public/*<br>?file={raw_string}
    HTTP->>FileKey: parse_file_key(raw_string)
    alt raw == "__new__"
        FileKey-->>HTTP: NewFileKey()
    else other string
        FileKey-->>HTTP: PathFileKey(raw_string)
    end
    HTTP->>Workspace: resolve(file_key) / get_unique_file_key()
    alt NewFileKey
        Workspace-->>HTTP: None (blank notebook)
    else PathFileKey
        Workspace->>Workspace: normalize_and_validate(key.path)
        Workspace-->>HTTP: absolute path or 404
    end
    HTTP->>FileKey: serialize_file_key(file_key)
    FileKey-->>HTTP: "__new__" or path string
    HTTP-->>Client: HTML with serialized key in service worker

    Note over Client,RTC: SESSION CREATION / RESUMPTION FLOW

    Client->>WS: websocket /ws?file={raw_string}&session_id=...
    WS->>FileKey: parse_file_key(raw_string)
    FileKey-->>WS: FileKey ADT
    WS->>SessionMgr: create_session(file_key, ...)
    SessionMgr->>Workspace: load(file_key, defaults)
    alt PathFileKey
        Workspace->>Workspace: register_allowed_path(key.path)
    end
    Workspace-->>SessionMgr: AppFileManager
    SessionMgr->>FileKey: serialize_file_key(file_key)
    FileKey-->>SessionMgr: wire string
    SessionMgr->>SessionMgr: initialization_id = wire string
    SessionMgr-->>WS: Session created

    alt resume session
        SessionMgr->>SessionRepo: get_by_file_key(file_key)
        Note over SessionRepo: Compares by initialization_id<br>and app_file_manager.path
        SessionRepo-->>SessionMgr: existing Session or None
        alt NewFileKey
            SessionMgr->>SessionMgr: skip path-based lookup
        end
    end

    Note over WS,RTC: RTC DOCUMENT MANAGEMENT FLOW

    WS->>ConfigMgr: get_config(serialize_file_key(file_key))
    ConfigMgr-->>WS: config (rtc_enabled, auto_instantiate)
    opt rtc_enabled
        WS->>RTC: create_doc(file_key, cell_ids, codes)
        RTC->>RTC: keyed by FileKey (ADT, not string)
        RTC-->>WS: LoroDoc
    end

    Note over HTTP,SessionMgr: SESSION RESTART FLOW

    Client->>HTTP: POST /api/kernel/restart_session
    HTTP->>FileKey: parse_file_key(raw query param)
    FileKey-->>HTTP: FileKey ADT
    HTTP->>SessionMgr: close_session(session_id)
    SessionMgr->>RTC: remove_doc(file_key)
    alt failover when query param missing
        HTTP->>SessionMgr: get_unique_file_key()
        alt still None
            HTTP->>SessionMgr: session.app_file_manager.path
            SessionMgr->>FileKey: PathFileKey(path)
        end
    end

    Note over HTTP,RTC: STATUS / HEALTH CHECK FLOW

    Client->>HTTP: GET /api/status
    HTTP->>FileKey: NEW_FILE_WIRE for new-file sessions
    Note over HTTP: Uses exported constant, not startswith check
    HTTP-->>Client: JSON with "__new__" for untitled notebooks

    Note over Workspace,RTC: KEY DIFFERENCE: startswith → exact match
    Note over Workspace: Old: key.startswith("__new__") matched<br>"__new__../../etc/passwd"
    Note over Workspace: New: isinstance(key, NewFileKey) matches only<br>exact sentinel from parse_file_key()
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants