refactor: replace MarimoFileKey alias with FileKey ADT#9483
Conversation
Introduces a tagged union (NewFileKey | PathFileKey) for notebook keys, replacing the `MarimoFileKey = str` alias and `__new__` magic-string sentinel. Boundary code parses raw query-param strings via `parse_file_key` once, internal code pattern-matches by type, and `serialize_file_key` round-trips back to the wire format where needed (session initialization_id, service-worker injection, log messages). Also tightens the new-file sentinel from a `startswith` prefix check to an exact match, so non-sentinel keys consistently flow through normal path validation instead of returning the blank notebook.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
1 issue found across 31 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="marimo/_server/api/endpoints/ws/ws_connection_validator.py">
<violation number="1" location="marimo/_server/api/endpoints/ws/ws_connection_validator.py:124">
P2: `?file=` is now parsed as an empty `PathFileKey` instead of being treated as missing, which changes prior fallback behavior and can drive invalid file resolution/config lookup.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Client as HTTP Client
participant API as API Gateway
participant WS as WebSocket Handler
participant AppState as AppState
participant SessionMgr as SessionManager
participant Workspace as NotebookWorkspace
participant FileKeyMod as FileKey Module
participant SessionRep as SessionRepository
participant RTC as LoroDocManager
participant ConfigMgr as ConfigManager
Note over Client,ConfigMgr: FileKey ADT Flow - Boundary Parsing and Internal Dispatch
Client->>API: GET /?file=__new__ or /?file=path.py
API->>AppState: extract FILE_QUERY_PARAM_KEY
AppState-->>API: raw_file_key string
alt query param provided
API->>FileKeyMod: parse_file_key(raw_file_key)
Note over FileKeyMod: NEW: exact match "=__new__" else PathFileKey
FileKeyMod-->>API: NewFileKey() or PathFileKey(path)
else no query param
API->>SessionMgr: get_unique_file_key()
SessionMgr->>Workspace: get_unique_file_key()
Workspace-->>SessionMgr: FileKey (NewFileKey or PathFileKey)
SessionMgr-->>API: FileKey
end
API->>FileKeyMod: serialize_file_key(file_key)
Note over FileKeyMod: convert back to wire string for config/init
FileKeyMod-->>API: wire_string
API->>ConfigMgr: config_manager_at_file(wire_string)
ConfigMgr-->>API: config
API->>SessionMgr: app_manager(file_key)
alt PathFileKey
SessionMgr->>Workspace: register_allowed_path(file_key.path)
SessionMgr->>Workspace: load(file_key, defaults)
else NewFileKey
SessionMgr->>Workspace: load(file_key, defaults) - no path register
end
Workspace-->>SessionMgr: AppFileManager
SessionMgr-->>API: AppFileManager
API-->>Client: HTML response with wire_string embedded
Note over Client,RTC: WebSocket Connection and Session Creation
Client->>WS: /ws?session_id=xyz&file=path.py
WS->>WS: extract_connection_params()
WS->>FileKeyMod: parse_file_key(raw_file_key)
WS->>SessionMgr: get_unique_file_key()
FileKeyMod-->>WS: FileKey
WS->>ConfigMgr: serialize_file_key(file_key) -> get_config()
ConfigMgr-->>WS: config (rtc, auto_instantiate)
WS->>SessionMgr: create_session(session_id, file_key)
alt PathFileKey
SessionMgr->>Workspace: register_allowed_path(file_key.path)
end
SessionMgr->>Workspace: load(file_key, defaults)
alt NewFileKey
SessionMgr->>SessionRep: store with initialization_id="__new__"
else PathFileKey
SessionMgr->>SessionRep: store with initialization_id=key.path
end
SessionMgr->>SessionRep: store with app_file_manager.path=absolute_path
WS->>SessionMgr: any_clients_connected(file_key)
SessionMgr->>WS: bool
alt NewFileKey
SessionMgr->>SessionMgr: return False
else PathFileKey
SessionMgr->>SessionRep: get_by_file_path(file_key.path)
SessionRep-->>SessionMgr: sessions list
end
Note over WS,RTC: RTC Doc Lifecycle with FileKey
alt rtc_enabled
WS->>RTC: create_doc(file_key, cell_ids, codes)
alt NewFileKey
RTC->>RTC: use NewFileKey() as dict key
else PathFileKey
RTC->>RTC: use PathFileKey(path) as dict key
end
RTC-->>WS: LoroDoc
end
Note over SessionRep,SessionMgr: Session Lookup by FileKey
alt get_session_by_file_key
SessionMgr->>SessionRep: get_by_file_key(file_key)
SessionRep->>FileKeyMod: serialize_file_key(file_key)
SessionRep->>SessionRep: compare against stored initialization_id
alt PathFileKey
SessionRep->>SessionRep: also compare against os.path.abspath(file_key.path)
end
SessionRep-->>SessionMgr: Session or None
end
alt maybe_resume_session
SessionMgr->>SessionRep: find orphaned sessions
alt NewFileKey
SessionRep->>SessionRep: return None (no resume for new files)
else PathFileKey
SessionRep->>SessionRep: match by abs path
end
SessionRep-->>SessionMgr: Session or None
end
Note over API,RTC: Session Restart Flow
API->>API: restart_session()
API->>AppState: extract raw_file_key
alt query param exists
API->>FileKeyMod: parse_file_key(raw)
FileKeyMod-->>API: FileKey
else fallback
API->>SessionMgr: get_unique_file_key()
alt is None and app_file_manager.path exists
API->>FileKeyMod: create PathFileKey(path)
end
end
API->>RTC: remove_doc(file_key)
API-->>Client: session restarted
Note over Client,ConfigMgr: Key Change - Exact sentinel matching not prefix
Client->>API: GET /?file=__new__file.py
API->>FileKeyMod: parse_file_key("__new__file.py")
Note over FileKeyMod: exact match fails, becomes PathFileKey
FileKeyMod-->>API: PathFileKey("__new__file.py")
API->>Workspace: resolve(PathFileKey("__new__file.py"))
Workspace->>Workspace: isinstance check fails, go through path validation
Workspace-->>API: HTTPException NOT_FOUND/FORBIDDEN
API-->>Client: 404/403 (not blank notebook)
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Pull request overview
This PR replaces the legacy MarimoFileKey = str + "__new__" sentinel approach with an explicit FileKey tagged union (NewFileKey | PathFileKey). Boundary code parses raw wire strings once via parse_file_key, internal code uses type checks, and serialize_file_key is used when a string is required (URLs, session initialization IDs, logs). It also tightens the “new file” detection from a startswith check to an exact sentinel match.
Changes:
- Introduce
FileKeyADT (NewFileKey,PathFileKey) with parse/serialize helpers and wire sentinelNEW_FILE_WIRE. - Refactor workspaces, session management, RTC, and server endpoints to accept
FileKeyinternally and serialize only at I/O boundaries. - Update and expand tests to reflect the ADT and the exact-match sentinel behavior.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_server/test_workspace.py | Updates workspace tests to use FileKey variants; adds parse/serialize round-trip coverage and exact-sentinel security assertions. |
| tests/_server/test_sessions.py | Updates session tests to pass PathFileKey instead of raw strings. |
| tests/_server/test_session_manager.py | Updates session manager tests to use NewFileKey/PathFileKey and new session matching behavior. |
| tests/_server/test_file_manager_absolute_path.py | Updates file-manager tests to load via PathFileKey. |
| tests/_server/rtc/test_rtc_doc.py | Updates RTC tests to use FileKey types instead of the removed MarimoFileKey alias. |
| tests/_server/api/endpoints/test_ws.py | Serializes FileKey for filesystem operations in websocket tests. |
| tests/_server/api/endpoints/test_ws_rtc.py | Serializes FileKey when constructing ws_sync URLs. |
| tests/_server/api/endpoints/test_resume_session.py | Serializes FileKey before reading/writing notebook files in resume-session tests. |
| tests/_server/api/endpoints/test_home.py | Serializes FileKey for API expectations; uses PathFileKey for app manager access. |
| tests/_server/api/endpoints/test_files.py | Serializes FileKey before filesystem interactions in file endpoint tests. |
| tests/_server/api/endpoints/test_file_explorer.py | Serializes FileKey before wrapping into Path(...) in file-explorer tests. |
| tests/_server/api/endpoints/test_assets.py | Updates index tests for exact __new__ sentinel; serializes FileKey for headers and paths. |
| marimo/_session/session_repository.py | Changes session lookup to accept FileKey and match by serialized initialization ID and/or absolute path. |
| marimo/_server/workspace/_single.py | Migrates single-file workspace APIs to FileKey; resolves PathFileKey.path. |
| marimo/_server/workspace/_keys.py | Adds FileKey ADT, NEW_FILE_WIRE, parse_file_key, and serialize_file_key. |
| marimo/_server/workspace/_fixed.py | Migrates fixed-files workspace APIs to FileKey; rejects NewFileKey. |
| marimo/_server/workspace/_empty.py | Migrates empty workspace APIs to FileKey; uses NewFileKey as unique key. |
| marimo/_server/workspace/_directory.py | Migrates directory workspace APIs to FileKey; exact-match new-file behavior via NewFileKey. |
| marimo/_server/workspace/_base.py | Updates NotebookWorkspace contract to FileKey; updates 404 message formatting via serialize_file_key. |
| marimo/_server/workspace/init.py | Re-exports FileKey/parse/serialize utilities and NEW_FILE_WIRE. |
| marimo/_server/session_manager.py | Refactors session creation/app-manager access to operate on FileKey and serialize initialization IDs. |
| marimo/_server/rtc/doc.py | Changes RTC document maps and APIs to key by FileKey. |
| marimo/_server/resume_strategies.py | Updates resume strategy protocol/logic to accept FileKey and treat NewFileKey as non-resumable. |
| marimo/_server/api/lifespans.py | Updates startup logging “new notebook” detection to use NewFileKey. |
| marimo/_server/api/endpoints/ws/ws_rtc_handler.py | Updates RTC websocket handler typing to accept FileKey. |
| marimo/_server/api/endpoints/ws/ws_kernel_ready.py | Updates kernel-ready helpers typing to accept FileKey for RTC initialization. |
| marimo/_server/api/endpoints/ws/ws_connection_validator.py | Parses query-param file into a FileKey and serializes for config lookup. |
| marimo/_server/api/endpoints/home.py | Updates home endpoint metadata resolution to call workspace.resolve(PathFileKey(...)). |
| marimo/_server/api/endpoints/health.py | Uses NEW_FILE_WIRE for sessions without filenames. |
| marimo/_server/api/endpoints/execution.py | Parses file query param into FileKey for RTC cleanup/takeover behavior. |
| marimo/_server/api/endpoints/assets.py | Parses file query param into FileKey, serializes for templating/service-worker injection/OpenGraph context. |
- Treat empty ``?file=`` as missing in HTTP/WS boundary parsers, falling
back to the workspace key. Without this, ``?file=`` parsed as
``PathFileKey("")`` and tried to resolve an empty path.
- Drop ``assert isinstance(key, PathFileKey)`` from workspace
``resolve`` overrides. The closed ``FileKey`` union narrows to
``PathFileKey`` after the ``NewFileKey`` early-return, and asserts
are stripped under ``python -O``.
Wrap path strings in PathFileKey when calling try_resume — these tests live under tests/_session/ and were missed in the initial migration, which caused AttributeError on file_key.path in CI.
|
@kirangadhave I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
No issues found across 32 files
Architecture diagram
sequenceDiagram
participant Client as Browser Client
participant HTTP as HTTP Endpoints
participant WS as WebSocket Handler
participant Workspace as NotebookWorkspace
participant FileKey as FileKey ADT
participant SessionMgr as SessionManager
participant SessionRepo as SessionRepository
participant RTC as LoroDocManager
participant ConfigMgr as ConfigManager
Note over Client,ConfigMgr: PRIMARY FLOW: File key parsing at HTTP boundary
Client->>HTTP: GET /, /file_thumbnail, /public/*<br>?file={raw_string}
HTTP->>FileKey: parse_file_key(raw_string)
alt raw == "__new__"
FileKey-->>HTTP: NewFileKey()
else other string
FileKey-->>HTTP: PathFileKey(raw_string)
end
HTTP->>Workspace: resolve(file_key) / get_unique_file_key()
alt NewFileKey
Workspace-->>HTTP: None (blank notebook)
else PathFileKey
Workspace->>Workspace: normalize_and_validate(key.path)
Workspace-->>HTTP: absolute path or 404
end
HTTP->>FileKey: serialize_file_key(file_key)
FileKey-->>HTTP: "__new__" or path string
HTTP-->>Client: HTML with serialized key in service worker
Note over Client,RTC: SESSION CREATION / RESUMPTION FLOW
Client->>WS: websocket /ws?file={raw_string}&session_id=...
WS->>FileKey: parse_file_key(raw_string)
FileKey-->>WS: FileKey ADT
WS->>SessionMgr: create_session(file_key, ...)
SessionMgr->>Workspace: load(file_key, defaults)
alt PathFileKey
Workspace->>Workspace: register_allowed_path(key.path)
end
Workspace-->>SessionMgr: AppFileManager
SessionMgr->>FileKey: serialize_file_key(file_key)
FileKey-->>SessionMgr: wire string
SessionMgr->>SessionMgr: initialization_id = wire string
SessionMgr-->>WS: Session created
alt resume session
SessionMgr->>SessionRepo: get_by_file_key(file_key)
Note over SessionRepo: Compares by initialization_id<br>and app_file_manager.path
SessionRepo-->>SessionMgr: existing Session or None
alt NewFileKey
SessionMgr->>SessionMgr: skip path-based lookup
end
end
Note over WS,RTC: RTC DOCUMENT MANAGEMENT FLOW
WS->>ConfigMgr: get_config(serialize_file_key(file_key))
ConfigMgr-->>WS: config (rtc_enabled, auto_instantiate)
opt rtc_enabled
WS->>RTC: create_doc(file_key, cell_ids, codes)
RTC->>RTC: keyed by FileKey (ADT, not string)
RTC-->>WS: LoroDoc
end
Note over HTTP,SessionMgr: SESSION RESTART FLOW
Client->>HTTP: POST /api/kernel/restart_session
HTTP->>FileKey: parse_file_key(raw query param)
FileKey-->>HTTP: FileKey ADT
HTTP->>SessionMgr: close_session(session_id)
SessionMgr->>RTC: remove_doc(file_key)
alt failover when query param missing
HTTP->>SessionMgr: get_unique_file_key()
alt still None
HTTP->>SessionMgr: session.app_file_manager.path
SessionMgr->>FileKey: PathFileKey(path)
end
end
Note over HTTP,RTC: STATUS / HEALTH CHECK FLOW
Client->>HTTP: GET /api/status
HTTP->>FileKey: NEW_FILE_WIRE for new-file sessions
Note over HTTP: Uses exported constant, not startswith check
HTTP-->>Client: JSON with "__new__" for untitled notebooks
Note over Workspace,RTC: KEY DIFFERENCE: startswith → exact match
Note over Workspace: Old: key.startswith("__new__") matched<br>"__new__../../etc/passwd"
Note over Workspace: New: isinstance(key, NewFileKey) matches only<br>exact sentinel from parse_file_key()
Introduces a tagged union (NewFileKey | PathFileKey) for notebook keys,
replacing the
MarimoFileKey = stralias and__new__magic-stringsentinel. Boundary code parses raw query-param strings via
parse_file_keyonce, internal code pattern-matches by type, andserialize_file_keyround-trips back to the wire format where needed(session initialization_id, service-worker injection, log messages).
Also tightens the new-file sentinel from a
startswithprefix check toan exact match, so non-sentinel keys consistently flow through normal
path validation instead of returning the blank notebook.