feat(BA-6155): add metadata-driven chunk-store upload session engine#11766
Draft
jopemachine wants to merge 1 commit into
Draft
feat(BA-6155): add metadata-driven chunk-store upload session engine#11766jopemachine wants to merge 1 commit into
jopemachine wants to merge 1 commit into
Conversation
This was referenced May 22, 2026
2a9d0eb to
193aece
Compare
764ffad to
89e52ef
Compare
jopemachine
commented
May 26, 2026
89e52ef to
4499f92
Compare
jopemachine
commented
May 26, 2026
jopemachine
commented
May 26, 2026
4499f92 to
b84c12f
Compare
jopemachine
commented
May 28, 2026
jopemachine
commented
May 28, 2026
jopemachine
commented
May 28, 2026
jopemachine
commented
May 28, 2026
c0c0db8 to
4d1cb89
Compare
4d1cb89 to
5c675ef
Compare
9112c26 to
f1a899d
Compare
jopemachine
commented
Jun 1, 2026
jopemachine
commented
Jun 1, 2026
f1a899d to
a4fb473
Compare
jopemachine
commented
Jun 1, 2026
49b4a42 to
940cceb
Compare
jopemachine
commented
Jun 1, 2026
jopemachine
commented
Jun 1, 2026
9a8919e to
9225be5
Compare
jopemachine
commented
Jun 1, 2026
88d3e00 to
7bbd8c4
Compare
Introduce the concurrency-safe upload session engine that replaces the
single-file-append TUS model. Session metadata is the source of truth in
Valkey, guarded by a per-session lock (SET NX + token-compare Lua release);
only the chunk payload bytes live on the shared filesystem as per-offset
chunk_<offset>.dat files. Coordination across multiple Storage Proxy replicas
happens entirely through Valkey, so there is no dependency on filesystem lock
semantics (fcntl.flock) or NFS attribute-cache coherence; chunk payloads are
content-addressed by (offset, sha256) and idempotent, needing no coordination.
Contents:
- common/clients/valkey_client/valkey_tus: ValkeyTusClient — per-session
state get/set (with TTL) + a per-session lock; reuses the Glide-based
AbstractValkeyClient like the other valkey clients.
- common/defs: REDIS_TUS_DB; common/metrics: VALKEY_TUS layer.
- errors/upload.py: ChunkConflictError (409), UploadSessionCorruptedError (500)
- services/upload/tus_session.py:
- ChunkRecord / SessionState (BackendAISchema; committed_offset as the
largest contiguous prefix, missing_ranges, progress_percent) / ChunkAcceptance
- UploadStatus StrEnum
- TusUploadSession (ensure_initialized, read_state, write_temp_chunk,
commit_chunk — idempotent dup / 409 conflict / no-op when completed,
assemble, cleanup) taking a TusUploadSessionArgs that carries the
ValkeyTusClient.
- Stale sessions auto-expire via the Valkey TTL (no separate GC sweep).
- Storage RootContext now provisions a ValkeyTusClient (server.py bootstrap).
- Integration tests drive a real Valkey (redis container) + tmp_path chunks.
Resolves BA-6154. Resolves BA-6155.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7bbd8c4 to
f337ca3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📚 Stacked PRs
This PR is part of a 5-PR stack implementing BA-3974 (epic: BA-6153). Merge in order:
feat(BA-6155): add Valkey-backed chunk-store upload session engine← you are herefix(BA-6156): rewire TUS PATCH/HEAD to chunk-based store(actual user-visible fix)test(BA-6157): add multi-proxy NFS race regression testfeat(BA-6158): support TUS Checksum extensionfeat(BA-6159): add /upload/status endpoint + progress headersSummary
Introduces the concurrency-safe TUS upload session engine. Per-session metadata
is the source of truth in Valkey (keyed by session id) and is guarded by a
per-session distributed lock produced by an injected
DistributedLockFactory.Chunk payload bytes live as individual files under
chunks/chunk_<offset>.daton the shared filesystem; they are content-addressed by
(offset, sha256)andidempotent, so the heavy write path needs no coordination at all. Only the small
metadata read-modify-write window is serialized — and across Storage Proxy
replicas, since the lock is distributed.
Architecture
The handler (PR #11767) only adapts the request body; all chunk file I/O and
metadata management lives in the engine (this PR).
flowchart TB classDef pr66 fill:#e6f7ff,stroke:#1890ff,color:#000 classDef pr67 fill:#fff7e6,stroke:#fa8c16,color:#000 classDef redis fill:#fff1f0,stroke:#cf1322,color:#000 classDef disk fill:#f6ffed,stroke:#52c41a,color:#000 subgraph L1["PATCH handler (PR #11767)"] REQ["request.content"]:::pr67 ADP["TusChunkUploadStreamReader"]:::pr67 REQ --> ADP end subgraph L2["Upload engine (PR #11766)"] WTC["write_temp_chunk"]:::pr66 CC["commit_chunk"]:::pr66 FIN["assemble + cleanup"]:::pr66 WTC --> CC --> FIN end subgraph L3R["Valkey · source of truth"] STATE["tus.upload.session:id"]:::redis LOCK["tus.upload.lock:id"]:::redis end subgraph L3F["shared FS · chunk payloads"] TMP["chunk_off.tok.tmp"]:::disk DAT["chunk_off.dat"]:::disk OUT["assembled file"]:::disk end ADP --> WTC WTC -->|"lock-free write"| TMP CC -->|"acquire"| LOCK CC -->|"get / set"| STATE CC -->|"atomic rename"| DAT FIN -->|"concat in order"| OUT FIN -->|"mark COMPLETED"| STATE FIN -->|"unlink chunks"| DATData model
classDiagram class TusUploadSession { +ensure_initialized() +read_state() SessionState +open_temp_chunk(offset) Path +write_temp_chunk(offset, reader) WrittenChunk +commit_chunk(offset, path, length, sha256) ChunkCommitResult +assemble(target) +cleanup() } class SessionState { +session_id +total_size +committed_chunks : ChunkMetadata[] +status : UploadStatus +committed_offset +find_at_offset(offset) +append_chunk(chunk) } class ChunkMetadata { +offset +length +sha256 } class WrittenChunk { +path +length +sha256 } class ChunkCommitResult { +state : SessionState +committed : bool +is_final_commit : bool } class TusUploadSessionArgs { +session_dir +session_id +total_size +valkey_client : ValkeyTusClient +lock_factory : DistributedLockFactory } SessionState o-- ChunkMetadata TusUploadSession ..> SessionState TusUploadSession ..> ChunkCommitResult TusUploadSession ..> WrittenChunk TusUploadSession ..> TusUploadSessionArgsOne PATCH lifecycle
The expensive body write is lock-free; only the short metadata read-modify-write
is serialized via the distributed lock, so concurrent replicas never corrupt
each other.
sequenceDiagram autonumber participant H as tus_upload_part participant S as TusUploadSession participant V as Valkey participant D as shared FS H->>S: ensure_initialized() S->>V: acquire lock + SET state if missing H->>S: write_temp_chunk(offset, reader) loop async for chunk in reader.read() S->>D: write chunk_off.tok.tmp (lock-free) + sha256 end S-->>H: WrittenChunk(path, length, sha256) H->>S: commit_chunk(offset, path, length, sha256) Note over S,V: distributed lock window (short) alt status == COMPLETED S-->>H: no-op (late duplicate from another replica) else duplicate (same offset, length, sha256) S-->>H: committed=False (idempotent) else conflict (same offset, different content) S-->>H: ChunkConflictError 409 else new chunk S->>D: atomic rename tmp to chunk_off.dat S->>V: update SessionState with new record S-->>H: ChunkCommitResult(is_final_commit?) end opt is_final_commit H->>S: assemble(target) S->>D: concat chunks in offset order S->>V: mark status=COMPLETED H->>S: cleanup() S->>D: unlink chunk files (state kept for TTL) end H-->>H: 204 No ContentLocking & HA semantics
Session state is NOT held in memory and NOT on the shared filesystem.
TusUploadSessionis stateless in-process — every operation reads/writes theSessionState in Valkey, and chunk payload files are content-addressed.
This makes it safe for the production topology of multiple storage-proxy
processes/replicas sharing an NFS mount: no in-process state to synchronize,
no dependency on filesystem lock semantics, no dependency on NFS attribute-cache
coherence.
How replicas coordinate:
DistributedLockFactory)serializes the metadata read-modify-write window. The factory pattern mirrors
the manager's
DistributedLockFactory— the lock backend lives inRootContext.tus_lock_factoryand is injected into the engine.Path.replace) promotes a temp chunk file(
chunk_<offset>.<token>.tmp) to its canonical name (chunk_<offset>.dat).Readers see old-or-new, never a partial file.
no-ops, mismatched content surfaces as a 409
ChunkConflictError.session affinity, and still resolves to the same Valkey key and same chunk
file path.
Stale or abandoned sessions are reclaimed by the Valkey state's TTL (24h), so
there is no separate GC sweep.
Resolves BA-6154, BA-6155.