fix(BA-3974): make multi-proxy TUS uploads safe via per-offset chunks#11757
Closed
jopemachine wants to merge 2 commits into
Closed
fix(BA-3974): make multi-proxy TUS uploads safe via per-offset chunks#11757jopemachine wants to merge 2 commits into
jopemachine wants to merge 2 commits into
Conversation
The TUS PATCH handler appended every chunk to a single temp file under
`vfpath/.upload/<session>`. With a load-balanced storage-proxy fleet
sharing one NFS mount, concurrent PATCH requests from different replicas
race on that file and silently corrupt the upload.
Replace the append model with a metadata-driven chunk store:
<session>/
info.json # atomic-rename-only source of truth
.lock # fcntl.flock target (short critical section)
chunks/chunk_<offset>.dat
- Each PATCH streams into a unique temp file, computes sha256, then under
a brief metadata lock either commits a new chunk record, idempotently
no-ops on a duplicate replay, or fails with 409 on a hash conflict.
- The replica whose commit first closes the contiguous prefix to the
declared total size is the only one that assembles and cleans up.
- Add TUS Checksum extension support (`Upload-Checksum: sha256 <b64>`,
HTTP 460 on mismatch) and advertise it in OPTIONS.
- Add `GET /upload/status` returning `chunks_received`, `missing_ranges`,
`committed_offset`, `progress_percent` so clients can resume after a
proxy crash without retransmitting completed ranges.
- Enrich PATCH responses with `X-Backend-Ai-Chunks-Received` and
`X-Backend-Ai-Progress-Percent` headers (additive, optional).
Wire protocol (PATCH/HEAD/OPTIONS on /upload) stays TUS 1.0.0-compatible,
so existing tus-js-client based clients work unchanged.
Resolves BA-3974.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Member
Author
Member
Author
|
Superseded by the BA-3974 PR stack, which decomposes this single monolithic PR into reviewable, independently-mergeable pieces (and combines the state model + storage class after review feedback):
Same wire contract and behavior; the stack is what will be merged. Closing this one. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tus_upload_partwith a metadata-driven per-offset chunk store (info.json+chunks/chunk_<offset>.dat) so concurrent PATCH requests from multiple Storage Proxy replicas can no longer corrupt an upload on a shared NFS mount.Upload-Checksum: sha256 <b64>, HTTP 460 on mismatch) and advertise it in OPTIONS.GET /upload/statusreturningchunks_received,missing_ranges,committed_offset, andprogress_percentso clients can resume after a proxy crash without retransmitting completed ranges.X-Backend-Ai-Chunks-Received/X-Backend-Ai-Progress-Percentheaders (additive, optional).The wire contract on
/upload(PATCH/HEAD/OPTIONS) remains TUS 1.0.0-compatible, so existing tus-js-client based callers work unchanged.Resolves BA-3974.