Skip to content

fix(BA-3974): make multi-proxy TUS uploads safe via per-offset chunks#11757

Closed
jopemachine wants to merge 2 commits into
mainfrom
fix/BA-3974-safe-chunk-upload
Closed

fix(BA-3974): make multi-proxy TUS uploads safe via per-offset chunks#11757
jopemachine wants to merge 2 commits into
mainfrom
fix/BA-3974-safe-chunk-upload

Conversation

@jopemachine
Copy link
Copy Markdown
Member

Summary

  • Replace single-file append in tus_upload_part with a metadata-driven per-offset chunk store (info.json + chunks/chunk_<offset>.dat) so concurrent PATCH requests from multiple Storage Proxy replicas can no longer corrupt an upload on a shared NFS mount.
  • Add TUS Checksum extension (Upload-Checksum: sha256 <b64>, HTTP 460 on mismatch) and advertise it in OPTIONS.
  • Add GET /upload/status returning chunks_received, missing_ranges, committed_offset, and progress_percent so clients can resume after a proxy crash without retransmitting completed ranges.
  • Enrich PATCH responses with X-Backend-Ai-Chunks-Received / X-Backend-Ai-Progress-Percent headers (additive, optional).

The wire contract on /upload (PATCH/HEAD/OPTIONS) remains TUS 1.0.0-compatible, so existing tus-js-client based callers work unchanged.

Resolves BA-3974.

The TUS PATCH handler appended every chunk to a single temp file under
`vfpath/.upload/<session>`. With a load-balanced storage-proxy fleet
sharing one NFS mount, concurrent PATCH requests from different replicas
race on that file and silently corrupt the upload.

Replace the append model with a metadata-driven chunk store:

  <session>/
    info.json              # atomic-rename-only source of truth
    .lock                  # fcntl.flock target (short critical section)
    chunks/chunk_<offset>.dat

- Each PATCH streams into a unique temp file, computes sha256, then under
  a brief metadata lock either commits a new chunk record, idempotently
  no-ops on a duplicate replay, or fails with 409 on a hash conflict.
- The replica whose commit first closes the contiguous prefix to the
  declared total size is the only one that assembles and cleans up.
- Add TUS Checksum extension support (`Upload-Checksum: sha256 <b64>`,
  HTTP 460 on mismatch) and advertise it in OPTIONS.
- Add `GET /upload/status` returning `chunks_received`, `missing_ranges`,
  `committed_offset`, `progress_percent` so clients can resume after a
  proxy crash without retransmitting completed ranges.
- Enrich PATCH responses with `X-Backend-Ai-Chunks-Received` and
  `X-Backend-Ai-Progress-Percent` headers (additive, optional).

Wire protocol (PATCH/HEAD/OPTIONS on /upload) stays TUS 1.0.0-compatible,
so existing tus-js-client based clients work unchanged.

Resolves BA-3974.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot added size:XL 500~ LoC comp:storage-proxy Related to Storage proxy component labels May 22, 2026
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jopemachine
Copy link
Copy Markdown
Member Author

Superseded by sub-issue PRs under epic BA-6153 (BA-6154 through BA-6159). This PR remains open as the integration view; the actual review-ready PRs are the 6 smaller stacked ones to follow.

@jopemachine jopemachine added the skip:changelog Make the action workflow to skip towncrier check label May 22, 2026
@jopemachine
Copy link
Copy Markdown
Member Author

Superseded by the BA-3974 PR stack, which decomposes this single monolithic PR into reviewable, independently-mergeable pieces (and combines the state model + storage class after review feedback):

Same wire contract and behavior; the stack is what will be merged. Closing this one.

@jopemachine jopemachine deleted the fix/BA-3974-safe-chunk-upload branch May 28, 2026 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:storage-proxy Related to Storage proxy component size:XL 500~ LoC skip:changelog Make the action workflow to skip towncrier check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant