Skip to content

test(BA-6157): add multi-proxy NFS race regression test#11768

Draft
jopemachine wants to merge 1 commit into
fix/BA-6156-rewire-tus-handlersfrom
fix/BA-6157-nfs-race-regression-test
Draft

test(BA-6157): add multi-proxy NFS race regression test#11768
jopemachine wants to merge 1 commit into
fix/BA-6156-rewire-tus-handlersfrom
fix/BA-6157-nfs-race-regression-test

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented May 22, 2026

📚 Stacked PRs

This PR is part of a 5-PR stack implementing BA-3974 (epic: BA-6153). Merge in order:

  1. ⬇️ feat(BA-6155): add metadata-driven chunk-store upload session engine #11766feat(BA-6155): add metadata-driven chunk-store upload session engine
  2. ⬇️ fix(BA-6156): rewire TUS PATCH/HEAD to chunk-based store #11767fix(BA-6156): rewire TUS PATCH/HEAD to chunk-based store (actual user-visible fix)
  3. 👉 test(BA-6157): add multi-proxy NFS race regression test #11768test(BA-6157): add multi-proxy NFS race regression test ← you are here
  4. ⬇️ feat(BA-6158): support TUS Checksum extension #11769feat(BA-6158): support TUS Checksum extension
  5. ⬇️ feat(BA-6159): add /upload/status endpoint and progress headers #11770feat(BA-6159): add /upload/status endpoint + progress headers

Summary

  • Reproduce the multi-proxy upload corruption on a local filesystem by mocking `Path.stat` (simulating NFS attribute cache staleness) and using `lseek` + `write` (simulating the lack of cross-client `O_APPEND` atomicity).
  • `TestLegacyModelCorruptsUnderStaleStatCache` proves the bug existed in the pre-PR append-only model.
  • `TestNewModelSurvivesSameChaos` drives the same workload through `TusUploadSession` and confirms byte-perfect output, including under 3× duplicate retry storms.
  • Test-only; no production code change.

Resolves BA-6157.

@github-actions github-actions Bot added the size:L 100~500 LoC label May 22, 2026
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from 8d131c2 to 3ab9dd3 Compare May 22, 2026 07:04
@jopemachine jopemachine added the skip:changelog Make the action workflow to skip towncrier check label May 22, 2026
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from dada14e to 0d1bf13 Compare May 26, 2026 05:55
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch 2 times, most recently from a1a01a2 to b9640d3 Compare May 26, 2026 08:29
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from 0d1bf13 to 7dcbab8 Compare May 26, 2026 08:29
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from b9640d3 to da3310e Compare May 26, 2026 08:50
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch 2 times, most recently from 84e303d to fc4dc36 Compare May 26, 2026 09:04
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from da3310e to 5407549 Compare May 26, 2026 09:04
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from fc4dc36 to 968d6c3 Compare May 26, 2026 09:11
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from 5407549 to 74d9524 Compare May 26, 2026 09:11
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from 968d6c3 to b40ebc8 Compare May 26, 2026 09:13
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch 2 times, most recently from 4c2e6ac to 5460ca4 Compare May 26, 2026 09:19
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from b40ebc8 to a94454f Compare May 26, 2026 09:19
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from 5460ca4 to 7278de5 Compare May 26, 2026 09:31
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from a94454f to 583cb92 Compare May 26, 2026 09:31
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from 7278de5 to d5caa79 Compare May 28, 2026 05:02
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from 583cb92 to 1f2fc01 Compare May 28, 2026 05:02
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from d5caa79 to 6b2f68a Compare May 28, 2026 05:10
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from 1f2fc01 to 4c07f47 Compare May 28, 2026 05:10
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from 9ecba5c to 0d1a512 Compare June 1, 2026 05:43
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from 9bdda29 to fa1ca45 Compare June 1, 2026 05:48
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch 2 times, most recently from e37ca75 to cf51fc2 Compare June 1, 2026 05:51
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from fa1ca45 to 1665bd6 Compare June 1, 2026 05:51
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from cf51fc2 to c171fa4 Compare June 1, 2026 06:15
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch 2 times, most recently from 56d3465 to ef41a44 Compare June 1, 2026 06:19
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch 2 times, most recently from e517c70 to 6acff68 Compare June 1, 2026 06:21
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch 2 times, most recently from 6916fd1 to 88c67f3 Compare June 1, 2026 06:30
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from 6acff68 to e14f59f Compare June 1, 2026 06:30
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from 88c67f3 to d627067 Compare June 1, 2026 06:35
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch 2 times, most recently from 51181ab to fccee9f Compare June 1, 2026 06:44
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from d627067 to eab1435 Compare June 1, 2026 06:44
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from fccee9f to ee1e510 Compare June 1, 2026 06:52
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch 2 times, most recently from 3187f15 to d8ffda6 Compare June 1, 2026 07:29
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from ee1e510 to 087b43d Compare June 1, 2026 07:29
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from d8ffda6 to 7c3d4ea Compare June 1, 2026 07:33
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch 2 times, most recently from 5dd870e to c9c67a8 Compare June 1, 2026 07:45
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch 2 times, most recently from 99ed0cb to dcf7dc6 Compare June 1, 2026 07:55
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from c9c67a8 to e35aee1 Compare June 1, 2026 07:55
@jopemachine jopemachine force-pushed the fix/BA-6157-nfs-race-regression-test branch from dcf7dc6 to 363302f Compare June 1, 2026 08:00
@jopemachine jopemachine force-pushed the fix/BA-6156-rewire-tus-handlers branch from e35aee1 to 8482a07 Compare June 1, 2026 08:00
Lock in the BA-3974 fix with a unit test that reproduces the cross-replica
upload corruption on a local filesystem — no real NFS required.

The trick is to fake the two NFS-only behaviors that conspire to break the
legacy append model:

  - NFS attribute cache staleness: multiple clients can observe the same
    cached \`st_size\`. We simulate this by patching \`pathlib.Path.stat\` to
    return a frozen snapshot for the upload temp file.
  - Lack of cross-client \`O_APPEND\` atomicity: each NFS client computes
    its own write position from its cached size. We simulate this with
    \`os.lseek\` + \`os.write\` instead of an \`O_APPEND\` writer.

Two test classes:
  - \`TestLegacyModelCorruptsUnderStaleStatCache\` — proves the bug exists.
    All concurrent workers pass the BA-3678 offset guard (because they
    all observe size = 0) and clobber each other at position 0. The
    on-disk file is shorter than the expected concatenation and does not
    match the source payload.
  - \`TestNewModelSurvivesSameChaos\` — same workload through
    \`TusUploadSession\` produces a byte-perfect result, including under
    3x duplicate retries per chunk fired concurrently.

This serves as objective regression evidence: a future reader can run this
test to confirm both that the legacy model was broken and that the new
model fixes it.

Resolves BA-6157. Part of epic BA-6153 (implements BA-3974).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100~500 LoC skip:changelog Make the action workflow to skip towncrier check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant