feat(auth): remove navigator.locks-based mutex; introduce commit guard + dispose()#2392
Open
Bewinxed wants to merge 7 commits into
Open
feat(auth): remove navigator.locks-based mutex; introduce commit guard + dispose()#2392Bewinxed wants to merge 7 commits into
Bewinxed wants to merge 7 commits into
Conversation
6 tasks
0b30f2d to
36261d5
Compare
36261d5 to
026e1ee
Compare
@supabase/auth-js
@supabase/functions-js
@supabase/postgrest-js
@supabase/realtime-js
@supabase/storage-js
@supabase/supabase-js
commit: |
…guard + dispose() Removes the `_acquireLock` mutex that wrapped every auth operation and the underlying `navigator.locks` / `processLock` machinery. Replaces it with two lighter primitives that target the specific synchronization needs each operation actually has: - `refreshingDeferred` (already existed) continues to single-flight the refresh path within an instance. - A storage-level commit guard in `_callRefreshToken` re-reads the storage refresh_token between the rotated-tokens response and `_saveSession`. If storage changed under us (e.g. a concurrent `signOut` ran `_removeSession`), the rotated tokens are discarded rather than written back. Returns `AuthRefreshDiscardedError` on the result. Cross-tab refresh races are handled server-side by GoTrue's v1 parent-of-active mechanism at `internal/tokens/service.go:376-385`, so no client-side coordination is needed. New `client.dispose()` tears down the auto-refresh interval, the `visibilitychange` listener, and the BroadcastChannel; clears registered `onAuthStateChange` subscribers. Idempotent. Call from cleanup hooks in React Strict Mode / HMR contexts to prevent stale tickers from outliving the client. The `lock` and `lockAcquireTimeout` constructor options are silently ignored for backwards compatibility; both are marked `@deprecated`. `navigatorLock`, `processLock`, and the `LockAcquireTimeoutError` family remain exported from `./lib/locks` for one major version. Stale lock references in JSDoc on `getSession`, `onAuthStateChange`, `_challengeAndVerify`, and `_listFactors` updated to match the new model. Test branch only, not for merge yet. See RFC `lockless_auth_coordination`. Resolves (test target): supabase#2013, supabase#936, supabase#2111
…ispose() _useSession: explain that concurrent callers can both reach `__loadSession` because storage reads are idempotent and the only write path (refresh) is single-flighted downstream by `refreshingDeferred` in `_callRefreshToken`. No serialization is needed at this layer. dispose(): add a lifecycle caveat clarifying that in-flight refreshes are not aborted, so a disposed instance can still persist a rotated session to storage after `dispose()` returns. A subsequent `createClient` against the same `storageKey` will pick that session up. Notes the mitigation (await pending ops before dispose, or use a fresh `storageKey`). Doc-only changes; no runtime behaviour change.
The two tests at `SupabaseAuthClient.test.ts:61-77` were asserting that `lockAcquireTimeout` is stored as a runtime field on the auth client (`expect((authClient as any).lockAcquireTimeout).toBe(30_000)`). That field no longer exists after the lockless refactor — the option is accepted by the type for backwards compatibility but is silently ignored at runtime because the client doesn't acquire a lock around auth operations. Rewrite both tests to verify the new contract: - `_initSupabaseAuthClient` accepts the option without throwing. - `createClient` accepts it through `auth.lockAcquireTimeout`, the auth client is still constructed, but the value is not stored as a runtime field (`toBeUndefined()`). Fixes the CI failure in `Supabase-JS Integration CI / Unit + Type Check` (2 failed, 102 passed → now 104 passed) on all three platforms.
265c2a9 to
3ca0099
Compare
Contributor
|
Do not merge until we ask users to test, and dogfood. |
This was referenced May 26, 2026
Open
This was referenced May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
What changed?
GoTrueClientnow defaults to lockless coordination. Thenavigator.locks-based mutex no longer runs by default — most users get the lockless path. Callers who explicitly pass a customlock(typically React NativeprocessLockor Node multi-process setups) keep the old behavior on an opt-in legacy path so the change is backwards-compatible.The lockless default uses:
refreshingDeferred(already existed) to single-flight in-instance concurrent refreshes._callRefreshToken: (1) snapshots storage before the rotated-token fetch and re-reads after, discarding rotated tokens if a non-null snapshot was cleared between the two reads (typical case: a concurrentsignOutran_removeSession); (2) captures a session-removal epoch counter before_saveSessionand re-checks after, so asignOutthat interleaves inside_saveSession's storage-write awaits is also caught. Either leg returns{ data: null, error: new AuthRefreshDiscardedError() }._recoverAndRefreshrecognises this error and skips its own_removeSession()call so no duplicateSIGNED_OUTevent fires.AuthRefreshDiscardedError(with anisAuthRefreshDiscardedErrortype guard), is what the commit guard returns when it throws away a successfully-rotated token. Distinct fromAuthRetryableFetchError(transient network) andAuthApiError(server rejection). Surfaces throughrefreshSession()andgetSession()results.client.auth.dispose()tears down the auto-refresh interval, thevisibilitychangelistener, theBroadcastChannel, and registeredonAuthStateChangesubscribers. Idempotent. Designed for React Strict Mode and HMR cleanup hooks. In-flightfetchcalls are not aborted — they run to completion.internal/tokens/service.go:376-385, so the client doesn't need to coordinate. The current default is v1 (RefreshTokenAlgorithmVersion = 0); v2 (counter-based) is opt-in viaRefreshTokenUpgradePercentageand gated rollout. This PR's reasoning relies on v1 behaviour, which is what customers actually run today.The legacy opt-in path (
settings.lock != null):_acquireLock,pendingInLock,lockAcquired,lockAcquireTimeout, and all 14 call-site wrappers are preserved verbatim frommasterand gated onthis.lock != null. The default path is untouched by them.// TODO(v3): remove legacy lock pathso the eventual v3 cleanup is a mechanical search-and-delete. A separate Linear ticket on the supabase-js v3 project tracks that work.lockandlockAcquireTimeoutconstructor options are accepted and honored when supplied; both are@deprecatedin JSDoc with "Custom locks still work in v2.x for backwards compatibility. Will be removed in v3."navigatorLock,processLock,LockAcquireTimeoutError,NavigatorLockAcquireTimeoutError,ProcessLockAcquireTimeoutError, andinternalsinlib/locks.tsremain exported. Still@deprecated.Stale JSDoc referencing the lock on
getSession,onAuthStateChange,_useSession,_challengeAndVerify, and_listFactorsnow matches the new default behaviour. The@deprecatedtag on the asynconAuthStateChangeoverload is kept, with its reason updated to point at the one residual reentry hazard (refreshSessioncalled from inside aTOKEN_REFRESHEDhandler —refreshingDeferred.resolvehappens after_notifyAllSubscribersreturns).Why was this change needed?
The shared mutex has caused seven failure classes on the issue tracker for 18+ months. Each earlier patch fixed one symptom and created another:
#2235 (primitive swap to
processLock) was an earlier attempt at moving offnavigator.locksand is no longer being actively pursued. Community PRs #2016 and #2019 fixed individual symptoms and were closed waiting on a structural fix.Failure classes resolved by this PR's default (lockless) path:
INITIAL_SESSION/getSession()hangs with a persisted session (#936 @bugprone). The default path no longer touchesnavigator.locks.AbortError: Lock broken by another request with the 'steal' option(#2013 @cpannwitz). The{ steal: true }fallback that produced these errors is bypassed on the default path.GoTrueClient.ts:3824). The cycle needed the lock; a callback that awaitsgetUser()now just runs on the default path.dispose()cleans them up.signOutblocked behind in-flight refresh.signOutnow runs concurrently with the refresh on the default path, and the two-leg commit guard prevents the refresh from writing rotated tokens after_removeSession()cleared storage — whether the clear happened before the post-fetch snapshot read or inside_saveSession's storage writes.Callers on the legacy opt-in path (explicit
settings.lock) keep the old serialization semantics and the failure modes that come with them. They accepted those when they opted in; they can drop the option to migrate to the lockless default at any time.Why default lockless: cross-tab races are handled on the server (GoTrue's parent-of-active), in-instance refresh dedup is already handled by
refreshingDeferred, and the only job the lock did beyond that was serializing subscriber callbacks — which is the deadlock we're fixing.Closes (test target): #2013, #936, #2111
Examples
dispose()in a React appSubscriber callbacks can call other auth methods now (default path)
One residual hazard remains: calling
refreshSessionfrom inside aTOKEN_REFRESHEDhandler still deadlocks viarefreshingDeferred. The async overload ofonAuthStateChangekeeps its@deprecatedmarker for that reason.AuthRefreshDiscardedErrorfor the signOut-during-refresh raceLegacy lock opt-in (existing custom-lock users)
Breaking changes
The earlier draft of this PR silently dropped custom
lockfunctions, which was behaviorally breaking for opt-in users. The current design preserves that behavior on a gated legacy path so the change ships as a v2 minor. Custom-lock users have time to migrate before v3 removes the option entirely.Behaviour changes worth flagging for review
lockoption no longer acquiresnavigator.locksor any in-process lock. For most users this is invisible (the lock was internal coordination). For users who depended on observing the lock indirectly (rare), the change is observable._autoRefreshTokenTickmay now run concurrently withsignOut/setSession/getUseron the default path. Previously_acquireLock(0, ...)made the tick skip whenever any auth op held the lock. The lockless equivalent only skips whenrefreshingDeferredis set. The commit guard keeps storage consistent under the new concurrency. The legacy lock opt-in path retains the old skip-on-any-lock behavior.signOutno longer waits for an in-flight refresh's HTTP and continuation to finish before its own fetch goes out. Both fetches now run concurrently, and the commit guard keeps storage consistent.Checklist
<type>(<scope>): <description>npx nx formatto ensure consistent code formattingdispose(), subscriber re-entry no-deadlock, default vs. legacy path)Additional notes
Alternatives considered
Continuing to patch
navigator.locks(better steal recovery, lower timeouts, smarter error filtering). Each prior patch in this direction (#1962, #2106, #2178) fixed one symptom and produced another. The Web Locks API has no recovery for orphaned holders other than{ steal: true }, which leaves the previousfn()running concurrently with the new holder and creates the steal-cascade error storm. Each fix here swaps one failure for another instead of removing the contention that causes them.Swapping
navigatorLockforprocessLockas the default browser lock (the direction #2235 explored). Removes the cross-process orphan failure but keeps the same shape of the bug: one primitive serializing operations that don't all need the same synchronization.Non-blocking subscriber notifications alone (#2016). Fixes subscriber re-entry. The other six failure classes still bite.
Removing the lock entirely (no legacy opt-in). The earlier draft of this PR did this. It silently dropped custom locks supplied via
settings.lock, which broke React NativeprocessLockand Node multi-process setups. The current design preserves those callers on a gated legacy path so the change is non-breaking; the legacy path is slated for removal in v3.Adding an
AbortControllerlayer for in-flight operation cancellation. Deferred to a follow-up, not included here. The commit guard catches the races that affect correctness (signOut overwriting cleared storage with rotated tokens, in both the fetch and storage-write phases). AbortController would be a UX improvement on top: cancel the in-flight refresh as soon as signOut runs, instead of letting it finish and discarding the result. Out of scope for this PR.Deferring the commit guard and accepting eventual consistency. Considered and rejected. Without a guard, a refresh that completes after
_removeSession()cleared storage will write the rotated tokens back. Subscribers then seeSIGNED_OUTfollowed byTOKEN_REFRESHED, with stale tokens in storage until the next refresh tick (~30s) fails against the server and clears them. The guard is about 30 lines (both legs combined). Better to land it with this PR than to ship the bug and patch it later.Server-side context
Why cross-tab is safe on the default path: GoTrue's parent-of-active path at
internal/tokens/service.go:376-385(the v1 branch,*models.RefreshToken). When a request arrives with a revoked refresh token whose child is the currently-active token, the server returns the active token instead of rejecting. Two tabs that POST the same refresh token concurrently both receive the same rotated token under DB row locking. This is the production-default behaviour (RefreshTokenAlgorithmVersion = 0, v1). v2 (counter-based, gated onRefreshTokenUpgradePercentage) is safe under the same N-tab concurrency, covered byTestConcurrentReuse.Test coverage
Updated tests in
packages/core/auth-js/test/GoTrueClient.test.ts:'Lockless coordination (default) and legacy lock opt-in'describe:lockoption, assert_acquireLockis never invokedlock, assertlockIS invoked (.toHaveBeenCalled())lockAcquireTimeoutaccepted on both paths (lockless ignores it at runtime; legacy uses it)onAuthStateChangewithout deadlock'dispose() lifecycle'describe: idempotency, subscriber clearing, ticker stopping.'Refresh commit guard (signOut-during-refresh race)'describe (4 tests): cleared-mid-flight discard, empty-storage acceptance, different-session acceptance, different-session-written-mid-flight discard.Updated tests in
packages/core/auth-js/test/GoTrueClient.browser.test.ts:'Lockless coordination: navigator.locks should NOT be invoked': default path no longer touches the browser API.'Legacy lock opt-in: customlockfunction is invoked when supplied': custom lock IS called on the legacy path.Updated tests in
packages/core/supabase-js/test/unit/SupabaseAuthClient.test.ts:'should pass through lockAcquireTimeout option'and'should accept auth.lockAcquireTimeout and wire it to auth client': assert the option flows fromcreateClientthrough to the auth client instance (via a targetedas unknown as { lockAcquireTimeout: number }cast). Comment in-source explains the cast is greppable for the v3 cleanup.AI review responses
Addresses inline AI review feedback on this PR:
_saveSession(MEDIUM): closed via a new_sessionRemovalEpochcounter, incremented synchronously at the top of_removeSessionbefore anyawait, captured in_callRefreshTokenimmediately before_saveSession, and re-checked after. If the epoch advances during the save, the rotated tokens are undone directly withremoveItemAsync(not via_removeSession, which would emit a duplicateSIGNED_OUT).substring(0, 5)ofrefresh_tokenfrom all four_debug()paths (_callRefreshToken/_refreshAccessTokendebugNameconstruction, plus the two fields in the commit-guard discard log). Discard logs now use presence indicators ('present'/'replaced'/'cleared') instead of credential fragments.Open questions for review
AbortSignal.anyplumbing inlib/fetch.tsisn't done here (no AbortController layer to plumb yet). If we add the AbortController follow-up later, this becomes a prerequisite. Just so it's on the radar.@supabase/supabase-jstypes deprecation:lockandlockAcquireTimeoutare@deprecatedin the supabase-js types file too, since they delegate to the auth client. Confirm that's the right surface to deprecate.v3 cleanup
A separate Linear ticket on the SDK team /
supabase-js v3project tracks the mechanical removal of the legacy lock path in v3. Every gated site inGoTrueClient.tshas a// TODO(v3): remove legacy lock pathmarker; the ticket lists the full file-by-file scope.Status
Ready for review. Replaces #2387 (which was opened against
developbefore the v3 branching update).