Skip to content

Fix "draft not found" error on send: retry DB query and clean up sending state#2727

Open
bengotow wants to merge 2 commits into
masterfrom
claude/awesome-ritchie-AwTQG
Open

Fix "draft not found" error on send: retry DB query and clean up sending state#2727
bengotow wants to merge 2 commits into
masterfrom
claude/awesome-ritchie-AwTQG

Conversation

@bengotow
Copy link
Copy Markdown
Collaborator

What I observed in Sentry

Investigating MAILSPRING-CLIENT-6 and MAILSPRING-CLIENT-T: 155 + 53 = 208 users are hitting "Could not find draft after finalizing session for sending." consistently across Windows and macOS, with some users hitting it repeatedly (one user triggered it 7+ times in a single day).

The Sentry stack trace confirms the error always fires from line 560 of draft-store.ts — the second DatabaseStore.findBy null-check, after session.changes.commit() and session.teardown() have both completed. The first null-check (line 535, session.draft()) never fires, which means the draft IS in memory — it just isn't found in SQLite at the time of the post-commit query.

Root cause

session.changes.commit() calls changeSetCommit() in DraftEditingSession, which queues a SyncbackDraftTask and then awaits TaskQueue.waitForPerformLocal(task). That promise resolves as soon as the task's status changes from 'local' to anything else — including 'cancelled' — because hasRunLocally() returns true for any non-'local' status.

This creates two failure paths:

  1. Cancelled task: If the SyncbackDraftTask is cancelled before its local phase runs (e.g. a concurrent destroyDraft call, a window close, or the sync engine rejecting the task), waitForPerformLocal resolves immediately with no draft written to SQLite.

  2. Sync engine transaction ordering race: The C++ sync engine may commit the task-status update (local → remote) in a separate SQLite transaction from the draft row write. If Electron's waitForPerformLocal resolves based on the task-status transaction, the subsequent DatabaseStore.findBy can run before the draft transaction commits — returning null even though the write is imminent.

Either way DatabaseStore.findBy({ headerMessageId, draft: true }) returns null, the error dialog fires, and — as a secondary bug — _draftsSending[headerMessageId] is never cleared, leaving the composer stuck in a permanent "sending…" state.

There is also a third bug I spotted while reading: _waitingForLocal.filter(...) and _waitingForRemote.filter(...) in TaskQueue call .filter() purely for its side-effects but discard the return value, so resolved entries are never removed and both arrays grow without bound (memory leak).

Fix

Three targeted changes:

  1. Retry the post-commit database query once with a 150ms delay (draft-store.ts). This is long enough to absorb the transaction-ordering race (the _onQueueChangedDebounced throttle is itself 150 ms) while being imperceptible to the user on the happy path.

  2. Reset _draftsSending and trigger on both error paths (draft-store.ts) so the UI is never left in a stuck "sending…" state after this error.

  3. Assign the .filter() result back in TaskQueue._onQueueChangedDebounced (task-queue.ts) so _waitingForLocal / _waitingForRemote shrink as entries are resolved.

Test plan

  • Compose a new draft and send it normally — verify send succeeds without delay
  • Compose with Undo Send delay enabled — verify the undo-send flow still works
  • Quick-reply send — verify no regression
  • If possible, simulate a cancelled SyncbackDraftTask (e.g. rapid send + delete) and verify the error dialog appears once and the composer is no longer stuck in "sending…" afterwards

Fixes MAILSPRING-CLIENT-6, MAILSPRING-CLIENT-T

https://claude.ai/code/session_01M2WtCF7rb3ijmYbMYDaYWR


Generated by Claude Code

After session.changes.commit() calls waitForPerformLocal(), the resolved
promise indicates the SyncbackDraftTask status changed to 'remote' or
'cancelled'. However, 'cancelled' status also satisfies hasRunLocally(),
meaning waitForPerformLocal can resolve even when the draft was never
written to SQLite (e.g. if the task was cancelled by a concurrent destroy,
or due to a race where the C++ sync engine commits the task status in a
separate transaction before writing the draft row).

This caused DatabaseStore.findBy({ headerMessageId, draft: true }) to return
null for 155+ users, showing an error dialog and leaving the composer stuck
in a "sending..." state indefinitely.

Three fixes:
1. Retry the post-commit database query once with a 150ms delay, which
   handles the race condition where the task status commits slightly before
   the draft row is written to SQLite.
2. Reset _draftsSending[headerMessageId] and trigger() at both error call
   sites so the UI does not remain stuck in "sending" state after the error.
3. Fix _waitingForLocal / _waitingForRemote memory leak in TaskQueue:
   Array.filter() was called for its side-effects (calling resolve()) but
   the result was discarded, so the arrays grew without bound. Assign the
   filtered result back so resolved entries are removed.

Fixes MAILSPRING-CLIENT-6, MAILSPRING-CLIENT-T

https://claude.ai/code/session_01M2WtCF7rb3ijmYbMYDaYWR
@indent-staging
Copy link
Copy Markdown
Contributor

indent-staging Bot commented May 22, 2026

PR Summary

Investigates and partially fixes a production bug where ~155 users got stuck with a "Sending..." composer state after the post-commit draft lookup returned null. The root cause is that Task.hasRunLocally() returns true for cancelled tasks too, so waitForPerformLocal can resolve before the C++ sync engine writes the draft row. After feedback, the speculative 150ms retry was dropped in favor of richer Sentry instrumentation so the next occurrence can be diagnosed.

  • draft-store.ts: reset _draftsSending[headerMessageId] = false and trigger() on both error branches so the composer no longer remains stuck in "sending..."
  • draft-store.ts: collect a diagnostics object throughout _onSendDraft (session existence, draft id/account before+after ensureCorrectAccount, dirty fields before+after commit, commitPromiseInFlight, failedAt) and attach it to the Error reported via AppEnv.reportError
  • draft-store.ts: _onUnexpectedNotFoundDuringSend(headerMessageId?, diagnostics?) signature change to pass that context through
  • task-queue.ts: assign the .filter() result back to _waitingForLocal / _waitingForRemote to actually drop resolved waiters (was a memory leak — .filter() was being used purely for the side effect of resolve())

Issues

3 potential issues found:

  • Diagnostics are attached to the Error via Object.assign(err as any, diagnostics), but AppEnv.reportError(error, extra) already accepts a dedicated extra parameter for Sentry context (used at app-env.ts:197 with { url, line, column }). Passing diagnostics through extra would drop both as any casts and match the existing pattern. → Autofix
  • Error path resets _draftsSending but never calls _doneWithSession, so the torn-down session entry remains in _draftSessions; if the user retries Send on the same composer it reuses the dead session and gets stuck again. Pre-existing, but the PR's stated goal is to unstick the composer. → Autofix
  • commitPromiseInFlight = !!(session.changes as any)._commitPromise reaches into the underscore-prefixed private field of DraftChangeSet. If _commitPromise is ever renamed/refactored, this diagnostic silently becomes a permanent false with no compiler warning. → Autofix

CI Checks

All CI checks pass on commit 0568f7d7.

Custom Rules 3 rules evaluated, 3 passed, 0 failed

Passing This is a longer title to see what happens when they are too long to fit
Passing B
Passing Ben Rule

View all rules


⚡ Autofix All Issues

Comment thread app/src/flux/stores/draft-store.ts Outdated
if (!draft) {
this._draftsSending[headerMessageId] = false;
this.trigger({ headerMessageId });
return this._onUnexpectedNotFoundDuringSend(headerMessageId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After session.teardown() at line 554, this error path resets _draftsSending[headerMessageId] but never removes the entry from _draftSessions. The success path at line 584 calls this._doneWithSession(session) which both tears down and deletes the map entry. Here the session is torn down but the map entry survives, so a follow-up Actions.sendDraft(headerMessageId) or any other call to DraftStore.sessionForClientId(headerMessageId) will resolve to this dead session and await prepare() on it. The PR's headline fix (un-stuck composer) is only partial unless _doneWithSession(session) is called here too — or you swap session.teardown() on line 554 for this._doneWithSession(session) so it's handled centrally for both success and failure paths. (Trigger: user retries Send from the same composer after seeing the not-found dialog.)

@bengotow
Copy link
Copy Markdown
Collaborator Author

I like the ancillary fixes that you've put in place for the task queue logic not filtering correctly and the need to reset draft sending. However, I'm pretty skeptical of the 150 ms delay having any impact. I've confirmed the task status transaction happens after the draft is written, and there is no condition in which this moves into a cancelled state in performLocal.

One scenario I want to explore is whether there is an edge case in ensureCorrectAccount that could potentially cause the draft header message ID to change or for the DestroyDraftTask to destroy the new draft, not the old draft. If it's helpful, you should reference the Mailspring sync code here to make sure you have a complete picture: https://github.com/Foundry376/Mailspring-Sync/blob/master/MailSync/TaskProcessor.cpp#L862

If nothing else, maybe we leave the core code alone in this release, and instead add a bit more metadata to the Sentry traces to let us know if the ensureCorrectAccount code (or other branches) were run to narrow the problem.

…d error

Instead of a speculative retry delay (which has no proven impact), collect
contextual diagnostics throughout _onSendDraft and attach them to the error
reported to Sentry when a draft cannot be found:

- sessionExisted: was a DraftEditingSession already open before send?
- draftId/AccountId before and after ensureCorrectAccount
- ensureCorrectAccountChangedAccount: did account-switching run?
- dirtyFields before/after commit, commitPromiseInFlight
- failedAt: which null-check triggered (post-ensureCorrectAccount vs post-commit)

This will let us pinpoint the failing code path in the next occurrence without
adding overhead to the happy path.

Also updates _onUnexpectedNotFoundDuringSend signature to accept diagnostics
and attaches them to the Error object before AppEnv.reportError().

https://claude.ai/code/session_01M2WtCF7rb3ijmYbMYDaYWR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants