Fix "draft not found" error on send: retry DB query and clean up sending state#2727
Fix "draft not found" error on send: retry DB query and clean up sending state#2727bengotow wants to merge 2 commits into
Conversation
After session.changes.commit() calls waitForPerformLocal(), the resolved
promise indicates the SyncbackDraftTask status changed to 'remote' or
'cancelled'. However, 'cancelled' status also satisfies hasRunLocally(),
meaning waitForPerformLocal can resolve even when the draft was never
written to SQLite (e.g. if the task was cancelled by a concurrent destroy,
or due to a race where the C++ sync engine commits the task status in a
separate transaction before writing the draft row).
This caused DatabaseStore.findBy({ headerMessageId, draft: true }) to return
null for 155+ users, showing an error dialog and leaving the composer stuck
in a "sending..." state indefinitely.
Three fixes:
1. Retry the post-commit database query once with a 150ms delay, which
handles the race condition where the task status commits slightly before
the draft row is written to SQLite.
2. Reset _draftsSending[headerMessageId] and trigger() at both error call
sites so the UI does not remain stuck in "sending" state after the error.
3. Fix _waitingForLocal / _waitingForRemote memory leak in TaskQueue:
Array.filter() was called for its side-effects (calling resolve()) but
the result was discarded, so the arrays grew without bound. Assign the
filtered result back so resolved entries are removed.
Fixes MAILSPRING-CLIENT-6, MAILSPRING-CLIENT-T
https://claude.ai/code/session_01M2WtCF7rb3ijmYbMYDaYWR
|
| if (!draft) { | ||
| this._draftsSending[headerMessageId] = false; | ||
| this.trigger({ headerMessageId }); | ||
| return this._onUnexpectedNotFoundDuringSend(headerMessageId); |
There was a problem hiding this comment.
After session.teardown() at line 554, this error path resets _draftsSending[headerMessageId] but never removes the entry from _draftSessions. The success path at line 584 calls this._doneWithSession(session) which both tears down and deletes the map entry. Here the session is torn down but the map entry survives, so a follow-up Actions.sendDraft(headerMessageId) or any other call to DraftStore.sessionForClientId(headerMessageId) will resolve to this dead session and await prepare() on it. The PR's headline fix (un-stuck composer) is only partial unless _doneWithSession(session) is called here too — or you swap session.teardown() on line 554 for this._doneWithSession(session) so it's handled centrally for both success and failure paths. (Trigger: user retries Send from the same composer after seeing the not-found dialog.)
|
I like the ancillary fixes that you've put in place for the task queue logic not filtering correctly and the need to reset draft sending. However, I'm pretty skeptical of the 150 ms delay having any impact. I've confirmed the task status transaction happens after the draft is written, and there is no condition in which this moves into a cancelled state in performLocal. One scenario I want to explore is whether there is an edge case in If nothing else, maybe we leave the core code alone in this release, and instead add a bit more metadata to the Sentry traces to let us know if the ensureCorrectAccount code (or other branches) were run to narrow the problem. |
…d error Instead of a speculative retry delay (which has no proven impact), collect contextual diagnostics throughout _onSendDraft and attach them to the error reported to Sentry when a draft cannot be found: - sessionExisted: was a DraftEditingSession already open before send? - draftId/AccountId before and after ensureCorrectAccount - ensureCorrectAccountChangedAccount: did account-switching run? - dirtyFields before/after commit, commitPromiseInFlight - failedAt: which null-check triggered (post-ensureCorrectAccount vs post-commit) This will let us pinpoint the failing code path in the next occurrence without adding overhead to the happy path. Also updates _onUnexpectedNotFoundDuringSend signature to accept diagnostics and attaches them to the Error object before AppEnv.reportError(). https://claude.ai/code/session_01M2WtCF7rb3ijmYbMYDaYWR
What I observed in Sentry
Investigating MAILSPRING-CLIENT-6 and MAILSPRING-CLIENT-T: 155 + 53 = 208 users are hitting
"Could not find draft after finalizing session for sending."consistently across Windows and macOS, with some users hitting it repeatedly (one user triggered it 7+ times in a single day).The Sentry stack trace confirms the error always fires from line 560 of
draft-store.ts— the secondDatabaseStore.findBynull-check, aftersession.changes.commit()andsession.teardown()have both completed. The first null-check (line 535,session.draft()) never fires, which means the draft IS in memory — it just isn't found in SQLite at the time of the post-commit query.Root cause
session.changes.commit()callschangeSetCommit()inDraftEditingSession, which queues aSyncbackDraftTaskand then awaitsTaskQueue.waitForPerformLocal(task). That promise resolves as soon as the task's status changes from'local'to anything else — including'cancelled'— becausehasRunLocally()returnstruefor any non-'local'status.This creates two failure paths:
Cancelled task: If the
SyncbackDraftTaskis cancelled before its local phase runs (e.g. a concurrentdestroyDraftcall, a window close, or the sync engine rejecting the task),waitForPerformLocalresolves immediately with no draft written to SQLite.Sync engine transaction ordering race: The C++ sync engine may commit the task-status update (
local → remote) in a separate SQLite transaction from the draft row write. If Electron'swaitForPerformLocalresolves based on the task-status transaction, the subsequentDatabaseStore.findBycan run before the draft transaction commits — returningnulleven though the write is imminent.Either way
DatabaseStore.findBy({ headerMessageId, draft: true })returnsnull, the error dialog fires, and — as a secondary bug —_draftsSending[headerMessageId]is never cleared, leaving the composer stuck in a permanent "sending…" state.There is also a third bug I spotted while reading:
_waitingForLocal.filter(...)and_waitingForRemote.filter(...)inTaskQueuecall.filter()purely for its side-effects but discard the return value, so resolved entries are never removed and both arrays grow without bound (memory leak).Fix
Three targeted changes:
Retry the post-commit database query once with a 150ms delay (
draft-store.ts). This is long enough to absorb the transaction-ordering race (the_onQueueChangedDebouncedthrottle is itself 150 ms) while being imperceptible to the user on the happy path.Reset
_draftsSendingand trigger on both error paths (draft-store.ts) so the UI is never left in a stuck "sending…" state after this error.Assign the
.filter()result back inTaskQueue._onQueueChangedDebounced(task-queue.ts) so_waitingForLocal/_waitingForRemoteshrink as entries are resolved.Test plan
SyncbackDraftTask(e.g. rapid send + delete) and verify the error dialog appears once and the composer is no longer stuck in "sending…" afterwardsFixes MAILSPRING-CLIENT-6, MAILSPRING-CLIENT-T
https://claude.ai/code/session_01M2WtCF7rb3ijmYbMYDaYWR
Generated by Claude Code