Skip to content

fix: defer update instead of erroring when state storage is temporarily unwritable#336

Draft
eseidel wants to merge 4 commits into
mainfrom
fix/defer-state-write-on-permission-denied
Draft

fix: defer update instead of erroring when state storage is temporarily unwritable#336
eseidel wants to merge 4 commits into
mainfrom
fix/defer-state-write-on-permission-denied

Conversation

@eseidel
Copy link
Copy Markdown
Contributor

@eseidel eseidel commented Apr 8, 2026

Summary

Defensive fallback for the UpdateException: File::create for ".../shorebird_updater/patches_state.json" failures reported on iOS.

  • Detects ErrorKind::PermissionDenied at the state-write boundary (disk_io::write — the only writer used for state.json and patches_state.json) and translates it into a dedicated UpdateError::StateStorageUnavailable.
  • updater::update() catches that error alongside UpdateAlreadyInProgress and maps it to a new UpdateStatus::UpdateDeferred variant.
  • Adds the C status code SHOREBIRD_UPDATE_DEFERRED = 5 and extends the Dart wrapper to treat it as a successful return — ShorebirdUpdater.update() no longer throws for this case.
  • Other IO error kinds (StorageFull, NotFound, generic IO) continue to propagate as errors with the standard enhanced file-operation context.

Fixes shorebirdtech/shorebird#3683.

Why

iOS telemetry shows a long tail of UpdateException: File::create for ".../shorebird_updater/patches_state.json" failures spread across many unique installs. The app sandbox always allows the app to write inside its own Application Support, so a genuine PermissionDenied from std::fs::File::create on this path is almost always iOS Data Protection: files under Library/Application Support/ inherit NSFileProtectionCompleteUntilFirstUserAuthentication, and before the user has unlocked the device for the first time since boot (and in some edge cases while the device is locked), the OS refuses writes with EPERM/EACCES.

This is transient — the next update attempt after the device is unlocked typically succeeds — so surfacing it as a thrown UpdateException in the app's crash telemetry is misleading.

Scope: defensive fallback only

This PR is deliberately scoped to the Rust-side fallback. The proper root-cause fix lives one layer up in the Flutter engine, where we can gate the auto-updater kickoff on UIApplication.protectedDataAvailable (and the UIApplicationProtectedDataDidBecomeAvailable notification) from the main thread without any threading hacks. That work is tracked separately — see the engine-layer gating follow-up issue.

This PR remains valuable regardless of the engine-layer work because:

  1. It catches the residual case where an update was started while data was available but the device becomes locked mid-update (data becomes unavailable before the final state write).
  2. It catches manual ShorebirdUpdater.update() calls from Dart that may happen before the engine's gating logic takes effect.
  3. It is strictly additive and platform-agnostic — no iOS-specific code in the updater.

What this does NOT do

  • This does not proactively check whether state storage is writable before attempting work. That check belongs at the engine layer where UIKit access is available on the main thread.
  • This does not lower the file's Data Protection class. We considered NSFileProtectionNone but rejected it — the attack-surface analysis was fine for this specific file, but we did not want to impose a protection-class opinion on customer apps.

Version skew

Non-breaking and additive. New Dart on an old engine still sees the legacy SHOREBIRD_UPDATE_ERROR with a File::create/Permission denied message and will still throw — this fix only kicks in once both sides ship.

Test plan

  • cargo build -p updater (regenerates library/include/updater.h via cbindgen; new SHOREBIRD_UPDATE_DEFERRED constant confirmed)
  • cargo test -p updater --lib — 222/222 pass, including two new unit tests in library/src/cache/disk_io.rs:
    • permission_denied_maps_to_state_storage_unavailable
    • non_permission_denied_errors_retain_file_context
  • dart format --set-exit-if-changed lib test
  • dart analyze --fatal-warnings lib test (one pre-existing unrelated line-length warning at test/src/shorebird_updater_io_test.dart:307, not touched by this PR)
  • flutter test test/src/shorebird_updater_io_test.dart — 28/28 pass, including new when the update is deferred returns normally and does not throw
  • Full flutter test suite (not run locally; CI will cover)

Notes for reviewers

  • shorebird_code_push/lib/src/generated/updater_bindings.g.dart received a surgical single-constant edit rather than a full dart run ffigen regen. Running ffigen locally produces ~500 lines of unrelated platform-header drift.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 98.09524% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
library/src/cache/disk_io.rs 98.87% 1 Missing ⚠️
library/src/updater.rs 92.85% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@eseidel eseidel force-pushed the fix/defer-state-write-on-permission-denied branch from 39a0705 to 79f3af8 Compare April 8, 2026 03:37
eseidel added a commit to shorebirdtech/flutter that referenced this pull request Apr 8, 2026
On iOS, files under `Library/Application Support/` inherit the default
`NSFileProtectionCompleteUntilFirstUserAuthentication` class. Before the
user has unlocked the device for the first time since boot, the OS
refuses writes under that directory with EPERM/EACCES. Our updater's
state files (`state.json`, `patches_state.json`) live there.

When the engine kicked off the auto-updater thread during early app
boot on a locked or freshly-booted device, the updater would try to
download and persist a patch, then fail at the state-write step and
throw `UpdateException: File::create for ".../patches_state.json"`.
Customer telemetry shows this as a long tail of failures spread
across many unique installs.

Gate the call to `StartUpdateThread()` on `UIApplication.protectedData
Available`. Introduce a new `StartUpdateWhenProtectedDataAvailable`
abstraction in `shell/common/shorebird/`:

- `protected_data.h` declares the API.
- `protected_data.cc` provides the default implementation used on
  every platform except iOS — calls the start_fn immediately since
  there is nothing to wait for.
- `protected_data_ios.mm` provides the iOS implementation. It
  dispatches to the main queue (required for `UIApplication` access),
  checks `protectedDataAvailable`, and either calls start_fn
  immediately or registers a one-shot observer for
  `UIApplicationProtectedDataDidBecomeAvailableNotification`. On app
  extensions where `sharedApplication` is nil, it falls back to
  starting immediately.

`BUILD.gn` conditionally compiles the iOS impl and links
`Foundation.framework` + `UIKit.framework` on iOS.

`shorebird.cc` is updated to route its `StartUpdateThread()` call
through the new function. Non-iOS behavior is unchanged — the default
impl invokes start_fn synchronously.

This is the proper root-cause fix for the locked-device failures.
A complementary defensive fallback in the updater Rust library
(shorebirdtech/updater#336) catches the residual case where the
device becomes locked between the availability check and the
actual write.

Fixes shorebirdtech/shorebird#3685.
eseidel added a commit to shorebirdtech/flutter that referenced this pull request Apr 8, 2026
On iOS, files under `Library/Application Support/` inherit the default
`NSFileProtectionCompleteUntilFirstUserAuthentication` class. Before the
user has unlocked the device for the first time since boot, the OS
refuses writes under that directory with EPERM/EACCES. Our updater's
state files (`state.json`, `patches_state.json`) live there.

When the engine kicks off the auto-updater thread during early app boot
on a locked or freshly-booted device, the updater tries to download and
persist a patch, then fails at the state-write step and throws
`UpdateException: File::create for ".../patches_state.json"`. Customer
telemetry shows this as a long tail of failures spread across many
unique installs.

This change introduces a cross-platform protected-data gate in
`shell/common/shorebird/` whose iOS implementation lives in the iOS
platform layer (`shell/platform/darwin/ios/framework/Source/`), where
UIKit linkage and main-thread access already belong. The common layer
declares the abstraction and holds a process-global function pointer
that the platform embedder installs during engine init:

  shell/common/shorebird/protected_data.h
    - StartUpdateWhenProtectedDataAvailable(start_fn): cross-platform
      entry point called by shorebird.cc.
    - SetProtectedDataGate(gate): installs a platform implementation.
      Default gate invokes start_fn immediately.

  shell/common/shorebird/protected_data.cc
    - Owns the process-global gate and dispatches to it.

  shell/platform/darwin/ios/framework/Source/
    FlutterShorebirdProtectedDataGate.{h,mm}
    - iOS gate: dispatches to the main queue, checks
      UIApplication.protectedDataAvailable, and either calls start_fn
      immediately or registers a one-shot observer for
      UIApplicationProtectedDataDidBecomeAvailableNotification. Has a
      fallback for app extensions (where sharedApplication is nil).

  FlutterDartProject.mm
    - Calls flutter::SetProtectedDataGate(&ShorebirdIOSProtectedDataGate)
      just before ConfigureShorebird(...).

  shell/common/shorebird/shorebird.cc
    - Both ConfigureShorebird() overloads route their StartUpdateThread()
      call through StartUpdateWhenProtectedDataAvailable. Non-iOS
      behavior is unchanged; the default gate calls start_fn
      synchronously.

Layering rationale: keeping the iOS .mm out of shell/common means the
common layer has no platform-specific code, no UIKit dependency, and no
conditional source selection in its BUILD.gn. The iOS framework target
already links UIKit and owns main-thread access, so the .mm naturally
belongs there.

Unit tests cover the default gate, installation of a custom gate, and
restoration of the default via SetProtectedDataGate(nullptr). The iOS
gate itself requires a UIApplication to exercise meaningfully and is
covered by on-device integration testing.

A complementary defensive fallback in the updater Rust library
(shorebirdtech/updater#336) catches the residual case where the device
becomes locked between the availability check and the actual write, or
where a manual Dart `update()` call happens before first unlock.

Fixes shorebirdtech/shorebird#3685.
eseidel added a commit to shorebirdtech/flutter that referenced this pull request Apr 8, 2026
On iOS, files under `Library/Application Support/` inherit the default
`NSFileProtectionCompleteUntilFirstUserAuthentication` class. Before the
user has unlocked the device for the first time since boot, the OS
refuses writes under that directory with EPERM/EACCES. Our updater's
state files (`state.json`, `patches_state.json`) live there.

When the engine kicks off the auto-updater thread during early app boot
on a locked or freshly-booted device, the updater tries to download and
persist a patch, then fails at the state-write step and throws
`UpdateException: File::create for ".../patches_state.json"`. Customer
telemetry shows this as a long tail of failures spread across many
unique installs.

This change introduces a cross-platform protected-data gate in
`shell/common/shorebird/` whose iOS implementation lives in the iOS
platform layer (`shell/platform/darwin/ios/framework/Source/`), where
UIKit linkage and main-thread access already belong. The common layer
declares the abstraction and holds a process-global function pointer
that the platform embedder installs during engine init.

  shell/common/shorebird/protected_data.h
    - StartUpdateWhenProtectedDataAvailable(start_fn): cross-platform
      entry point called by shorebird.cc.
    - SetProtectedDataGate(gate): installs a platform implementation.
      Default gate invokes start_fn immediately.

  shell/common/shorebird/protected_data.cc
    - Owns the process-global gate and dispatches to it.

  shell/platform/darwin/ios/framework/Source/
    FlutterShorebirdProtectedDataGate.{h,mm}
    - iOS gate: dispatches to the main queue, checks
      UIApplication.protectedDataAvailable, and either calls start_fn
      immediately or registers a one-shot observer for
      UIApplicationProtectedDataDidBecomeAvailableNotification. Has a
      fallback for app extensions (where sharedApplication is nil).

  FlutterDartProject.mm
    - Calls flutter::SetProtectedDataGate(&ShorebirdIOSProtectedDataGate)
      just before ConfigureShorebird(...).

  shell/common/shorebird/shorebird.cc
    - Both ConfigureShorebird() overloads route their StartUpdateThread()
      call through StartUpdateWhenProtectedDataAvailable. Non-iOS
      behavior is unchanged; the default gate calls start_fn
      synchronously.

Layering rationale: keeping the iOS .mm out of shell/common means the
common layer has no platform-specific code, no UIKit dependency, and no
conditional source selection in its BUILD.gn. The iOS framework target
already links UIKit and owns main-thread access, so the .mm naturally
belongs there.

Unit tests cover the default gate, installation of a custom gate, and
restoration of the default via SetProtectedDataGate(nullptr). The iOS
gate itself requires a UIApplication to exercise meaningfully and is
covered by on-device integration testing.

A complementary defensive fallback in the updater Rust library
(shorebirdtech/updater#336) catches the residual case where the device
becomes locked between the availability check and the actual write, or
where a manual Dart `update()` call happens before first unlock.

Fixes shorebirdtech/shorebird#3685.
eseidel added a commit to shorebirdtech/flutter that referenced this pull request Apr 8, 2026
On iOS, files under `Library/Application Support/` inherit the default
`NSFileProtectionCompleteUntilFirstUserAuthentication` class. Before the
user has unlocked the device for the first time since boot, the OS
refuses writes under that directory with EPERM/EACCES. Our updater's
state files (`state.json`, `patches_state.json`) live there.

When the engine kicks off the auto-updater thread during early app boot
on a locked or freshly-booted device, the updater tries to download and
persist a patch, then fails at the state-write step and throws
`UpdateException: File::create for ".../patches_state.json"`. Customer
telemetry shows this as a long tail of failures spread across many
unique installs.

This change introduces a `ProtectedDataGate` abstraction owned by the
`Updater` singleton. The gate decides when
`StartUpdateThreadWhenReady()` may actually run the updater thread.
Installing a new gate cancels any pending start on the previous one,
so the Updater is always the owner of at most one pending observer.

  shell/common/shorebird/protected_data.{h,cc} (moved into :updater
  target; the :shorebird target no longer references them)
    - `ProtectedDataGate` abstract interface with
      `StartWhenAvailable(start_fn)` and `CancelPending()`.
    - `MakeImmediateProtectedDataGate()` factory returning the default
      gate used on every platform that does not install its own. The
      immediate gate invokes `start_fn` synchronously and `CancelPending`
      is a no-op.

  shell/common/shorebird/updater.{h,cc}
    - `Updater` gains `SetProtectedDataGate(std::unique_ptr<...>)`,
      `StartUpdateThreadWhenReady()`, and `CancelPendingUpdateStart()`.
    - The gate member is initialized lazily to the immediate gate on
      first access; installing a new gate cancels the previous one's
      pending start first.
    - `StartUpdateThreadWhenReady` routes through the gate to
      `Updater::Instance().StartUpdateThread()`.

  shell/platform/darwin/ios/framework/Source/
    FlutterShorebirdProtectedDataGate.{h,mm}
    - Exposes `MakeIOSProtectedDataGate()` factory. The concrete
      `IOSProtectedDataGate` is private (anonymous namespace) and
      holds a single `__strong id observer_` NSNotificationCenter
      handle.
    - `StartWhenAvailable` dispatches to the main queue, cancels any
      prior pending observer, checks
      `UIApplication.protectedDataAvailable`, and either invokes
      `start_fn` immediately or registers a one-shot observer for
      `UIApplicationProtectedDataDidBecomeAvailableNotification`.
    - `CancelPending` removes the observer. The destructor also calls
      the cancel path (dispatching sync to main if needed) so the
      gate never leaks an observer past its own lifetime.
    - Falls back to immediate start inside app extensions where
      `UIApplication.sharedApplication` is nil.

  FlutterDartProject.mm
    - Calls
      `Updater::Instance().SetProtectedDataGate(MakeIOSProtectedDataGate())`
      just before `ConfigureShorebird(...)`. The Updater now owns the
      gate for the life of the process.

  shell/common/shorebird/shorebird.cc
    - Both `ConfigureShorebird()` overloads route their
      `StartUpdateThread()` call through
      `Updater::Instance().StartUpdateThreadWhenReady()`.

Why ownership on the Updater rather than a process-global function
pointer (an earlier draft of this change): putting the gate on the
Updater gives cancellation a real home
(`CancelPendingUpdateStart()`), removes a data-race class on the
previous global, guarantees at most one pending observer across
multiple ConfigureShorebird calls, and makes destructor-driven
cleanup work.

Unit tests exercise the default gate, an installed capturing gate,
cancellation, gate replacement, and restoration of the immediate gate
via `SetProtectedDataGate(nullptr)`. The iOS gate itself requires a
`UIApplication` to exercise meaningfully and is covered by on-device
integration testing.

A complementary defensive fallback in the updater Rust library
(shorebirdtech/updater#336) catches the residual case where the device
becomes locked between the availability check and the actual write, or
where a manual Dart `update()` call happens before first unlock.

Fixes shorebirdtech/shorebird#3685.
eseidel added a commit to shorebirdtech/flutter that referenced this pull request Apr 8, 2026
On iOS, files under `Library/Application Support/` inherit the default
`NSFileProtectionCompleteUntilFirstUserAuthentication` class. Before the
user has unlocked the device for the first time since boot, the OS
refuses writes under that directory with EPERM/EACCES. Our updater's
state files (`state.json`, `patches_state.json`) live there.

When the engine kicks off the auto-updater thread during early app boot
on a locked or freshly-booted device, the updater tries to download and
persist a patch, then fails at the state-write step and throws
`UpdateException: File::create for ".../patches_state.json"`. Customer
telemetry shows this as a long tail of failures spread across many
unique installs.

This change introduces a `ProtectedDataGate` abstraction owned by the
`Updater` singleton. The gate decides when
`StartUpdateThreadWhenReady()` may actually run the updater thread.
Installing a new gate cancels any pending start on the previous one,
so the Updater is always the owner of at most one pending observer.

  shell/common/shorebird/protected_data.{h,cc} (moved into :updater
  target; the :shorebird target no longer references them)
    - `ProtectedDataGate` abstract interface with
      `StartWhenAvailable(start_fn)` and `CancelPending()`.
    - `MakeImmediateProtectedDataGate()` factory returning the default
      gate used on every platform that does not install its own. The
      immediate gate invokes `start_fn` synchronously and `CancelPending`
      is a no-op.

  shell/common/shorebird/updater.{h,cc}
    - `Updater` gains `SetProtectedDataGate(std::unique_ptr<...>)`,
      `StartUpdateThreadWhenReady()`, and `CancelPendingUpdateStart()`.
    - The gate member is initialized lazily to the immediate gate on
      first access; installing a new gate cancels the previous one's
      pending start first.
    - `StartUpdateThreadWhenReady` routes through the gate to
      `Updater::Instance().StartUpdateThread()`.

  shell/platform/darwin/ios/framework/Source/
    FlutterShorebirdProtectedDataGate.{h,mm}
    - Exposes `MakeIOSProtectedDataGate()` factory. The concrete
      `IOSProtectedDataGate` is private (anonymous namespace) and
      holds a single `__strong id observer_` NSNotificationCenter
      handle.
    - `StartWhenAvailable` dispatches to the main queue, cancels any
      prior pending observer, checks
      `UIApplication.protectedDataAvailable`, and either invokes
      `start_fn` immediately or registers a one-shot observer for
      `UIApplicationProtectedDataDidBecomeAvailableNotification`.
    - `CancelPending` removes the observer. The destructor also calls
      the cancel path (dispatching sync to main if needed) so the
      gate never leaks an observer past its own lifetime.
    - Falls back to immediate start inside app extensions where
      `UIApplication.sharedApplication` is nil.

  FlutterDartProject.mm
    - Calls
      `Updater::Instance().SetProtectedDataGate(MakeIOSProtectedDataGate())`
      just before `ConfigureShorebird(...)`. The Updater now owns the
      gate for the life of the process.

  shell/common/shorebird/shorebird.cc
    - Both `ConfigureShorebird()` overloads route their
      `StartUpdateThread()` call through
      `Updater::Instance().StartUpdateThreadWhenReady()`.

Why ownership on the Updater rather than a process-global function
pointer (an earlier draft of this change): putting the gate on the
Updater gives cancellation a real home
(`CancelPendingUpdateStart()`), removes a data-race class on the
previous global, guarantees at most one pending observer across
multiple ConfigureShorebird calls, and makes destructor-driven
cleanup work.

Unit tests exercise the default gate, an installed capturing gate,
cancellation, gate replacement, and restoration of the immediate gate
via `SetProtectedDataGate(nullptr)`. The iOS gate itself requires a
`UIApplication` to exercise meaningfully and is covered by on-device
integration testing.

A complementary defensive fallback in the updater Rust library
(shorebirdtech/updater#336) catches the residual case where the device
becomes locked between the availability check and the actual write, or
where a manual Dart `update()` call happens before first unlock.

Fixes shorebirdtech/shorebird#3685.
eseidel added 3 commits April 27, 2026 17:38
…ly unwritable

iOS telemetry shows a long tail of `UpdateException: File::create for
"/var/mobile/Containers/.../shorebird_updater/patches_state.json"`
failures spread across many unique installs. The sandbox always allows
the app to write inside its own Application Support, so a genuine
`PermissionDenied` from `std::fs::File::create` on this path is almost
always iOS Data Protection: files under `Library/Application Support/`
inherit `NSFileProtectionCompleteUntilFirstUserAuthentication`, and
before the user has unlocked the device for the first time since boot
(and in some edge cases while the device is locked), the OS refuses
writes with `EPERM` / `EACCES`.

This is transient — the next update attempt after the device is
unlocked typically succeeds — so surfacing it as a thrown
`UpdateException` in the app's crash telemetry is misleading.

Detect `ErrorKind::PermissionDenied` at the state-write boundary
(`disk_io::write`, the only writer used for `state.json` and
`patches_state.json`) and translate it into a dedicated
`UpdateError::StateStorageUnavailable`. `updater::update()` catches
that error alongside `UpdateAlreadyInProgress` and maps it to a new
`UpdateStatus::UpdateDeferred` variant (C status code
`SHOREBIRD_UPDATE_DEFERRED = 5`). The Dart wrapper treats the deferred
status as a successful return, so `ShorebirdUpdater.update()` no
longer throws for this case.

Other IO error kinds (StorageFull, NotFound, generic IO) continue to
propagate as errors with the standard enhanced file-operation context.

Fixes shorebirdtech/shorebird#3683.

Notes for reviewers:
- The hypothesis that the root cause is iOS Data Protection is
  well-supported but not yet confirmed with device logs. This fix is
  safe regardless: any transient `PermissionDenied` on the state file
  is benign and deserves a defer rather than an exception.
- A follow-up that gates state writes on
  `UIApplication.isProtectedDataAvailable` would be a cleaner
  root-cause fix (avoid the write entirely rather than catch its
  failure), but requires iOS-specific platform code and can land
  after this PR.
- We considered lowering the file's Data Protection class to
  `NSFileProtectionNone`, but the attack-surface analysis and the
  desire to avoid imposing a protection-class opinion on customer
  apps pointed us at the deferral approach instead.
Words used in the new disk_io PermissionDenied / state storage
deferral commentary and APIs.
@eseidel eseidel force-pushed the fix/defer-state-write-on-permission-denied branch from 79f3af8 to 4a595bb Compare April 28, 2026 00:40
Adds three tests to push patch coverage above the codecov target:
- write_returns_state_storage_unavailable_when_create_dir_all_denied
- write_returns_state_storage_unavailable_when_file_create_denied
- Display impls for UpdateStatus::UpdateDeferred and UpdateError::StateStorageUnavailable

The existing helper-only tests checked map_state_io_error in isolation;
these new ones drive the actual call sites in write() with a real
PermissionDenied from a chmod'd read-only TempDir.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

updater: iOS patches_state.json write failures

1 participant