Make Core.shutdown idempotent and safe to call concurrently by agners · Pull Request #6891 · home-assistant/supervisor

agners · 2026-05-28T13:41:59Z

Proposed change

After #6887, Core.shutdown() runs from the SIGTERM signal handler during host shutdown in addition to the existing call sites (host.control.reboot()/shutdown(), backup restore). To avoid spawning a second stop_supervisor() task when a repeat SIGTERM arrives, __main__.py debounced the signal handler by stashing the in-flight task in a single-element list and bailing out on subsequent invocations. That workaround only covers the SIGTERM-vs-SIGTERM race; two API-initiated reboots, or a reboot API call racing with SIGTERM, would still re-enter the shutdown sequence.

This PR moves the idempotency into Core.shutdown() itself, where every caller benefits:

A second call while a shutdown is in progress awaits the in-flight shutdown via an asyncio.Event rather than re-running the sequence.
Calls while state is STOPPING/CLOSE return early — Supervisor is already going away, the work is moot.
Calls while state is in STARTING_STATES (INITIALIZE/STARTUP/SETUP) return early too. There is nothing coherent to gracefully stop before startup completes, and on the SIGTERM-during-startup path the caller cancels startup_task first, so waiting for it to complete would deadlock (the finally in Core.start() cannot reach set_state(RUNNING) once awaits raise CancelledError).
The sequence is wrapped in try/finally so the completion event is set even when an inner step raises.

With idempotency in Core, the closure-with-list workaround in __main__.py collapses to a plain coresys.create_task(stop_supervisor()): repeat SIGTERMs spawn extra tasks but each just observes the in-flight shutdown and waits.

Inspired by (a subset of) #6631.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New feature (which adds functionality to the supervisor)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue: Coordinate graceful shutdown with Home Assistant OS #6887
Link to documentation pull request:
Link to cli pull request:
Link to client library pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
The code has been formatted using Ruff (ruff format supervisor tests)
Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

Documentation added/updated for developers.home-assistant.io
CLI updated (if necessary)
Client library updated (if necessary)

After #6887, Core.shutdown() now runs in the SIGTERM path during host shutdown in addition to the existing host.control reboot/shutdown and backup restore paths. Multiple concurrent callers were possible (e.g. SIGTERM arriving while a reboot API call is mid-flight), so __main__.py debounced the signal handler by stashing the in-flight task in a single- element list and bailing out on the second SIGTERM. Move the idempotency into Core.shutdown() itself, where it belongs: - A second call while shutdown is in progress awaits the in-flight shutdown via an asyncio.Event rather than re-running the sequence. - Calls during STOPPING/CLOSE return early (Supervisor is already going away; the work is moot). - Calls during STARTING_STATES (INITIALIZE/STARTUP/SETUP) return early too. There is nothing coherent to gracefully stop before startup completes, and on the SIGTERM-during-startup path the caller cancels startup_task first, so waiting for it to complete would deadlock. - The sequence is wrapped in try/finally so the completion event is set even when an inner step raises. With that in place the closure workaround in __main__.py collapses to a plain coresys.create_task(stop_supervisor()): repeat SIGTERMs spawn extra tasks but each just observes the in-flight shutdown and waits. Tests cover the four state branches and confirm the event is reset between repeated shutdown cycles (backup restore re-enters RUNNING).

mdegat01

The code is fine, works and definitely helpful.

My comments are just about a seemingly supported use case that confuses me. There is currently no way to return to RUNNING after SHUTDOWN without exiting and restarting the python process in production. Is there a future use case here we're looking to support I'm unaware of? If not we should probably remove the perceived support for that since its a bit confusing.

mdegat01 · 2026-05-28T19:47:44Z

+        # Reset event for this shutdown cycle (supports repeated use, e.g. backup restore)
+        self._shutdown_event.clear()


The concept of repeated use doesn't really make sense here. Shutdown eventually leads to a stop in the python process which obviously clears the event. And there's no way to go back to CoreState.RUNNING from those closing states. So I'm not really sure when this would have any effect? At the very least the comment should probably be adjusted since its example isn't a real use case.

mdegat01 · 2026-05-28T19:50:07Z

+async def test_shutdown_event_reset_between_cycles(coresys: CoreSys):
+    """Repeated shutdown cycles (e.g. backup restore) work because the event is reset."""
+    await coresys.core.set_state(CoreState.RUNNING)
+
+    await coresys.core.shutdown()
+    assert coresys.core._shutdown_event.is_set()
+
+    # Simulate restore returning to RUNNING and shutting down again
+    await coresys.core.set_state(CoreState.RUNNING)
+
+    second_entered = False
+    original_shutdown = coresys.apps.shutdown
+
+    async def track_app_shutdown(startup):
+        nonlocal second_entered
+        second_entered = True
+        return await original_shutdown(startup)
+
+    with patch.object(coresys.apps, "shutdown", side_effect=track_app_shutdown):
+        await coresys.core.shutdown()
+
+    assert second_entered
+    assert coresys.core._shutdown_event.is_set()


I mean this is very much a fabricated use case, it can't ever happen in production. CoreState cannot get back to RUNNING from SHUTDOWN. Is there a plan to support some kind of live backup and restore for Supervisor I'm unaware of?

home-assistant · 2026-05-28T19:53:33Z

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

agners added the refactor A code change that neither fixes a bug nor adds a feature label May 28, 2026

home-assistant Bot added the cla-signed label May 28, 2026

agners requested a review from sairon May 28, 2026 13:42

mdegat01 requested changes May 28, 2026

View reviewed changes

home-assistant Bot marked this pull request as draft May 28, 2026 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Core.shutdown idempotent and safe to call concurrently#6891

Make Core.shutdown idempotent and safe to call concurrently#6891
agners wants to merge 1 commit into
mainfrom
core-shutdown-reentrant

agners commented May 28, 2026

Uh oh!

mdegat01 left a comment

Uh oh!

mdegat01 May 28, 2026

Uh oh!

mdegat01 May 28, 2026

Uh oh!

home-assistant Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Reset event for this shutdown cycle (supports repeated use, e.g. backup restore)
		self._shutdown_event.clear()

Conversation

agners commented May 28, 2026

Proposed change

Type of change

Additional information

Checklist

Uh oh!

mdegat01 left a comment

Choose a reason for hiding this comment

Uh oh!

mdegat01 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

mdegat01 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

home-assistant Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants