Skip to content

fix(docker): restore add-on device access after USB re-enumeration#6877

Draft
carlossg wants to merge 2 commits into
home-assistant:mainfrom
carlossg:fix/docker-hw-listener-options-devices
Draft

fix(docker): restore add-on device access after USB re-enumeration#6877
carlossg wants to merge 2 commits into
home-assistant:mainfrom
carlossg:fix/docker-hw-listener-options-devices

Conversation

@carlossg

@carlossg carlossg commented May 25, 2026

Copy link
Copy Markdown

Proposed change

Add-ons that get their serial device from the options schema (e.g. Z-Wave JS
device:) lose access whenever the device re-enumerates to a different minor
number (e.g. ttyACM0ttyACM1 after a HAOS reboot). The add-on then
crash-loops on EACCES until the user restarts it.

Key prerequisite: the user must have selected the device via a
/dev/serial/by-id/… symlink (the stable path shown in the add-on UI). On
re-enumeration the kernel assigns a new device node and a new major:minor pair,
but the by-id symlink stays the same. Supervisor must therefore match the
incoming hardware event via that stable by-id link and re-grant cgroup access
with the device's current major:minor.

Two independent bugs in DockerApp caused this:

  1. Listener never registered for options-based devices. _hw_listener was
    only wired up when addon.static_devices (the manifest's devices: list)
    was non-empty. Add-ons that expose a device only via the options schema
    never registered the listener, so add_devices_allowed was never called
    when the device reappeared at a new minor.

  2. Options-based devices never matched in _hardware_events. The handler
    compared the incoming Device only against static_devices (and only
    against .path/.sysfs, not .links). Devices configured via the options
    schema were never matched, so access was never re-granted even if the
    listener had been registered.

Fix

Introduce AppOptions.extract_device_paths and AppModel.option_device_paths:
a cheap property that walks the raw schema and options to return the configured
device path strings as set[Path], without running full options validation
(no hardware lookup, no pwnd hashing).

In _hardware_events, replace the old two-condition guard with a single
set-intersection:

allowed_paths = set(self.app.static_devices) | self.app.option_device_paths
if not allowed_paths & {device.path, device.sysfs, *device.links}:
    return

This is strictly better:

  • No per-event full validation.
  • Matches by-id symlinks (via device.links) for both static and
    options-based devices.
  • Uses the incoming device's current major:minor for the cgroup rule, which is
    the whole point of re-enumeration handling.

Reproduction

  1. Install Z-Wave JS, point its device: option at a /dev/serial/by-id/…
    symlink.
  2. Replug the USB stick (or reboot HAOS) so it re-enumerates to a different
    minor.
  3. Before fix: add-on crash-loops on EACCES reading the device.
    After fix: Supervisor re-grants cgroup access and the add-on recovers.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:
  • Link to client library pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Ruff (ruff format supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints or add-on configuration are added/changed:

@home-assistant home-assistant Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @carlossg

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

@home-assistant

Copy link
Copy Markdown

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@home-assistant home-assistant Bot marked this pull request as draft May 25, 2026 12:32
@agners

agners commented May 26, 2026

Copy link
Copy Markdown
Member
  • Listener never registered for options-based devices. _hw_listener was only registered when addon.static_devices was non-empty. Add-ons that expose a device via the options schema (e.g. Z-Wave JS device:
    option) never had the listener registered, so add_devices_allowed was never called when the device reappeared after a disconnect.

This is not true, the options validation adds it:

elif typ.startswith(_DEVICE):
if not isinstance(value, str):
raise vol.Invalid(
f"Expected a string for option '{key}' in {self._name} ({self._slug})"
)
try:
device = self.sys_hardware.get_by_path(Path(value))
except HardwareNotFound:
raise vol.Invalid(
f"Device '{value}' does not exist in {self._name} ({self._slug})"
) from None
# Have filter
if match.group("filter"):
str_filter = match.group("filter")
device_filter = _create_device_filter(str_filter)
if device not in self.sys_hardware.filter_devices(**device_filter):
raise vol.Invalid(
f"Device '{value}' don't match the filter {str_filter}! in {self._name} ({self._slug})"
)
# Device valid
self.devices.add(device)
return str(value)

So it adds it to the resulting self.devices. The symlink themselfs don't need cgroups rules, since they are symlinks...

  • by-id paths never matched in _hardware_events. The event handler compared only device.path and device.sysfs against static_devices. When static_devices (or the options path) contains a by-id symlink, the
    match always failed because by-id paths live in device.links.

That is handled by the above code, by-id paths are resolved and the underlying device is added.

Did updating the permission not work for you? You should see Added cgroup permissions ... logs.

If its not working, what type of installation do you have? Can you share the logs?

@carlossg

Copy link
Copy Markdown
Author

@agners thanks for the review, I've checked a bit more and part of the PR is not needed
The issue happens when the device is re-enumerated (could be due to power issues), then zwave addon needds to be restarted to work again.
I've been running with a couple patches for a while to test before submitting and don't have logs, will revert them and see if I can reproduce

I'll post Claude analysis

You're right that options.py populates options_schema.devices — but that's not where the bug is.

The device is added to the container correctly at start time: cgroups_rules already iterates app.devices (line 184), so Z-Wave JS gets the right cgroup rules baked in and works fine initially.

The problem is only on USB re-enumeration. When the stick disconnects and reconnects at a different minor number (e.g. ttyACM0 → ttyACM1), the baked-in rules are stale. _hw_listener exists specifically to
call add_devices_allowed dynamically to fix this — but before this patch it was gated on:
if self.app.static_devices:
For Z-Wave JS, static_devices reads from the manifest's devices: field (model.py:357), which is empty — the device path comes entirely from the schema: options. So the listener was never registered and the
stale rules were never updated after a reconnect.

You're correct about the by-id point though — options validation resolves symlinks to Device objects before adding them, so the device_all_paths expansion wasn't needed. I've removed that part and its test
from the PR.

@carlossg carlossg force-pushed the fix/docker-hw-listener-options-devices branch from b1189ce to ee6118b Compare May 27, 2026 10:20
@carlossg

carlossg commented May 27, 2026

Copy link
Copy Markdown
Author

I upgraded core and it happened again. Attaching logs

issue-zwave_js.log
issue-supervisor.log

Installation type: Home Assistant OS (HAOS)
Board:            Raspberry Pi 3 64-bit (rpi3-64)
HAOS version:     17.3
HA Core:          2026.5.4
Supervisor:       2026.05.1 (official, unpatched)
Z-Wave JS:        1.3.0 (core add-on)

Device configuration in Z-Wave JS options:
  "device": "/dev/serial/by-id/usb-0658_0200-if00"

Device state at time of failure:
  /dev/ttyACM1  (major 166, minor 1)
  /dev/serial/by-id/usb-0658_0200-if00 -> ../../ttyACM1

Previous device state (before 15:19 re-enumeration):
  /dev/ttyACM0  (major 166, minor 0)
  /dev/serial/by-id/usb-0658_0200-if00 -> ../../ttyACM0

Z-Wave stick: Sigma Designs USB Z-Wave controller (0658:0200)
  sysfs path: /sys/bus/usb/devices/1-1.4

Timeline:
  15:18:56  ttyACM0 removed (USB bus hang, kernel dwc_otg_hcd_urb_dequeue timeouts)
  15:19:01  USB device re-added at new address
  15:19:03  ttyACM1 appeared — minor changed from 0 to 1
  15:19:03  No "Added cgroup permissions" log from supervisor (bug)
  16:35:12  Z-Wave JS begins failing with "Operation not permitted"
  The chain of events that led to the Z-Wave failure:

  ┌──────────┬─────────────────────────────────────────────────────────────────────────┐
  │   Time   │                                  Event                                  │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 15:11:52 │ Supervisor starts HA Core update to 2026.5.4                            │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 15:11:55 │ Docker begins pulling new image                                         │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 15:12:35 │ OOM killer fires — memory exhausted                                     │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 15:12:42 │ Tailscale, nginx, SSH, ttyd killed                                      │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 15:18:56 │ Z-Wave USB stick drops off (USB subsystem destabilised by OOM pressure) │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 15:19:03 │ Stick reconnects as ttyACM1                                             │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 15:19:03 │ No cgroup update — bug triggered                                        │
  ├──────────┼─────────────────────────────────────────────────────────────────────────┤
  │ 16:35+   │ Z-Wave JS failing Operation not permitted                               │
  └──────────┴─────────────────────────────────────────────────────────────────────────┘

@carlossg carlossg marked this pull request as ready for review June 1, 2026 10:48
@carlossg carlossg force-pushed the fix/docker-hw-listener-options-devices branch from ee6118b to 942af50 Compare June 1, 2026 10:48

@agners agners left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the PR description to accurately reflect what the current change is addressing?

@carlossg carlossg changed the title fix(docker): register hw listener and match by-id paths for options-based devices fix(docker): restore add-on device access after USB re-enumeration Jun 1, 2026
…ased devices

Two bugs caused a crash loop when a USB device re-enumerates to a different
minor number (e.g. ttyACM0→ttyACM1) after a HAOS reboot:

1. _hw_listener was only registered when addon.static_devices was non-empty.
   Addons that expose a device via the options schema (e.g. Z-Wave JS `device:`
   option) never had the listener registered, so add_devices_allowed was never
   called when the device reappeared at a new minor.

2. _hardware_events matched only device.path and device.sysfs against
   static_devices.  When static_devices (or the new options path) contains a
   by-id symlink, the match always failed because by-id paths live in
   device.links.

Fix: extend the listener registration condition to also cover addon.devices
(options-based), and expand the path-matching set to include device.links so
by-id paths resolve correctly.  For options-based devices, compare the incoming
Device against addon.devices (which re-evaluates options.json against the live
hardware list, picking up the new minor number automatically).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@carlossg carlossg force-pushed the fix/docker-hw-listener-options-devices branch from 942af50 to 844c770 Compare June 1, 2026 16:53
@agners agners requested a review from Copilot June 1, 2026 20:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes add-ons losing device access after USB re-enumeration by ensuring the Docker app registers for hardware hotplug events even when devices are configured via options, and by matching hotplug events against the options-resolved device set.

Changes:

  • Register the hardware event listener when either manifest static_devices or options-derived devices are present.
  • Update the hotplug handler to treat an incoming Device as relevant if it matches either static_devices or app.devices.
  • Add a regression test covering options-based device hotplug handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
supervisor/docker/app.py Extends hardware listener registration and event matching to include options-based devices.
tests/docker/test_app.py Adds coverage to ensure options-based devices register the listener and trigger cgroup permission updates on hotplug.

Comment thread supervisor/docker/app.py Outdated
Comment on lines +891 to +895
if (
not any(
device_path in (device.path, device.sysfs)
for device_path in self.app.static_devices
)

@agners agners left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the PR description is much better. But I think it misses one core point: The user needs to choose a by-id device, which can symlink to a different real device on reenumeration (it is stated in the reproduction steps, but I think this should be noted in the main description as well).

Comment thread tests/docker/test_app.py
was non-empty. Add-ons like Z-Wave JS that configure the device via the
options schema (addon.devices) never had a listener, so cgroup permissions
were never updated after a USB re-enumeration.
"""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full end-to-end test that sets a real device: option resolving to a by-id link (and ideally simulates a minor-number change on re-enumeration) would be nice.

Comment thread supervisor/docker/app.py Outdated
device_path in (device.path, device.sysfs)
for device_path in self.app.static_devices
)
and device not in self.app.devices

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this means on every hardware event we do a full validation, including pwnd hashing and whatnot.

The device extraction is just a side effect, but it's really not ideal doing this as part of the validation.

Now this is a bit bigger refactor, but I think its worth doing since this would introduce a lot of validation otherwise:

The cleaner separation is two distinct concerns:

  1. What did the user configure? A set of device paths, depends only on schema + options.json. Changes only when options change.
  2. Does this event's device match one of them? Should be a cheap comparison.

And the key realization: in the event handler you don't need to resolve anything against live hardware. The incoming device already carries .path, .sysfs, and .links (which includes the by-id link). So you just intersect the configured paths against the device's own identifiers:

  async def _hardware_events(self, device: Device) -> None:                                                                                                                                                                                                   
      allowed_paths = set(self.app.static_devices) | self.app.option_device_paths                                                                                                                                                                             
      if not allowed_paths & {device.path, device.sysfs, *device.links}:                                                                                                                                                                                      
          return                                                                                                                                                                                                                                              
      ...                                                                                                                                                                                                                                                     
      permission = self.sys_hardware.policy.get_cgroups_rule(device)  # uses event's live major:minor                                                                                                                                                         

This is strictly better because:

  • drops the per-event full validation entirely,
  • also fixes by-id matching for both static_devices and options-based devices (which I think your earlier iteration tried to do too).
  • uses the incoming device's current major:minor for the cgroup rule, which is the whole point of re-enumeration handling.

@home-assistant home-assistant Bot marked this pull request as draft June 1, 2026 21:09
Refactor _hardware_events to avoid per-event full options validation
(including pwnd hashing). Introduce AppOptions.extract_device_paths and
AppModel.option_device_paths to extract raw device paths from options
without resolving against live hardware. Use set-intersection against
{device.path, device.sysfs, *device.links} so by-id symlinks match
correctly after re-enumeration for both static and options-based devices.

Update test to use real schema/options setup and simulate a minor-number
change (ttyACM0→ttyACM1) with a stable by-id symlink.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants