Skip to content

[image-host] Protect manifest with mutex#4809

Open
sharder996 wants to merge 5 commits intomainfrom
bugfix/thread-safe-image-vault
Open

[image-host] Protect manifest with mutex#4809
sharder996 wants to merge 5 commits intomainfrom
bugfix/thread-safe-image-vault

Conversation

@sharder996
Copy link
Copy Markdown
Collaborator

Thread-safe image vault?

This issue has been plaguing me for awhile. See #4613 for a bit of my detective work, but this is what I've found to be happening:

  1. There are two separate timers that hit the image hosts; the 6 hr timer of source_images_maintenance_task and the 15 min timer of update_manifests_all_task
  2. You put your PC to sleep long enough for timers to expire
  3. You wake your machine up. The update_manifests_all_task clears the manifest and repopulates it. Concurrently, the source_images_maintenance_task tries to update the images in the vault and queries the image host for info on the images. The first task invalidates the data structure causing the second task to reference freed memory.
  4. The daemon crashes and immediately restarts. The orphaned QEMU process still holds a lock on the backing disk image. The restarted daemon tries to create another QEMU process, but fails to get a lock on the disk image.
  5. At this point I'm not sure exactly what is happening, but the daemon gets into a bad state where it is busy waiting for something causing >400% CPU usage.

This PR solves the issue by adding a mutex to the BaseVMImageHost to protect the state of the manifest(s). Reader methods use a shared_lock while the writer method, update_manifests(), uses a unique_lock.

In order to keep the manipulation of the mutex to a single file, the specific implementations of the subclass image hosts are called via the Template Method pattern.

Fixes #4613


MULTI-2533

@sharder996 sharder996 requested review from a team and jimporter and removed request for a team April 10, 2026 03:52
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.56%. Comparing base (1cde698) to head (9554cfc).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4809      +/-   ##
==========================================
+ Coverage   87.54%   87.56%   +0.02%     
==========================================
  Files         258      258              
  Lines       14152    14162      +10     
==========================================
+ Hits        12388    12399      +11     
+ Misses       1764     1763       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[macOS] Daemon taking up CPU time after wakeup

1 participant