vdev_disk: avoid vd_lock lockdep cycle during reopen#18504
Open
Gality369 wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a real Linux lock ordering problem in the vdev_disk backend by ensuring vd_lock is no longer held across slow block-device open/close paths, avoiding a lockdep-reported cycle involving mapping.invalidate_lock during page-fault reads.
Changes:
- Narrow
vd_lockusage to protect onlyvd_bdhstate transitions, and move block-device open/release operations outside ofvd_lock. - Add a reopen “in-progress” wait state (
vd_open_lock/vd_open_cv/vd_opening) so I/O waits through transient reopen windows instead of immediately failing onvd_bdh == NULL. - Tie EIO media revalidation/removal decisions to the submission-time
struct block_device *(captured viavbio_bdevfor RW andio_vsdfor flush/trim) to avoid races with reopen.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
behlendorf
approved these changes
May 14, 2026
Contributor
behlendorf
left a comment
There was a problem hiding this comment.
Nice fix. I can't say a love the for loop which was added to vdev_disk_io_start() but it makes sense and looks solid.
a96ca68 to
250ba07
Compare
behlendorf
reviewed
May 14, 2026
250ba07 to
7bf3a65
Compare
On Linux, fault-driven reads can reach vdev_disk_io_start() while holding mapping.invalidate_lock and then acquire vd_lock. At the same time, vdev_disk_open() held vd_lock across bdev_file_open_by_path()/bdev_open_by_path(). That pulls vd_lock into the block open path and into the mmap_lock -> mapping.invalidate_lock dependency chain. This is a real lock ordering problem, not just a lockdep false positive. A reopen can hold vd_lock while the page fault path already holds mapping.invalidate_lock and waits for vd_lock. Fix this by limiting vd_lock to short vd_bdh state transitions and moving the slow block-device open/release work outside the lock. Add a small reopen wait state so I/O waits for a transient reopen window instead of treating vd_bdh == NULL as a permanent failure. Also use the submission-time block device for EIO revalidation so reopen does not race with media checks. Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
7bf3a65 to
884df7a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
This fixes a Linux lockdep cycle in the vdev disk backend.
vdev_disk_io_start()can acquirevd_lockfrom the page-fault readpath while
filemap_fault()holdsmapping.invalidate_lock.Meanwhile
vdev_disk_open()heldvd_lockacrossbdev_file_open_by_path()/bdev_open_by_path(), which tiesvd_lockinto the block-open locking chain and back intommap_lock/mapping.invalidate_lock.That makes this a real lock ordering issue rather than a lockdep-only
false positive.
Description
This change narrows
vd_lockso it only protectsvd_bdhstatetransitions and no longer covers the slow block-device open/release
operations.
A small reopen wait state was added so I/O waits for a transient
reopen window instead of observing a temporary
vd_bdh == NULLandfailing immediately.
The patch also keeps EIO media revalidation tied to the
submission-time block device, so reopen does not race with media
checks.
How Has This Been Tested?
Tested on Linux.
Types of changes
Checklist:
Signed-off-by.