Prevent checkpointed DTLs from restarting resilvers#18592
Open
favilances wants to merge 1 commit into
Open
Conversation
Pool checkpoints intentionally keep DTL_MISSING entries after a scan because checkpoint-only blocks are not traversed. Those retained DTLs are needed if the pool is rewound, but they are not evidence of a deferred resilver. vdev_clear_resilver_deferred() treated any remaining DTL on an available leaf vdev as a reason to start another resilver, even when the vdev had not been marked vdev_resilver_deferred. With a checkpoint present, a completed resilver therefore queued another resilver indefinitely. Only request the follow-up scan when a deferred flag was actually cleared. Retained checkpoint DTLs stay preserved, while real deferred resilvers still run after the current scan completes. Add a regression test for the checkpoint case. Closes openzfs#11434 Closes openzfs#17109 Signed-off-by: Favilances <78090594+favilances@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Pool checkpoints intentionally preserve DTL_MISSING entries after a scan because checkpoint-only blocks are not traversed. That retained DTL is needed if the pool is rewound, but it does not mean a new resilver was deferred.
Deferred resilvers rely on
vdev_resilver_deferredto mark vdevs that missed txgs outside the active scan range. Treating any remaining DTL as deferred work breaks that distinction, so a completed resilver with a checkpoint can immediately queue another resilver and repeat indefinitely.This fixes #11434 and fixes #17109. It follows the deferred-resilver intent from #7732 and keeps the restart avoidance added by #9588: a follow-up scan is only needed when a vdev was actually deferred.
Description
The follow-up resilver request is now gated on whether a leaf vdev had
vdev_resilver_deferredset before it was cleared. Checkpoint-retained DTLs remain intact, while real deferred resilvers still run after the current scan completes.A regression test creates a checkpoint during an attach resilver, lets the resilver finish, and verifies that no second
resilver_startevent is generated.How Has This Been Tested?
scripts/cstyle.pl module/zfs/vdev.cgit diff --checkbash -n tests/zfs-tests/tests/functional/replacement/resilver_restart_003.kshscripts/commitcheck.sh HEADI did not run the ZFS Test Suite in this environment because this checkout is not configured (
Makefileandconfig.statusare absent) andkshis not installed.Types of changes
Checklist:
Signed-off-by.