Skip to content

implement thorough scrub support (zpool scrub -t)#18474

Open
alek-p wants to merge 1 commit into
openzfs:masterfrom
alek-p:thorough_scrub
Open

implement thorough scrub support (zpool scrub -t)#18474
alek-p wants to merge 1 commit into
openzfs:masterfrom
alek-p:thorough_scrub

Conversation

@alek-p
Copy link
Copy Markdown
Contributor

@alek-p alek-p commented Apr 28, 2026

This introduces the -t (thorough) flag to 'zpool scrub' command. A thorough scrub decrypts and decompresses blocks as they are read, allowing ZFS to catch rare corruption scenarios where the block's checksum matches the data on disk, but the block fails to decrypt or decompress.
For encrypted datasets, the keys must be loaded to perform a thorough scrub. If the keys are not loaded, or are unloaded while the scrub is in progress, the scrub will fall back to a normal scrub for those encrypted blocks with key unloaded.

Motivation and Context

See #17630 for original motivation and context. This is an enhanced version of what was presented in that PR. Specifically this version adds support for decryption during a thorough scrub, uses the existing Z/O pipeline through unsetting the ZIO_FLAG_RAW flag, and wires in the functionality to be accessed with a zpool scrub cli flag (-t) instead of a sysctl.

Description

This PR implements thorough scrub support, which can be invoked using zpool scrub -t.
Even though this patch adds RAM overhead, in practive I didn't see any meaninful impact on RAM usage as the existing zfs_scan_mem_lim_* tunables still apply to throttle and control memory usage.
For CPU overhead I've seen about a 10% increase in average CPU usage during a through scrub on a compressed pool vs a normal scrub on the same pool.

Key features and implementation details:

  • Decompression and Decryption: A thorough scrub will actively decompress and decrypt (if the key is loaded) the scrubbed blocks to verify their full integrity. This is achieved by removing the ZIO_FLAG_RAW flag from the scrub ZIOs when the DSF_SCRUB_THOROUGH flag is set.
  • Increases Per-Block RAM Requirements (All Scrubs): This patch increases the amount of RAM a scrub uses per block, even during a normal scrub. This is due to the expansion of the scan_io_t struct, which now includes sio_salt, sio_iv1, and sio_iv2 fields (an additional 24 bytes per IO) needed to decrypt blocks when required.
  • Increased Buffer Memory (Thorough Scrubs): Additionally, because thorough scrubs actively decompress data, scan_exec_io() must allocate data buffers based on the block's logical size (BP_GET_LSIZE) rather than its compressed physical size (BP_GET_PSIZE). This increases memory consumption for in-flight I/O compared to a normal scrub.
  • Graceful Fallback (Ignoring Errors): If an encrypted dataset's keys are not loaded—or if they are unloaded while a thorough scrub is running—the scrub will gracefully revert to doing a normal scrub (checksum verification only). Under the hood, this is implemented by ignoring EACCES errors returned during decryption and MAC verification (since the block's checksum was successfully verified beforehand).
  • Status Reporting: zpool status and the pool_scan_stat_t struct have been updated to track and correctly report when a "thorough scrub" is in progress, paused, completed, or canceled. This uses an extra word in the scan stats (pss_pass_scrub_flags) to persist the thorough scrub state.
  • Mode switch with scrub pause: When resuming after a paused scrub the user may select either normal or thorough scrub which means pausing can be used to switch scrub methods while the scrub is in progress:
zpool scrub -t testpool
...
zpool scrub -p testpool
zpool scrub testpool # resumes the paused thorough scrub and switches it to a normal scrub
----------------------------------------------------------------------------------------------
zpool scrub testpool
...
zpool scrub -p testpool
zpool scrub -t testpool # resumes the paused normal scrub and switches it to a thorough scrub

How Has This Been Tested?

zfs-tests including the added through-scrub-specfic test
some light performance testing

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copilot AI review requested due to automatic review settings April 28, 2026 14:54
@alek-p alek-p added Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion labels Apr 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for a new thorough scrub mode (zpool scrub -t) that forces blocks to be decrypted and decompressed as they are scrubbed, enabling detection of corruption cases that standard scrub can miss.

Changes:

  • Adds -t flag plumbing from zpool(8) CLI → libzfs → ioctl/kernel scan code, including new scrub command bit(s) and scan-status reporting.
  • Updates scrub I/O behavior to drop ZIO_FLAG_RAW under thorough scrub and to tolerate missing encryption keys (fallback to normal scrub behavior for those blocks).
  • Adds a functional zfs-test for thorough scrub and updates manpages to document the new option.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/zfs-tests/tests/functional/cli_root/zpool_scrub/zpool_scrub_thorough.ksh Adds a new functional test covering thorough scrub across compression/encryption and pause/resume scenarios.
tests/zfs-tests/tests/Makefile.am Installs the new thorough scrub test script.
tests/zfs-tests/cmd/libzfs_input_check.c Adjusts ioctl input-validation test expectations for the updated scrub command encoding.
tests/runfiles/common.run Adds the new thorough scrub test to the default functional runfile.
module/zfs/zio.c Allows scrub to tolerate missing keys (EACCES) without posting I/O/data ereports, and relaxes decrypt-time assertions for scrub.
module/zfs/zfs_ioctl.c Introduces stricter scrub ioctl validation and threads scrub command bits into scan start/range paths.
module/zfs/spa_misc.c Exposes persisted scrub flags in scan stats for zpool status reporting.
module/zfs/spa.c Extends scan entry points to accept scrub flags and forwards thorough mode into dsl_scan().
module/zfs/dsl_scan.c Persists thorough flag in scan phys state, adjusts scrub zio flags/buffer sizing, and handles missing-key EACCES during scrub completion.
lib/libzfs/libzfs_pool.c Updates scrub command handling logic for new command bit semantics (incl. start detection).
lib/libzfs/libzfs.abi Records the ABI change to pool_scrub_cmd enum values and new POOL_SCRUB_THOROUGH.
include/sys/spa.h Updates scan function prototypes to accept scrub flags.
include/sys/fs/zfs.h Converts scrub commands to a bitmask enum and extends scan stats struct for thorough reporting.
include/sys/dsl_scan.h Adds DSF_SCRUB_THOROUGH scan flag and threads scan setup args through.
cmd/ztest.c Updates internal scrub invocations for the new spa_scan() signature.
cmd/zpool/zpool_main.c Adds -t CLI parsing and prints “thorough scrub” status when the kernel reports that mode.
man/man8/zpool-scrub.8 Documents zpool scrub -t and its constraints/behavior.
man/man4/zfs.4 Minor documentation wording fix (“2 hours”).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread module/zfs/dsl_scan.c Outdated

# Start scrub (async)
log_must zpool scrub -t $TESTPOOL

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test unloads the dataset key immediately after starting an async scrub, without ensuring the scrub has actually begun reading those blocks. This can be timing-dependent and flaky on fast pools. Consider waiting until zpool status shows the (thorough) scrub is in progress (or using an existing helper) before unloading the key.

Suggested change
# Wait until the thorough scrub is reported in progress before
# unloading the key, otherwise this test can race on fast pools.
typeset scrub_started=0
typeset i=0
while (( i < 30 )); do
if zpool status $TESTPOOL | grep -q "scrub in progress" &&
zpool status $TESTPOOL | grep -q "thorough"; then
scrub_started=1
break
fi
sleep 1
(( i = i + 1 ))
done
log_must test $scrub_started -eq 1

Copilot uses AI. Check for mistakes.
Comment thread tests/zfs-tests/tests/functional/cli_root/zpool_scrub/zpool_scrub_thorough.ksh Outdated
Comment thread cmd/zpool/zpool_main.c Outdated
Comment thread tests/zfs-tests/tests/functional/cli_root/zpool_scrub/zpool_scrub_thorough.ksh Outdated
Comment thread module/zfs/dsl_scan.c Outdated
Comment thread module/zfs/dsl_scan.c Outdated
@alek-p alek-p added Status: Design Review Needed Architecture or design is under discussion and removed Status: Design Review Needed Architecture or design is under discussion labels Apr 28, 2026
Copilot AI review requested due to automatic review settings April 30, 2026 14:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +98 to +102
# Start scrub (async)
log_must zpool scrub -t $TESTPOOL

# Unload key while scrub is running
log_must zfs unload-key $TESTPOOL/comp_enc_unloading
Comment thread cmd/zpool/zpool_main.c
Comment thread module/zfs/zfs_ioctl.c
Comment thread module/zfs/zfs_ioctl.c
Copilot AI review requested due to automatic review settings April 30, 2026 15:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +98 to +106
# Start scrub (async)
log_must zpool scrub -t $TESTPOOL

# Unload key while scrub is running
log_must zfs unload-key $TESTPOOL/comp_enc_unloading

# Wait for scrub
log_must zpool wait -t scrub $TESTPOOL

Comment thread tests/zfs-tests/cmd/libzfs_input_check.c
Comment thread man/man8/zpool-scrub.8 Outdated
Comment thread cmd/zpool/zpool_main.c Outdated
Copilot AI review requested due to automatic review settings May 1, 2026 14:46
@alek-p alek-p force-pushed the thorough_scrub branch from cfa630a to 3381875 Compare May 1, 2026 14:46
@alek-p alek-p force-pushed the thorough_scrub branch from 3381875 to 59d1345 Compare May 1, 2026 14:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

module/zfs/dsl_scan.c:5006

  • For thorough scrub, scan_exec_io() allocates data sized by BP_GET_LSIZE(bp) when ZIO_FLAG_RAW is not set, but the in-flight throttling (spa_scrub_inflight / q_inflight_bytes) is still accounting with BP_GET_PSIZE(bp). On highly-compressed blocks this can significantly under-account memory and allow much more RAM to be pinned than scn_maxinflight_bytes intends. Consider tracking the actual allocated buffer size (e.g., size or zio->io_size) in the in-flight counters and subtracting the same value in dsl_scan_scrub_done().

	size_t size = (zio_flags & ZIO_FLAG_RAW) ?
	    BP_GET_PSIZE(bp) : BP_GET_LSIZE(bp);
	abd_t *data = abd_alloc_for_io(size, B_FALSE);
	zio_t *pio;

	if (queue == NULL) {
		ASSERT3U(scn->scn_maxinflight_bytes, >, 0);
		mutex_enter(&spa->spa_scrub_lock);
		while (spa->spa_scrub_inflight >= scn->scn_maxinflight_bytes)
			cv_wait(&spa->spa_scrub_io_cv, &spa->spa_scrub_lock);
		spa->spa_scrub_inflight += BP_GET_PSIZE(bp);
		mutex_exit(&spa->spa_scrub_lock);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/libzfs/libzfs_pool.c
Comment thread man/man8/zpool-scrub.8 Outdated
@alek-p alek-p force-pushed the thorough_scrub branch from 59d1345 to e165a50 Compare May 1, 2026 15:01
Copilot AI review requested due to automatic review settings May 1, 2026 15:53
@alek-p alek-p force-pushed the thorough_scrub branch from e165a50 to fd15b18 Compare May 1, 2026 15:53
@alek-p
Copy link
Copy Markdown
Contributor Author

alek-p commented May 1, 2026

took a while to get the tests and manpage to reflect reality but I think this is ready for review now. Looks like the manpage currently in tree is wrong as it describes cli arg combinations that are not allowed.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

module/zfs/dsl_scan.c:5006

  • In scan_exec_io(), the I/O buffer size is now BP_GET_LSIZE() for non-RAW (thorough) scrubs, but the in-flight throttling counters (spa_scrub_inflight / q_inflight_bytes) are still incremented/decremented using BP_GET_PSIZE(). For highly-compressible blocks this can allow far more memory to be allocated than the throttling intends, potentially causing large RAM spikes or OOM during thorough scrubs. Consider accounting using the allocated buffer size (or a separate memory-based counter/limit) so throttling matches actual memory consumption for non-RAW scrubs.
	size_t size = (zio_flags & ZIO_FLAG_RAW) ?
	    BP_GET_PSIZE(bp) : BP_GET_LSIZE(bp);
	abd_t *data = abd_alloc_for_io(size, B_FALSE);
	zio_t *pio;

	if (queue == NULL) {
		ASSERT3U(scn->scn_maxinflight_bytes, >, 0);
		mutex_enter(&spa->spa_scrub_lock);
		while (spa->spa_scrub_inflight >= scn->scn_maxinflight_bytes)
			cv_wait(&spa->spa_scrub_io_cv, &spa->spa_scrub_lock);
		spa->spa_scrub_inflight += BP_GET_PSIZE(bp);
		mutex_exit(&spa->spa_scrub_lock);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/zfs-tests/tests/functional/cli_root/zpool_scrub/zpool_scrub_001_neg.ksh Outdated
Comment thread module/zfs/zfs_ioctl.c
Comment on lines +98 to +106
# Start scrub (async)
log_must zpool scrub -t $TESTPOOL
log_mustnot zpool scrub $TESTPOOL

# Unload key while scrub is running
log_must zfs unload-key $TESTPOOL/comp_enc_unloading

# Wait for scrub
log_must zpool wait -t scrub $TESTPOOL
Comment on lines 699 to 702
nvlist_t *required = fnvlist_alloc();
fnvlist_add_uint64(required, "scan_type", POOL_SCAN_FUNCS + 1);
fnvlist_add_uint64(required, "scan_command", POOL_SCRUB_FLAGS_END + 1);
fnvlist_add_uint64(required, "scan_command", 0xDEADBEEF);
IOC_INPUT_TEST(ZFS_IOC_POOL_SCRUB, pool, required, NULL, EINVAL);
Comment thread cmd/zpool/zpool_main.c Outdated
Comment thread cmd/zpool/zpool_main.c Outdated
Comment thread include/sys/fs/zfs.h Outdated
Comment thread module/zfs/dsl_scan.c Outdated
Comment thread module/zfs/dsl_scan.c Outdated
@alek-p alek-p force-pushed the thorough_scrub branch from fd15b18 to 7b71a02 Compare May 7, 2026 00:54
Copilot AI review requested due to automatic review settings May 7, 2026 02:34
@alek-p alek-p force-pushed the thorough_scrub branch from 7b71a02 to ced71be Compare May 7, 2026 02:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Comment on lines +53 to 64
typeset -a args=("-?" "blah blah" "-%" "--?" "-*" "-=" \
"-b" "-c" "-d" "-f" "-g" "-h" "-i" "-j" "-k" "-l" \
"-m" "-n" "-o" "-q" "-r" "-u" "-v" "-x" "-y" "-z" \
"-A" "-B" "-D" "-E" "-F" "-G" "-H" "-I" "-J" "-K" "-L" \
"-M" "-N" "-O" "-P" "-Q" "-R" "-S" "-T" "-U" "-V" "-W" "-X" "-W" "-Z")


log_assert "Execute 'zpool scrub' using invalid parameters."

typeset -i i=0
while [[ $i -lt ${#args[*]} ]]; do
log_mustnot zpool scrub ${args[i]}

((i = i + 1))
for arg in "${args[@]}"; do
log_mustnot zpool scrub "$arg" "$TESTPOOL"
done
Comment thread module/zfs/dsl_scan.c Outdated
Comment thread cmd/zpool/zpool_main.c Outdated
Comment thread man/man8/zpool-scrub.8 Outdated
@alek-p alek-p force-pushed the thorough_scrub branch from ced71be to 42e893c Compare May 7, 2026 03:52
Copilot AI review requested due to automatic review settings May 7, 2026 13:35
@alek-p alek-p force-pushed the thorough_scrub branch from 42e893c to ecce90a Compare May 7, 2026 13:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread cmd/zpool/zpool_main.c Outdated
Comment thread include/sys/fs/zfs.h
@alek-p alek-p force-pushed the thorough_scrub branch 2 times, most recently from 980622e to 0ff3555 Compare May 7, 2026 17:00
Copilot AI review requested due to automatic review settings May 7, 2026 21:35
@alek-p alek-p force-pushed the thorough_scrub branch from 0ff3555 to 204d261 Compare May 7, 2026 21:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Comment on lines 698 to 704
{
nvlist_t *required = fnvlist_alloc();
fnvlist_add_uint64(required, "scan_type", POOL_SCAN_FUNCS + 1);
fnvlist_add_uint64(required, "scan_command", POOL_SCRUB_FLAGS_END + 1);
fnvlist_add_uint64(required, "scan_command", 0xDEADBEEF);
IOC_INPUT_TEST(ZFS_IOC_POOL_SCRUB, pool, required, NULL, EINVAL);
nvlist_free(required);
}
Comment thread cmd/zpool/zpool_main.c Outdated
Comment thread cmd/zpool/zpool_main.c
Comment thread man/man8/zpool-scrub.8 Outdated
This introduces the -t (thorough) flag to 'zpool scrub' command.
A thorough scrub decrypts and decompresses blocks as they are read,
allowing ZFS to catch rare corruption scenarios where the block's
checksum matches the data on disk, but the block fails to decrypt
or decompress.

For encrypted datasets, the keys must be loaded to perform a thorough
scrub. If the keys are not loaded, or are unloaded while the scrub is
in progress, the scrub will fall back to a normal scrub
for those encrypted blocks with key unloaded.

Signed-off-by: Alek Pinchuk <Alek.Pinchuk@connectwise.com>
Comment thread man/man8/zpool-scrub.8
Comment thread man/man8/zpool-scrub.8
resumes as a normal scrub.
Specify
.Fl t
again to resume thorough scrubbing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why require the -t to resume this as as thorough scrub? It seems like that should be the default behavior, similar to how a errlog, last_txg, or date range scrub pick up where they left off.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to preserve the ability to switch back and forth between thorough and normal scrub without reseting scrub progress which with the current code is achived by choosing which scrub you want when resuming

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's what I was guessing. From a testing standing point I get it, but from a user perspective it's not what I'd expect. If you're doing a thorough scrub, then it's probably because you have some reason to suspect there's some subtle damage/corruption. In which case, I'd like some assurance that when a thorough scrub is started it will check the whole pool, or date range, even if it was paused and restarted. Operationally, accidentally converting a paused thorough scrub to a normal scrub would be an easy mistake to make. I'd prefer to change this behavior. DSF_SCRUB_THOROUGH is persisted in the scn_flags so the information is there on disk.

Comment thread cmd/zpool/zpool_main.c
if (pause == 0) {
(void) printf(gettext("scrub in progress since %s"),
(void) printf(gettext("%sscrub in progress since %s"),
is_thorough ? "thorough " : "",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping gettext() for the main phrase, but not for "thorough " is odd, unless it is an international term.

Comment thread module/zfs/dsl_scan.c
uint64_t sio_birth;
uint64_t sio_salt;
uint64_t sio_iv1;
uint64_t sio_iv2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems iv2 is 32bit in block pointers.

Comment thread module/zfs/spa.c

int
spa_scan(spa_t *spa, pool_scan_func_t func)
spa_scan(spa_t *spa, pool_scan_func_t func, uint64_t flags)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the argument should be pool_scrub_cmd_t, not abstract flags?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants