zstream: add comprehensive test suite#18510
Closed
GarthSnyder wants to merge 1 commit into
Closed
Conversation
48b7afb to
6880535
Compare
GarthSnyder
added a commit
to GarthSnyder/zfs
that referenced
this pull request
May 8, 2026
### Motivation
In the current version of `zstream`, each subcommand is independent and
is responsible for implementing its own stream-processing pipeline. It
started as a stream dumper, but as additional subcommands were added,
contributors typically copied an existing subcommand's pipeline and
adapted it for different purposes.
This pattern has led to quite a bit of duplicated code and has also led
to some functional nonuniformities. For example, some subcommands
support opposite-endian streams and others don't.
### Overview
This PR segregates functions that most subcommands need into
free-standing modules and reimplements the existing subcommands in
terms of those modules. The current modules are:
* I/O
* Checksum validation and generation
* Byte-order detection and byteswapping
* Data compression
* Data decompression
* General validity checking
This PR also adds a generic pipeline mechanism that subcommands can use
to declare the processing they want. For example, the pipeline for
`zstream recompress` is:
```
zstream_chain_t recompress_chain = {
serial_read_stream(infile),
serial_validate_fletcher4(),
serial_byteswap(BS_INCOMING),
serial_validate_records(),
serial_decompress_writes(&spec),
serial_compress_writes(&spec),
serial_byteswap(BS_OUTGOING),
serial_add_fletcher4(),
serial_write_stream(outfile),
chain_terminator()
};
```
Or more succinctly:
```
zstream_chain_t recompress_chain = {
STANDARD_INPUT_STACK(infile),
serial_decompress_writes(&spec),
serial_compress_writes(&spec),
STANDARD_OUTPUT_STACK(outfile)
};
```
To execute the pipeline:
```
chain_attrs_t attrs = { .ca_command_opts = CA_FORBID_DEDUP };
zstream_chain_exec(recompress_chain, &attrs);
```
### Explanation of `serial_` prefixes
The `serial_` prefixes above indicate that the steps run sequentially
and that each module sees records in their original order. A future PR
will also allow multithreaded execution for individual steps and will
handle the marshaling involved in integrating serial and parallel
operations.
### What this is not
This PR is not a general cleanup. Original code that is not subsumed by
one of the standardized modules is left largely unchanged, although
some portions have been adapted as modules to work within the pipeline
context.
The PR does not change command syntax or behavior except to the extent
that standard modules may fix bugs or allow additional cases that
formerly failed.
### The test suite is separate
A comprehensive test suite for `zstream` is introduced in PR openzfs#18510. The
original intent was to verify that this PR did not modify any `zstream`
behavior. However, as I found issues in the existing code, I added
tests for them to the test suite as well. So unfortunately, the current
version of `zstream` will fail some of the tests in PR openzfs#18510. I would
suggest deferring acceptance of PR openzfs#18510 until this PR has been
resolved.
Alternatively, I can add known-issue flags to those failing tests.
### Effect on checksum generation
This PR incorporates changes equivalent to those in draft PR openzfs#18293,
which I submitted but later changed to draft status because I knew
they'd have to be fixed in a different way for this PR.
Draft PR openzfs#18293 includes a detailed explanation, but the TL;DR is:
`zstream` and `zfs send` formerly diverged somewhat in their checksum
generation patterns. `zstream` added checksums in some cases where `zfs
send` did not. It didn't matter functionally because `zfs receive` does
not attempt to validate those checksums anyway. However, the result was
that even null operations (such as `zstream redup`'ing a stream that
was not deduplicated) changed the contents of the stream.
The Fletcher4 component of this PR emulates the behavior of `zfs send`
more closely so that any no-op invocation of `zstream` leaves the
stream bit-for-bit identical to the input.
### Divisibility
This is a fairly large PR, and I'm presenting it as one big splat. But
if preferred, I can port it into smaller pieces: e.g., add a general
pipeline mechanism, add Fletcher4 pipeline modules, port zstream
decompress to the pipeline system, etc.
Signed-off-by: Garth Snyder <garth@garthsnyder.com>
14 tasks
GarthSnyder
added a commit
to GarthSnyder/zfs
that referenced
this pull request
May 8, 2026
### Motivation
In the current version of `zstream`, each subcommand is independent and
is responsible for implementing its own stream-processing pipeline. It
started as a stream dumper, but as additional subcommands were added,
contributors typically copied an existing subcommand's pipeline and
adapted it for different purposes.
This pattern has led to quite a bit of duplicated code and has also led
to some functional nonuniformities. For example, some subcommands
support opposite-endian streams and others don't.
### Overview
This PR segregates functions that most subcommands need into
free-standing modules and reimplements the existing subcommands in
terms of those modules. The current modules are:
* I/O
* Checksum validation and generation
* Byte-order detection and byteswapping
* Data compression
* Data decompression
* General validity checking
This PR also adds a generic pipeline mechanism that subcommands can use
to declare the processing they want. For example, the pipeline for
`zstream recompress` is:
```
zstream_chain_t recompress_chain = {
serial_read_stream(infile),
serial_validate_fletcher4(),
serial_byteswap(BS_INCOMING),
serial_validate_records(),
serial_decompress_writes(&spec),
serial_compress_writes(&spec),
serial_byteswap(BS_OUTGOING),
serial_add_fletcher4(),
serial_write_stream(outfile),
chain_terminator()
};
```
Or more succinctly:
```
zstream_chain_t recompress_chain = {
STANDARD_INPUT_STACK(infile),
serial_decompress_writes(&spec),
serial_compress_writes(&spec),
STANDARD_OUTPUT_STACK(outfile)
};
```
To execute the pipeline:
```
chain_attrs_t attrs = { .ca_command_opts = CA_FORBID_DEDUP };
zstream_chain_exec(recompress_chain, &attrs);
```
### Explanation of `serial_` prefixes
The `serial_` prefixes above indicate that the steps run sequentially
and that each module sees records in their original order. A future PR
will also allow multithreaded execution for individual steps and will
handle the marshaling involved in integrating serial and parallel
operations.
### What this is not
This PR is not a general cleanup. Original code that is not subsumed by
one of the standardized modules is left largely unchanged, although
some portions have been adapted as modules to work within the pipeline
context.
The PR does not change command syntax or behavior except to the extent
that standard modules may fix bugs or allow additional cases that
formerly failed.
### The test suite is separate
A comprehensive test suite for `zstream` is introduced in PR openzfs#18510. The
original intent was to verify that this PR did not modify any `zstream`
behavior. However, as I found issues in the existing code, I added
tests for them to the test suite as well. So unfortunately, the current
version of `zstream` will fail some of the tests in PR openzfs#18510. I would
suggest deferring acceptance of PR openzfs#18510 until this PR has been
resolved.
Alternatively, I can add known-issue flags to those failing tests.
### Effect on checksum generation
This PR incorporates changes equivalent to those in draft PR openzfs#18293,
which I submitted but later changed to draft status because I knew
they'd have to be fixed in a different way for this PR.
Draft PR openzfs#18293 includes a detailed explanation, but the TL;DR is:
`zstream` and `zfs send` formerly diverged somewhat in their checksum
generation patterns. `zstream` added checksums in some cases where `zfs
send` did not. It didn't matter functionally because `zfs receive` does
not attempt to validate those checksums anyway. However, the result was
that even null operations (such as `zstream redup`'ing a stream that
was not deduplicated) changed the contents of the stream.
The Fletcher4 component of this PR emulates the behavior of `zfs send`
more closely so that any no-op invocation of `zstream` leaves the
stream bit-for-bit identical to the input.
### Divisibility
This is a fairly large PR, and I'm presenting it as one big splat. But
if preferred, I can port it into smaller pieces: e.g., add a general
pipeline mechanism, add Fletcher4 pipeline modules, port zstream
decompress to the pipeline system, etc.
Signed-off-by: Garth Snyder <garth@garthsnyder.com>
GarthSnyder
added a commit
to GarthSnyder/zfs
that referenced
this pull request
May 11, 2026
### Motivation
In the current version of `zstream`, each subcommand is independent and
is responsible for implementing its own stream-processing pipeline. It
started as a stream dumper, but as additional subcommands were added,
contributors typically copied an existing subcommand's pipeline and
adapted it for different purposes.
This pattern has led to quite a bit of duplicated code and has also led
to some functional nonuniformities. For example, some subcommands
support opposite-endian streams and others don't.
### Overview
This PR segregates functions that most subcommands need into
free-standing modules and reimplements the existing subcommands in
terms of those modules. The current modules are:
* I/O
* Checksum validation and generation
* Byte-order detection and byteswapping
* Data compression
* Data decompression
* General validity checking
This PR also adds a generic pipeline mechanism that subcommands can use
to declare the processing they want. For example, the pipeline for
`zstream recompress` is:
```
zstream_chain_t recompress_chain = {
serial_read_stream(infile),
serial_validate_fletcher4(),
serial_byteswap(BS_INCOMING),
serial_validate_records(),
serial_decompress_writes(&spec),
serial_compress_writes(&spec),
serial_byteswap(BS_OUTGOING),
serial_add_fletcher4(),
serial_write_stream(outfile),
chain_terminator()
};
```
Or more succinctly:
```
zstream_chain_t recompress_chain = {
STANDARD_INPUT_STACK(infile),
serial_decompress_writes(&spec),
serial_compress_writes(&spec),
STANDARD_OUTPUT_STACK(outfile)
};
```
To execute the pipeline:
```
chain_attrs_t attrs = { .ca_command_opts = CA_FORBID_DEDUP };
zstream_chain_exec(recompress_chain, &attrs);
```
### Explanation of `serial_` prefixes
The `serial_` prefixes above indicate that the steps run sequentially
and that each module sees records in their original order. A future PR
will also allow multithreaded execution for individual steps and will
handle the marshaling involved in integrating serial and parallel
operations.
### What this is not
This PR is not a general cleanup. Original code that is not subsumed by
one of the standardized modules is left largely unchanged, although
some portions have been adapted as modules to work within the pipeline
context.
The PR does not change command syntax or behavior except to the extent
that standard modules may fix bugs or allow additional cases that
formerly failed.
### The test suite is separate
A comprehensive test suite for `zstream` is introduced in PR openzfs#18510. The
original intent was to verify that this PR did not modify any `zstream`
behavior. However, as I found issues in the existing code, I added
tests for them to the test suite as well. So unfortunately, the current
version of `zstream` will fail some of the tests in PR openzfs#18510. I would
suggest deferring acceptance of PR openzfs#18510 until this PR has been
resolved.
Alternatively, I can add known-issue flags to those failing tests.
### Effect on checksum generation
This PR incorporates changes equivalent to those in draft PR openzfs#18293,
which I submitted but later changed to draft status because I knew
they'd have to be fixed in a different way for this PR.
Draft PR openzfs#18293 includes a detailed explanation, but the TL;DR is:
`zstream` and `zfs send` formerly diverged somewhat in their checksum
generation patterns. `zstream` added checksums in some cases where `zfs
send` did not. It didn't matter functionally because `zfs receive` does
not attempt to validate those checksums anyway. However, the result was
that even null operations (such as `zstream redup`'ing a stream that
was not deduplicated) changed the contents of the stream.
The Fletcher4 component of this PR emulates the behavior of `zfs send`
more closely so that any no-op invocation of `zstream` leaves the
stream bit-for-bit identical to the input.
### Divisibility
This is a fairly large PR, and I'm presenting it as one big splat. But
if preferred, I can port it into smaller pieces: e.g., add a general
pipeline mechanism, add Fletcher4 pipeline modules, port zstream
decompress to the pipeline system, etc.
Signed-off-by: Garth Snyder <garth@garthsnyder.com>
This PR defines a `zstream` test category and adds tests to exercise all features of the `zstream` command. It was originally intended to guarantee that PR openzfs#18509 did not disrupt any user-facing behavior, but during the development of that latter PR, some bugs were fixed and tests were added for them. As a result, some of the tests in this PR will fail when run against the current version of `zstream`. I would suggest not integrating this PR in advance of PR openzfs#18509, but if it's preferred to have the tests in first, I can add known-issue flags for the failing tests. Or alternatively, this PR can be merged into PR openzfs#18509. Signed-off-by: Garth Snyder <garth@garthsnyder.com>
6880535 to
5930a19
Compare
Contributor
Author
|
Closing - I have merged this test suit into #18509. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR defines a
zstreamtest category and adds tests to exercise all features of thezstreamcommand. It was originally intended to guarantee that PR #18509 did not disrupt any user-facing behavior, but during the development of that latter PR, some bugs were fixed and tests were added for them. As a result, some of the tests in this PR will fail when run against the current version ofzstream.I would suggest not integrating this PR in advance of PR #18509, but if it's preferred to have the tests in first, I can add known-issue flags for the failing tests. Or alternatively, this PR can be merged into PR #18509.
Motivation and Context
I've made these tests a separate PR both because they're fairly extensive and because they require a large number of statically-defined streams to exercise endianness handling.
Types of changes
None of these: tests only.
Checklist:
Signed-off-by.