Skip to content

zstream: add comprehensive test suite#18510

Closed
GarthSnyder wants to merge 1 commit into
openzfs:masterfrom
GarthSnyder:pr-zstream-tests
Closed

zstream: add comprehensive test suite#18510
GarthSnyder wants to merge 1 commit into
openzfs:masterfrom
GarthSnyder:pr-zstream-tests

Conversation

@GarthSnyder
Copy link
Copy Markdown
Contributor

@GarthSnyder GarthSnyder commented May 8, 2026

This PR defines a zstream test category and adds tests to exercise all features of the zstream command. It was originally intended to guarantee that PR #18509 did not disrupt any user-facing behavior, but during the development of that latter PR, some bugs were fixed and tests were added for them. As a result, some of the tests in this PR will fail when run against the current version of zstream.

I would suggest not integrating this PR in advance of PR #18509, but if it's preferred to have the tests in first, I can add known-issue flags for the failing tests. Or alternatively, this PR can be merged into PR #18509.

Motivation and Context

I've made these tests a separate PR both because they're fairly extensive and because they require a large number of statically-defined streams to exercise endianness handling.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

None of these: tests only.

Checklist:

GarthSnyder added a commit to GarthSnyder/zfs that referenced this pull request May 8, 2026
### Motivation

In the current version of `zstream`, each subcommand is independent and
is responsible for implementing its own stream-processing pipeline. It
started as a stream dumper, but as additional subcommands were added,
contributors typically copied an existing subcommand's pipeline and
adapted it for different purposes.

This pattern has led to quite a bit of duplicated code and has also led
to some functional nonuniformities. For example, some subcommands
support opposite-endian streams and others don't.

### Overview

This PR segregates functions that most subcommands need into
free-standing modules and reimplements the existing subcommands in
terms of those modules. The current modules are:

* I/O
* Checksum validation and generation
* Byte-order detection and byteswapping
* Data compression
* Data decompression
* General validity checking

This PR also adds a generic pipeline mechanism that subcommands can use
to declare the processing they want. For example, the pipeline for
`zstream recompress` is:

```
	zstream_chain_t recompress_chain = {
		serial_read_stream(infile),
		serial_validate_fletcher4(),
		serial_byteswap(BS_INCOMING),
		serial_validate_records(),
		serial_decompress_writes(&spec),
		serial_compress_writes(&spec),
		serial_byteswap(BS_OUTGOING),
		serial_add_fletcher4(),
		serial_write_stream(outfile),
		chain_terminator()
	};
```

Or more succinctly:

```
	zstream_chain_t recompress_chain = {
		STANDARD_INPUT_STACK(infile),
		serial_decompress_writes(&spec),
		serial_compress_writes(&spec),
		STANDARD_OUTPUT_STACK(outfile)
	};
```

To execute the pipeline:

```
	chain_attrs_t attrs = { .ca_command_opts = CA_FORBID_DEDUP };
	zstream_chain_exec(recompress_chain, &attrs);
```

### Explanation of `serial_` prefixes

The `serial_` prefixes above indicate that the steps run sequentially
and that each module sees records in their original order. A future PR
will also allow multithreaded execution for individual steps and will
handle the marshaling involved in integrating serial and parallel
operations.

### What this is not

This PR is not a general cleanup. Original code that is not subsumed by
one of the standardized modules is left largely unchanged, although
some portions have been adapted as modules to work within the pipeline
context.

The PR does not change command syntax or behavior except to the extent
that standard modules may fix bugs or allow additional cases that
formerly failed.

### The test suite is separate

A comprehensive test suite for `zstream` is introduced in PR openzfs#18510. The
original intent was to verify that this PR did not modify any `zstream`
behavior. However, as I found issues in the existing code, I added
tests for them to the test suite as well. So unfortunately, the current
version of `zstream` will fail some of the tests in PR openzfs#18510. I would
suggest deferring acceptance of PR openzfs#18510 until this PR has been
resolved.

Alternatively, I can add known-issue flags to those failing tests.

### Effect on checksum generation

This PR incorporates changes equivalent to those in draft PR openzfs#18293,
which I submitted but later changed to draft status because I knew
they'd have to be fixed in a different way for this PR.

Draft PR openzfs#18293 includes a detailed explanation, but the TL;DR is:
`zstream` and `zfs send` formerly diverged somewhat in their checksum
generation patterns. `zstream` added checksums in some cases where `zfs
send` did not. It didn't matter functionally because `zfs receive` does
not attempt to validate those checksums anyway. However, the result was
that even null operations (such as `zstream redup`'ing a stream that
was not deduplicated) changed the contents of the stream.

The Fletcher4 component of this PR emulates the behavior of `zfs send`
more closely so that any no-op invocation of `zstream` leaves the
stream bit-for-bit identical to the input.

### Divisibility

This is a fairly large PR, and I'm presenting it as one big splat. But
if preferred, I can port it into smaller pieces: e.g., add a general
pipeline mechanism, add Fletcher4 pipeline modules, port zstream
decompress to the pipeline system, etc.

Signed-off-by: Garth Snyder <garth@garthsnyder.com>
@github-actions github-actions Bot added the Status: Work in Progress Not yet ready for general review label May 8, 2026
GarthSnyder added a commit to GarthSnyder/zfs that referenced this pull request May 8, 2026
### Motivation

In the current version of `zstream`, each subcommand is independent and
is responsible for implementing its own stream-processing pipeline. It
started as a stream dumper, but as additional subcommands were added,
contributors typically copied an existing subcommand's pipeline and
adapted it for different purposes.

This pattern has led to quite a bit of duplicated code and has also led
to some functional nonuniformities. For example, some subcommands
support opposite-endian streams and others don't.

### Overview

This PR segregates functions that most subcommands need into
free-standing modules and reimplements the existing subcommands in
terms of those modules. The current modules are:

* I/O
* Checksum validation and generation
* Byte-order detection and byteswapping
* Data compression
* Data decompression
* General validity checking

This PR also adds a generic pipeline mechanism that subcommands can use
to declare the processing they want. For example, the pipeline for
`zstream recompress` is:

```
	zstream_chain_t recompress_chain = {
		serial_read_stream(infile),
		serial_validate_fletcher4(),
		serial_byteswap(BS_INCOMING),
		serial_validate_records(),
		serial_decompress_writes(&spec),
		serial_compress_writes(&spec),
		serial_byteswap(BS_OUTGOING),
		serial_add_fletcher4(),
		serial_write_stream(outfile),
		chain_terminator()
	};
```

Or more succinctly:

```
	zstream_chain_t recompress_chain = {
		STANDARD_INPUT_STACK(infile),
		serial_decompress_writes(&spec),
		serial_compress_writes(&spec),
		STANDARD_OUTPUT_STACK(outfile)
	};
```

To execute the pipeline:

```
	chain_attrs_t attrs = { .ca_command_opts = CA_FORBID_DEDUP };
	zstream_chain_exec(recompress_chain, &attrs);
```

### Explanation of `serial_` prefixes

The `serial_` prefixes above indicate that the steps run sequentially
and that each module sees records in their original order. A future PR
will also allow multithreaded execution for individual steps and will
handle the marshaling involved in integrating serial and parallel
operations.

### What this is not

This PR is not a general cleanup. Original code that is not subsumed by
one of the standardized modules is left largely unchanged, although
some portions have been adapted as modules to work within the pipeline
context.

The PR does not change command syntax or behavior except to the extent
that standard modules may fix bugs or allow additional cases that
formerly failed.

### The test suite is separate

A comprehensive test suite for `zstream` is introduced in PR openzfs#18510. The
original intent was to verify that this PR did not modify any `zstream`
behavior. However, as I found issues in the existing code, I added
tests for them to the test suite as well. So unfortunately, the current
version of `zstream` will fail some of the tests in PR openzfs#18510. I would
suggest deferring acceptance of PR openzfs#18510 until this PR has been
resolved.

Alternatively, I can add known-issue flags to those failing tests.

### Effect on checksum generation

This PR incorporates changes equivalent to those in draft PR openzfs#18293,
which I submitted but later changed to draft status because I knew
they'd have to be fixed in a different way for this PR.

Draft PR openzfs#18293 includes a detailed explanation, but the TL;DR is:
`zstream` and `zfs send` formerly diverged somewhat in their checksum
generation patterns. `zstream` added checksums in some cases where `zfs
send` did not. It didn't matter functionally because `zfs receive` does
not attempt to validate those checksums anyway. However, the result was
that even null operations (such as `zstream redup`'ing a stream that
was not deduplicated) changed the contents of the stream.

The Fletcher4 component of this PR emulates the behavior of `zfs send`
more closely so that any no-op invocation of `zstream` leaves the
stream bit-for-bit identical to the input.

### Divisibility

This is a fairly large PR, and I'm presenting it as one big splat. But
if preferred, I can port it into smaller pieces: e.g., add a general
pipeline mechanism, add Fletcher4 pipeline modules, port zstream
decompress to the pipeline system, etc.

Signed-off-by: Garth Snyder <garth@garthsnyder.com>
GarthSnyder added a commit to GarthSnyder/zfs that referenced this pull request May 11, 2026
### Motivation

In the current version of `zstream`, each subcommand is independent and
is responsible for implementing its own stream-processing pipeline. It
started as a stream dumper, but as additional subcommands were added,
contributors typically copied an existing subcommand's pipeline and
adapted it for different purposes.

This pattern has led to quite a bit of duplicated code and has also led
to some functional nonuniformities. For example, some subcommands
support opposite-endian streams and others don't.

### Overview

This PR segregates functions that most subcommands need into
free-standing modules and reimplements the existing subcommands in
terms of those modules. The current modules are:

* I/O
* Checksum validation and generation
* Byte-order detection and byteswapping
* Data compression
* Data decompression
* General validity checking

This PR also adds a generic pipeline mechanism that subcommands can use
to declare the processing they want. For example, the pipeline for
`zstream recompress` is:

```
	zstream_chain_t recompress_chain = {
		serial_read_stream(infile),
		serial_validate_fletcher4(),
		serial_byteswap(BS_INCOMING),
		serial_validate_records(),
		serial_decompress_writes(&spec),
		serial_compress_writes(&spec),
		serial_byteswap(BS_OUTGOING),
		serial_add_fletcher4(),
		serial_write_stream(outfile),
		chain_terminator()
	};
```

Or more succinctly:

```
	zstream_chain_t recompress_chain = {
		STANDARD_INPUT_STACK(infile),
		serial_decompress_writes(&spec),
		serial_compress_writes(&spec),
		STANDARD_OUTPUT_STACK(outfile)
	};
```

To execute the pipeline:

```
	chain_attrs_t attrs = { .ca_command_opts = CA_FORBID_DEDUP };
	zstream_chain_exec(recompress_chain, &attrs);
```

### Explanation of `serial_` prefixes

The `serial_` prefixes above indicate that the steps run sequentially
and that each module sees records in their original order. A future PR
will also allow multithreaded execution for individual steps and will
handle the marshaling involved in integrating serial and parallel
operations.

### What this is not

This PR is not a general cleanup. Original code that is not subsumed by
one of the standardized modules is left largely unchanged, although
some portions have been adapted as modules to work within the pipeline
context.

The PR does not change command syntax or behavior except to the extent
that standard modules may fix bugs or allow additional cases that
formerly failed.

### The test suite is separate

A comprehensive test suite for `zstream` is introduced in PR openzfs#18510. The
original intent was to verify that this PR did not modify any `zstream`
behavior. However, as I found issues in the existing code, I added
tests for them to the test suite as well. So unfortunately, the current
version of `zstream` will fail some of the tests in PR openzfs#18510. I would
suggest deferring acceptance of PR openzfs#18510 until this PR has been
resolved.

Alternatively, I can add known-issue flags to those failing tests.

### Effect on checksum generation

This PR incorporates changes equivalent to those in draft PR openzfs#18293,
which I submitted but later changed to draft status because I knew
they'd have to be fixed in a different way for this PR.

Draft PR openzfs#18293 includes a detailed explanation, but the TL;DR is:
`zstream` and `zfs send` formerly diverged somewhat in their checksum
generation patterns. `zstream` added checksums in some cases where `zfs
send` did not. It didn't matter functionally because `zfs receive` does
not attempt to validate those checksums anyway. However, the result was
that even null operations (such as `zstream redup`'ing a stream that
was not deduplicated) changed the contents of the stream.

The Fletcher4 component of this PR emulates the behavior of `zfs send`
more closely so that any no-op invocation of `zstream` leaves the
stream bit-for-bit identical to the input.

### Divisibility

This is a fairly large PR, and I'm presenting it as one big splat. But
if preferred, I can port it into smaller pieces: e.g., add a general
pipeline mechanism, add Fletcher4 pipeline modules, port zstream
decompress to the pipeline system, etc.

Signed-off-by: Garth Snyder <garth@garthsnyder.com>
This PR defines a `zstream` test category and adds tests to exercise all
features of the `zstream` command. It was originally intended to
guarantee that PR openzfs#18509 did not disrupt any user-facing behavior, but
during the development of that latter PR, some bugs were fixed and
tests were added for them. As a result, some of the tests in this  PR
will fail when run against the current version of `zstream`.

I would suggest not integrating this PR in advance of PR openzfs#18509, but if
it's preferred to have the tests in first, I can add known-issue flags
for the failing tests. Or alternatively, this PR can be merged into
PR openzfs#18509.

Signed-off-by: Garth Snyder <garth@garthsnyder.com>
@GarthSnyder
Copy link
Copy Markdown
Contributor Author

Closing - I have merged this test suit into #18509.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Work in Progress Not yet ready for general review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant