zstream: refactor common functions#18509
Conversation
This PR defines a `zstream` test category and adds tests to exercise all features of the `zstream` command. It was originally intended to guarantee that PR openzfs#18509 did not disrupt any user-facing behavior, but during the development of that latter PR, some bugs were fixed and tests were added for them. As a result, some of the tests in this PR will fail when run against the current version of `zstream`. I would suggest not integrating this PR in advance of PR openzfs#18509, but if it's preferred to have the tests in first, I can add known-issue flags for the failing tests. Or alternatively, this PR can be merged into PR openzfs#18509. Signed-off-by: Garth Snyder <garth@garthsnyder.com>
371d0c9 to
fab2ca4
Compare
|
I'm a bit confused as to what checkstyle is actually complaining about here. "actual" is an input variable, so its value should never undefined. I suspect it's observing that actual is reassigned in the if (swap) case and not assigned otherwise, but both paths are valid. |
fab2ca4 to
24a9bb0
Compare
This PR defines a `zstream` test category and adds tests to exercise all features of the `zstream` command. It was originally intended to guarantee that PR openzfs#18509 did not disrupt any user-facing behavior, but during the development of that latter PR, some bugs were fixed and tests were added for them. As a result, some of the tests in this PR will fail when run against the current version of `zstream`. I would suggest not integrating this PR in advance of PR openzfs#18509, but if it's preferred to have the tests in first, I can add known-issue flags for the failing tests. Or alternatively, this PR can be merged into PR openzfs#18509. Signed-off-by: Garth Snyder <garth@garthsnyder.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors zstream subcommands to share a common, declarative stream-processing pipeline (“chain”) and extracts previously duplicated logic into reusable modules (I/O, checksum handling, byte order, record validation, and (de)compression). This aims to reduce duplication and improve behavioral uniformity across subcommands (e.g., endian handling and checksum behavior/idempotence).
Changes:
- Introduces a
zstream_chainexecution framework and shared “STANDARD_INPUT_STACK/OUTPUT_STACK” module macros. - Adds new pipeline modules for stream I/O, Fletcher4 validation/inscription, byteswapping, and record validation.
- Ports existing subcommands (dump, recompress, redup, drop_record, decompress) to the new pipeline model.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/zstream/zstream.c | Trims includes; keeps CLI dispatch/usage for refactored subcommands. |
| cmd/zstream/zstream_validate.h | Declares record-validation chain step. |
| cmd/zstream/zstream_validate.c | Adds record validation module (nesting checks, compression type validation, etc.). |
| cmd/zstream/zstream_util.h | Adds compression helpers/spec, checksum formatting/validation APIs. |
| cmd/zstream/zstream_util.c | Implements checksum helpers plus generic compress/decompress helpers. |
| cmd/zstream/zstream_token.c | Updates includes; token subcommand remains largely standalone. |
| cmd/zstream/zstream_redup.c | Ports redup to pipeline steps and shared modules. |
| cmd/zstream/zstream_recompress.h | Declares recompress chain steps for (de)compression. |
| cmd/zstream/zstream_recompress.c | Refactors recompress into pipeline steps and shared compression helpers. |
| cmd/zstream/zstream_modules.h | Aggregates module headers and defines standard input/output stack macros. |
| cmd/zstream/zstream_io.h | Defines drr_packet_t and declares I/O + checkpoint + null output steps. |
| cmd/zstream/zstream_io.c | Implements stream read/write/null-output/checkpoint steps and stream attribute detection. |
| cmd/zstream/zstream_fletcher4.h | Declares Fletcher4 validate/add steps for pipelines. |
| cmd/zstream/zstream_fletcher4.c | Implements Fletcher4 checksum validation and checksum inscription behavior. |
| cmd/zstream/zstream_dump.c | Ports dump to pipeline; adds nvlist encoding reporting and updated summary. |
| cmd/zstream/zstream_drop_record.c | Ports drop_record to pipeline with a record-dropping step. |
| cmd/zstream/zstream_decompress.c | Ports decompress to pipeline and shared decompression helper. |
| cmd/zstream/zstream_chain.h | Introduces chain types, attrs/options/stats, and execution API. |
| cmd/zstream/zstream_chain.c | Implements chain executor and centralized library init/fini. |
| cmd/zstream/zstream_byteswap.h | Declares byteswap step + byteswap_record helper. |
| cmd/zstream/zstream_byteswap.c | Implements byteswap module for incoming/outgoing stages. |
| cmd/zstream/Makefile.am | Adds new module sources/headers to the build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static char * | ||
| stringify_encryption_fields(void *crypto_in) | ||
| { | ||
| char *buf = safe_malloc(SPA_MAXBLOCKSIZE); | ||
| uint64_t drr_record_count[DRR_NUMTYPES] = { 0 }; | ||
| uint64_t total_payload_size = 0; | ||
| uint64_t total_overhead_size = 0; | ||
| uint64_t drr_byte_count[DRR_NUMTYPES] = { 0 }; | ||
| crypto_fields_t *crypto = crypto_in; | ||
| char salt[ZIO_DATA_SALT_LEN * 2 + 1]; | ||
| char iv[ZIO_DATA_IV_LEN * 2 + 1]; | ||
| char mac[ZIO_DATA_MAC_LEN * 2 + 1]; | ||
| uint64_t total_records = 0; | ||
| uint64_t payload_size; | ||
| dmu_replay_record_t thedrr; | ||
| dmu_replay_record_t *drr = &thedrr; | ||
| struct drr_begin *drrb = &thedrr.drr_u.drr_begin; | ||
| struct drr_end *drre = &thedrr.drr_u.drr_end; | ||
| struct drr_object *drro = &thedrr.drr_u.drr_object; | ||
| struct drr_freeobjects *drrfo = &thedrr.drr_u.drr_freeobjects; | ||
| struct drr_write *drrw = &thedrr.drr_u.drr_write; | ||
| struct drr_write_byref *drrwbr = &thedrr.drr_u.drr_write_byref; | ||
| struct drr_free *drrf = &thedrr.drr_u.drr_free; | ||
| struct drr_spill *drrs = &thedrr.drr_u.drr_spill; | ||
| struct drr_write_embedded *drrwe = &thedrr.drr_u.drr_write_embedded; | ||
| struct drr_object_range *drror = &thedrr.drr_u.drr_object_range; | ||
| struct drr_redact *drrr = &thedrr.drr_u.drr_redact; | ||
| struct drr_checksum *drrc = &thedrr.drr_u.drr_checksum; | ||
| int c; | ||
| boolean_t verbose = B_FALSE; | ||
| boolean_t very_verbose = B_FALSE; | ||
| boolean_t first = B_TRUE; | ||
| /* | ||
| * dump flag controls whether the contents of any modified data blocks | ||
| * are printed to the console during processing of the stream. Warning: | ||
| * for large streams, this can obviously lead to massive prints. | ||
| */ | ||
| boolean_t dump = B_FALSE; | ||
| int err; | ||
| zio_cksum_t zc = { { 0 } }; | ||
| zio_cksum_t pcksum = { { 0 } }; | ||
| static char buff[sizeof (salt) + sizeof (iv) + sizeof (mac) + 32]; | ||
|
|
||
| sprintf_bytes(salt, crypto->drr_salt, ZIO_DATA_SALT_LEN); | ||
| sprintf_bytes(iv, crypto->drr_iv, ZIO_DATA_IV_LEN); | ||
| sprintf_bytes(mac, crypto->drr_mac, ZIO_DATA_MAC_LEN); | ||
| snprintf(buff, sizeof (buff), "salt = %s iv = %s mac = %s", | ||
| salt, iv, mac); | ||
| return (buff); | ||
| } |
There was a problem hiding this comment.
The current code is certainly a hack, although one that I would be surprised to see fail on any system or compiler supported by OpenZFS. And the diff above doesn't do it justice. We're talking about:
typedef struct {
uint8_t drr_salt[ZIO_DATA_SALT_LEN];
uint8_t drr_iv[ZIO_DATA_IV_LEN];
uint8_t drr_mac[ZIO_DATA_MAC_LEN];
} crypto_fields_t;
static char *
stringify_encryption_fields(void *crypto_in)
{
crypto_fields_t *crypto = crypto_in;
char salt[sizeof (crypto->drr_salt) * 2 + 1];
char iv[sizeof (crypto->drr_iv) * 2 + 1];
char mac[sizeof (crypto->drr_mac) * 2 + 1];
static char buff[sizeof (salt) + sizeof (iv) + sizeof (mac) + 32];
sprintf_bytes(salt, crypto->drr_salt, sizeof (crypto->drr_salt));
sprintf_bytes(iv, crypto->drr_iv, sizeof (crypto->drr_iv));
sprintf_bytes(mac, crypto->drr_mac, sizeof (crypto->drr_mac));
snprintf(buff, sizeof (buff), "salt = %s iv = %s mac = %s",
salt, iv, mac);
return (buff);
}
...
stringify_encryption_fields(&drrw->drr_salt);
The problem is not how to conveniently pass one value instead of three; it's how to pass three separate arrays, each of the same type but of a different, specific, and semi-permanently defined lengths.
The crypto_fields_t solution at least defines a single authoritative place (within zstream dump) where the lengths and ordering of the fields is defined. The broader ZFS code defines this triad separately every time it's used, and it appears in a bunch of different structs and contexts.
What's is the real alternative here? Pass pointers to the first element of each array and assume you can rely on ZIO_DATA_SALT_LEN et al for sizing? Pass complete arrays of defined sizes so that any mismatch at least generates a compiler warning? Nothing really stands out to me as a better solution.
I will defer to the maintainers on this one.
|
Incorporating the tests into this PR would be best. Adding new tests for functionality that is independent of this PR can be separate, but tests covering the changes here should be included here (that is the meaning of "I have added tests to cover my changes" I believe). I will be upstreaming a new zstream subcommand in the near future. Some consistency in the existing code was certainly lacking, so I appreciate the effort. Perhaps you will prefer to see my changes first to make sure your design can accommodate the new use case? |
It's kind of you to embrace this. I'm right on the knife edge between "this is going to simplify everything and everyone will love it!" vs. "OMG, it's just another API people will have to learn just to do basic stuff that they used to be able to bang out in half an hour." So I'm open to all comments, both positive and negative, and I'd be very interested to see your plans. I think you're right about tests. I reflexively avoided combining one large PR with an equally large PR for testing. But you're right, that doesn't actually make logical sense or facilitate anyone's review. As far as zstream API changes, I will help in whatever way I can. Is there a branch I can look at now? That would at least give me a sense of what you're shooting for. My near-term plans for zstream are:
I'd argue that that last item is a must-have for zstream just because of the fact that ZFS prides itself on its scalability. If you can generate zettabyte-scale dump streams, the tooling better be able to deal with streams of that scale correctly and efficiently. Currently, all hashing is memory-based and the failure mode is "sorry, hash table's full, can't do no more of that." The three line-items above are implemented, just not yet PR-ready. But if any of this is relevant to your plans, I will prioritize accordingly. |
24a9bb0 to
345e20c
Compare
|
I have merged the test suit from #18510 into this PR. |
345e20c to
6a3b427
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 54 out of 88 changed files in this pull request and generated 20 comments.
Comments suppressed due to low confidence (3)
cmd/zstream/zstream_io.c:246
ic_offsetis anoff_t, but this diagnostic prints it with%lu. That is not portable on ILP32 or platforms whereoff_tis signed/64-bit with a different underlying type; cast to a fixed type and use a matching format.
if (ferror(context->ic_fp)) {
err(1, "error reading record payload at "
" offset %lu", context->ic_offset);
cmd/zstream/zstream_io.c:255
ic_offsetis anoff_t, but the warning formats it with%lu. This can misprint or be undefined on platforms whereoff_tis notunsigned long; use a cast/format pair that matches the type.
warnx("input ends mid-record at offset %lu "
"- stream is likely corrupt",
context->ic_offset);
tests/zfs-tests/tests/functional/zstream/zstream_dump_004_neg.ksh:88
- The pass message still says the test expects exit code 95 only, even though the test accepts 45 on FreeBSD as well. Keep the message aligned with the actual portable expectation.
6a3b427 to
b1f0147
Compare
b1f0147 to
b8fc5cc
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 54 out of 88 changed files in this pull request and generated 34 comments.
Comments suppressed due to low confidence (12)
cmd/zstream/zstream_io.c:272
drr_typecomes directly from the input stream and is used as an index before any bounds check. A corrupt stream withdrr_type >= DRR_NUMTYPESwill write pastca_stats_in, so validate the type before updating stats (and before passing it downstream).
uint32_t drr_type = ATTR_IS_SET(chain_attrs, CA_BYTESWAPPED) ?
BSWAP_32(drr->drr_type) : drr->drr_type;
record_stats_t *stats = &chain_attrs->ca_stats_in[drr_type];
stats->rs_num_records++;
cmd/zstream/zstream_drop_record.c:77
- For
DRR_WRITE_EMBEDDED, this verbose message printsdrrw->drr_offset, but that union layout reads the embedded record's length rather than its offset. Use thedrrwefields whenrecord_typeisWRITE_EMBEDDEDso verbose output identifies the record actually being dropped.
warnx("dropping %s record for object %llu offset %llu",
record_type,
(u_longlong_t)drrw->drr_object,
(u_longlong_t)drrw->drr_offset);
cmd/zstream/scripts/add-xattrs.py:1
- The script hard-codes the interpreter to
/tmp/zstream-venv/bin/python3, so it fails unless the helper venv was created at exactly that path. Use a portable interpreter path or invoke it through the venv activation script so these generation helpers work outside one local setup.
#!/tmp/zstream-venv/bin/python3
tests/zfs-tests/tests/functional/zstream/zstream_dump_004_neg.ksh:88
- The pass message says only exit code 95 is expected, but the test intentionally accepts both 45 and 95. This should match the cross-platform expectation checked above.
cmd/zstream/zstream_io.c:255 ic_offsetis anoff_t, but this warning formats it with%lu. Use a format/cast that matchesoff_tso truncated-stream diagnostics are correct on ILP32 and other platforms whereoff_tis notunsigned long.
warnx("input ends mid-record at offset %lu "
"- stream is likely corrupt",
context->ic_offset);
cmd/zstream/zstream_io.c:247
ic_offsetis anoff_t, but this error formats it with%lu. Use a format/cast that matchesoff_tso diagnostics are portable and do not truncate offsets on 32-bit builds.
if (ferror(context->ic_fp)) {
err(1, "error reading record payload at "
" offset %lu", context->ic_offset);
} else {
cmd/zstream/zstream_redup.c:147
err()appends the errno text itself, so the trailing:in this message produces a duplicated separator in the final diagnostic. Use a message without the final colon/space.
if (fread(drr, sizeof (*drr), 1, context->rc_fp) != 1) {
err(1, "read of prior write failed: ");
}
cmd/zstream/zstream_redup.c:163
err()appends the errno text itself, so the trailing:in this message produces a duplicated separator in the final diagnostic. Use a message without the final colon/space.
size_t n_read = fread(item->dp_payload, item->dp_payload_size,
1, context->rc_fp);
if (n_read != 1)
err(1, "read of prior payload failed: ");
cmd/zstream/scripts/make-dump-files.py:73
- The help text says these are compressed streams, but
run_dump()feeds the file directly tozstream dumpand does not decompress.bz2/.gzinputs. Either update the wording to require raw send streams or add decompression based on the suffix.
parser.add_argument(
"streams", nargs="+", type=Path, help="Compressed streams to process"
)
cmd/zstream/scripts/gen-lorem-files.py:85
- The help text for
--min-sizesays the default is 2048, but the parser default is 16384. This makes generated stream contents less reproducible for anyone following the helper's CLI help.
parser.add_argument("--min-size", type=int, default=16384,
help="Minimum file size in bytes (default: 2048)")
cmd/zstream/zstream_io.c:240
- The BEGIN payload size limit is enforced later in
serial_validate_records(), but the buffer is allocated here first. A malformed BEGIN record with a very largedrr_payloadlencan force a large allocation before the validation step rejects it; apply the size cap beforesafe_malloc().
item->dp_payload_size = calc_payload_size(&item->dp_drr);
if (item->dp_payload_size > 0) {
item->dp_payload = safe_malloc(item->dp_payload_size);
cmd/zstream/zstream_io.c:241
calc_payload_size()returnssize_tfrom 64-bit record fields, but it is stored in the 32-bitdp_payload_sizewithout checking for overflow. A malformed WRITE/SPILL size aboveUINT32_MAXwill be truncated, causing the parser to read the wrong amount of data and desynchronize from the stream.
item->dp_payload_size = calc_payload_size(&item->dp_drr);
if (item->dp_payload_size > 0) {
item->dp_payload = safe_malloc(item->dp_payload_size);
size_t n_read = fread(item->dp_payload, item->dp_payload_size,
|
Here is a draft of what is in the pipeline for me. In short, it is yet another copy-paste of zstream_dump.c with some changes to error handling, some minimal input validation, and hooking in to a few parts of the stream to write out to a disk image file or device. While my work is not very different from anything we already have, we'll want to make sure that your refactoring can fulfill the basic needs if not the exact behavior. I'm mostly thinking of accommodating the input validation behavior, error handling behavior, and general state management. I haven't been free to fully read through your proposal yet, but at a high level I resonate with the motivation. We are in the client testing phase of my project right now, so not quite ready to start upstreaming but I don't expect any significant changes or delays will be required. In the meantime, feel free to have a look. The extent of your refactoring will likely require a longer review period. It was discussed briefly on yesterday's leadership call, and we proposed landing my addition first as the smoothest route forward. |
b8fc5cc to
ed81f41
Compare
ed81f41 to
6bcc5fe
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 54 out of 88 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (2)
tests/zfs-tests/tests/functional/zstream/zstream_redup_001_pos.ksh:1
- These commands aren’t wrapped with
log_must, so failures (missing fixture, bzcat error, zstream error) can silently cascade into misleadingcmpmismatches or false positives depending on leftover files. Uselog_must(orlog_must eval ...) for both steps so the test fails at the actual point of failure.
tests/zfs-tests/tests/functional/zstream/zstream_dump_002_pos.ksh:1 - This pipeline doesn’t check exit status. In
ksh, withoutset -o pipefail, a failure inbzcatmay be ignored and the test will proceed with partial/empty output. Consider either enablingpipefailfor the test (or locally around this command) and usinglog_must, or avoiding a pipeline by decompressing to a temp file first and then runningzstream dump -vunderlog_must.
|
@ryan-moeller, this looks like it would be pretty straightforward to port. There's some framework in place for forbidding or requiring stream features that will probably need expansion, and the decompression module will need to expand to cover WRITE_EMBEDDED as well as WRITE. But for the most part, it looks like the structure is pretty much IO/byteswap/dump/decompress/do-back-end-stuff. The first four of those look like they can be replaced by standard modules, and the back-end stuff looks like it could function independently as a separate module. If you are dumping more information than the standard There shouldn't be anything that would interfere with you dropping in |
6bcc5fe to
7521623
Compare
The current zstream subcommands write end-of-record checksums for all DRR_END records. However, this behavior is inconsistent with the streams generated by zfs send; any DRR_END record generated by send_conclusion_record() in libzfs_sendrecv.c:2175 has the checksum set to zero. This PR adjusts zstream to mimic zfs send's behavior. Checksums that are currently being modified include the doubled END in a replication stream as well as some stream-internal END cases. zfs receive doesn't mind the current behavior because it seems to ignore these checksums anyway. The problem posed by the current behavior is that it breaks the idempotence of zstream commands. A zstream redup that's given no writes to redup should yield back the original input stream, and so for recompress. Currently, the output is not identical. That introduces noise that developers have to filter out to debug stream processing, and it prevents end-users from detecting changes with simple tools like hashes. This PR modifies dump_record() to replicate the behavior of zfs send. If a DRR_END record has a drr_toguid of zero and its incoming checksum is zero, then the checksum is left at zero. Subcommands were formerly erasing all end-record checksums after validation, with comments remarking that this "needs to be done". The "needs to be done" part seems to stem from an assertion in dump_record() that requires outgoing records to start with zero checksums. I suspect the erasing and assertion are holdovers from libzfs, where they probably do serve some purpose. However, nothing within zstream currently seems to rely on this behavior. Removing that code allows the original checksums to flow through to dump_record(), where they can be inspected to determine which END records previously had waived checksums. Without access to the original checksums, dump_record() would either need to maintain additional state about the stream or rely on the zero toguid as a marker. If either of those approaches is preferable, I will adjust. The drr_toguid derives from ds_guid, which appears to be randomly generated and not explicitly checked against zero. Is zero a potentially valid GUID? Signed-off-by: Garth Snyder <garth@garthsnyder.com>
7521623 to
0589a86
Compare
This description and the corresponding commit message have been updated to reflect the integration of the test suite from former PR #18510.
Motivation
In the current version of
zstream, each subcommand is independent and is responsible for implementing its own stream-processing pipeline. It started as a stream dumper, but as additional subcommands were added, contributors typically copied an existing subcommand's pipeline and adapted it for different purposes.This pattern has led to quite a bit of duplicated code and has also led to some functional nonuniformities. For example, some subcommands support opposite-endian streams and others don't.
Overview
This PR segregates functions that most subcommands need into free-standing modules and reimplements the existing subcommands in terms of those modules. The current modules are:
This PR also adds a generic pipeline mechanism that subcommands can use to declare the processing they want. For example, the pipeline for
zstream recompressis:Or more succinctly:
To execute the pipeline:
Explanation of
serial_prefixesThe
serial_prefixes above indicate that the steps run sequentially and that each module sees records in their original order. A future PR will also allow multithreaded execution for individual steps and will handle the marshaling involved in integrating serial and parallel operations.What this is not
This PR is not a general cleanup. Original code that is not subsumed by one of the standardized modules is left largely unchanged, although in some cases a few unavoidable modifications have been made to adapt the prior code to the pipeline context.
This PR does not change command syntax or behavior in any way except to the extent that standard modules may fix bugs or allow additional cases that formerly failed.
Comprehensive test suite
This PR defines a
zstreamtest category and adds tests to exercise all features of thezstreamcommand.My original intent was to submit the PR for this test suite first and independently of the modular refactoring, with an eye toward verifying that the later restructuring did not change existing behavior. However, I did find occasional issues in the existing code as I worked on the refactoring and added tests for these issues to the test suite. Ultimately, the tests ended up flagging enough issues in the existing code that they are no longer very useful as an "everything behaves the same" check, so I'm just including them here. They are still a comprehensive workout for all of
zstream's functions, however.Effect on checksum generation
This PR incorporates changes equivalent to those in draft PR #18293, which I submitted but later changed to draft status because I knew they'd have to be fixed in a different way for this PR.
Draft PR #18293 includes a detailed explanation, but the TL;DR is:
zstreamandzfs sendformerly diverged somewhat in their checksum generation patterns.zstreamadded checksums in some cases wherezfs senddid not. It didn't matter functionally becausezfs receivedoes not attempt to validate those checksums anyway. However, the result was that even null operations (such aszstream redup'ing a stream that was not deduplicated) changed the contents of the stream.The Fletcher4 component of this PR emulates the behavior of
zfs sendmore closely so that any no-op invocation ofzstreamleaves the stream bit-for-bit identical to the input.Divisibility
This is a fairly large PR, and I'm presenting it as one big splat. But if preferred, I can port it into smaller pieces: e.g., add a general pipeline mechanism, add Fletcher4 pipeline modules, port zstream decompress to the pipeline system, etc.
Types of changes
Breaking changes:
zstream.zstream dumpnow prints the encoding of any nvlist attached to a BEGIN record. This change is also in Consistently encode DRR_BEGIN packed nvlist payloads with NV_ENCODE_XDR #18372, but it looks a bit different here because the context has changed.zstream dumpnow includes DRR_OBJECT_RANGE and DRR_REDACT records in the end-of-dump summary. These were formerly omitted.Checklist:
Signed-off-by.