Skip to content

Consistently encode DRR_BEGIN packed nvlist payloads with NV_ENCODE_XDR#18372

Merged
behlendorf merged 1 commit into
openzfs:masterfrom
GarthSnyder:pr-xdr-nvlists
May 12, 2026
Merged

Consistently encode DRR_BEGIN packed nvlist payloads with NV_ENCODE_XDR#18372
behlendorf merged 1 commit into
openzfs:masterfrom
GarthSnyder:pr-xdr-nvlists

Conversation

@GarthSnyder
Copy link
Copy Markdown
Contributor

@GarthSnyder GarthSnyder commented Mar 25, 2026

This is a fix for #18360.

Currently, zfs send generates a mix of nvlist encodings in DRR_BEGIN records, some XDR and some in native byte order. The result is that many streams currently can't be zfs received on opposite-endian systems.

zfs send generates the outer wrappers for compound streams in userspace, and it explicitly requests NV_ENCODE_XDR format for those records. But the BEGIN records for individual datasets are generated on the kernel side, in dmu_send.c, where fnvlist_pack() is used for encoding. That routine hard-wires NV_ENCODE_NATIVE format.

This PR replaces the fnvlist_pack() call with a direct call to nvlist_pack() that specifies NV_ENCODE_XDR.

Motivation and Context

Currently, cross-endian zfs receives can fail because there is no facility within ZFS for byteswapping packed nvlists after the fact. When a DRR_BEGIN record with a cross-endian NV_ENCODE_NATIVE nvlist is received, the kernel rejects it with ENOTSUP, aborting the whole receive.

This PR is a step toward making any valid send stream readable and importable on any ZFS system. There are no doubt other stream encoding issues yet to be resolved, but in my limited testing, many opposite-endian-generated streams seem to be received just fine with this patch in place.

How Has This Been Tested?

This change likely affects the majority of nontrivial send streams, so the existing test suite is already a fairly comprehensive vetting, at least as far as same-endian functionality goes.

I have also built with this change on a big-endian system and generated several send streams that formerly were unimportable on little-endian systems. They now import fine.

Of note, this change requires no receiving-end support. All-XDR streams are already supported by the existing nvlist_unpack() infrastructure. There does not appear to be any stream-related code that does anything with packed nvlists other than passing them along to nvlist_unpack().

I will include cross-endian stream testing as part of a separate testing revamp for zstream. There are some interdependencies, so it would be helpful to have this PR in master before that PR is submitted.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@github-actions github-actions Bot added the Status: Work in Progress Not yet ready for general review label Mar 25, 2026
@GarthSnyder GarthSnyder marked this pull request as ready for review March 28, 2026 18:55
Copilot AI review requested due to automatic review settings March 28, 2026 18:55
@github-actions github-actions Bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Mar 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cross-endian zfs send | zfs recv incompatibility by ensuring packed nvlist payloads in kernel-generated DRR_BEGIN records are encoded using XDR, consistent with userspace-generated compound stream wrapper records.

Changes:

  • Replace fnvlist_pack() (native-endian) with an explicit nvlist_pack(..., NV_ENCODE_XDR, ...) for DRR_BEGIN payload encoding in dmu_send_impl().
  • Preserve existing behavior of including optional BEGIN nvlist fields (redaction/resume/crypto metadata), but now with a consistent on-wire encoding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread module/zfs/dmu_send.c Outdated
Comment thread module/zfs/dmu_send.c
@GarthSnyder GarthSnyder marked this pull request as draft March 29, 2026 00:13
@github-actions github-actions Bot added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Mar 29, 2026
@GarthSnyder
Copy link
Copy Markdown
Contributor Author

GarthSnyder commented Mar 29, 2026

I think Copilot's comment regarding payloads not being rounded up to an 8-byte boundary is correct. Internally, NV_ENCODE_NATIVE seems to work with 8-byte rounding, but NV_ENCODE_XDR uses 4-byte rounding.

In theory, send_prelim_records() in libzfs_sendrecv.c should raise this same issue, as it does not appear to do any rounding. However, that file has its own private implementation of dump_record() that does not double-check payload sizes for 8-byte granularity. The receiving side is also special-cased and does not run through the usual assertions. I have verified that zfs send does in fact generate non-rounded DRR_BEGIN payloads in some cases.

This PR does the stupidest possible thing: if a payload needs to be rounded up, a second, slightly longer buffer is allocated and the payload is copied into it, with the extra bytes being zeroed. The alternative would be to define a customized nv_alloc_t that does rounding on allocation, thus sparing the copy. That's how I did it at first, but it's inelegant. It is, in effect, a monkey patch, and requires a separate allocator function. It's also harder to make it compile in userspace since some of the normal nv_alloc_t infrastructure isn't available there.

I suggest staying with the copy. BEGIN record payloads are typically small and infrequent, there's at most one per dataset, and the allocation is only active momentarily. But that other version exists and I'd be happy to sub it in if that's preferable.

@GarthSnyder GarthSnyder force-pushed the pr-xdr-nvlists branch 3 times, most recently from 161e7a4 to 774b697 Compare April 4, 2026 23:22
@GarthSnyder GarthSnyder marked this pull request as ready for review April 5, 2026 19:37
Copilot AI review requested due to automatic review settings April 5, 2026 19:37
@github-actions github-actions Bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Apr 5, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread module/zfs/dmu_send.c Outdated
Comment thread module/zfs/dmu_send.c Outdated
Comment thread module/zfs/dmu_send.c Outdated
Comment thread module/zfs/dmu_send.c
Copilot AI review requested due to automatic review settings April 6, 2026 21:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread module/zfs/dmu_send.c
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/zstream/zstream_dump.c
@GarthSnyder GarthSnyder marked this pull request as draft May 5, 2026 01:34
@github-actions github-actions Bot added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels May 5, 2026
@GarthSnyder GarthSnyder force-pushed the pr-xdr-nvlists branch 7 times, most recently from 3087596 to 790b251 Compare May 7, 2026 23:43
@GarthSnyder
Copy link
Copy Markdown
Contributor Author

OK, tests are in and this should ready for review. The test plan was to:

  • Identify all paths that can attach nvlists to BEGIN records
  • Exercise each of those paths and verify that the attached nvlists are always emitted with XDR encoding
  • Verify that all test streams can be received without errors

Effect of XDR encoding on older versions of zfs receive

Packed nvlists declare their encoding, and nvlist_unpack() obeys the declared encoding. Ergo, this patch should not require any receive-side support. Successful reception of the test streams on a current build should also imply success on older versions of ZFS.

BEGIN records for replication streams

BEGIN records emitted as wrappers for composite (replication) streams are generated in userspace from libzfs_sendrecv.c. Those records already use XDR encoding, so this patch isn't directly relevant to them. However, the tests do include a check for this outer wrapper just to catch any future regressions.

Scenarios tested

There are three remaining cases in which BEGIN records acquire attached nvlists:

  • In various send operations that involve redaction or redaction bookmarks
  • In raw sends of encrypted datasets
  • In sends generated from a resume token

There are four code paths within the kernel that generate redaction-related nvlist data. On top of that, the three cases above can appear in various combinations. For example, the most complex case is a token-resumed, raw-and-encrypted, incremental stream generated relative to a redaction bookmark. That case includes all three sub-nvlists and is in fact a valid type of stream.

If multiple nvlists are to be attached to a BEGIN record, they are merged within dmu_send_impl() in dmu_send.c. The composite nvlists then travel through the same nvlist_pack() call patched by this PR, so all possible combinations should be covered by this same patch. The tests demonstrate that this is in fact the case and that there are no auxiliary paths that might leak native-encoded nvlist data. They also verify that XDR encoding does not interfere with receivability.

What "verify that streams are receivable" means

Of note: the "receives OK" determinations only verify that zfs receive completes without errors. They do not do end-to-end validation such as creating hash trees of source and destination data and verifying that the trees are identical. The tests could be expanded to be more aggressive in this regard, but redaction complicates things because some received datasets exist only as references and are not mountable as filesystems.

I don't think it's worth adding such test enhancements here, although I will do so if requested. Since the tests prove that BEGIN records will always have XDR-encoded nvlists, any functional problems that this entails should be surfaced by the existing test batteries, which do perform these more detailed validations.

Ancillary change to zstream dump

I have added a small mod to zstream dump that prints out the exact encoding used by any BEGIN record with an attached nvlist. The tests use this for validation. Each test looks for at least one instance of "NV_ENCODE_XDR" and verifies that there are no instances of "NV_ENCODE_NATIVE".

Interactions with issue #18491

Two of the original redaction-related tests revealed the existence of bug #18491. The operations that triggered the bug were not necessary for the actual XDR tests, so each of those two tests has been split in two: a base version that tests for XDR encoding but doesn't trigger the bug and a _with_write version that has a chance to demonstrate #18491 (it reproduces frequently but not 100% of the time). The latter two are marked as known issues. Once more is known about #18491, those tests can be moved to a more specific test category.

@GarthSnyder GarthSnyder marked this pull request as ready for review May 8, 2026 06:08
Copilot AI review requested due to automatic review settings May 8, 2026 06:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.

Comment thread cmd/zstream/zstream_dump.c
Comment thread tests/runfiles/common.run
@github-actions github-actions Bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels May 8, 2026
This is a fix for openzfs#18360.

Currently, zfs send generates a mix of nvlist encodings in DRR_BEGIN
records, some XDR and some in native byte order. The result is that
most streams currently can't be zfs received on opposite-endian systems.

zfs send generates the outer wrappers for compound streams in userspace,
and it explicitly requests NV_ENCODE_XDR format for those records. But
the BEGIN records for individual datasets are generated on the kernel
side, in dmu_send.c, where fnvlist_pack() is used for encoding. That
routine hard-wires NV_ENCODE_NATIVE format.

This PR replaces the fnvlist_pack() call with a direct call to
nvlist_pack() that specifies NV_ENCODE_XDR.

Tests are included to verify that native-encoded nvlists are not
generated by any kernel path that attaches nvlists to BEGIN records.
There's also a check for XDR encoding in the outer wrapper of
replication streams in case there is ever a regression there.

There are also two tests that have a chance of triggering (and
detecting) bug openzfs#18491. Non-triggering versions of those tests are
already included here, so when that bug is more fully characterized,
the tests can be moved to a more directly relevant category. (They
are the two tests with _with_write suffixes.)

This PR adds to zstream dump an output line that shows the exact
encoding of any nvlists in BEGIN records. This feature is used by
the tests to validate streams.

Signed-off-by: Garth Snyder <garth@garthsnyder.com>
Copy link
Copy Markdown
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. It's great to see the comprehensive test coverage along with this fix. The subtle #18491 failures you hit are interesting, but yeah they can be investigated independently. No need to hold of this PR.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 11, 2026
@behlendorf behlendorf merged commit eaaea55 into openzfs:master May 12, 2026
43 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants