Skip to content

zed: Prefer dRAID distributed spares to regular ones#18578

Open
amotin wants to merge 1 commit into
openzfs:masterfrom
amotin:spare_draid
Open

zed: Prefer dRAID distributed spares to regular ones#18578
amotin wants to merge 1 commit into
openzfs:masterfrom
amotin:spare_draid

Conversation

@amotin
Copy link
Copy Markdown
Member

@amotin amotin commented May 22, 2026

One of the main dRAID features is avoiding single drive bottlenecks by using distributed spares. Activation of regular spare will take more time, during which the vdev redundancy is much lower than in case of RAIDZ. But regular spares might still be added to the pool as a second line of defense, possibly shared by several vdevs.

The PR is built on top of #18597 .

How Has This Been Tested?

It wasn't. CI should test it.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin amotin requested a review from behlendorf May 22, 2026 20:55
@amotin
Copy link
Copy Markdown
Member Author

amotin commented May 22, 2026

@andriytk What do you think about this?

@amotin amotin added the Status: Code Review Needed Ready for review and testing label May 22, 2026
@amotin amotin force-pushed the spare_draid branch 2 times, most recently from 84e00fb to eb797ac Compare May 23, 2026 03:19
@andriytk
Copy link
Copy Markdown
Contributor

We do have some old pools with mixed spares, draid and regular ones, and yes - the fact that regular ones are selected before draid ones is a bit annoying. So the change makes perfect sense, but it would be nice also if the user would have the possibility to set the priority of the regular spare devices.

Having said that, some admins may prefer to use regular spares first because they leave them in the pool as replacement devices, so maybe it would be wise to add some sort of configuration for this to not break the existing behaviour.

@amotin amotin marked this pull request as draft May 27, 2026 18:05
@github-actions github-actions Bot added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels May 27, 2026
@amotin amotin marked this pull request as ready for review May 28, 2026 15:40
@github-actions github-actions Bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels May 28, 2026
@amotin
Copy link
Copy Markdown
Member Author

amotin commented May 28, 2026

Rebased on top of new #18597. The diff is smaller, but the idea is the same.

Copy link
Copy Markdown
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny optimization would be to drop (or sort after) any distributed spares from the list which can't be used because they're part of dRAID 1 while the failed vdev is part of dRAID 2. A common configuration for our systems is to have two top-level dRAID vdevs in the pool. Not critical of course but it is a factor when selecting a preferred spare.

With #18597 merged this is ready for a rebase.

One of the main dRAID features is avoiding single drive bottlenecks
by using distributed spares.  Activation of regular spare will take
more time, during which the dRAID redundancy is even lower than in
case of RAIDZ.  But regular spares might still be added to the pool
as a second line of defence, possibly shared by several vdevs.

Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
@amotin
Copy link
Copy Markdown
Member Author

amotin commented May 29, 2026

Rebased and added the top-level vdev GUID matching.

@amotin amotin linked an issue May 29, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the top-guid is a nice way to handle this.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

draid should use its integrated spares before discrete ones

3 participants