zed: Prefer dRAID distributed spares to regular ones#18578
Conversation
|
@andriytk What do you think about this? |
84e00fb to
eb797ac
Compare
|
We do have some old pools with mixed spares, draid and regular ones, and yes - the fact that regular ones are selected before draid ones is a bit annoying. So the change makes perfect sense, but it would be nice also if the user would have the possibility to set the priority of the regular spare devices. Having said that, some admins may prefer to use regular spares first because they leave them in the pool as replacement devices, so maybe it would be wise to add some sort of configuration for this to not break the existing behaviour. |
|
Rebased on top of new #18597. The diff is smaller, but the idea is the same. |
behlendorf
left a comment
There was a problem hiding this comment.
One tiny optimization would be to drop (or sort after) any distributed spares from the list which can't be used because they're part of dRAID 1 while the failed vdev is part of dRAID 2. A common configuration for our systems is to have two top-level dRAID vdevs in the pool. Not critical of course but it is a factor when selecting a preferred spare.
With #18597 merged this is ready for a rebase.
One of the main dRAID features is avoiding single drive bottlenecks by using distributed spares. Activation of regular spare will take more time, during which the dRAID redundancy is even lower than in case of RAIDZ. But regular spares might still be added to the pool as a second line of defence, possibly shared by several vdevs. Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
|
Rebased and added the top-level vdev GUID matching. |
behlendorf
left a comment
There was a problem hiding this comment.
Using the top-guid is a nice way to handle this.
One of the main dRAID features is avoiding single drive bottlenecks by using distributed spares. Activation of regular spare will take more time, during which the vdev redundancy is much lower than in case of RAIDZ. But regular spares might still be added to the pool as a second line of defense, possibly shared by several vdevs.
The PR is built on top of #18597 .
How Has This Been Tested?
It wasn't. CI should test it.
Types of changes
Checklist:
Signed-off-by.