[cinder-csi-plugin] Fix volume from snapshot AZ selection#3126
[cinder-csi-plugin] Fix volume from snapshot AZ selection#3126carterpewpew wants to merge 1 commit into
Conversation
When CreateVolume restores from a snapshot and the availability zone is derived from topology, clear the topology-derived AZ and let Cinder place the volume in the snapshot's AZ. This avoids cross-AZ Cinder restore failures in multi-AZ clusters using WaitForFirstConsumer. Cinder requires that a volume created from a snapshot resides in the same AZ as the snapshot. When the scheduler picks a node in a different AZ, the topology-derived AZ conflicts with this constraint, causing HTTP 400: "Volume must be in the same availability zone as the snapshot". By omitting the explicit AZ, Cinder automatically selects the correct zone. Signed-off-by: Jathavedhan M <jathavedhan.m@ibm.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @carterpewpew. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
When restoring a volume from a snapshot in a multi-AZ cluster with volumeBindingMode: WaitForFirstConsumer, the topology-derived availability zone can conflict with the snapshot's AZ. Cinder requires the new volume to be in the same AZ as the snapshot, so it rejects the request with HTTP 400: "Volume must be in the same availability zone as the snapshot". This happens because the scheduler picks a node (and therefore an AZ) before the CSI driver gets called, and that AZ gets passed straight to Cinder's volume create without considering what AZ the snapshot lives in. The fix clears the topology derived AZ when creating a volume from a snapshot, letting Cinder auto-select the snapshot's AZ. Explicit availability set via StorageClass parameters is not affected.
Which issue this PR fixes(if applicable):
fixes #3088
Special notes for reviewers:
Only the topology-derived AZ path is changed. If the user explicitly sets availability in StorageClass params, that value is passed through unchanged.
Two tests added:
Release note: