Skip to content

[cinder-csi-plugin] Fix volume from snapshot AZ selection#3126

Open
carterpewpew wants to merge 1 commit into
kubernetes:masterfrom
carterpewpew:fix/snapshot-topology-az
Open

[cinder-csi-plugin] Fix volume from snapshot AZ selection#3126
carterpewpew wants to merge 1 commit into
kubernetes:masterfrom
carterpewpew:fix/snapshot-topology-az

Conversation

@carterpewpew

Copy link
Copy Markdown

What this PR does / why we need it:

When restoring a volume from a snapshot in a multi-AZ cluster with volumeBindingMode: WaitForFirstConsumer, the topology-derived availability zone can conflict with the snapshot's AZ. Cinder requires the new volume to be in the same AZ as the snapshot, so it rejects the request with HTTP 400: "Volume must be in the same availability zone as the snapshot". This happens because the scheduler picks a node (and therefore an AZ) before the CSI driver gets called, and that AZ gets passed straight to Cinder's volume create without considering what AZ the snapshot lives in. The fix clears the topology derived AZ when creating a volume from a snapshot, letting Cinder auto-select the snapshot's AZ. Explicit availability set via StorageClass parameters is not affected.

Which issue this PR fixes(if applicable):
fixes #3088

Special notes for reviewers:

Only the topology-derived AZ path is changed. If the user explicitly sets availability in StorageClass params, that value is passed through unchanged.

Two tests added:

  1. TestCreateVolumeFromSnapshotWithTopology - verifies the AZ is cleared when restoring from snapshot with topology
  2. TestCreateVolumeFromSnapshotWithExplicitAvailability - verifies explicit availability param is preserved even with snapshot + topology

Release note:

[cinder-csi-plugin] Fix snapshot restore in multi-AZ clusters with topology-aware provisioning by clearing the topology-derived availability zone and letting Cinder select the snapshot's zone.

When CreateVolume restores from a snapshot and the availability zone
is derived from topology, clear the topology-derived AZ and let Cinder
place the volume in the snapshot's AZ. This avoids cross-AZ Cinder
restore failures in multi-AZ clusters using WaitForFirstConsumer.

Cinder requires that a volume created from a snapshot resides in the
same AZ as the snapshot. When the scheduler picks a node in a
different AZ, the topology-derived AZ conflicts with this constraint,
causing HTTP 400: "Volume must be in the same availability zone as the
snapshot". By omitting the explicit AZ, Cinder automatically selects
the correct zone.

Signed-off-by: Jathavedhan M <jathavedhan.m@ibm.com>
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jun 9, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zetaab for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from stephenfin June 9, 2026 07:42
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 9, 2026
@k8s-ci-robot k8s-ci-robot requested a review from zetaab June 9, 2026 07:42
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 9, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @carterpewpew. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cinder-csi-plugin] VolumeSnapshots not topology aware

2 participants