Skip to content

[release-4.21] Fix NFS mount failure on IBM Cloud by fixing retry logic and opening port 2049#15128

Open
openshift-cherrypick-robot wants to merge 4 commits into
red-hat-storage:release-4.21from
openshift-cherrypick-robot:cherry-pick-15119-to-release-4.21
Open

[release-4.21] Fix NFS mount failure on IBM Cloud by fixing retry logic and opening port 2049#15128
openshift-cherrypick-robot wants to merge 4 commits into
red-hat-storage:release-4.21from
openshift-cherrypick-robot:cherry-pick-15119-to-release-4.21

Conversation

@openshift-cherrypick-robot
Copy link
Copy Markdown
Collaborator

This is an automated cherry-pick of #15119

/assign amr1ta

amr1ta and others added 4 commits May 13, 2026 07:43
The retry() calls wrapping self.con.exec_cmd() were broken in two ways:
1. exec_cmd was called immediately and its tuple result passed to retry()
   instead of passing a callable, so no retry ever occurred
2. Connection.exec_cmd() returns (retcode, stdout, stderr) and never
   raises CommandFailed, so the retry exception type never triggered

Add _mount_nfs_with_retry() helper that wraps exec_cmd in a nested
function which raises CommandFailed on non-zero retcode, and calls it
via retry properly. Replace all 5 broken call sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Amrita Mahapatra <49347640+amr1ta@users.noreply.github.com>
IBM Cloud VPC LoadBalancer security groups block inbound traffic by
default. The NFS LB on port 2049 needs an explicit inbound rule, same
as the ingress LB needs rules for ports 80/443 (added in red-hat-storage#15012).

Add configure_nfs_lb_security_group() that finds the VPC LB backing
the rook-ceph-nfs-my-nfs-load-balancer Service and adds an inbound
TCP 2049 rule to its security groups. Call it automatically from
create_nfs_load_balancer_service() on IBM Cloud.

Add remove_nfs_lb_security_group_rules() to clean up the rule during
teardown, called from delete_nfs_load_balancer_service() before the
Service is deleted (so the VPC LB is still present for lookup).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Amrita Mahapatra <49347640+amr1ta@users.noreply.github.com>
Assert that the specific subvolume created by the test is no longer
stale after deletion, instead of asserting zero stale subvolumes
cluster-wide. Pre-existing stale subvolumes from other tests caused
false failures. Also log the delete output and stale lists for
debuggability.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Amrita Mahapatra <49347640+amr1ta@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Amrita Mahapatra <49347640+amr1ta@users.noreply.github.com>
@openshift-ci openshift-ci Bot added the lgtm label May 13, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: amr1ta, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@amr1ta
Copy link
Copy Markdown
Contributor

amr1ta commented May 13, 2026

@amr1ta amr1ta added the Verified Mark when PR was verified and log provided label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm size/L PR that changes 100-499 lines Verified Mark when PR was verified and log provided

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants