[manila-csi-plugin] Retry proxied probe on Unavailable#3125
[manila-csi-plugin] Retry proxied probe on Unavailable#3125carterpewpew wants to merge 1 commit into
Conversation
The Manila CSI node plugin fatally exits on startup when the proxied CSI driver socket (e.g. NFS) is not yet available, because ProbeForever only retries on DeadlineExceeded and immediately returns on Unavailable. Wrap the ProbeForever call in a retry loop that retries on codes.Unavailable within the existing 15-second context timeout, so transient connection errors during concurrent DaemonSet restarts no longer cause a fatal exit. Signed-off-by: Jathavedhan M <jathavedhan.m@ibm.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @carterpewpew! |
|
Hi @carterpewpew. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
The Manila CSI node plugin fatally exits on startup when its proxied CSI driver socket (e.g. NFS CSI plugin) is not yet available. After a node reboot, both DaemonSets restart concurrently and the Manila driver crashes because upstream ProbeForever only retries on DeadlineExceeded, not on Unavailable (connection refused). This causes a fatal exit within ~1 second instead of retrying within the existing 15-second timeout window. This PR wraps the ProbeForever call in a retry loop that specifically handles codes.Unavailable, bounded by the existing context timeout.
Which issue this PR fixes(if applicable):
fixes #3111
Special notes for reviewers:
Release note: