Skip to content

Fix mon pod selection on LSO deployments in test_mon_data_avail_warn#14707

Merged
vavuthu merged 3 commits into
red-hat-storage:masterfrom
amr1ta:fix-mon-data-avail-warn-lso
Apr 22, 2026
Merged

Fix mon pod selection on LSO deployments in test_mon_data_avail_warn#14707
vavuthu merged 3 commits into
red-hat-storage:masterfrom
amr1ta:fix-mon-data-avail-warn-lso

Conversation

@amr1ta
Copy link
Copy Markdown
Contributor

@amr1ta amr1ta commented Mar 17, 2026

Summary

  • The LSO code path in workloads_dir_setup used ceph_daemon_id label to find the mon pod, but this label is shared between mon and mgr pods (e.g., both rook-ceph-mon-a and rook-ceph-mgr-a have
    ceph_daemon_id=a)
  • When the mgr pod was returned first by the API, the test selected it instead of the mon pod, causing mon_suffix=None and mkdir /var/lib/ceph/mon/ceph-None/workloads failure
  • Replace the LSO-specific pod lookup with get_mon_pods() which uses the unambiguous app=rook-ceph-mon label
  • The worker node for the LSO dd path is now derived from the selected mon pod's spec.nodeName

Root Cause

ReportPortal: https://reportportal-ocs4.apps.dno.ocp-hub.prod.psi.redhat.com/ui/#ocs/launches/all/44404/2024569/2024879/log

Platform: vSphere UPI LSO

The LSO code path used ceph_daemon_id label to find the mon pod, but
this label is shared between mon and mgr pods (both have
ceph_daemon_id=a). When the mgr pod was returned first by the API, the
test selected it instead of the mon pod, causing mon_suffix=None and
mkdir failure.

Replace the LSO-specific pod lookup with get_mon_pods() which uses the
unambiguous app=rook-ceph-mon label. The worker node for the LSO dd
path is now derived from the selected mon pod's spec.nodeName.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Amrita Mahapatra <49347640+amr1ta@users.noreply.github.com>
@amr1ta amr1ta requested a review from a team as a code owner March 17, 2026 11:33
@pull-request-size pull-request-size Bot added the size/S PR that changes 10-29 lines label Mar 17, 2026
On LSO, df -Th | grep /etc/hosts returned empty because df shows mount
points not file paths, causing IndexError. Use df -Th with the actual
mon data path (/var/lib/rook/mon-{suffix}/data) via oc debug on the
worker node. Also switch non-LSO to use df -Th {path} directly instead
of grep for reliability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Amrita Mahapatra <49347640+amr1ta@users.noreply.github.com>
@pull-request-size pull-request-size Bot added size/M PR that changes 30-99 lines and removed size/S PR that changes 10-29 lines labels Mar 17, 2026
vavuthu
vavuthu previously approved these changes Mar 18, 2026
Writing 1GB of random data via oc debug can exceed the default 300s
timeout, especially on later iterations when the disk is nearly full.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Amrita Mahapatra <49347640+amr1ta@users.noreply.github.com>
@amr1ta
Copy link
Copy Markdown
Contributor Author

amr1ta commented Mar 20, 2026

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 22, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: amr1ta, ebenahar, suchita-g, vavuthu, yitzhak12

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vavuthu vavuthu merged commit 713ad90 into red-hat-storage:master Apr 22, 2026
6 of 7 checks passed
@amr1ta
Copy link
Copy Markdown
Contributor Author

amr1ta commented May 13, 2026

/cherry-pick release-4.21

@openshift-cherrypick-robot
Copy link
Copy Markdown
Collaborator

@amr1ta: new pull request created: #15135

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm size/M PR that changes 30-99 lines Verified Mark when PR was verified and log provided

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants