Add topology-aware provisioning tests for external mode (RHSTOR-5525)#15151
Add topology-aware provisioning tests for external mode (RHSTOR-5525)#15151AviadP wants to merge 3 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: AviadP The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
- Add create_topology_pools() to ExternalCluster for replica-3 pool creation - Extract build_exporter_base_params() from get_external_cluster_details() - Remove IBM Cloud platform gate on jump_host SSH configuration - Fix upload_exporter_script() to use /tmp/ for remote path - Add DEFAULT_EXTERNAL_MODE_STORAGECLASS_NON_RESILIENT_RBD constant - Add test_topology_awareness.py with 5 tests: - SC auto-creation with WaitForFirstConsumer - PVC pending without pod - Single pod topology placement verification - StatefulSet spread across topology pools - PVC deletion cleans RBD image Signed-off-by: Aviadp <apolak@redhat.com>
f23a568 to
ada7824
Compare
📝 WalkthroughWalkthroughThis PR implements topology-aware provisioning for OCS external mode: refactors exporter parameter building, adds topology pool creation, introduces a non‑resilient RBD storageclass constant, updates a secrets baseline mapping, and adds functional tests validating topology-constrained StorageClass behavior and pod-to-pool placement. ChangesTopology-Aware Provisioning for OCS External Mode
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ocs_ci/deployment/helpers/external_cluster_helpers.py`:
- Around line 123-127: The code assumes config.DEPLOYMENT["ssh_key_private"]
always exists and may raise KeyError; in the jump-host initialization (where
self.jump_host is set) change the assignment to first check
config.DEPLOYMENT.get("ssh_key_private") and only set
self.jump_host["private_key"] when that value is present (and expanduser it);
otherwise leave self.jump_host as-is so password or other auth fields can be
used. Locate the block that sets self.jump_host in external_cluster_helpers.py
and replace the direct dict access with a safe .get-based conditional.
In `@tests/functional/external_mode/test_topology_awareness.py`:
- Around line 365-374: The TimeoutSampler loop using func=sc_ocp.get to wait for
the StorageClass named sc_name can exit without confirming creation; after the
loop ends, check whether a successful sample was found and if not fail fast
(e.g., raise AssertionError or call pytest.fail) with a clear message indicating
the topology StorageClass sc_name was not reconciled within the timeout so
subsequent tests stop immediately; update the block that contains
TimeoutSampler, sc_ocp.get and sc_name to perform this explicit check and fail
when no sample was returned.
- Around line 78-93: The returned topology config may have mismatched lengths
between pool_names and failure_domain_values; before the return in the
config-generation block (where pool_names, fd_label, fd_values, pool_size,
pg_num are set in test_topology_awareness.py) add a validation that
len(pool_names) == len(fd_values) and raise a clear exception (or use an assert)
if they differ, including both lengths in the message so the test fails with a
meaningful error rather than proceeding with inconsistent mapping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6dcc0aeb-c3a5-4d9a-81a9-574efd80cd76
📒 Files selected for processing (4)
.secrets.baselineocs_ci/deployment/helpers/external_cluster_helpers.pyocs_ci/ocs/constants.pytests/functional/external_mode/test_topology_awareness.py
…t SC wait - Use safe .get() for ssh_key_private fallback in jump host init - Validate pool_names and failure_domain_values length match - Fail fast with assert if topology SC not created within timeout Signed-off-by: Aviadp <apolak@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ocs_ci/deployment/helpers/external_cluster_helpers.py`:
- Around line 243-253: The exporter arguments are built before the side-effect
that sets restricted auth and alias_rbd_name, so build_exporter_base_params()
can miss adding --cephfs-filesystem-name/related flags; move the conditional
that checks rbd_name for "." or "_" and sets
config.ENV_DATA["restricted-auth-permission"] and
config.ENV_DATA["alias_rbd_name"] to run before calling
self.build_exporter_base_params(), ensuring build_exporter_base_params() sees
the updated ENV_DATA and emits the correct flags (refer to rbd_name,
build_exporter_base_params, config.ENV_DATA["restricted-auth-permission"], and
config.ENV_DATA["alias_rbd_name"]).
In `@tests/functional/external_mode/test_topology_awareness.py`:
- Around line 70-85: The topology helper only advertises failure domains for
fd_values = topo_cfg.get("failure_domain_values", hostnames[:3]) but test
pods/StatefulSet creations later can land on unmapped nodes; fix by ensuring
workloads are scheduled only onto those advertised domains: either set fd_values
to include all candidate workers (e.g., use hostnames instead of hostnames[:3])
or, preferably, add a node selector/nodeAffinity to the pod/StatefulSet specs
used in the tests that matches the failure domain label (fd_label) with the
values in fd_values so pods are constrained to those nodes (update the code that
creates pods/StatefulSet to include the nodeAffinity/nodeSelector referencing
fd_label and fd_values).
- Around line 414-416: The test currently only checks for the presence of the
topologyFailureDomainLabel key in params; update the assertion to verify the
actual value matches the expected config by asserting
params["topologyFailureDomainLabel"] == self.topo_config["failure_domain_label"]
(use the existing params, sc_name and self.topo_config symbols) so the
StorageClass contract is fully validated and retain the log.info call afterward.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 43c95b40-d8e8-40b0-bd58-82b59171afa6
📒 Files selected for processing (2)
ocs_ci/deployment/helpers/external_cluster_helpers.pytests/functional/external_mode/test_topology_awareness.py
- Move restricted-auth side effects before build_exporter_base_params() so the builder sees alias_rbd_name and restricted-auth flags - Assert exact topologyFailureDomainLabel value, not just presence Signed-off-by: Aviadp <apolak@redhat.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ocs_ci/deployment/helpers/external_cluster_helpers.py (1)
204-209:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDuplicate
--restricted-auth-permissionand--cluster-nameflags whenrbd_namecontains.or_.When
get_external_cluster_details()processes anrbd_namewith.or_, it setsconfig.ENV_DATA["restricted-auth-permission"] = Trueat lines 248-252 before callingbuild_exporter_base_params(). This causes both blocks to execute:
- Lines 197-202: Adds
--restricted-auth-permission true --cluster-name ...- Lines 204-209: Adds another
--restricted-auth-permission true --cluster-name ... --alias-rbd-data-pool-name ...This results in duplicate command-line flags being passed to the exporter script.
Proposed fix: Only add the alias flag when other flags are already present
if "." in rbd_name or "_" in rbd_name: alias_rbd_name = rbd_name.replace(".", "-").replace("_", "-") - params = ( - f"{params} --restricted-auth-permission true --cluster-name {cluster_name} " - f"--alias-rbd-data-pool-name {alias_rbd_name}" - ) + if "--restricted-auth-permission" not in params: + params = f"{params} --restricted-auth-permission true --cluster-name {cluster_name}" + params = f"{params} --alias-rbd-data-pool-name {alias_rbd_name}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ocs_ci/deployment/helpers/external_cluster_helpers.py` around lines 204 - 209, The code is appending duplicate flags when rbd_name contains '.' or '_' in get_external_cluster_details(): after setting config.ENV_DATA["restricted-auth-permission"]=True and calling build_exporter_base_params(), the block that constructs alias_rbd_name appends "--restricted-auth-permission true --cluster-name ..." again; modify the alias branch in build_exporter_base_params()/get_external_cluster_details() (where alias_rbd_name is created) to only append "--alias-rbd-data-pool-name {alias_rbd_name}" if params already contains the restricted-auth-permission/cluster-name flags (or omit re-adding those flags entirely), e.g., detect existing flags in params or rely on the earlier addition so you only add the alias flag instead of duplicating "--restricted-auth-permission" and "--cluster-name".
🧹 Nitpick comments (1)
tests/functional/external_mode/test_topology_awareness.py (1)
113-113: 💤 Low valueConsider adding
strict=Truetozip()for defensive programming.The Ruff B905 rule checks for zip calls without an explicit strict parameter when called with two or more iterables. By default, if the iterables passed to zip are of different lengths, the resulting iterator will be silently truncated.
While lengths are validated at lines 87-90, adding
strict=Trueprovides an extra safety net if this function is ever called from a different path.Suggested improvement
- return dict(zip(pool_names, fd_values)) + return dict(zip(pool_names, fd_values, strict=True))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/functional/external_mode/test_topology_awareness.py` at line 113, The return currently uses dict(zip(pool_names, fd_values)) which can silently truncate if lengths differ; update the call to zip to use strict=True (i.e., dict(zip(pool_names, fd_values, strict=True))) so mismatched lengths raise immediately; reference the variables pool_names and fd_values in the same function where lengths are validated (lines around the existing checks) to locate and change the zip invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@ocs_ci/deployment/helpers/external_cluster_helpers.py`:
- Around line 204-209: The code is appending duplicate flags when rbd_name
contains '.' or '_' in get_external_cluster_details(): after setting
config.ENV_DATA["restricted-auth-permission"]=True and calling
build_exporter_base_params(), the block that constructs alias_rbd_name appends
"--restricted-auth-permission true --cluster-name ..." again; modify the alias
branch in build_exporter_base_params()/get_external_cluster_details() (where
alias_rbd_name is created) to only append "--alias-rbd-data-pool-name
{alias_rbd_name}" if params already contains the
restricted-auth-permission/cluster-name flags (or omit re-adding those flags
entirely), e.g., detect existing flags in params or rely on the earlier addition
so you only add the alias flag instead of duplicating
"--restricted-auth-permission" and "--cluster-name".
---
Nitpick comments:
In `@tests/functional/external_mode/test_topology_awareness.py`:
- Line 113: The return currently uses dict(zip(pool_names, fd_values)) which can
silently truncate if lengths differ; update the call to zip to use strict=True
(i.e., dict(zip(pool_names, fd_values, strict=True))) so mismatched lengths
raise immediately; reference the variables pool_names and fd_values in the same
function where lengths are validated (lines around the existing checks) to
locate and change the zip invocation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: ba744750-6ecd-48c3-90d0-b8e09794fac7
📒 Files selected for processing (2)
ocs_ci/deployment/helpers/external_cluster_helpers.pytests/functional/external_mode/test_topology_awareness.py
Summary by CodeRabbit
New Features
Tests
Chores