Skip to content

Centralize TLS configuration for NooBaa And RGW#15134

Open
paraggit wants to merge 3 commits into
red-hat-storage:masterfrom
paraggit:tls_profile
Open

Centralize TLS configuration for NooBaa And RGW#15134
paraggit wants to merge 3 commits into
red-hat-storage:masterfrom
paraggit:tls_profile

Conversation

@paraggit
Copy link
Copy Markdown
Contributor

@paraggit paraggit commented May 13, 2026

Test Automation

    Centralized TLSProfile: version toggle (1.3 / 1.2 / 1.3), scan, delete, log check.

    Steps:
    1. Detect whether ``ocs-tls-profile`` exists (required metadata name per product).
    2. If missing, create it with TLS 1.3 and DF-supported cipher/group sets.
    3. Confirm spec shows TLSv1.3 and (where applicable) operands follow; run an
        in-cluster port scan and assert ``tls1.3`` appears on at least one ``OK``
        endpoint (other ports may use TLS 1.2 only).
    4. Patch to TLS 1.2, verify operands where applicable; scan and assert ``tls1.2``
        appear on at least one ``OK`` endpoint.
    5. Patch back to TLS 1.3; scan and assert ``tls1.3`` on at least one ``OK``
        appear on at least one ``OK`` endpoint.
    6. Delete the TLSProfile.
    7. Confirm operator default (NooBaa/RGW TLSProfile propagation cleared).
    8. Scan operator/workload pod logs for TLS-related error lines (heuristic).

Summary by CodeRabbit

  • New Features

    • Centralized TLSProfile configuration management for OpenShift (DF 4.22+).
    • In-cluster TLS scanner that probes endpoints across configurable TLS versions, ciphers, and groups; emits CSV scan summaries.
    • Pod log scanning to detect TLS/handshake/certificate-related errors.
    • Local OpenSSL-based verifier to validate endpoint TLS behavior.
  • Tests

    • Functional test suite covering TLSProfile lifecycle, version transitions, in-cluster scans, and cleanup across component selectors.
  • Bug Fixes

    • Improved command execution handling to avoid invalid stdin+input combinations.

Review Change Stack

@paraggit paraggit requested a review from a team as a code owner May 13, 2026 09:44
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: paraggit

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: pakamble-ib-c09
Cluster Configuration:
PR Test Suite:
PR Test Path: tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Additional Test Params:
OCP VERSION: 4.22
OCS VERSION: 4.22
tested against branch: master

Job PASSED.

@paraggit paraggit added the Verified Mark when PR was verified and log provided label May 14, 2026
Comment thread ocs_ci/helpers/tlsprofile_helper.py Outdated
Comment thread ocs_ci/helpers/tlsprofile_helper.py
Comment thread ocs_ci/helpers/tlsprofile_helper.py Outdated
Comment thread ocs_ci/helpers/tlsprofile_helper.py Outdated
Copy link
Copy Markdown
Contributor

@AviadP AviadP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments added

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a TLSProfile helper module (log heuristics, in-cluster OpenSSL scanner, TLSProfile CR management, propagation waiters, local OpenSSL verifier) and a parametrized functional test that exercises TLSProfile TLSv1.3↔TLSv1.2 lifecycle with in-cluster scans and cleanup checks.

Changes

TLSProfile Helper Module

Layer / File(s) Summary
TLS error detection and log scanning
ocs_ci/helpers/tlsprofile_helper.py
Module constants for cipher/group sets, selector domains, and regex-based heuristics for detecting TLS/handshake errors in pod logs; pod discovery and per-pod log scanning helpers.
In-cluster scanner defaults and external script
ocs_ci/helpers/tlsprofile_helper.py, scripts/bash/tls_scan_endpoints.sh
Scanner default env, loader for the external tls_scan_endpoints.sh script which probes endpoints with openssl s_client and emits CSV per-endpoint negotiation results.
Scanner orchestration and entry
ocs_ci/helpers/tlsprofile_helper.py
Kubeconfig/oc helpers, endpoint discovery, scanner namespace/pod lifecycle, writing and executing the scanner script inside the pod, parsing CSV output, cleanup, and scan_cluster(...) entry with timeout handling.
Scan token mapping and assertions
ocs_ci/helpers/tlsprofile_helper.py
Mapping from TLSProfile API versions to scan tokens and assert_tls_scan_results_include_version(...) to verify expected negotiated TLS versions appear in results.
TLSProfile CR CRUD
ocs_ci/helpers/tlsprofile_helper.py
TLSProfile class providing create/get/delete and merge-patch replacement of single-rule TLSProfile CRs with configurable selectors, ciphers, and groups, plus CRD existence and version waiter.
State propagation waiters
ocs_ci/helpers/tlsprofile_helper.py
Polling helpers for NooBaa spec.security and CephObjectStore spec.security propagation/clearance and substring checks.
Local OpenSSL verifier
ocs_ci/helpers/tlsprofile_helper.py
TLSProfileOpensslVerifier that runs local openssl s_client probes to confirm TLS version/cipher/group negotiation behavior and logs detailed pass/fail output.
utils.exec_cmd stdin handling fix
ocs_ci/utility/utils.py
Avoids passing stdin=subprocess.PIPE to subprocess.run when an input kwarg is provided by building arguments in a run_kw dict.

Functional Test

Layer / File(s) Summary
Test module setup and parametrization
tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Imports, require_tlsprofile_crd gating fixture, test class with eligibility markers, autouse cleanup fixture, and parametrization across all, noobaa, and rgw with MCG-only skip handling.
Initialization and TLSv1.3 validation
tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Determine applicability, resolve CephObjectStore when needed, create/normalize ocs-tls-profile to TLSv1.3, wait for config/version, optional NooBaa check, and run in-cluster scan asserting TLSv1.3.
TLSv1.2 cycle and TLSv1.3 restoration
tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Patch TLSProfile to TLSv1.2, wait for propagation (and optional RGW/CephObjectStore cipher evidence), rescan asserting TLSv1.2, restore TLSv1.3, and re-verify.
Deletion, cleanup and error log verification
tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Delete ocs-tls-profile, assert absence, wait for NooBaa/CephObjectStore security clearance as applicable, and assert no TLS-related errors in operator/workload pod logs within ~45 minutes.

🎯 4 (Complex) | ⏱️ ~45 minutes

🐰 I hopped through configs, ciphers bright,

Scanned pods at day and into night,
From TLS 1.3 down to 1.2's test,
Restored, then deleted — all put to rest,
A rabbit's cheer for secure cluster light!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding centralized TLS configuration support for NooBaa and RGW through TLSProfile management, validation, and test automation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py (1)

107-108: ⚡ Quick win

Inconsistent config access pattern.

Line 107 uses config.ENV_DATA["cluster_namespace"] which raises KeyError if missing, while line 108 and the helper module use .get() with fallback to constants.OPENSHIFT_STORAGE_NAMESPACE. Consider using the same defensive pattern.

Proposed fix
-        namespace = config.ENV_DATA["cluster_namespace"]
+        namespace = (
+            config.ENV_DATA.get("cluster_namespace")
+            or constants.OPENSHIFT_STORAGE_NAMESPACE
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py`
around lines 107 - 108, The test currently reads namespace using
config.ENV_DATA["cluster_namespace"] which can raise KeyError; change it to use
the defensive lookup used elsewhere: call
config.ENV_DATA.get("cluster_namespace", constants.OPENSHIFT_STORAGE_NAMESPACE")
(mirroring the mcg_only retrieval pattern) so namespace falls back to
constants.OPENSHIFT_STORAGE_NAMESPACE; update the code that assigns namespace
(and any helper that expects cluster_namespace) to use this .get(...) fallback
to ensure consistent, non-raising config access.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ocs_ci/helpers/tlsprofile_helper.py`:
- Around line 364-365: The CSV header uses a multi-line echo with a backslash
and indentation which leaves literal "\" and spaces in the output; replace the
multiline/backslash form (the echo line containing
"pod_namespace,pod_name,...tls13groups,reason") with a single-line header output
or a safe formatter (e.g. echo "pod_namespace,pod_name,....,reason" or printf
'%s\n' "pod_namespace,pod_name,....,reason") or use a non-indented heredoc
(<<'CSV' on its own line) so the header is emitted without backslashes or
leading whitespace.
- Around line 356-358: The heredoc/echo that builds the CSV line in
tlsprofile_helper.py currently uses an indented multi-line string (the echo
"$pod_ns,$pod_name,..." line) which injects literal leading spaces into CSV
fields; fix it by removing the indentation and line-continuation so the echoed
CSV is produced as a single left-aligned string (or replace the echo with a
single printf '%s\n' statement that concatenates
$pod_ns,$pod_name,$pod_ip,$container_name,$port,$process,OK,$supported_versions,$supported_tls12_ciphers,$supported_tls12_groups,$supported_tls13_ciphers,$supported_tls13_groups,"Supports:
$supported_versions" with no extra whitespace), ensuring the echo/printf
invocation in tlsprofile_helper.py emits no leading spaces.

---

Nitpick comments:
In `@tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py`:
- Around line 107-108: The test currently reads namespace using
config.ENV_DATA["cluster_namespace"] which can raise KeyError; change it to use
the defensive lookup used elsewhere: call
config.ENV_DATA.get("cluster_namespace", constants.OPENSHIFT_STORAGE_NAMESPACE")
(mirroring the mcg_only retrieval pattern) so namespace falls back to
constants.OPENSHIFT_STORAGE_NAMESPACE; update the code that assigns namespace
(and any helper that expects cluster_namespace) to use this .get(...) fallback
to ensure consistent, non-raising config access.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0e1cf099-aac3-4a68-a149-97729cf3d44b

📥 Commits

Reviewing files that changed from the base of the PR and between 51b29d0 and 6b60abe.

📒 Files selected for processing (2)
  • ocs_ci/helpers/tlsprofile_helper.py
  • tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py

Comment thread ocs_ci/helpers/tlsprofile_helper.py Outdated
Comment thread ocs_ci/helpers/tlsprofile_helper.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ocs_ci/helpers/tlsprofile_helper.py (1)

356-365: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

CSV output contains embedded whitespace from line continuation.

The backslash-newline continuation inside the bash heredoc preserves the indentation whitespace, corrupting CSV field values. This was flagged in a prior review and appears unresolved.

The echo statements on lines 356-358 and 364-365 produce values like " TLS_AES_128..." instead of clean fields.

Proposed fix
     if [[ "$supported_versions" != "NA" ]]; then
-        echo "$pod_ns,$pod_name,$pod_ip,$container_name,$port,$process,OK,$supported_versions,\
-            $supported_tls12_ciphers,$supported_tls12_groups,$supported_tls13_ciphers,\
-                $supported_tls13_groups,Supports: $supported_versions"
+        echo "$pod_ns,$pod_name,$pod_ip,$container_name,$port,$process,OK,$supported_versions,$supported_tls12_ciphers,$supported_tls12_groups,$supported_tls13_ciphers,$supported_tls13_groups,Supports: $supported_versions"

And for the header:

-echo "pod_namespace,pod_name,pod_ip,container_name,port,process,status,tlsversions,tls12ciphers,\
-    tls12groups,tls13ciphers,tls13groups,reason"
+echo "pod_namespace,pod_name,pod_ip,container_name,port,process,status,tlsversions,tls12ciphers,tls12groups,tls13ciphers,tls13groups,reason"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ocs_ci/helpers/tlsprofile_helper.py` around lines 356 - 365, The CSV echo
lines produce leading spaces because the backslash-newline continuation
preserves indentation; fix by collapsing the continued strings so there is no
indentation after the backslash (or replace the multi-line echo with a
single-line printf/echo), e.g. change the echo that emits
"$pod_ns,$pod_name,...$supported_versions" and the NO_TLS echo to either a
single uninterrupted string or use printf '%s\n' with concatenated variables,
and do the same for the header echo (the line that prints
"pod_namespace,pod_name,...reason") so no leading whitespace is present in any
CSV field (refer to the echo lines emitting supported_versions,
supported_tls12_ciphers, supported_tls12_groups, supported_tls13_ciphers,
supported_tls13_groups and the header echo).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@ocs_ci/helpers/tlsprofile_helper.py`:
- Around line 356-365: The CSV echo lines produce leading spaces because the
backslash-newline continuation preserves indentation; fix by collapsing the
continued strings so there is no indentation after the backslash (or replace the
multi-line echo with a single-line printf/echo), e.g. change the echo that emits
"$pod_ns,$pod_name,...$supported_versions" and the NO_TLS echo to either a
single uninterrupted string or use printf '%s\n' with concatenated variables,
and do the same for the header echo (the line that prints
"pod_namespace,pod_name,...reason") so no leading whitespace is present in any
CSV field (refer to the echo lines emitting supported_versions,
supported_tls12_ciphers, supported_tls12_groups, supported_tls13_ciphers,
supported_tls13_groups and the header echo).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 3d3c8b70-c21f-42da-8fd3-ee1580a55037

📥 Commits

Reviewing files that changed from the base of the PR and between 6b60abe and 7a2c86c.

📒 Files selected for processing (2)
  • ocs_ci/helpers/tlsprofile_helper.py
  • tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py

Copy link
Copy Markdown

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: pakamble-ib-c09
Cluster Configuration:
PR Test Suite:
PR Test Path: tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Additional Test Params:
OCP VERSION: 4.22
OCS VERSION: 4.22
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link
Copy Markdown

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: pakamble-ib-c09
Cluster Configuration:
PR Test Suite:
PR Test Path: tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Additional Test Params:
OCP VERSION: 4.22
OCS VERSION: 4.22
tested against branch: master

Job UNSTABLE (some or all tests failed).

Copy link
Copy Markdown

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: pakamble-ib-c09
Cluster Configuration:
PR Test Suite:
PR Test Path: tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Additional Test Params:
OCP VERSION: 4.22
OCS VERSION: 4.22
tested against branch: master

Job PASSED.

Signed-off-by: Parag Kamble <pakamble@redhat.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ocs_ci/helpers/tlsprofile_helper.py`:
- Around line 134-143: The try/except around get_pod_logs in
tlsprofile_helper.py should not swallow exceptions and return an empty list;
instead either re-raise the caught exception or mark the pod as unreadable so
assert_no_tls_errors_in_relevant_pod_logs() can fail later. Concretely, in the
except block for get_pod_logs(pod_name=pod_name, ...), remove the "return []"
and either call "raise" to propagate the exception or append pod_name (and exc)
to a shared unreadable_pods collection that
assert_no_tls_errors_in_relevant_pod_logs() checks and fails on; update
assert_no_tls_errors_in_relevant_pod_logs() to assert unreadable_pods is empty
(or surface the errors) so log-collection failures do not silently pass tests.

In `@ocs_ci/utility/utils.py`:
- Around line 845-847: The current logic sets run_kw["stdin"]=subprocess.PIPE
when "input" is not in kwargs but doesn't check for an explicit "stdin" in
kwargs, causing a duplicate-argument TypeError on subprocess.run(cmd, **run_kw,
**kwargs); change the guard to check both keys (e.g., if "input" not in kwargs
and "stdin" not in kwargs:) before assigning to run_kw, leaving subprocess.run
usage unchanged so callers supplying stdin or input won't collide with run_kw.

In `@tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py`:
- Around line 241-244: Record the test start time at the beginning of this test
(e.g., capture a datetime/timestamp into a variable like test_start_time) and
then pass a derived "since" value based on that timestamp to
assert_no_tls_errors_in_relevant_pod_logs instead of the fixed "45m" string;
update the call that currently reads
assert_no_tls_errors_in_relevant_pod_logs(namespace, component, since="45m") to
use the computed since (for example a RFC3339 timestamp or ISO string derived
from test_start_time) so the log scan window covers only the test's lifetime and
continues to reference assert_no_tls_errors_in_relevant_pod_logs(namespace,
component, since=derived_since).
- Around line 71-76: The teardown currently swallows exceptions when attempting
to remove the TLS profile; update the finalizer so that after calling
tls.is_tls_profile_available(silent=True) and attempting
tls.delete_tls_profile(wait=True, force=True) any Exception caught is logged and
then re-raised (or explicitly raise a RuntimeError) to fail the finalizer and
prevent a poisoned cluster state; specifically modify the except block around
tls.delete_tls_profile to log the error (using log.warning or log.exception) and
then reraise exc so the test suite fails closed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e97037e7-2f00-4f09-8f3e-d55badb3cec6

📥 Commits

Reviewing files that changed from the base of the PR and between 07c8c10 and a85bcd3.

📒 Files selected for processing (4)
  • ocs_ci/helpers/tlsprofile_helper.py
  • ocs_ci/utility/utils.py
  • scripts/bash/tls_scan_endpoints.sh
  • tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/bash/tls_scan_endpoints.sh

Comment on lines +134 to +143
try:
raw = get_pod_logs(
pod_name=pod_name,
namespace=namespace,
since=since,
tail=str(tail),
)
except Exception as exc:
log.warning("Could not read logs for pod %s: %s", pod_name, exc)
return []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't convert log-collection failures into a clean result.

If get_pod_logs() fails, this returns [] and the final assertion treats that pod as having no TLS errors. A pod restart or API hiccup can therefore hide the exact regression this test is trying to catch. Bubble the failure up, or record unreadable pods and fail in assert_no_tls_errors_in_relevant_pod_logs().

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 141-141: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ocs_ci/helpers/tlsprofile_helper.py` around lines 134 - 143, The try/except
around get_pod_logs in tlsprofile_helper.py should not swallow exceptions and
return an empty list; instead either re-raise the caught exception or mark the
pod as unreadable so assert_no_tls_errors_in_relevant_pod_logs() can fail later.
Concretely, in the except block for get_pod_logs(pod_name=pod_name, ...), remove
the "return []" and either call "raise" to propagate the exception or append
pod_name (and exc) to a shared unreadable_pods collection that
assert_no_tls_errors_in_relevant_pod_logs() checks and fails on; update
assert_no_tls_errors_in_relevant_pod_logs() to assert unreadable_pods is empty
(or surface the errors) so log-collection failures do not silently pass tests.

Comment thread ocs_ci/utility/utils.py
Comment on lines +845 to +847
if "input" not in kwargs:
run_kw["stdin"] = subprocess.PIPE
completed_process = subprocess.run(cmd, **run_kw, **kwargs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the exec_cmd function in the specified file
cd /tmp && git clone https://github.com/red-hat-storage/ocs-ci.git 2>/dev/null || true
cd ocs-ci && head -n 900 ocs_ci/utility/utils.py | tail -n 100

Repository: red-hat-storage/ocs-ci

Length of output: 3659


🏁 Script executed:

cd /tmp/ocs-ci && rg -A 30 "def exec_cmd" ocs_ci/utility/utils.py

Repository: red-hat-storage/ocs-ci

Length of output: 130


🏁 Script executed:

cd /tmp/ocs-ci && rg "exec_cmd.*stdin" --type py

Repository: red-hat-storage/ocs-ci

Length of output: 215


🏁 Script executed:

# Find the repository root and the utils.py file
find / -name "utils.py" -path "*/ocs_ci/utility/*" 2>/dev/null | head -5

Repository: red-hat-storage/ocs-ci

Length of output: 145


🏁 Script executed:

# Try to locate the ocs-ci repository
find / -type d -name "ocs-ci" 2>/dev/null | head -5

Repository: red-hat-storage/ocs-ci

Length of output: 78


🏁 Script executed:

# Search in common locations
ls -la /root/ 2>/dev/null | grep -i ocs || echo "Not in /root"
ls -la /home/ 2>/dev/null | head -20

Repository: red-hat-storage/ocs-ci

Length of output: 245


🏁 Script executed:

# Check if git is available and try direct clone with different approach
which git && git --version

Repository: red-hat-storage/ocs-ci

Length of output: 112


🏁 Script executed:

# Read the specific lines mentioned in the review (845-847)
sed -n '840,860p' /tmp/ocs-ci/ocs_ci/utility/utils.py

Repository: red-hat-storage/ocs-ci

Length of output: 883


🏁 Script executed:

# Get the full exec_cmd function to understand context
rg -A 100 "def exec_cmd" /tmp/ocs-ci/ocs_ci/utility/utils.py | head -150

Repository: red-hat-storage/ocs-ci

Length of output: 3963


🏁 Script executed:

# Search for any calls to exec_cmd that pass stdin parameter
rg "exec_cmd\(" /tmp/ocs-ci --type py -A 2 | grep -E "(exec_cmd|stdin)" | head -50

Repository: red-hat-storage/ocs-ci

Length of output: 5950


🏁 Script executed:

# Get more context around the lines in question to understand the full picture
sed -n '830,870p' /tmp/ocs-ci/ocs_ci/utility/utils.py | cat -n

Repository: red-hat-storage/ocs-ci

Length of output: 2003


🏁 Script executed:

# Search for 'run_kw' variable to see if it exists in the function
rg "run_kw" /tmp/ocs-ci/ocs_ci/utility/utils.py -B 5 -A 5

Repository: red-hat-storage/ocs-ci

Length of output: 48


🏁 Script executed:

# Check the git history to see if there's a recent change
cd /tmp/ocs-ci && git log --oneline -20 -- ocs_ci/utility/utils.py

Repository: red-hat-storage/ocs-ci

Length of output: 1421


🏁 Script executed:

# Check if there's a pending diff or PR that introduces run_kw
rg "if.*input.*not in kwargs" /tmp/ocs-ci --type py

Repository: red-hat-storage/ocs-ci

Length of output: 48


🏁 Script executed:

# Check if maybe the code is at different line numbers or the file has been modified
wc -l /tmp/ocs-ci/ocs_ci/utility/utils.py

Repository: red-hat-storage/ocs-ci

Length of output: 107


🏁 Script executed:

# Get lines around 845-847 directly
sed -n '843,850p' /tmp/ocs-ci/ocs_ci/utility/utils.py | cat -n

Repository: red-hat-storage/ocs-ci

Length of output: 407


🏁 Script executed:

# Try to find the pattern mentioned in the review anywhere in the file
rg "input.*not in kwargs" /tmp/ocs-ci/ocs_ci/utility/utils.py -C 3

Repository: red-hat-storage/ocs-ci

Length of output: 48


🏁 Script executed:

# Check if there are any branches or uncommitted changes
cd /tmp/ocs-ci && git status && git branch -a

Repository: red-hat-storage/ocs-ci

Length of output: 4514


🏁 Script executed:

# The code snippet in the review doesn't match the current code in master
# Let me search across all branches for the pattern
cd /tmp/ocs-ci && git log --all --oneline --grep="input\|run_kw" | head -20

Repository: red-hat-storage/ocs-ci

Length of output: 803


🏁 Script executed:

# Search for recent branches that might have this change
cd /tmp/ocs-ci && for branch in $(git branch -r | head -30); do
  echo "=== Checking $branch ==="
  git show $branch:ocs_ci/utility/utils.py 2>/dev/null | grep -A 5 "run_kw\|if.*input.*not in kwargs" | head -10 && break
done

Repository: red-hat-storage/ocs-ci

Length of output: 98


🏁 Script executed:

# Try checking if maybe the code is in a pull request branch or pending commit
# Look for any mention of stdin handling changes
cd /tmp/ocs-ci && git log --all --oneline -30 -- ocs_ci/utility/utils.py

Repository: red-hat-storage/ocs-ci

Length of output: 1971


🏁 Script executed:

# Let's examine the actual current subprocess.run call in the current code
sed -n '825,860p' /tmp/ocs-ci/ocs_ci/utility/utils.py | cat -n

Repository: red-hat-storage/ocs-ci

Length of output: 1795


🏁 Script executed:

# Check if this is from a specific PR or review context
# Maybe the code is in /home/jailuser/git location mentioned earlier
ls -la /home/jailuser/git/ocs_ci/utility/ 2>/dev/null | head -20

Repository: red-hat-storage/ocs-ci

Length of output: 1300


🏁 Script executed:

# Check the other utils.py location
sed -n '843,850p' /home/jailuser/git/ocs_ci/utility/utils.py 2>/dev/null | cat -n

Repository: red-hat-storage/ocs-ci

Length of output: 491


🏁 Script executed:

# Try to find if run_kw appears anywhere in either location
rg "run_kw" /home/jailuser/git/ocs_ci/ 2>/dev/null | head -10
rg "run_kw" /tmp/ocs-ci/ocs_ci/ 2>/dev/null | head -10

Repository: red-hat-storage/ocs-ci

Length of output: 328


🏁 Script executed:

# Now let's see the full context around run_kw initialization
sed -n '800,860p' /home/jailuser/git/ocs_ci/utility/utils.py | cat -n

Repository: red-hat-storage/ocs-ci

Length of output: 2902


🏁 Script executed:

# Check if any callers pass stdin= to exec_cmd
rg "exec_cmd\(" /home/jailuser/git/ocs_ci --type py -A 3 | grep -E "(exec_cmd\(|stdin)" | head -50

Repository: red-hat-storage/ocs-ci

Length of output: 4764


🏁 Script executed:

# Let's verify the exact issue: check subprocess.run behavior with duplicate stdin
python3 << 'EOF'
import subprocess
import inspect

# Check subprocess.run signature
sig = inspect.signature(subprocess.run)
print("subprocess.run parameters:")
print(sig)
print("\n---")

# Test what happens with both stdin and **kwargs containing stdin
print("Test 1: Passing stdin twice as separate args")
try:
    # Simulate: subprocess.run(cmd, stdin=subprocess.PIPE, **{"stdin": subprocess.PIPE})
    result = subprocess.run(["echo", "test"], stdin=subprocess.PIPE, **{"stdin": subprocess.PIPE})
except TypeError as e:
    print(f"Error: {e}")

print("\nTest 2: Passing stdin via unpacking only")
try:
    result = subprocess.run(["echo", "test"], **{"stdin": subprocess.PIPE})
    print(f"Success with unpacking")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: red-hat-storage/ocs-ci

Length of output: 370


🏁 Script executed:

# Now let's specifically search for any callers passing stdin= to exec_cmd
rg "exec_cmd\(" /home/jailuser/git/ocs_ci --type py -A 5 | grep -B 5 "stdin" | head -40

Repository: red-hat-storage/ocs-ci

Length of output: 48


🏁 Script executed:

# Also check the history of this file to see when run_kw was added
cd /home/jailuser/git && git log --oneline -10 -- ocs_ci/utility/utils.py 2>/dev/null || echo "Not a git repo"

Repository: red-hat-storage/ocs-ci

Length of output: 123


🏁 Script executed:

# Let's verify the exact line numbers
sed -n '843,849p' /home/jailuser/git/ocs_ci/utility/utils.py | cat -n

Repository: red-hat-storage/ocs-ci

Length of output: 447


Handle both input and stdin kwargs to prevent duplicate keyword argument error.

The code checks if "input" not in kwargs: before setting stdin=subprocess.PIPE, which correctly avoids the subprocess constraint that forbids both input= and stdin= together. However, if a caller explicitly passes stdin=... as a kwarg, subprocess.run(cmd, **run_kw, **kwargs) will fail with TypeError: subprocess.run() got multiple values for keyword argument 'stdin'. Update the check to exclude both input and stdin:

💡 Minimal fix
-        if "input" not in kwargs:
+        if "input" not in kwargs and "stdin" not in kwargs:
             run_kw["stdin"] = subprocess.PIPE
🧰 Tools
🪛 Ruff (0.15.12)

[error] 847-847: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ocs_ci/utility/utils.py` around lines 845 - 847, The current logic sets
run_kw["stdin"]=subprocess.PIPE when "input" is not in kwargs but doesn't check
for an explicit "stdin" in kwargs, causing a duplicate-argument TypeError on
subprocess.run(cmd, **run_kw, **kwargs); change the guard to check both keys
(e.g., if "input" not in kwargs and "stdin" not in kwargs:) before assigning to
run_kw, leaving subprocess.run usage unchanged so callers supplying stdin or
input won't collide with run_kw.

Comment on lines +71 to +76
try:
if tls.is_tls_profile_available(silent=True):
log.info("Teardown: deleting leftover ocs-tls-profile")
tls.delete_tls_profile(wait=True, force=True)
except Exception as exc:
log.warning("Teardown: failed to delete TLSProfile: %s", exc)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Teardown should fail closed.

If the finalizer can't delete ocs-tls-profile, the suite only logs a warning and keeps going with cluster TLS state potentially modified. That can poison the next parametrized case or later suites. Re-raise after logging, or explicitly fail the finalizer.

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 75-75: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py`
around lines 71 - 76, The teardown currently swallows exceptions when attempting
to remove the TLS profile; update the finalizer so that after calling
tls.is_tls_profile_available(silent=True) and attempting
tls.delete_tls_profile(wait=True, force=True) any Exception caught is logged and
then re-raised (or explicitly raise a RuntimeError) to fail the finalizer and
prevent a poisoned cluster state; specifically modify the except block around
tls.delete_tls_profile to log the error (using log.warning or log.exception) and
then reraise exc so the test suite fails closed.

Comment on lines +241 to +244
log.info(
"Scanning operator/workload pod logs for TLS-related errors (last ~45m)"
)
assert_no_tls_errors_in_relevant_pod_logs(namespace, component, since="45m")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope the log scan to this test's lifetime.

A fixed since="45m" window makes the result depend on unrelated TLS noise from earlier activity on the same cluster. Any stale operator error inside that window will fail this test even when the lifecycle under test is clean. Capture the test start time and derive the log window from it instead of using a coarse constant.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py`
around lines 241 - 244, Record the test start time at the beginning of this test
(e.g., capture a datetime/timestamp into a variable like test_start_time) and
then pass a derived "since" value based on that timestamp to
assert_no_tls_errors_in_relevant_pod_logs instead of the fixed "45m" string;
update the call that currently reads
assert_no_tls_errors_in_relevant_pod_logs(namespace, component, since="45m") to
use the computed since (for example a RFC3339 timestamp or ISO string derived
from test_start_time) so the log scan window covers only the test's lifetime and
continues to reference assert_no_tls_errors_in_relevant_pod_logs(namespace,
component, since=derived_since).

Signed-off-by: Parag Kamble <pakamble@redhat.com>
Copy link
Copy Markdown

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: pakamble-ib-c09
Cluster Configuration:
PR Test Suite:
PR Test Path: tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Additional Test Params:
OCP VERSION: 4.22
OCS VERSION: 4.22
tested against branch: master

Job PASSED.

Signed-off-by: Parag Kamble <pakamble@redhat.com>
Copy link
Copy Markdown

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: pakamble-ib-c09
Cluster Configuration:
PR Test Suite:
PR Test Path: tests/functional/tlsprofile/test_centralized_tlsprofile_configuration.py
Additional Test Params:
OCP VERSION: 4.22
OCS VERSION: 4.22
tested against branch: master

Job PASSED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Verified Mark when PR was verified and log provided

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants