Skip to content

fix(kserve): add annotation to use HTTP logger endpoint for Raw deployments#689

Open
ruivieira wants to merge 1 commit intotrustyai-explainability:mainfrom
ruivieira:bugfix/RHOAIENG-38132-kserve-logger-http-annotation
Open

fix(kserve): add annotation to use HTTP logger endpoint for Raw deployments#689
ruivieira wants to merge 1 commit intotrustyai-explainability:mainfrom
ruivieira:bugfix/RHOAIENG-38132-kserve-logger-http-annotation

Conversation

@ruivieira
Copy link
Copy Markdown
Member

@ruivieira ruivieira commented Mar 27, 2026

When TrustyAI is deployed with KServe Raw InferenceServices, the operator injects an HTTPS logger URL. However, KServe's inference logger cannot verify the TLS certificate because *.svc.cluster.local certs are signed by OpenShift's internal Service CA, causing x509 verification failures.

This commit adds an opt-in annotation trustyai.opendatahub.io/kserve-logger-http on the TrustyAIService CR. When set to "true", the operator injects an HTTP logger URL instead of HTTPS for KServe Raw deployments, avoiding the TLS issue for cluster-internal traffic. HTTPS remains the default behavior.

Summary

  • Add trustyai.opendatahub.io/kserve-logger-http annotation support
  • When annotation="true", Raw deployments use HTTP logger URLs
  • HTTPS remains default (backward compatible)
  • No impact on Serverless or ModelMesh deployments

Test plan

  • go build ./... passes
  • make test passes in CI
  • Deploy with annotation and verify HTTP logger URL
  • Deploy without annotation and verify HTTPS logger URL (backward compat)

Fixes: RHOAIENG-38132

Summary by CodeRabbit

  • New Features

    • Added annotation support to configure HTTP logger URLs for KServe InferenceServices, enabling non-HTTPS communication for cluster-internal traffic and avoiding certificate verification issues in certain environments.
  • Tests

    • Introduced comprehensive test coverage for KServe logger URL configuration, validating HTTP and HTTPS logger assignment across multiple deployment scenarios.

…yments

When TrustyAI is deployed with KServe Raw InferenceServices, the operator
injects an HTTPS logger URL. However, KServe's inference logger cannot verify
the TLS certificate because *.svc.cluster.local certs are signed by
OpenShift's internal Service CA, causing x509 verification failures.

This commit adds an opt-in annotation `trustyai.opendatahub.io/kserve-logger-http`
on the TrustyAIService CR. When set to "true", the operator injects an HTTP
logger URL instead of HTTPS for KServe Raw deployments, avoiding the TLS
issue for cluster-internal traffic. HTTPS remains the default behavior.

## Summary
- Add `trustyai.opendatahub.io/kserve-logger-http` annotation support
- When annotation="true", Raw deployments use HTTP logger URLs
- HTTPS remains default (backward compatible)
- No impact on Serverless or ModelMesh deployments

## Test plan
- [ ] `go build ./...` passes
- [ ] `make test` passes in CI
- [ ] Deploy with annotation and verify HTTP logger URL
- [ ] Deploy without annotation and verify HTTPS logger URL (backward compat)

Fixes: RHOAIENG-38132

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ruivieira ruivieira self-assigned this Mar 27, 2026
@ruivieira ruivieira added the kind/bug Something isn't working label Mar 27, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

📝 Walkthrough

Walkthrough

This change introduces annotation-based control for KServe logger URL schemes. A new TrustyAIService CR annotation (kserveLoggerHTTPAnnotationKey) enables operators to inject HTTP (instead of HTTPS) logger URLs into KServe InferenceServices to avoid certificate verification failures for cluster-internal traffic. The reconciler reads this annotation and conditionally passes a boolean flag to the existing patchKServe function.

Changes

Cohort / File(s) Summary
Logger Configuration
controllers/tas/constants.go, controllers/tas/inference_services.go
Added annotation constant for HTTP logger selection; modified handleInferenceServices to read the annotation and conditionally pass !useHTTPLogger to patchKServe, switching logger URL scheme from HTTPS (default) to HTTP when annotation is set to "true".
Test Suite
controllers/tas/inference_services_test.go
New comprehensive Ginkgo test suite validating patchKServe and handleInferenceServices behavior across multiple scenarios: default HTTPS logger, HTTP logger when annotation is "true", and HTTPS fallback when annotation has non-"true" values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 With a hop and a skip through annotations so fine,
HTTP or HTTPS, we draw the line,
No x509 troubles in clusters today,
Logger URLs dancing the safe, local way! 🌿

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding an annotation to enable HTTP logger endpoints for KServe Raw deployments, which is the primary objective of this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/tas/inference_services_test.go`:
- Around line 23-31: Replace the fake client usage in the BeforeEach when
constructing testReconciler with an envtest-backed controller-runtime client:
start an envtest.Environment, register the scheme entries (same scheme.Scheme
used now), create a manager and use mgr.GetClient() (or client.New with
mgr.GetConfig()) as the Client for TrustyAIServiceReconciler, and assign
testCtx; ensure you stop/tear down the envtest environment in AfterEach and load
any required CRDs so the envtest API server behavior matches real controller
tests instead of using fake.NewClientBuilder().

In `@controllers/tas/inference_services.go`:
- Around line 255-256: When removing operator-managed logger config
(remove==true) patchKServe currently only compares the existing logger URL
against one computed scheme and can skip deletion if the annotation changed
(HTTP vs HTTPS); update patchKServe to compute both operator-managed variants
(HTTP and HTTPS) and treat the existing logger URL as removable if it equals
either variant (or matches the operator-managed host pattern) so removal does
not depend on which scheme was last stored; adjust logic where patchKServe is
called (the call with !useHTTPLogger) and inside patchKServe to check both
schemes when remove is true.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e1dd1156-eeae-479b-a0a1-597f28beb247

📥 Commits

Reviewing files that changed from the base of the PR and between 8c2cd26 and 7e8cf03.

📒 Files selected for processing (3)
  • controllers/tas/constants.go
  • controllers/tas/inference_services.go
  • controllers/tas/inference_services_test.go

Comment on lines +23 to +31
BeforeEach(func() {
testReconciler = &TrustyAIServiceReconciler{
Client: fake.NewClientBuilder().WithScheme(scheme.Scheme).Build(),
Scheme: scheme.Scheme,
EventRecorder: record.NewFakeRecorder(10),
Namespace: operatorNamespace,
}
testCtx = context.Background()
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use envtest-backed controller tests instead of fake client.

Line 25 builds the reconciler with fake.NewClientBuilder(), which does not meet the controller test harness requirement and can miss API-server behavior differences.

As per coding guidelines, "Use Ginkgo v2 with Gomega assertions and controller-runtime envtest for all controller tests."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/tas/inference_services_test.go` around lines 23 - 31, Replace the
fake client usage in the BeforeEach when constructing testReconciler with an
envtest-backed controller-runtime client: start an envtest.Environment, register
the scheme entries (same scheme.Scheme used now), create a manager and use
mgr.GetClient() (or client.New with mgr.GetConfig()) as the Client for
TrustyAIServiceReconciler, and assign testCtx; ensure you stop/tear down the
envtest environment in AfterEach and load any required CRDs so the envtest API
server behavior matches real controller tests instead of using
fake.NewClientBuilder().

Comment on lines +255 to +256
// Use HTTPS by default, unless the HTTP logger annotation is set
err := r.patchKServe(ctx, instance, infService, namespace, crName, remove, !useHTTPLogger)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Line 256 introduces scheme-coupled cleanup risk for remove=true.

patchKServe removal currently only deletes when the existing logger URL equals the computed scheme URL. After this change, if the annotation value changes before cleanup, removal can be skipped and the old logger remains configured.

💡 Proposed fix (make removal accept either operator-managed scheme)
 func (r *TrustyAIServiceReconciler) patchKServe(ctx context.Context, instance *trustyaiopendatahubiov1alpha1.TrustyAIService, infService kservev1beta1.InferenceService, namespace string, crName string, remove bool, useHTTPS bool) error {
@@
 	if remove {
-		if infService.Spec.Predictor.Logger == nil || *infService.Spec.Predictor.Logger.URL != url {
+		if infService.Spec.Predictor.Logger == nil || infService.Spec.Predictor.Logger.URL == nil {
 			return nil // Removing, but InferenceLogger is not set or is not set to the expected URL. Do nothing.
 		}
+		currentURL := *infService.Spec.Predictor.Logger.URL
+		httpsURL := utils.GenerateHTTPSKServeLoggerURL(crName, namespace)
+		httpURL := utils.GenerateKServeLoggerURL(crName, namespace)
+		if currentURL != httpsURL && currentURL != httpURL {
+			return nil // Do not remove non-operator-managed logger URLs.
+		}
 
 		// Remove the InferenceLogger
 		infService.Spec.Predictor.Logger = nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/tas/inference_services.go` around lines 255 - 256, When removing
operator-managed logger config (remove==true) patchKServe currently only
compares the existing logger URL against one computed scheme and can skip
deletion if the annotation changed (HTTP vs HTTPS); update patchKServe to
compute both operator-managed variants (HTTP and HTTPS) and treat the existing
logger URL as removable if it equals either variant (or matches the
operator-managed host pattern) so removal does not depend on which scheme was
last stored; adjust logic where patchKServe is called (the call with
!useHTTPLogger) and inside patchKServe to check both schemes when remove is
true.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 27, 2026

@ruivieira: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 7e8cf03 link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ruivieira ruivieira moved this to In Review in TrustyAI planning Mar 27, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 1, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: m-misiura

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working lgtm

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants