Skip to content

ECOPROJECT-4359 | feat: Add metrics cache to reduce database queries for inventory statistics#1180

Open
AvielSegev wants to merge 1 commit into
kubev2v:mainfrom
AvielSegev:stat-cache
Open

ECOPROJECT-4359 | feat: Add metrics cache to reduce database queries for inventory statistics#1180
AvielSegev wants to merge 1 commit into
kubev2v:mainfrom
AvielSegev:stat-cache

Conversation

@AvielSegev
Copy link
Copy Markdown
Collaborator

@AvielSegev AvielSegev commented May 17, 2026

Implements a thread-safe cache layer for inventory statistics to avoid
repeated database queries. The cache uses atomic operations and the
singleflight pattern to prevent duplicate concurrent refreshes.

Key features:

  • Configurable cooldown periods (5min-3hr) to balance freshness vs performance
  • Automatic invalidation on assessment create/update/delete operations
  • Thread-safe concurrent access using atomic.Pointer and atomic.Bool
  • Singleflight deduplication prevents thundering herd on cache misses

Changes:

  • Add MetricsCache type with GetStats() and RequestMetricsCacheRefresh()
  • Modify Statistics() to use cache instead of direct database queries
  • Trigger cache refresh after assessment mutations
  • Update MockStore in tests to implement new interface method

Signed-off-by: Aviel Segev asegev@redhat.com

Summary by CodeRabbit

  • New Features
    • Inventory statistics are now cached with intelligent refresh logic to improve application performance and reduce server load.
    • Cache automatically refreshes whenever assessments are created, updated, or deleted, ensuring statistics remain fresh and accurate without manual intervention.

Review Change Stack

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nirarg for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Warning

Rate limit exceeded

@AvielSegev has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 33 minutes and 7 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6cfb17d3-02b5-4f52-8e94-6f2cae8e5164

📥 Commits

Reviewing files that changed from the base of the PR and between 5684a18 and 67ac9bf.

📒 Files selected for processing (6)
  • internal/handlers/v1alpha1/sizer_test.go
  • internal/rvtools/jobs/worker.go
  • internal/service/assessment.go
  • internal/service/sizer_test.go
  • internal/store/metric_cache.go
  • internal/store/store.go
📝 Walkthrough

Walkthrough

This PR introduces a metrics caching layer for inventory statistics with cooldown-based refresh logic. A new MetricsCache component caches InventoryStats using atomic state and singleflight deduplication. The cache is integrated into the Store interface and DataStore, with Statistics() now delegating to the cache. Assessment mutations trigger cache refresh requests via a new RequestMetricsCacheRefresh() method chain.

Changes

Metrics Caching System

Layer / File(s) Summary
Metrics cache implementation with cooldown logic
internal/store/metric_cache.go
MetricsCache type uses atomic state (stats pointer, lastRefresh timestamp, needsUpdate flag) and singleflight to deduplicate concurrent refresh calls. GetStats returns cached stats when cooldown rules permit; otherwise refreshes from the assessment store. RequestMetricsCacheRefresh sets the needsUpdate flag. shouldRefresh combines initialization, max cooldown override for cross-pod changes, the needsUpdate gate, and min cooldown elapsed check to determine refresh eligibility.
Store interface and cache wiring
internal/store/store.go
Store interface adds RequestMetricsCacheRefresh() method contract. DataStore gains MetricCache field initialized via NewMetricsCache(assessment) in NewStore. Statistics(ctx) now delegates to MetricCache.GetStats(ctx) instead of direct assessment listing. DataStore.RequestMetricsCacheRefresh() forwards refresh requests to MetricCache.RequestMetricsCacheRefresh().
Assessment service cache refresh triggers
internal/service/assessment.go, internal/service/sizer_test.go
CreateAssessment, UpdateAssessment, and DeleteAssessment call as.store.RequestMetricsCacheRefresh() after successful database mutations to signal cache invalidation. MockStore in sizer_test.go adds no-op RequestMetricsCacheRefresh() method to satisfy the updated store interface contract.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant AssessmentService
  participant Store
  participant MetricsCache
  participant Assessment

  Client->>AssessmentService: CreateAssessment()
  AssessmentService->>Store: Create(assessment)
  AssessmentService->>Store: RequestMetricsCacheRefresh()
  Store->>MetricsCache: RequestMetricsCacheRefresh()
  Note over MetricsCache: needsUpdate = true

  Client->>Store: Statistics()
  Store->>MetricsCache: GetStats()
  alt shouldRefresh() = true
    MetricsCache->>Assessment: List()
    Assessment-->>MetricsCache: assessments
    MetricsCache->>MetricsCache: build InventoryStats
    Note over MetricsCache: stats cached,<br/>lastRefresh updated,<br/>needsUpdate = false
  else cached within cooldown
    Note over MetricsCache: return cached stats
  end
  MetricsCache-->>Store: InventoryStats
  Store-->>Client: InventoryStats
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

lgtm

Suggested reviewers

  • nirarg
  • ronenav
  • tupyy

Poem

🐰 A cache springs forth with cooling grace,
Stats flow swift through atomic space,
Cooldown timers guard the way,
Service mutations call refresh today!
Metrics bloom with singleflight care.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a metrics cache to reduce database queries for inventory statistics, which is the central focus of the entire changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AvielSegev AvielSegev marked this pull request as ready for review May 17, 2026 13:07
@AvielSegev AvielSegev requested a review from a team as a code owner May 17, 2026 13:07
@AvielSegev AvielSegev requested review from ronenav and tupyy and removed request for a team May 17, 2026 13:07
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/service/assessment.go (1)

237-254: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Missing cache refresh in UpdateAssessment early return path.

When assessment.SourceID != nil, the function updates the assessment and commits at line 249, then returns at line 254 without calling RequestMetricsCacheRefresh(). This means the metrics cache is not invalidated when assessments with source IDs are updated.

This is inconsistent with the other update path (line 262) and the create/delete operations.

🔧 Proposed fix
 		if _, err := store.Commit(ctx); err != nil {
 			return nil, err
 		}
+		as.store.RequestMetricsCacheRefresh()
 
 		tracer.Success().WithString("update_type", "with_new_snapshot").Log()
 		return as.GetAssessment(ctx, id)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/assessment.go` around lines 237 - 254, The early-return
branch in UpdateAssessment that handles assessment.SourceID != nil updates via
as.store.Assessment().Update and commits with store.Commit but fails to call
RequestMetricsCacheRefresh(), leaving metrics cache stale; modify that branch to
invoke RequestMetricsCacheRefresh(ctx) (or the existing service method used
elsewhere) after a successful commit and before returning from UpdateAssessment
(the same place other update paths call RequestMetricsCacheRefresh), keeping the
tracer steps (tracer.Success) around the refresh call and then return
as.GetAssessment(ctx, id).
internal/store/metric_cache.go (1)

1-89: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add unit tests for MetricsCache.

This critical caching component lacks unit tests. The cooldown logic, concurrent access, and singleflight deduplication should be thoroughly tested.

Test coverage should include:

  • Initial refresh when cache is empty
  • Cache hit when cooldown not expired
  • Refresh after min cooldown when needsUpdate is set
  • Refresh after max cooldown regardless of needsUpdate
  • Concurrent GetStats calls (singleflight deduplication)
  • Error handling when assessment store fails
  • RequestMetricsCacheRefresh flag behavior

As per coding guidelines: "Make sure that the code has unit tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/store/metric_cache.go` around lines 1 - 89, Add a new unit test file
that exercises MetricsCache behavior: instantiate MetricsCache with a mock
Assessment implementation and write tests for GetStats to cover initial refresh
when cache is empty (verify List called and stats stored), cache hit when
cooldown not expired (verify List not called), refresh after min cooldown when
RequestMetricsCacheRefresh sets needsUpdate (verify List called after advancing
clock or manipulating lastRefresh), refresh after maxCooldownPeriod regardless
of needsUpdate (manipulate lastRefresh to older than maxCooldownPeriod and
verify refresh), concurrent GetStats calls (spawn multiple goroutines and assert
the mock List is only invoked once to test singleflight deduplication), error
propagation when the Assessment.List returns an error (assert GetStats returns
wrapped error), and RequestMetricsCacheRefresh toggling (call
RequestMetricsCacheRefresh and assert shouldRefresh behavior); use small helper
functions or a fake clock by directly setting mc.lastRefresh and mc.needsUpdate
to control timing, and reference MetricsCache, GetStats,
RequestMetricsCacheRefresh, shouldRefresh, and the Assessment.List method in
your tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/service/assessment.go`:
- Line 262: RequestMetricsCacheRefresh() is being invoked before the database
transaction is committed, causing premature cache invalidation and race
conditions; move the call so it executes after the transaction commit succeeds.
Specifically, locate the RequestMetricsCacheRefresh() call in
internal/service/assessment.go and relocate it to run only after the transaction
commit (e.g., after tx.Commit() or the function that finalizes the transaction
in this scope), ensuring it is not executed if the commit fails. Keep the same
call but place it immediately after the commit-success branch and before
returning to guarantee cache refresh only on successful persistence.
- Line 192: Move the call to as.store.RequestMetricsCacheRefresh() so it runs
only after the DB transaction successfully commits; currently it is invoked
before the commit (the commit call like tx.Commit() / Commit() occurs after line
192), causing premature cache invalidation and race conditions. Locate the
function/method where as.store.RequestMetricsCacheRefresh() is called (in the
assessment creation flow in internal/service/assessment.go) and relocate that
call to immediately after the transaction commit/Commit() succeeds, ensuring it
is not executed on rollback or commit failure.

In `@internal/store/metric_cache.go`:
- Line 64: The return currently does an unchecked type assertion on v:
"v.(model.InventoryStats)" which can panic if the closure changes; update the
code to perform a defensive assertion (stats, ok := v.(model.InventoryStats))
and return a zero value plus an error describing the unexpected type (e.g.,
fmt.Errorf("unexpected type from singleflight: %T", v)) when ok is false,
ensuring the calling function (the code that invokes the singleflight result)
propagates that error; add the fmt import if not present.
- Around line 78-81: The existing comment "Potential change by other pods" is
misleading because this code uses no distributed invalidation; update the
comment around the time.Since(time.Unix(0, last)) > maxCooldownPeriod check (and
any nearby mention of last and maxCooldownPeriod) to state explicitly that each
pod holds independent cache state, that entries may remain stale for up to
maxCooldownPeriod (currently 3 hours), and that this is a deliberate freshness
vs. performance trade-off (or note next steps such as exposing maxCooldownPeriod
as a config or implementing a distributed invalidation mechanism if you want
stronger consistency).
- Around line 37-65: Instrument MetricsCache.GetStats to emit structured logs
and metrics: log and increment a cache-hit counter when mc.stats.Load() returns
a fresh pointer (use a "cache_hits" counter and a logger on mc if present), and
on miss increment a "cache_misses" counter, record start time, then wrap the
mc.group.Do("refresh_stats", ...) call to observe duration and outcome; after Do
completes record a histogram/summary metric for refresh duration (e.g.,
cache_refresh_duration_seconds), increment a "cache_refresh_success" counter on
success and "cache_refresh_errors" on error, and log structured messages
including the operation ("refresh_stats"), duration, and the error value when
err != nil. Ensure you reference the existing symbols mc.stats.Load,
mc.shouldRefresh, mc.group.Do, and mc.lastRefresh/mc.needsUpdate when adding the
instrumentation so the placement is correct.

In `@internal/store/store.go`:
- Around line 112-114: The current DataStore.Statistics simply returns
s.MetricCache.GetStats(ctx) which fails hard when a refresh error occurs; change
it to attempt a cached fallback: call GetStats and if it returns an error, try
to retrieve stale cached stats (e.g., via a new or existing method like
MetricCache.GetCachedStats(ctx) or MetricCache.LastStats()) and return those
with a logged warning, but only propagate the original error when no cached
stats exist; implement logging inside DataStore.Statistics to record the refresh
failure and that stale data is being served.
- Line 41: Change the exported field MetricCache to an unexported metricCache on
the Store struct and update all internal references to use metricCache;
specifically, rename the struct field MetricCache -> metricCache and replace
usages of Store.MetricCache in functions/methods referenced (previously at lines
noted for lookups) with Store.metricCache so external packages cannot directly
mutate the cache while internal code continues to access it via the new private
field.

---

Outside diff comments:
In `@internal/service/assessment.go`:
- Around line 237-254: The early-return branch in UpdateAssessment that handles
assessment.SourceID != nil updates via as.store.Assessment().Update and commits
with store.Commit but fails to call RequestMetricsCacheRefresh(), leaving
metrics cache stale; modify that branch to invoke
RequestMetricsCacheRefresh(ctx) (or the existing service method used elsewhere)
after a successful commit and before returning from UpdateAssessment (the same
place other update paths call RequestMetricsCacheRefresh), keeping the tracer
steps (tracer.Success) around the refresh call and then return
as.GetAssessment(ctx, id).

In `@internal/store/metric_cache.go`:
- Around line 1-89: Add a new unit test file that exercises MetricsCache
behavior: instantiate MetricsCache with a mock Assessment implementation and
write tests for GetStats to cover initial refresh when cache is empty (verify
List called and stats stored), cache hit when cooldown not expired (verify List
not called), refresh after min cooldown when RequestMetricsCacheRefresh sets
needsUpdate (verify List called after advancing clock or manipulating
lastRefresh), refresh after maxCooldownPeriod regardless of needsUpdate
(manipulate lastRefresh to older than maxCooldownPeriod and verify refresh),
concurrent GetStats calls (spawn multiple goroutines and assert the mock List is
only invoked once to test singleflight deduplication), error propagation when
the Assessment.List returns an error (assert GetStats returns wrapped error),
and RequestMetricsCacheRefresh toggling (call RequestMetricsCacheRefresh and
assert shouldRefresh behavior); use small helper functions or a fake clock by
directly setting mc.lastRefresh and mc.needsUpdate to control timing, and
reference MetricsCache, GetStats, RequestMetricsCacheRefresh, shouldRefresh, and
the Assessment.List method in your tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e5d70c9c-68da-4a86-92dd-d5f90a7c949f

📥 Commits

Reviewing files that changed from the base of the PR and between 8afc415 and 5684a18.

📒 Files selected for processing (4)
  • internal/service/assessment.go
  • internal/service/sizer_test.go
  • internal/store/metric_cache.go
  • internal/store/store.go

Comment thread internal/service/assessment.go Outdated
Comment thread internal/service/assessment.go Outdated
Comment thread internal/store/metric_cache.go
Comment thread internal/store/metric_cache.go
Comment on lines +78 to +81
// Potential change by other pods
if time.Since(time.Unix(0, last)) > maxCooldownPeriod {
return true
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify distributed cache invalidation limitations.

The comment "Potential change by other pods" is misleading. This implementation has no distributed cache invalidation—each pod maintains independent cache state. When pod A modifies an assessment, pod B's cache won't refresh until maxCooldownPeriod (3 hours) expires, potentially serving stale data.

If this trade-off is intentional (freshness vs. performance), consider clarifying the comment and documenting the staleness window in multi-pod deployments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/store/metric_cache.go` around lines 78 - 81, The existing comment
"Potential change by other pods" is misleading because this code uses no
distributed invalidation; update the comment around the time.Since(time.Unix(0,
last)) > maxCooldownPeriod check (and any nearby mention of last and
maxCooldownPeriod) to state explicitly that each pod holds independent cache
state, that entries may remain stale for up to maxCooldownPeriod (currently 3
hours), and that this is a deliberate freshness vs. performance trade-off (or
note next steps such as exposing maxCooldownPeriod as a config or implementing a
distributed invalidation mechanism if you want stronger consistency).

Comment thread internal/store/store.go Outdated
Comment thread internal/store/store.go
Comment on lines 112 to +114
func (s *DataStore) Statistics(ctx context.Context) (model.InventoryStats, error) {
assessments, err := s.Assessment().List(ctx, NewAssessmentQueryFilter())
if err != nil {
return model.InventoryStats{}, err
}
return model.NewInventoryStats(assessments), nil
return s.MetricCache.GetStats(ctx)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff

Consider fallback when cache refresh fails.

When MetricCache.GetStats(ctx) fails (e.g., database temporarily unavailable during refresh), Statistics() returns an error. Should there be a fallback to return stale cached data with a warning, or is failing fast the intended behavior?

This depends on your availability vs. freshness requirements. If stale stats are acceptable during transient failures, consider:

  • Returning cached stats with a logged warning when refresh fails
  • Only failing when no cached stats exist

If failing fast is correct (e.g., stats must be fresh), document this behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/store/store.go` around lines 112 - 114, The current
DataStore.Statistics simply returns s.MetricCache.GetStats(ctx) which fails hard
when a refresh error occurs; change it to attempt a cached fallback: call
GetStats and if it returns an error, try to retrieve stale cached stats (e.g.,
via a new or existing method like MetricCache.GetCachedStats(ctx) or
MetricCache.LastStats()) and return those with a logged warning, but only
propagate the original error when no cached stats exist; implement logging
inside DataStore.Statistics to record the refresh failure and that stale data is
being served.

@AvielSegev AvielSegev force-pushed the stat-cache branch 4 times, most recently from 2ad971a to ed2751b Compare May 17, 2026 13:32
…for inventory statistics

Implements a thread-safe cache layer for inventory statistics to avoid
repeated database queries. The cache uses atomic operations and the
singleflight pattern to prevent duplicate concurrent refreshes.

Key features:
- Configurable cooldown periods (5min-3hr) to balance freshness vs performance
- Automatic invalidation on assessment create/update/delete operations
- Thread-safe concurrent access using atomic.Pointer and atomic.Bool
- Singleflight deduplication prevents thundering herd on cache misses

Changes:
- Add MetricsCache type with GetStats() and RequestMetricsCacheRefresh()
- Modify Statistics() to use cache instead of direct database queries
- Trigger cache refresh after assessment mutations
- Update MockStore in tests to implement new interface method

Signed-off-by: Aviel Segev <asegev@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant