ECOPROJECT-4359 | feat: Add metrics cache to reduce database queries for inventory statistics#1180
ECOPROJECT-4359 | feat: Add metrics cache to reduce database queries for inventory statistics#1180AvielSegev wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis PR introduces a metrics caching layer for inventory statistics with cooldown-based refresh logic. A new ChangesMetrics Caching System
Sequence Diagram(s)sequenceDiagram
participant Client
participant AssessmentService
participant Store
participant MetricsCache
participant Assessment
Client->>AssessmentService: CreateAssessment()
AssessmentService->>Store: Create(assessment)
AssessmentService->>Store: RequestMetricsCacheRefresh()
Store->>MetricsCache: RequestMetricsCacheRefresh()
Note over MetricsCache: needsUpdate = true
Client->>Store: Statistics()
Store->>MetricsCache: GetStats()
alt shouldRefresh() = true
MetricsCache->>Assessment: List()
Assessment-->>MetricsCache: assessments
MetricsCache->>MetricsCache: build InventoryStats
Note over MetricsCache: stats cached,<br/>lastRefresh updated,<br/>needsUpdate = false
else cached within cooldown
Note over MetricsCache: return cached stats
end
MetricsCache-->>Store: InventoryStats
Store-->>Client: InventoryStats
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/service/assessment.go (1)
237-254:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winMissing cache refresh in UpdateAssessment early return path.
When
assessment.SourceID != nil, the function updates the assessment and commits at line 249, then returns at line 254 without callingRequestMetricsCacheRefresh(). This means the metrics cache is not invalidated when assessments with source IDs are updated.This is inconsistent with the other update path (line 262) and the create/delete operations.
🔧 Proposed fix
if _, err := store.Commit(ctx); err != nil { return nil, err } + as.store.RequestMetricsCacheRefresh() tracer.Success().WithString("update_type", "with_new_snapshot").Log() return as.GetAssessment(ctx, id) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/assessment.go` around lines 237 - 254, The early-return branch in UpdateAssessment that handles assessment.SourceID != nil updates via as.store.Assessment().Update and commits with store.Commit but fails to call RequestMetricsCacheRefresh(), leaving metrics cache stale; modify that branch to invoke RequestMetricsCacheRefresh(ctx) (or the existing service method used elsewhere) after a successful commit and before returning from UpdateAssessment (the same place other update paths call RequestMetricsCacheRefresh), keeping the tracer steps (tracer.Success) around the refresh call and then return as.GetAssessment(ctx, id).internal/store/metric_cache.go (1)
1-89:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd unit tests for MetricsCache.
This critical caching component lacks unit tests. The cooldown logic, concurrent access, and singleflight deduplication should be thoroughly tested.
Test coverage should include:
- Initial refresh when cache is empty
- Cache hit when cooldown not expired
- Refresh after min cooldown when needsUpdate is set
- Refresh after max cooldown regardless of needsUpdate
- Concurrent GetStats calls (singleflight deduplication)
- Error handling when assessment store fails
- RequestMetricsCacheRefresh flag behavior
As per coding guidelines: "Make sure that the code has unit tests."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/store/metric_cache.go` around lines 1 - 89, Add a new unit test file that exercises MetricsCache behavior: instantiate MetricsCache with a mock Assessment implementation and write tests for GetStats to cover initial refresh when cache is empty (verify List called and stats stored), cache hit when cooldown not expired (verify List not called), refresh after min cooldown when RequestMetricsCacheRefresh sets needsUpdate (verify List called after advancing clock or manipulating lastRefresh), refresh after maxCooldownPeriod regardless of needsUpdate (manipulate lastRefresh to older than maxCooldownPeriod and verify refresh), concurrent GetStats calls (spawn multiple goroutines and assert the mock List is only invoked once to test singleflight deduplication), error propagation when the Assessment.List returns an error (assert GetStats returns wrapped error), and RequestMetricsCacheRefresh toggling (call RequestMetricsCacheRefresh and assert shouldRefresh behavior); use small helper functions or a fake clock by directly setting mc.lastRefresh and mc.needsUpdate to control timing, and reference MetricsCache, GetStats, RequestMetricsCacheRefresh, shouldRefresh, and the Assessment.List method in your tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/service/assessment.go`:
- Line 262: RequestMetricsCacheRefresh() is being invoked before the database
transaction is committed, causing premature cache invalidation and race
conditions; move the call so it executes after the transaction commit succeeds.
Specifically, locate the RequestMetricsCacheRefresh() call in
internal/service/assessment.go and relocate it to run only after the transaction
commit (e.g., after tx.Commit() or the function that finalizes the transaction
in this scope), ensuring it is not executed if the commit fails. Keep the same
call but place it immediately after the commit-success branch and before
returning to guarantee cache refresh only on successful persistence.
- Line 192: Move the call to as.store.RequestMetricsCacheRefresh() so it runs
only after the DB transaction successfully commits; currently it is invoked
before the commit (the commit call like tx.Commit() / Commit() occurs after line
192), causing premature cache invalidation and race conditions. Locate the
function/method where as.store.RequestMetricsCacheRefresh() is called (in the
assessment creation flow in internal/service/assessment.go) and relocate that
call to immediately after the transaction commit/Commit() succeeds, ensuring it
is not executed on rollback or commit failure.
In `@internal/store/metric_cache.go`:
- Line 64: The return currently does an unchecked type assertion on v:
"v.(model.InventoryStats)" which can panic if the closure changes; update the
code to perform a defensive assertion (stats, ok := v.(model.InventoryStats))
and return a zero value plus an error describing the unexpected type (e.g.,
fmt.Errorf("unexpected type from singleflight: %T", v)) when ok is false,
ensuring the calling function (the code that invokes the singleflight result)
propagates that error; add the fmt import if not present.
- Around line 78-81: The existing comment "Potential change by other pods" is
misleading because this code uses no distributed invalidation; update the
comment around the time.Since(time.Unix(0, last)) > maxCooldownPeriod check (and
any nearby mention of last and maxCooldownPeriod) to state explicitly that each
pod holds independent cache state, that entries may remain stale for up to
maxCooldownPeriod (currently 3 hours), and that this is a deliberate freshness
vs. performance trade-off (or note next steps such as exposing maxCooldownPeriod
as a config or implementing a distributed invalidation mechanism if you want
stronger consistency).
- Around line 37-65: Instrument MetricsCache.GetStats to emit structured logs
and metrics: log and increment a cache-hit counter when mc.stats.Load() returns
a fresh pointer (use a "cache_hits" counter and a logger on mc if present), and
on miss increment a "cache_misses" counter, record start time, then wrap the
mc.group.Do("refresh_stats", ...) call to observe duration and outcome; after Do
completes record a histogram/summary metric for refresh duration (e.g.,
cache_refresh_duration_seconds), increment a "cache_refresh_success" counter on
success and "cache_refresh_errors" on error, and log structured messages
including the operation ("refresh_stats"), duration, and the error value when
err != nil. Ensure you reference the existing symbols mc.stats.Load,
mc.shouldRefresh, mc.group.Do, and mc.lastRefresh/mc.needsUpdate when adding the
instrumentation so the placement is correct.
In `@internal/store/store.go`:
- Around line 112-114: The current DataStore.Statistics simply returns
s.MetricCache.GetStats(ctx) which fails hard when a refresh error occurs; change
it to attempt a cached fallback: call GetStats and if it returns an error, try
to retrieve stale cached stats (e.g., via a new or existing method like
MetricCache.GetCachedStats(ctx) or MetricCache.LastStats()) and return those
with a logged warning, but only propagate the original error when no cached
stats exist; implement logging inside DataStore.Statistics to record the refresh
failure and that stale data is being served.
- Line 41: Change the exported field MetricCache to an unexported metricCache on
the Store struct and update all internal references to use metricCache;
specifically, rename the struct field MetricCache -> metricCache and replace
usages of Store.MetricCache in functions/methods referenced (previously at lines
noted for lookups) with Store.metricCache so external packages cannot directly
mutate the cache while internal code continues to access it via the new private
field.
---
Outside diff comments:
In `@internal/service/assessment.go`:
- Around line 237-254: The early-return branch in UpdateAssessment that handles
assessment.SourceID != nil updates via as.store.Assessment().Update and commits
with store.Commit but fails to call RequestMetricsCacheRefresh(), leaving
metrics cache stale; modify that branch to invoke
RequestMetricsCacheRefresh(ctx) (or the existing service method used elsewhere)
after a successful commit and before returning from UpdateAssessment (the same
place other update paths call RequestMetricsCacheRefresh), keeping the tracer
steps (tracer.Success) around the refresh call and then return
as.GetAssessment(ctx, id).
In `@internal/store/metric_cache.go`:
- Around line 1-89: Add a new unit test file that exercises MetricsCache
behavior: instantiate MetricsCache with a mock Assessment implementation and
write tests for GetStats to cover initial refresh when cache is empty (verify
List called and stats stored), cache hit when cooldown not expired (verify List
not called), refresh after min cooldown when RequestMetricsCacheRefresh sets
needsUpdate (verify List called after advancing clock or manipulating
lastRefresh), refresh after maxCooldownPeriod regardless of needsUpdate
(manipulate lastRefresh to older than maxCooldownPeriod and verify refresh),
concurrent GetStats calls (spawn multiple goroutines and assert the mock List is
only invoked once to test singleflight deduplication), error propagation when
the Assessment.List returns an error (assert GetStats returns wrapped error),
and RequestMetricsCacheRefresh toggling (call RequestMetricsCacheRefresh and
assert shouldRefresh behavior); use small helper functions or a fake clock by
directly setting mc.lastRefresh and mc.needsUpdate to control timing, and
reference MetricsCache, GetStats, RequestMetricsCacheRefresh, shouldRefresh, and
the Assessment.List method in your tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e5d70c9c-68da-4a86-92dd-d5f90a7c949f
📒 Files selected for processing (4)
internal/service/assessment.gointernal/service/sizer_test.gointernal/store/metric_cache.gointernal/store/store.go
| // Potential change by other pods | ||
| if time.Since(time.Unix(0, last)) > maxCooldownPeriod { | ||
| return true | ||
| } |
There was a problem hiding this comment.
Clarify distributed cache invalidation limitations.
The comment "Potential change by other pods" is misleading. This implementation has no distributed cache invalidation—each pod maintains independent cache state. When pod A modifies an assessment, pod B's cache won't refresh until maxCooldownPeriod (3 hours) expires, potentially serving stale data.
If this trade-off is intentional (freshness vs. performance), consider clarifying the comment and documenting the staleness window in multi-pod deployments.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/store/metric_cache.go` around lines 78 - 81, The existing comment
"Potential change by other pods" is misleading because this code uses no
distributed invalidation; update the comment around the time.Since(time.Unix(0,
last)) > maxCooldownPeriod check (and any nearby mention of last and
maxCooldownPeriod) to state explicitly that each pod holds independent cache
state, that entries may remain stale for up to maxCooldownPeriod (currently 3
hours), and that this is a deliberate freshness vs. performance trade-off (or
note next steps such as exposing maxCooldownPeriod as a config or implementing a
distributed invalidation mechanism if you want stronger consistency).
| func (s *DataStore) Statistics(ctx context.Context) (model.InventoryStats, error) { | ||
| assessments, err := s.Assessment().List(ctx, NewAssessmentQueryFilter()) | ||
| if err != nil { | ||
| return model.InventoryStats{}, err | ||
| } | ||
| return model.NewInventoryStats(assessments), nil | ||
| return s.MetricCache.GetStats(ctx) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff
Consider fallback when cache refresh fails.
When MetricCache.GetStats(ctx) fails (e.g., database temporarily unavailable during refresh), Statistics() returns an error. Should there be a fallback to return stale cached data with a warning, or is failing fast the intended behavior?
This depends on your availability vs. freshness requirements. If stale stats are acceptable during transient failures, consider:
- Returning cached stats with a logged warning when refresh fails
- Only failing when no cached stats exist
If failing fast is correct (e.g., stats must be fresh), document this behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/store/store.go` around lines 112 - 114, The current
DataStore.Statistics simply returns s.MetricCache.GetStats(ctx) which fails hard
when a refresh error occurs; change it to attempt a cached fallback: call
GetStats and if it returns an error, try to retrieve stale cached stats (e.g.,
via a new or existing method like MetricCache.GetCachedStats(ctx) or
MetricCache.LastStats()) and return those with a logged warning, but only
propagate the original error when no cached stats exist; implement logging
inside DataStore.Statistics to record the refresh failure and that stale data is
being served.
2ad971a to
ed2751b
Compare
…for inventory statistics Implements a thread-safe cache layer for inventory statistics to avoid repeated database queries. The cache uses atomic operations and the singleflight pattern to prevent duplicate concurrent refreshes. Key features: - Configurable cooldown periods (5min-3hr) to balance freshness vs performance - Automatic invalidation on assessment create/update/delete operations - Thread-safe concurrent access using atomic.Pointer and atomic.Bool - Singleflight deduplication prevents thundering herd on cache misses Changes: - Add MetricsCache type with GetStats() and RequestMetricsCacheRefresh() - Modify Statistics() to use cache instead of direct database queries - Trigger cache refresh after assessment mutations - Update MockStore in tests to implement new interface method Signed-off-by: Aviel Segev <asegev@redhat.com>
Implements a thread-safe cache layer for inventory statistics to avoid
repeated database queries. The cache uses atomic operations and the
singleflight pattern to prevent duplicate concurrent refreshes.
Key features:
Changes:
Signed-off-by: Aviel Segev asegev@redhat.com
Summary by CodeRabbit