fix(cases): async case duration sync#2781
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ee692687f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
✅ No security or compliance issues detected. Reviewed everything up to 32576dc. Security Overview
Detected Code Changes
|
There was a problem hiding this comment.
1 issue found across 18 files
Confidence score: 3/5
- There is a concrete reliability risk in
tracecat/cases/durations/consumer.py: failed/unacked jobs are only reclaimed on idle reads, so retries can be starved indefinitely when the stream stays busy. - Given the medium severity (6/10) with fairly high confidence (8/10) and direct user-facing impact on retry behavior, this carries some merge risk rather than being a minor housekeeping issue.
- Pay close attention to
tracecat/cases/durations/consumer.py- reclaim logic tied to idle reads may prevent timely recovery of failed/unacked jobs under sustained load.
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 19974b485d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b21e7f525d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8d25a0a4b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
1 issue found across 5 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
This comment has been minimized.
This comment has been minimized.
e8d25a0 to
db63b32
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db63b32fd8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bb491c2483
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32576dca0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fallbackEventType: CaseDurationAnchorEventType | ||
| ): CaseDurationFormValues["start"] { | ||
| if (isCaseDropdownEventType(anchor.event_type)) { | ||
| const eventType = isCaseDurationAnchorEventType(anchor.event_type) |
There was a problem hiding this comment.
Avoid replacing legacy viewed anchors on save
When an existing duration definition still has a case_viewed anchor, this fallback initializes the edit form as case_created/case_closed. The submit handler always sends both anchors, so a user who only edits the name or description and saves will silently rewrite the legacy anchor and change the metric instead of preserving or explicitly migrating it. This affects workspaces with preexisting case_viewed duration definitions that are still returned by the backend compatibility path.
Useful? React with 👍 / 👎.
Summary
Fixes ENG-1462: https://linear.app/tracecat/issue/ENG-1462/investigate-slow-case-loads-from-synchronous-duration-sync
This removes case-duration recomputation from case read paths and moves normal mutation-driven duration materialization into an after-commit, coalesced Redis stream consumer.
Changes:
GET /cases/{case_id}/durationsread-only; it now lists materialized rows without syncing first.duration_sync="async" | "inline" | "none"to case event creation.CASE_VIEWEDas audit-only for durations and rejectcase_viewedas a duration anchor.(workspace_id, case_id), skips irrelevant event types, and uses per-case PG advisory locks.TRACECAT__CASE_DURATION_SYNC_ENABLED=false, so the flag is a safe async-worker kill switch.Motivation
Opening a case could previously trigger writes from GET-time paths:
CASE_VIEWEDevent,sync_case_durations(), andGET /durationsalso synced durations before listing.When workflows mutated the same case concurrently, those synchronous recomputes amplified write contention around
case_durationand held request transactions open. Other non-case pages remained snappy because they did not hit this write-on-read path.Benchmarks
Hot-case profile for before/after comparison:
1case40duration definitions300history events4mutators x8mutations12case loads3baseline case loads10 msload intervalHot-case old-path runs
113.8 ms, max159.0 ms184.0 ms, p95222.2 ms, max222.2 ms291.5 ms, p95928.3 ms, max942.2 ms70, ungranted3, case_duration ungranted0138.0 ms, max170.5 ms122.6 ms, p95320.3 ms, max320.3 ms170.5 ms, p95209.5 ms, max237.9 ms131.0 ms, max189.3 ms124.7 ms, p95292.5 ms, max292.5 ms155.2 ms, p95179.6 ms, max206.0 msHot-case new async-worker runs
55.9 ms, p9571.2 ms, max71.2 ms121.2 ms, p95181.0 ms, max181.0 ms118.6 ms, p95155.3 ms, max189.0 ms068.6 ms, p9591.7 ms, max91.7 ms124.6 ms, p95235.2 ms, max235.2 ms117.7 ms, p95217.9 ms, max221.9 ms071.4 ms, p95118.3 ms, max118.3 ms127.9 ms, p95154.0 ms, max154.0 ms118.1 ms, p95149.9 ms, max150.6 ms0Main signal: the old hot update path had mutation p95 around
928.3 ms; the current PR commit is149.9 msin the same reduced hot-case profile.Existing burst/health benchmark on new implementation
Profile:
20cases80definitions600history events per case1update per case50 mshealth interval1000 mshealth timeout315.2 ms322.4 ms322.5 ms202.0 ms2.3 ms2.3 ms45.3 ms142.3 ms142.3 ms141.6 ms2.3 ms2.3 ms62.9 ms3.4 ms3.4 ms447.5 ms193.7 ms193.7 ms142.7 ms3.4 ms3.4 ms6Other values:
0.674 s0, burst0, cooldown00Verification
uv run pytest tests/unit/test_case_events_service.py tests/unit/test_cases_service.py tests/unit/test_case_duration_service.py tests/unit/test_case_duration_router.py tests/unit/test_case_duration_sync_consumer.py107 passeduv run ruff check ...uv run ruff format --check ...uv run basedpyright ...0 errors, 0 warnings, 0 notesTRACECAT_RUN_CASE_DURATION_BENCHMARKS=1 ... uv run pytest tests/integration/test_case_duration_benchmarks.py -k hot_case -sSummary by cubic
Moves case-duration recomputation off reads and event writes to an async Redis-stream consumer to fix ENG-1462 slow case loads. The durations endpoint is read-only, and the background worker starts with the API and reads backlog for reliable sync.
New Features
tracecat-case-duration-sync.duration_sync="async" | "inline" | "none"; create-case uses inline. Definition create/update enqueue cursor-paged backfills; when the worker is disabled, they backfill inline streamed in small batches.CASE_VIEWEDas audit-only and reject it as a duration anchor; UI hides it from anchor options and falls back to safe defaults when editing.GET /cases/{id}/durationslists materialized rows only.Bug Fixes
case_closed,case_reopenedmap tostatus_changed).TRACECAT__CASE_DURATION_SYNC_ENABLEDwithenv_boolso the async worker flag behaves correctly across environments.CASE_VIEWEDevents still enqueue duration sync when legacy definitions reference them; metadata-only definition updates (name/description) no longer trigger backfills.Written for commit 32576dc. Summary will update on new commits.