cache: patience-window + minimal-set first paint#544
Open
nadaverell wants to merge 2 commits intomainfrom
Open
cache: patience-window + minimal-set first paint#544nadaverell wants to merge 2 commits intomainfrom
nadaverell wants to merge 2 commits intomainfrom
Conversation
Replaces the 60s blanket SyncTimeout for the connecting screen with a softer two-stage gate: 1. Patience window (8s) — wait for ALL critical informers. On the common path (fast cluster) nothing else triggers; first paint is complete and coherent. 2. Minimal-set fallback — once the patience window elapses, return as soon as pods/namespaces/nodes/services/deployments are synced. Any other critical informer still loading (ingresses, jobs, etc.) is promoted to deferred and joins the topology when ready. No hard timeout: deferred informers retry indefinitely. The user-facing "give up" semantic that the 60s cap implied was always misleading — informers stayed subscribed regardless. The patience window is a "render with what you have" boundary, not a failure cliff. Cache progress is wired into the connection's progressMessage so the connecting screen shows "Loading cluster data… X of Y ready" instead of a static "Loading workloads…". After first paint, slower informers arrive coalesced into 5s windows (was 3s) to keep the topology graph from jittering, and the home view shows a small inline indicator listing the kinds still loading. Mechanism lives in pkg/k8score (CacheConfig.PatienceWindow, CacheConfig.MinimalSet, CacheConfig.SyncProgress); skyhook-connector's legacy SyncTimeout path is unchanged.
- Guard MinimalSet against typos / unmatched keys: log a warning at cache construction listing keys that don't correspond to any enabled critical informer, plus a separate warning when the effective minimal set is empty (would otherwise return at PatienceWindow with nothing meaningfully gated). - Add PendingPromotedKinds() — live-filtered view of promoted-at-first- paint informers that still aren't synced. Dashboard banner switches to this so the "still loading" pill drains as kinds arrive instead of showing the snapshot from connect time. - After patience elapses, log distinguishes "first-paint blocked" (only minimal-set kinds matter) from generic "critical sync progress" so operators can see the actual blocker. - New tests: legacy SyncTimeout promotion, MinimalSet typo, PatienceWindow without MinimalSet, PendingPromotedKinds live-filter contract. - Comment cleanup: drop commit-flavored "(legacy)/(new)" parentheticals, fix SyncProgress godoc parameter names, trim restating-what comments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the 60s blanket sync timeout that gated first paint with a softer two-stage approach:
The connecting screen now shows live progress ("Loading cluster data… X of Y ready") instead of a static "Loading workloads…". Once on the home view, a small inline indicator lists kinds still loading.
Why
The previous behavior was "blank screen for up to 60s if any one informer was slow." On a small/slow API server (e.g. 2-node Talos) or with one flaky CRD-served kind, this was the worst-of-both: misleading (informers stay subscribed regardless of the "timeout"), and bad UX. The new shape commits to one render moment when the spine is ready and lets the rest stream in coalesced.
No new flags — defaults are tuned in code. The mechanism in `pkg/k8score` (`PatienceWindow`, `MinimalSet`, `SyncProgress`) is opt-in; skyhook-connector's legacy `SyncTimeout` path is unchanged.
Test plan