Skip to content

cache: patience-window + minimal-set first paint#544

Open
nadaverell wants to merge 2 commits intomainfrom
feature/first-paint-minimal-set
Open

cache: patience-window + minimal-set first paint#544
nadaverell wants to merge 2 commits intomainfrom
feature/first-paint-minimal-set

Conversation

@nadaverell
Copy link
Copy Markdown
Contributor

Summary

Replaces the 60s blanket sync timeout that gated first paint with a softer two-stage approach:

  1. Patience window (8s) — wait for all critical informers. Most clusters complete here and render once with everything.
  2. Minimal-set fallback — once the patience window elapses, return as soon as pods/namespaces/nodes/services/deployments are synced. Other critical informers still loading (ingresses, jobs, replicasets, etc.) are promoted to deferred and join the topology when ready, coalesced into 5s windows so the graph doesn't jitter.

The connecting screen now shows live progress ("Loading cluster data… X of Y ready") instead of a static "Loading workloads…". Once on the home view, a small inline indicator lists kinds still loading.

Why

The previous behavior was "blank screen for up to 60s if any one informer was slow." On a small/slow API server (e.g. 2-node Talos) or with one flaky CRD-served kind, this was the worst-of-both: misleading (informers stay subscribed regardless of the "timeout"), and bad UX. The new shape commits to one render moment when the spine is ready and lets the rest stream in coalesced.

No new flags — defaults are tuned in code. The mechanism in `pkg/k8score` (`PatienceWindow`, `MinimalSet`, `SyncProgress`) is opt-in; skyhook-connector's legacy `SyncTimeout` path is unchanged.

Test plan

  • `go test ./...` (new tests cover all-fast and slow-cluster promotion paths)
  • `tsc --noEmit` clean
  • Manual smoke: connect to a cluster, verify connecting screen shows progress and home renders within ~8s on healthy clusters
  • Manual smoke on a slow cluster: verify minimal-set fallback fires and the "still loading" indicator surfaces remaining kinds

Replaces the 60s blanket SyncTimeout for the connecting screen with a
softer two-stage gate:

1. Patience window (8s) — wait for ALL critical informers. On the
   common path (fast cluster) nothing else triggers; first paint is
   complete and coherent.
2. Minimal-set fallback — once the patience window elapses, return as
   soon as pods/namespaces/nodes/services/deployments are synced. Any
   other critical informer still loading (ingresses, jobs, etc.) is
   promoted to deferred and joins the topology when ready.

No hard timeout: deferred informers retry indefinitely. The user-facing
"give up" semantic that the 60s cap implied was always misleading —
informers stayed subscribed regardless. The patience window is a
"render with what you have" boundary, not a failure cliff.

Cache progress is wired into the connection's progressMessage so the
connecting screen shows "Loading cluster data… X of Y ready" instead of
a static "Loading workloads…". After first paint, slower informers
arrive coalesced into 5s windows (was 3s) to keep the topology graph
from jittering, and the home view shows a small inline indicator
listing the kinds still loading.

Mechanism lives in pkg/k8score (CacheConfig.PatienceWindow,
CacheConfig.MinimalSet, CacheConfig.SyncProgress); skyhook-connector's
legacy SyncTimeout path is unchanged.
@nadaverell nadaverell requested a review from hisco as a code owner April 26, 2026 22:48
- Guard MinimalSet against typos / unmatched keys: log a warning at
  cache construction listing keys that don't correspond to any enabled
  critical informer, plus a separate warning when the effective minimal
  set is empty (would otherwise return at PatienceWindow with nothing
  meaningfully gated).

- Add PendingPromotedKinds() — live-filtered view of promoted-at-first-
  paint informers that still aren't synced. Dashboard banner switches
  to this so the "still loading" pill drains as kinds arrive instead
  of showing the snapshot from connect time.

- After patience elapses, log distinguishes "first-paint blocked"
  (only minimal-set kinds matter) from generic "critical sync progress"
  so operators can see the actual blocker.

- New tests: legacy SyncTimeout promotion, MinimalSet typo, PatienceWindow
  without MinimalSet, PendingPromotedKinds live-filter contract.

- Comment cleanup: drop commit-flavored "(legacy)/(new)" parentheticals,
  fix SyncProgress godoc parameter names, trim restating-what comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant