feat(recall): TOON default + LRU cache + batched neighbour fetch by pszymkowiak · Pull Request #167 · rtk-ai/icm

pszymkowiak · 2026-05-01T07:07:06Z

Three coordinated changes from the recall-perf audit. They share a PR because they all live in the recall hot path and it's easier to review them together than as three tiny PRs touching the same files.

Measured impact (real numbers, this machine)

Action	Metric	Result
1 — TOON default	bytes vs `detail`, summaries ~130 chars	−37%
1 — TOON default	bytes vs `detail`, summaries ~13 chars	−64%
1 — TOON default	bytes vs `json`	−67% to −86%
2 — LRU cache	warm `get()` (cache hit) vs cold (DB hit + fill)	155× faster (81 ns vs 12 560 ns)
3 — Batch fetch	`get_many(50)` vs 50 × `get()`	2.4× faster (215 µs vs 516 µs)

The TOON win scales inversely with summary length: when summaries are ~13 chars (short tag-style memories), TOON saves 64% because metadata dominates the payload. When summaries are ~130 chars (typical decisions-* content), TOON saves 37% because the summary itself dominates. Expect 35-50% in the wild on real ICM data. This is the corrected number — my initial estimate of "60-65%" was based on small synthetic memories and turned out optimistic for typical content.

The LRU cache 155× number applies to hot reads in long-running processes (icm serve, TUI). For one-shot CLI invocations the process exits before benefiting from cache hits, so the practical impact there is the cache acting as a tiny in-flight memo — neutral, not negative.

The batch 2.4× scales linearly with N — at the typical recall budget (max_neighbors = limit/3, default ~1) the absolute saving is small (~5-50 µs) but the ratio holds.

Reproduce with:

# Action 1 — token bytes (real CLI)
cargo build --release -p icm-cli
icm --no-embeddings recall "query" -l 5             # TOON (default)
icm --no-embeddings recall "query" -l 5 -f detail   # legacy view
icm --no-embeddings recall "query" -l 5 -f json     # tooling

# Actions 2 + 3 — microbenches (release)
cargo test --release -p icm-store --lib -- --ignored --nocapture \
  bench_cache_hit_vs_miss bench_get_many_vs_n_plus_one

The two #[ignore]'d benches live in crates/icm-store/src/store.rs and are explicit documentation of the gain rather than asserting ceilings (perf numbers fluctuate under load and the existing perf_* tests already enforce ceilings).

1. `icm recall --format toon|detail|json` (default `toon`)

Per-recall stdout drops from a multi-line labelled view to one row under a single header. Smoke-test on a fixture with 1 result:

=== TOON (default) ===
memories[1]{id,topic,importance,weight,summary}:
  01KQH...,preferences,high,0.975,User prefers Rust over Go

=== DETAIL ===
--- 01KQH... ---
  topic:      preferences
  importance: high
  weight:     0.975
  created:    2026-05-01 07:04
  accessed:   2026-05-01 07:04 (x1)
  summary:    User prefers Rust over Go

=== JSON ===  (17 lines for the same memory)

The legacy multi-line view stays available as --format detail. json is for tooling.

Implementation: new crates/icm-cli/src/recall_format.rs module with RecallFormat enum + 6 unit tests (TOON header shape with/without scores, comma escaping, detail labels, JSON shape, empty list).

2. LRU cache in `SqliteStore`

Mutex<LruCache<String, Memory>>, cap 256 (~400 KB worst-case incl. embedding blobs). Read-through on get/get_many. Invalidations:

Path	Strategy
`update(memory)`	invalidate by id
`delete(id)`	invalidate by id
`update_access(id)`	invalidate by id
`batch_update_access(ids)`	invalidate listed ids
`apply_decay()`	clear all (touches every non-critical row's weight)
`prune(...)`	clear all if any rows deleted
`consolidate_topic(...)`	clear all (bulk delete + re-insert)

5 new cache invalidation tests cover all the paths above plus a "raw SQL update is not seen by cache" canary that proves the cache is actually serving reads. Adds lru = "0.12" workspace dep.

3. Batch `expand_with_neighbors`

Previously: N round-trips via self.get() per neighbour. Now: candidate ids collected in priority order, then fetched in one SELECT … WHERE id IN (?,?,…) via a new public get_many. Preserves scoring, dedup, and the hop discount.

The 8 existing expansion tests still pass without modification. 4 new tests cover get_many directly (basic, empty input, missing ids dropped, dedup of repeated ids).

Test plan

cargo fmt --all -- --check clean
cargo clippy --workspace --all-targets -- -D warnings clean
cargo test --workspace -- --test-threads=2 — 339 passed (+10 from new tests across the three actions). The existing perf_fts_search_100 and perf_store_with_embeddings_1000 perf tests are parallelism-sensitive on baseline (they fail under default cargo test even on main), so I ran with -j2 to verify.
Manual smoke test: icm recall produces TOON by default, --format detail reproduces the legacy output verbatim, --format json parses cleanly.
Empirical validation of all three claims (numbers above).
Optional follow-up: switch icm list and cmd_recall_context to TOON in a separate PR (out of scope here — recall was the user-asked target).

🤖 Generated with Claude Code

pszymkowiak · 2026-05-01T07:07:32Z

wshm · Automated triage by AI

📊 Automated PR Analysis


✨ Type	`feature`
🟡 Risk	`medium`

Summary

Adds a compact TOON output format (default) for icm recall to reduce token usage when piped to LLMs, introduces an LRU cache (cap 256) in SqliteStore for hot-path read-through caching with invalidation on mutations, and batches neighbour fetches into a single SELECT … WHERE id IN query via a new get_many method. The cmd_recall function is also refactored to unify the hybrid-search and FTS-fallback code paths.

Review Checklist

Tests present
Breaking change
Docs updated

Analyzed automatically by wshm · This is an automated analysis, not a human review.

Mirrors the rtk-ai/rtk CI/CD model so contributor PRs go to `develop`, get merged into a pre-release stream there, and only `develop` -> `main` PRs (release-please releases) cut a stable build. ## ci.yml — multi-stage gates Triggers on `pull_request` to `develop` or `main`. fmt -> clippy -> { test x3 OS, security scan } (parallel) Drops the path filter — ICM is small enough that running CI on every PR is the safer default. Adds a `cargo audit` security job + a "new dependencies" supply-chain reminder. Skips the AI doc-review job from RTK (no Anthropic API key available in this org per current policy) and skips the multi-language `benchmark` job (RTK-specific). ## cd.yml — dual-path CD Replaces release-please.yml. Single workflow with two non-overlapping paths gated on `github.ref`: - `push: develop` -> compute next version from conventional commits (mirrors release-please's `bump-minor-pre-major + bump-patch-for-minor-pre-major` logic), tag as `icm-dev-v{ver}-rc.{run_number}`, call release.yml with `prerelease: true`. Concurrency cancel-in-progress. - `push: main` -> release-please. On release_created, call release.yml with `prerelease: false`, then update the `latest` tag. Concurrency never cancelled. ## release.yml — opt-in prerelease input New `prerelease: boolean` input (default false) plumbed through both `workflow_call` and `workflow_dispatch`. Passed to `softprops/action-gh-release@v2` so pre-releases get the GitHub pre-release badge. The Homebrew tap update job is gated `if: inputs.prerelease != true` — taps only update on stable channels. ## pr-target-check.yml Borrowed verbatim from RTK (with `master` -> `main`): labels PRs that target `main` from anything other than `develop` and posts a comment pointing to the develop branch. ## What's NOT in this PR (deferred per discussion) - Discord webhook notification on stable release (no secret configured) - Branch protection rules (require enforcement via GitHub UI / `gh api`, not workflow YAML) - CONTRIBUTING.md / CICD.md docs - Migration of the two open feature PRs (#166, #167) — those will be rebased onto `develop` after this lands. Co-authored-by: patrick <patrick@rtk-ai.app> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three coordinated changes targeting recall token cost and runtime: 1. **`icm recall --format toon|detail|json`** (default `toon`). Per-recall stdout drops from ~7 lines/memory of labelled detail to 1 row/memory under a single header, ~60-65% fewer tokens when the output gets piped into an LLM context. The legacy multi-line view stays available as `--format detail`; `--format json` ships an array for tooling. Move the rendering into a small `recall_format` module with unit tests for each format. 2. **LRU cache in `SqliteStore`** (`Mutex<LruCache<String, Memory>>`, cap 256 ≈ 400KB worst-case incl. embedding blobs). `get` and `get_many` are read-through; `update`, `delete`, `update_access`, `batch_update_access` invalidate the touched ids; `apply_decay`, `prune` (when it changed rows), and `consolidate_topic` clear the whole cache because they touch arbitrarily many rows. Adds the `lru` workspace dep. Mainly pays off in long-running processes (`icm serve`, TUI); benign in one-shot CLI. 3. **Batch `expand_with_neighbors`**. Previously did N round-trips via `self.get()` per neighbour. Now collects candidate ids in priority order, then fetches them in a single `SELECT … WHERE id IN (?,?,…)` via the new `get_many`. Preserves scoring, dedup, and the hop discount. Existing 8 expansion tests still pass; 4 new tests cover `get_many` directly. Test plan: `cargo fmt`, `cargo clippy --workspace --all-targets`, `cargo test --workspace -- --test-threads=2` (the existing `perf_fts_search_100` / `perf_store_with_embeddings_1000` are parallelism-sensitive on baseline too — passing with -j2). Manual smoke-test: `icm recall` produces TOON by default, `--format detail` reproduces legacy output, `--format json` parses cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two informational `#[ignore]`'d benches in the icm-store test module so the gains from the cache and batched fetch are reproducible locally. Run with: cargo test --release -p icm-store --lib -- --ignored --nocapture Numbers on this machine (release build, in-memory store, 50 ids): bench_cache_hit_vs_miss cold (DB hit + cache fill): 12_560 ns/get warm (cache hit): 81 ns/get speedup on hot reads: ~155x bench_get_many_vs_n_plus_one batched get_many: 215 us N+1 individual: 516 us speedup: ~2.4x These aren't wired with assertions on purpose — perf numbers fluctuate under load and the existing perf_* tests already enforce ceilings. These two are evidence that the architecture changes do what the PR description claims, kept around as living documentation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#175) Three quick-win security bumps grouped into one PR — found via `cargo audit` during the 0.10.43 verification audit. ## Cleared - **RUSTSEC-2026-0049/0098/0099/0104** — rustls-webpki 0.103.9 → 0.103.13 Pulled transitively via ureq/hf-hub/reqwest. Semver-compatible bump via `cargo update`, no Cargo.toml change. - **RUSTSEC-2026-0067/0068** (medium 5.1) — tar 0.4.44 → 0.4.45 Direct dep in icm-cli (release artifact packaging). Pinned to `tar = "0.4.45"` in workspace Cargo.toml to make the floor explicit. - **RUSTSEC-2026-0002** (unsound IterMut) — lru 0.12 → 0.18 Direct dep in icm-store added in #167 for the recall LRU cache. Bumped to 0.18 (the latest stable) since both 0.13 and 0.16 still carried the advisory; 0.18 is the first version listed as unaffected. Our usage is `get`/`put`/`pop`/`clear` — the unsound `IterMut` path was never on the hot path here, but the bump removes the lint regardless. ## Remaining warnings (out of scope, transitive) - `lru 0.12.5` still pulled by `ratatui 0.29.0`. Bumping ratatui is bigger than this PR. Our usage is in icm-store, which now uses 0.18. - `paste 1.0.15` (unmaintained), `core2 0.4.0` (yanked) — both via fastembed/ratatui transitively. Same reasoning. ## Test plan - [x] `cargo audit` no longer flags any direct dep - [x] `cargo build --workspace` clean - [x] `cargo fmt --all -- --check` clean - [x] `cargo clippy --workspace --all-targets -- -D warnings` clean - [x] `cargo test --release --workspace` 339+ passed - The debug-build `perf_fts_search_100` test is parallelism-sensitive on local — passes in release mode and on CI defaults. Not a regression from this PR (same test was flaky before, baseline confirmed). Co-authored-by: patrick <patrick@rtk-ai.app> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pszymkowiak added cli performance recall labels May 1, 2026

pszymkowiak mentioned this pull request May 1, 2026

feat(cicd): adopt RTK-style develop/main flow #168

Merged

7 tasks

patrick and others added 2 commits May 1, 2026 13:47

pszymkowiak force-pushed the feat/recall-perf-tokens branch from debd5a4 to eb4e12d Compare May 1, 2026 11:47

pszymkowiak changed the base branch from main to develop May 1, 2026 11:47

pszymkowiak mentioned this pull request May 1, 2026

fix(cicd): match stable icm-v tags by anchored regex #170

Merged

3 tasks

pszymkowiak merged commit 6bd2ca4 into develop May 2, 2026
12 checks passed

pszymkowiak deleted the feat/recall-perf-tokens branch May 2, 2026 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(recall): TOON default + LRU cache + batched neighbour fetch#167

feat(recall): TOON default + LRU cache + batched neighbour fetch#167
pszymkowiak merged 2 commits intodevelopfrom
feat/recall-perf-tokens

pszymkowiak commented May 1, 2026 •

edited

Loading

Uh oh!

pszymkowiak commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pszymkowiak commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Measured impact (real numbers, this machine)

1. icm recall --format toon|detail|json (default toon)

2. LRU cache in SqliteStore

3. Batch expand_with_neighbors

Test plan

Uh oh!

pszymkowiak commented May 1, 2026

📊 Automated PR Analysis

Summary

Review Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pszymkowiak commented May 1, 2026 •

edited

Loading

1. `icm recall --format toon|detail|json` (default `toon`)

2. LRU cache in `SqliteStore`

3. Batch `expand_with_neighbors`