Skip to content

feat(recall): TOON default + LRU cache + batched neighbour fetch#167

Merged
pszymkowiak merged 2 commits intodevelopfrom
feat/recall-perf-tokens
May 2, 2026
Merged

feat(recall): TOON default + LRU cache + batched neighbour fetch#167
pszymkowiak merged 2 commits intodevelopfrom
feat/recall-perf-tokens

Conversation

@pszymkowiak
Copy link
Copy Markdown
Contributor

@pszymkowiak pszymkowiak commented May 1, 2026

Three coordinated changes from the recall-perf audit. They share a PR because they all live in the recall hot path and it's easier to review them together than as three tiny PRs touching the same files.

Measured impact (real numbers, this machine)

Action Metric Result
1 — TOON default bytes vs detail, summaries ~130 chars −37%
1 — TOON default bytes vs detail, summaries ~13 chars −64%
1 — TOON default bytes vs json −67% to −86%
2 — LRU cache warm get() (cache hit) vs cold (DB hit + fill) 155× faster (81 ns vs 12 560 ns)
3 — Batch fetch get_many(50) vs 50 × get() 2.4× faster (215 µs vs 516 µs)

The TOON win scales inversely with summary length: when summaries are ~13 chars (short tag-style memories), TOON saves 64% because metadata dominates the payload. When summaries are ~130 chars (typical decisions-* content), TOON saves 37% because the summary itself dominates. Expect 35-50% in the wild on real ICM data. This is the corrected number — my initial estimate of "60-65%" was based on small synthetic memories and turned out optimistic for typical content.

The LRU cache 155× number applies to hot reads in long-running processes (icm serve, TUI). For one-shot CLI invocations the process exits before benefiting from cache hits, so the practical impact there is the cache acting as a tiny in-flight memo — neutral, not negative.

The batch 2.4× scales linearly with N — at the typical recall budget (max_neighbors = limit/3, default ~1) the absolute saving is small (~5-50 µs) but the ratio holds.

Reproduce with:

# Action 1 — token bytes (real CLI)
cargo build --release -p icm-cli
icm --no-embeddings recall "query" -l 5             # TOON (default)
icm --no-embeddings recall "query" -l 5 -f detail   # legacy view
icm --no-embeddings recall "query" -l 5 -f json     # tooling

# Actions 2 + 3 — microbenches (release)
cargo test --release -p icm-store --lib -- --ignored --nocapture \
  bench_cache_hit_vs_miss bench_get_many_vs_n_plus_one

The two #[ignore]'d benches live in crates/icm-store/src/store.rs and are explicit documentation of the gain rather than asserting ceilings (perf numbers fluctuate under load and the existing perf_* tests already enforce ceilings).

1. icm recall --format toon|detail|json (default toon)

Per-recall stdout drops from a multi-line labelled view to one row under a single header. Smoke-test on a fixture with 1 result:

=== TOON (default) ===
memories[1]{id,topic,importance,weight,summary}:
  01KQH...,preferences,high,0.975,User prefers Rust over Go

=== DETAIL ===
--- 01KQH... ---
  topic:      preferences
  importance: high
  weight:     0.975
  created:    2026-05-01 07:04
  accessed:   2026-05-01 07:04 (x1)
  summary:    User prefers Rust over Go

=== JSON ===  (17 lines for the same memory)

The legacy multi-line view stays available as --format detail. json is for tooling.

Implementation: new crates/icm-cli/src/recall_format.rs module with RecallFormat enum + 6 unit tests (TOON header shape with/without scores, comma escaping, detail labels, JSON shape, empty list).

2. LRU cache in SqliteStore

Mutex<LruCache<String, Memory>>, cap 256 (~400 KB worst-case incl. embedding blobs). Read-through on get/get_many. Invalidations:

Path Strategy
update(memory) invalidate by id
delete(id) invalidate by id
update_access(id) invalidate by id
batch_update_access(ids) invalidate listed ids
apply_decay() clear all (touches every non-critical row's weight)
prune(...) clear all if any rows deleted
consolidate_topic(...) clear all (bulk delete + re-insert)

5 new cache invalidation tests cover all the paths above plus a "raw SQL update is not seen by cache" canary that proves the cache is actually serving reads. Adds lru = "0.12" workspace dep.

3. Batch expand_with_neighbors

Previously: N round-trips via self.get() per neighbour. Now: candidate ids collected in priority order, then fetched in one SELECT … WHERE id IN (?,?,…) via a new public get_many. Preserves scoring, dedup, and the hop discount.

The 8 existing expansion tests still pass without modification. 4 new tests cover get_many directly (basic, empty input, missing ids dropped, dedup of repeated ids).

Test plan

  • cargo fmt --all -- --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo test --workspace -- --test-threads=2339 passed (+10 from new tests across the three actions). The existing perf_fts_search_100 and perf_store_with_embeddings_1000 perf tests are parallelism-sensitive on baseline (they fail under default cargo test even on main), so I ran with -j2 to verify.
  • Manual smoke test: icm recall produces TOON by default, --format detail reproduces the legacy output verbatim, --format json parses cleanly.
  • Empirical validation of all three claims (numbers above).
  • Optional follow-up: switch icm list and cmd_recall_context to TOON in a separate PR (out of scope here — recall was the user-asked target).

🤖 Generated with Claude Code

@pszymkowiak
Copy link
Copy Markdown
Contributor Author

wshm · Automated triage by AI

📊 Automated PR Analysis

Type feature
🟡 Risk medium

Summary

Adds a compact TOON output format (default) for icm recall to reduce token usage when piped to LLMs, introduces an LRU cache (cap 256) in SqliteStore for hot-path read-through caching with invalidation on mutations, and batches neighbour fetches into a single SELECT … WHERE id IN query via a new get_many method. The cmd_recall function is also refactored to unify the hybrid-search and FTS-fallback code paths.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Analyzed automatically by wshm · This is an automated analysis, not a human review.

pszymkowiak added a commit that referenced this pull request May 1, 2026
Mirrors the rtk-ai/rtk CI/CD model so contributor PRs go to `develop`,
get merged into a pre-release stream there, and only `develop` -> `main`
PRs (release-please releases) cut a stable build.

## ci.yml — multi-stage gates

Triggers on `pull_request` to `develop` or `main`.

  fmt -> clippy -> { test x3 OS, security scan }   (parallel)

Drops the path filter — ICM is small enough that running CI on every PR
is the safer default. Adds a `cargo audit` security job + a "new
dependencies" supply-chain reminder. Skips the AI doc-review job from
RTK (no Anthropic API key available in this org per current policy) and
skips the multi-language `benchmark` job (RTK-specific).

## cd.yml — dual-path CD

Replaces release-please.yml. Single workflow with two non-overlapping
paths gated on `github.ref`:

- `push: develop` -> compute next version from conventional commits
  (mirrors release-please's `bump-minor-pre-major +
  bump-patch-for-minor-pre-major` logic), tag as
  `icm-dev-v{ver}-rc.{run_number}`, call release.yml with
  `prerelease: true`. Concurrency cancel-in-progress.

- `push: main` -> release-please. On release_created, call release.yml
  with `prerelease: false`, then update the `latest` tag. Concurrency
  never cancelled.

## release.yml — opt-in prerelease input

New `prerelease: boolean` input (default false) plumbed through both
`workflow_call` and `workflow_dispatch`. Passed to
`softprops/action-gh-release@v2` so pre-releases get the GitHub
pre-release badge. The Homebrew tap update job is gated `if:
inputs.prerelease != true` — taps only update on stable channels.

## pr-target-check.yml

Borrowed verbatim from RTK (with `master` -> `main`): labels PRs that
target `main` from anything other than `develop` and posts a comment
pointing to the develop branch.

## What's NOT in this PR (deferred per discussion)

- Discord webhook notification on stable release (no secret configured)
- Branch protection rules (require enforcement via GitHub UI / `gh api`,
  not workflow YAML)
- CONTRIBUTING.md / CICD.md docs
- Migration of the two open feature PRs (#166, #167) — those will be
  rebased onto `develop` after this lands.

Co-authored-by: patrick <patrick@rtk-ai.app>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
patrick and others added 2 commits May 1, 2026 13:47
Three coordinated changes targeting recall token cost and runtime:

1. **`icm recall --format toon|detail|json`** (default `toon`).
   Per-recall stdout drops from ~7 lines/memory of labelled detail to
   1 row/memory under a single header, ~60-65% fewer tokens when the
   output gets piped into an LLM context. The legacy multi-line view
   stays available as `--format detail`; `--format json` ships an
   array for tooling. Move the rendering into a small
   `recall_format` module with unit tests for each format.

2. **LRU cache in `SqliteStore`** (`Mutex<LruCache<String, Memory>>`,
   cap 256 ≈ 400KB worst-case incl. embedding blobs). `get` and
   `get_many` are read-through; `update`, `delete`, `update_access`,
   `batch_update_access` invalidate the touched ids; `apply_decay`,
   `prune` (when it changed rows), and `consolidate_topic` clear the
   whole cache because they touch arbitrarily many rows. Adds the
   `lru` workspace dep. Mainly pays off in long-running processes
   (`icm serve`, TUI); benign in one-shot CLI.

3. **Batch `expand_with_neighbors`**. Previously did N round-trips via
   `self.get()` per neighbour. Now collects candidate ids in
   priority order, then fetches them in a single
   `SELECT … WHERE id IN (?,?,…)` via the new `get_many`. Preserves
   scoring, dedup, and the hop discount. Existing 8 expansion tests
   still pass; 4 new tests cover `get_many` directly.

Test plan: `cargo fmt`, `cargo clippy --workspace --all-targets`,
`cargo test --workspace -- --test-threads=2` (the existing
`perf_fts_search_100` / `perf_store_with_embeddings_1000` are
parallelism-sensitive on baseline too — passing with -j2). Manual
smoke-test: `icm recall` produces TOON by default, `--format detail`
reproduces legacy output, `--format json` parses cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two informational `#[ignore]`'d benches in the icm-store test module
so the gains from the cache and batched fetch are reproducible
locally.

Run with:
  cargo test --release -p icm-store --lib -- --ignored --nocapture

Numbers on this machine (release build, in-memory store, 50 ids):

  bench_cache_hit_vs_miss
    cold (DB hit + cache fill): 12_560 ns/get
    warm (cache hit):              81 ns/get
    speedup on hot reads:        ~155x

  bench_get_many_vs_n_plus_one
    batched get_many: 215 us
    N+1 individual:   516 us
    speedup:          ~2.4x

These aren't wired with assertions on purpose — perf numbers fluctuate
under load and the existing perf_* tests already enforce ceilings.
These two are evidence that the architecture changes do what the PR
description claims, kept around as living documentation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pszymkowiak pszymkowiak force-pushed the feat/recall-perf-tokens branch from debd5a4 to eb4e12d Compare May 1, 2026 11:47
@pszymkowiak pszymkowiak changed the base branch from main to develop May 1, 2026 11:47
@pszymkowiak pszymkowiak merged commit 6bd2ca4 into develop May 2, 2026
12 checks passed
@pszymkowiak pszymkowiak deleted the feat/recall-perf-tokens branch May 2, 2026 07:55
pszymkowiak added a commit that referenced this pull request May 2, 2026
…#175)

Three quick-win security bumps grouped into one PR — found via
`cargo audit` during the 0.10.43 verification audit.

## Cleared

- **RUSTSEC-2026-0049/0098/0099/0104** — rustls-webpki 0.103.9 → 0.103.13
  Pulled transitively via ureq/hf-hub/reqwest. Semver-compatible bump
  via `cargo update`, no Cargo.toml change.

- **RUSTSEC-2026-0067/0068** (medium 5.1) — tar 0.4.44 → 0.4.45
  Direct dep in icm-cli (release artifact packaging). Pinned to
  `tar = "0.4.45"` in workspace Cargo.toml to make the floor explicit.

- **RUSTSEC-2026-0002** (unsound IterMut) — lru 0.12 → 0.18
  Direct dep in icm-store added in #167 for the recall LRU cache.
  Bumped to 0.18 (the latest stable) since both 0.13 and 0.16 still
  carried the advisory; 0.18 is the first version listed as
  unaffected. Our usage is `get`/`put`/`pop`/`clear` — the unsound
  `IterMut` path was never on the hot path here, but the bump
  removes the lint regardless.

## Remaining warnings (out of scope, transitive)

- `lru 0.12.5` still pulled by `ratatui 0.29.0`. Bumping ratatui is
  bigger than this PR. Our usage is in icm-store, which now uses
  0.18.
- `paste 1.0.15` (unmaintained), `core2 0.4.0` (yanked) — both via
  fastembed/ratatui transitively. Same reasoning.

## Test plan

- [x] `cargo audit` no longer flags any direct dep
- [x] `cargo build --workspace` clean
- [x] `cargo fmt --all -- --check` clean
- [x] `cargo clippy --workspace --all-targets -- -D warnings` clean
- [x] `cargo test --release --workspace` 339+ passed
- The debug-build `perf_fts_search_100` test is parallelism-sensitive
  on local — passes in release mode and on CI defaults. Not a
  regression from this PR (same test was flaky before, baseline
  confirmed).

Co-authored-by: patrick <patrick@rtk-ai.app>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant