feat(recall): TOON default + LRU cache + batched neighbour fetch#167
Merged
pszymkowiak merged 2 commits intodevelopfrom May 2, 2026
Merged
feat(recall): TOON default + LRU cache + batched neighbour fetch#167pszymkowiak merged 2 commits intodevelopfrom
pszymkowiak merged 2 commits intodevelopfrom
Conversation
Contributor
Author
📊 Automated PR Analysis
SummaryAdds a compact TOON output format (default) for Review Checklist
Analyzed automatically by wshm · This is an automated analysis, not a human review. |
7 tasks
pszymkowiak
added a commit
that referenced
this pull request
May 1, 2026
Mirrors the rtk-ai/rtk CI/CD model so contributor PRs go to `develop`,
get merged into a pre-release stream there, and only `develop` -> `main`
PRs (release-please releases) cut a stable build.
## ci.yml — multi-stage gates
Triggers on `pull_request` to `develop` or `main`.
fmt -> clippy -> { test x3 OS, security scan } (parallel)
Drops the path filter — ICM is small enough that running CI on every PR
is the safer default. Adds a `cargo audit` security job + a "new
dependencies" supply-chain reminder. Skips the AI doc-review job from
RTK (no Anthropic API key available in this org per current policy) and
skips the multi-language `benchmark` job (RTK-specific).
## cd.yml — dual-path CD
Replaces release-please.yml. Single workflow with two non-overlapping
paths gated on `github.ref`:
- `push: develop` -> compute next version from conventional commits
(mirrors release-please's `bump-minor-pre-major +
bump-patch-for-minor-pre-major` logic), tag as
`icm-dev-v{ver}-rc.{run_number}`, call release.yml with
`prerelease: true`. Concurrency cancel-in-progress.
- `push: main` -> release-please. On release_created, call release.yml
with `prerelease: false`, then update the `latest` tag. Concurrency
never cancelled.
## release.yml — opt-in prerelease input
New `prerelease: boolean` input (default false) plumbed through both
`workflow_call` and `workflow_dispatch`. Passed to
`softprops/action-gh-release@v2` so pre-releases get the GitHub
pre-release badge. The Homebrew tap update job is gated `if:
inputs.prerelease != true` — taps only update on stable channels.
## pr-target-check.yml
Borrowed verbatim from RTK (with `master` -> `main`): labels PRs that
target `main` from anything other than `develop` and posts a comment
pointing to the develop branch.
## What's NOT in this PR (deferred per discussion)
- Discord webhook notification on stable release (no secret configured)
- Branch protection rules (require enforcement via GitHub UI / `gh api`,
not workflow YAML)
- CONTRIBUTING.md / CICD.md docs
- Migration of the two open feature PRs (#166, #167) — those will be
rebased onto `develop` after this lands.
Co-authored-by: patrick <patrick@rtk-ai.app>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coordinated changes targeting recall token cost and runtime: 1. **`icm recall --format toon|detail|json`** (default `toon`). Per-recall stdout drops from ~7 lines/memory of labelled detail to 1 row/memory under a single header, ~60-65% fewer tokens when the output gets piped into an LLM context. The legacy multi-line view stays available as `--format detail`; `--format json` ships an array for tooling. Move the rendering into a small `recall_format` module with unit tests for each format. 2. **LRU cache in `SqliteStore`** (`Mutex<LruCache<String, Memory>>`, cap 256 ≈ 400KB worst-case incl. embedding blobs). `get` and `get_many` are read-through; `update`, `delete`, `update_access`, `batch_update_access` invalidate the touched ids; `apply_decay`, `prune` (when it changed rows), and `consolidate_topic` clear the whole cache because they touch arbitrarily many rows. Adds the `lru` workspace dep. Mainly pays off in long-running processes (`icm serve`, TUI); benign in one-shot CLI. 3. **Batch `expand_with_neighbors`**. Previously did N round-trips via `self.get()` per neighbour. Now collects candidate ids in priority order, then fetches them in a single `SELECT … WHERE id IN (?,?,…)` via the new `get_many`. Preserves scoring, dedup, and the hop discount. Existing 8 expansion tests still pass; 4 new tests cover `get_many` directly. Test plan: `cargo fmt`, `cargo clippy --workspace --all-targets`, `cargo test --workspace -- --test-threads=2` (the existing `perf_fts_search_100` / `perf_store_with_embeddings_1000` are parallelism-sensitive on baseline too — passing with -j2). Manual smoke-test: `icm recall` produces TOON by default, `--format detail` reproduces legacy output, `--format json` parses cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two informational `#[ignore]`'d benches in the icm-store test module
so the gains from the cache and batched fetch are reproducible
locally.
Run with:
cargo test --release -p icm-store --lib -- --ignored --nocapture
Numbers on this machine (release build, in-memory store, 50 ids):
bench_cache_hit_vs_miss
cold (DB hit + cache fill): 12_560 ns/get
warm (cache hit): 81 ns/get
speedup on hot reads: ~155x
bench_get_many_vs_n_plus_one
batched get_many: 215 us
N+1 individual: 516 us
speedup: ~2.4x
These aren't wired with assertions on purpose — perf numbers fluctuate
under load and the existing perf_* tests already enforce ceilings.
These two are evidence that the architecture changes do what the PR
description claims, kept around as living documentation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
debd5a4 to
eb4e12d
Compare
3 tasks
This was referenced May 2, 2026
pszymkowiak
added a commit
that referenced
this pull request
May 2, 2026
…#175) Three quick-win security bumps grouped into one PR — found via `cargo audit` during the 0.10.43 verification audit. ## Cleared - **RUSTSEC-2026-0049/0098/0099/0104** — rustls-webpki 0.103.9 → 0.103.13 Pulled transitively via ureq/hf-hub/reqwest. Semver-compatible bump via `cargo update`, no Cargo.toml change. - **RUSTSEC-2026-0067/0068** (medium 5.1) — tar 0.4.44 → 0.4.45 Direct dep in icm-cli (release artifact packaging). Pinned to `tar = "0.4.45"` in workspace Cargo.toml to make the floor explicit. - **RUSTSEC-2026-0002** (unsound IterMut) — lru 0.12 → 0.18 Direct dep in icm-store added in #167 for the recall LRU cache. Bumped to 0.18 (the latest stable) since both 0.13 and 0.16 still carried the advisory; 0.18 is the first version listed as unaffected. Our usage is `get`/`put`/`pop`/`clear` — the unsound `IterMut` path was never on the hot path here, but the bump removes the lint regardless. ## Remaining warnings (out of scope, transitive) - `lru 0.12.5` still pulled by `ratatui 0.29.0`. Bumping ratatui is bigger than this PR. Our usage is in icm-store, which now uses 0.18. - `paste 1.0.15` (unmaintained), `core2 0.4.0` (yanked) — both via fastembed/ratatui transitively. Same reasoning. ## Test plan - [x] `cargo audit` no longer flags any direct dep - [x] `cargo build --workspace` clean - [x] `cargo fmt --all -- --check` clean - [x] `cargo clippy --workspace --all-targets -- -D warnings` clean - [x] `cargo test --release --workspace` 339+ passed - The debug-build `perf_fts_search_100` test is parallelism-sensitive on local — passes in release mode and on CI defaults. Not a regression from this PR (same test was flaky before, baseline confirmed). Co-authored-by: patrick <patrick@rtk-ai.app> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three coordinated changes from the recall-perf audit. They share a PR because they all live in the recall hot path and it's easier to review them together than as three tiny PRs touching the same files.
Measured impact (real numbers, this machine)
detail, summaries ~130 charsdetail, summaries ~13 charsjsonget()(cache hit) vs cold (DB hit + fill)get_many(50)vs 50 ×get()The TOON win scales inversely with summary length: when summaries are ~13 chars (short tag-style memories), TOON saves 64% because metadata dominates the payload. When summaries are ~130 chars (typical
decisions-*content), TOON saves 37% because the summary itself dominates. Expect 35-50% in the wild on real ICM data. This is the corrected number — my initial estimate of "60-65%" was based on small synthetic memories and turned out optimistic for typical content.The LRU cache 155× number applies to hot reads in long-running processes (
icm serve, TUI). For one-shot CLI invocations the process exits before benefiting from cache hits, so the practical impact there is the cache acting as a tiny in-flight memo — neutral, not negative.The batch 2.4× scales linearly with N — at the typical recall budget (
max_neighbors = limit/3, default ~1) the absolute saving is small (~5-50 µs) but the ratio holds.Reproduce with:
The two
#[ignore]'d benches live incrates/icm-store/src/store.rsand are explicit documentation of the gain rather than asserting ceilings (perf numbers fluctuate under load and the existingperf_*tests already enforce ceilings).1.
icm recall --format toon|detail|json(defaulttoon)Per-recall stdout drops from a multi-line labelled view to one row under a single header. Smoke-test on a fixture with 1 result:
The legacy multi-line view stays available as
--format detail.jsonis for tooling.Implementation: new
crates/icm-cli/src/recall_format.rsmodule withRecallFormatenum + 6 unit tests (TOON header shape with/without scores, comma escaping, detail labels, JSON shape, empty list).2. LRU cache in
SqliteStoreMutex<LruCache<String, Memory>>, cap 256 (~400 KB worst-case incl. embedding blobs). Read-through onget/get_many. Invalidations:update(memory)delete(id)update_access(id)batch_update_access(ids)apply_decay()prune(...)consolidate_topic(...)5 new cache invalidation tests cover all the paths above plus a "raw SQL update is not seen by cache" canary that proves the cache is actually serving reads. Adds
lru = "0.12"workspace dep.3. Batch
expand_with_neighborsPreviously: N round-trips via
self.get()per neighbour. Now: candidate ids collected in priority order, then fetched in oneSELECT … WHERE id IN (?,?,…)via a new publicget_many. Preserves scoring, dedup, and the hop discount.The 8 existing expansion tests still pass without modification. 4 new tests cover
get_manydirectly (basic, empty input, missing ids dropped, dedup of repeated ids).Test plan
cargo fmt --all -- --checkcleancargo clippy --workspace --all-targets -- -D warningscleancargo test --workspace -- --test-threads=2— 339 passed (+10 from new tests across the three actions). The existingperf_fts_search_100andperf_store_with_embeddings_1000perf tests are parallelism-sensitive on baseline (they fail under defaultcargo testeven onmain), so I ran with-j2to verify.icm recallproduces TOON by default,--format detailreproduces the legacy output verbatim,--format jsonparses cleanly.icm listandcmd_recall_contextto TOON in a separate PR (out of scope here —recallwas the user-asked target).🤖 Generated with Claude Code