feat: verbatim transcripts — sessions + messages (refs #107) by pszymkowiak · Pull Request #108 · rtk-ai/icm

pszymkowiak · 2026-04-13T16:40:55Z

Summary

Add a third memory system — Transcripts — alongside the existing curated memories and knowledge-graph memoirs. A transcript is a verbatim recording of an agent conversation: every user turn, every assistant reply, every tool call stored as-is, no summarization.

Search happens at query time via FTS5 BM25 (boolean, phrase, prefix), not at write time. This is the pattern validated by MemPalace's 96.6% LongMemEval score: raw > extraction for retrieval fidelity.

Closes the gap in issue #107 and positions ICM for three use cases it couldn't previously serve:

Session replay — post-mortem review, debugging what an agent actually did
Compliance / audit — SOC2, HIPAA, full trail (no lossy extraction)
Training data — collect gold-labeled sessions for fine-tuning / RAG

Why this belongs in ICM (not a separate tool)

Same SQLite file as memories/memoirs → one store, consistent backup
FTS5 is already in the binary; transcripts reuse the same index engine
Rust + SQLite write throughput is ~10× faster than ChromaDB-based alternatives and produces smaller on-disk files
Zero new runtime dependencies — no Python, no ChromaDB, no external service

Scope of this PR

Schema — `sessions` + `messages` tables, FTS5 virtual table with triggers, ON DELETE CASCADE
Core types — `Session`, `Message`, `Role`, `TranscriptHit`, `TranscriptStats`, `TranscriptStore` trait
Store — 7 `SqliteStore` methods (create, get, list, record, list messages, search, forget, stats)
CLI — `icm transcript {start-session, record, search, list-sessions, show, stats, forget}`
MCP — 5 new tools (total MCP surface: 22 → 27)
Tests — 8 unit tests covering FTS5 boolean/phrase, scoping, stats, cascade delete, ordering
README — new section under `## CLI` with end-to-end example

Not in this PR — explicit follow-ups

`fix: update README license from MIT to source-available #2` TUI tab "Sessions" in `icm dashboard`
`fix: improve extract fact scoring for dev tool outputs #3` Web dashboard route `/sessions` + `/sessions/:id` thread viewer + REST endpoints
`Add tests and documentation #4` `icm hook transcript` — auto-capture from Claude Code / Gemini / Codex hooks
`Add performance tests, improve docs #5` Optional RTK Cloud sync for team-tier paid audit / replay

Splitting keeps each PR reviewable (this one is already ~1.5k lines).

Manual smoke test (reproducible)

```bash
SID=$(icm --db /tmp/test.db transcript start-session -a claude-code -p demo)
icm --db /tmp/test.db transcript record -s $SID -r user -c "Postgres ou MySQL ?"
icm --db /tmp/test.db transcript record -s $SID -r assistant -c "Postgres: JSONB natif, BRIN indexes."
icm --db /tmp/test.db transcript record -s $SID -r tool -c '{}' -t Bash --tokens 42
icm --db /tmp/test.db transcript search 'postgres OR mysql'
icm --db /tmp/test.db transcript search '"BRIN indexes"'
icm --db /tmp/test.db transcript show $SID
icm --db /tmp/test.db transcript stats
```

Test plan

`cargo test -p icm-store transcript` → 8/8 pass
`cargo build --release -p icm-cli` → clean
Manual end-to-end smoke via CLI (see above)
Reviewer: try FTS5 syntax edge cases (prefix `postg*`, negation `postgres NOT mysql`, proximity `NEAR(brin json, 5)`)
Reviewer: confirm `ON DELETE CASCADE` behaves correctly across SQLite versions

Add a third memory system alongside memories (curated, decayed) and memoirs (knowledge graphs): transcripts are verbatim recordings of agent conversations, stored as-is with no summarization or extraction. Retrieval uses FTS5 BM25 at query time (boolean, phrase, prefix). Motivation: issue #107 + MemPalace traction show real demand for raw conversation capture. Three use cases ICM couldn't previously serve: session replay for post-mortem review, compliance/audit trails, and training-data collection. Extraction is lossy for these — we need the actual bytes, not a summary. Implementation is 100% Rust, same SQLite file as memories/memoirs, zero new runtime deps. FTS5 index writes ~10× faster than ChromaDB- based alternatives. ## Schema - `sessions` (id, agent, project, started_at, updated_at, metadata) - `messages` (id, session_id, role, content, tool_name, tokens, ts, metadata) - `messages_fts` FTS5 virtual table on role + content + tool_name - ON DELETE CASCADE from sessions → messages; triggers keep FTS in sync ## API surface - Core: `TranscriptStore` trait + `Session`, `Message`, `Role`, `TranscriptHit`, `TranscriptStats` types. - Store: 7 methods on `SqliteStore` (create_session, get_session, list_sessions, record_message, list_session_messages, search_transcripts, forget_session, transcript_stats). - CLI: `icm transcript {start-session, record, search, list-sessions, show, stats, forget}` (7 subcommands). - MCP: 5 tools (`icm_transcript_start_session`, `_record`, `_search`, `_show`, `_stats`) — total MCP surface: 22 → 27. ## Tests 8 new unit tests in icm-store: create+record, missing-session rejection, FTS5 boolean + phrase, session/project scoping, stats breakdown, cascade delete, chronological ordering, list sorting. ## Follow-ups (separate PRs) - TUI tab "Sessions" (icm dashboard) - Web dashboard page `/sessions` with message thread viewer - `icm hook transcript` wiring for auto-capture from Claude Code hooks - Optional RTK Cloud sync (paid audit / session replay tier)

pszymkowiak · 2026-04-13T16:41:18Z

wshm · Automated triage by AI

📊 Automated PR Analysis


✨ Type	`feature`
🟡 Risk	`medium`

Summary

Adds a new 'Transcripts' subsystem to ICM for verbatim session replay, including new SQLite tables (sessions + messages with FTS5), a TranscriptStore trait with 7 methods on SqliteStore, 7 CLI subcommands, 5 new MCP tools, and 8 unit tests. This enables storing raw agent conversations for replay, audit, and training data use cases.

Review Checklist

Tests present
Breaking change
Docs updated

Linked issues: #107

Analyzed automatically by wshm · This is an automated analysis, not a human review.

pszymkowiak added feature storage transcript labels Apr 13, 2026

pszymkowiak added 2 commits April 13, 2026 19:37

style: cargo fmt

7d10ce1

fix(clippy): explicit param numbering in transcript search SQL

2928a09

pszymkowiak merged commit 85cad66 into main Apr 13, 2026
4 checks passed

pszymkowiak deleted the feat/verbatim-transcripts branch April 13, 2026 19:52

github-actions bot mentioned this pull request Apr 13, 2026

chore(main): release icm 0.10.24 #109

Merged

pszymkowiak mentioned this pull request Apr 16, 2026

Thanks, Critical features #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: verbatim transcripts — sessions + messages (refs #107)#108

feat: verbatim transcripts — sessions + messages (refs #107)#108
pszymkowiak merged 3 commits intomainfrom
feat/verbatim-transcripts

pszymkowiak commented Apr 13, 2026

Uh oh!

pszymkowiak commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pszymkowiak commented Apr 13, 2026

Summary

Why this belongs in ICM (not a separate tool)

Scope of this PR

Not in this PR — explicit follow-ups

Manual smoke test (reproducible)

Test plan

Uh oh!

pszymkowiak commented Apr 13, 2026

📊 Automated PR Analysis

Summary

Review Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant