feat: verbatim transcripts — sessions + messages (refs #107)#108
Merged
pszymkowiak merged 3 commits intomainfrom Apr 13, 2026
Merged
feat: verbatim transcripts — sessions + messages (refs #107)#108pszymkowiak merged 3 commits intomainfrom
pszymkowiak merged 3 commits intomainfrom
Conversation
Add a third memory system alongside memories (curated, decayed) and memoirs (knowledge graphs): transcripts are verbatim recordings of agent conversations, stored as-is with no summarization or extraction. Retrieval uses FTS5 BM25 at query time (boolean, phrase, prefix). Motivation: issue #107 + MemPalace traction show real demand for raw conversation capture. Three use cases ICM couldn't previously serve: session replay for post-mortem review, compliance/audit trails, and training-data collection. Extraction is lossy for these — we need the actual bytes, not a summary. Implementation is 100% Rust, same SQLite file as memories/memoirs, zero new runtime deps. FTS5 index writes ~10× faster than ChromaDB- based alternatives. ## Schema - `sessions` (id, agent, project, started_at, updated_at, metadata) - `messages` (id, session_id, role, content, tool_name, tokens, ts, metadata) - `messages_fts` FTS5 virtual table on role + content + tool_name - ON DELETE CASCADE from sessions → messages; triggers keep FTS in sync ## API surface - Core: `TranscriptStore` trait + `Session`, `Message`, `Role`, `TranscriptHit`, `TranscriptStats` types. - Store: 7 methods on `SqliteStore` (create_session, get_session, list_sessions, record_message, list_session_messages, search_transcripts, forget_session, transcript_stats). - CLI: `icm transcript {start-session, record, search, list-sessions, show, stats, forget}` (7 subcommands). - MCP: 5 tools (`icm_transcript_start_session`, `_record`, `_search`, `_show`, `_stats`) — total MCP surface: 22 → 27. ## Tests 8 new unit tests in icm-store: create+record, missing-session rejection, FTS5 boolean + phrase, session/project scoping, stats breakdown, cascade delete, chronological ordering, list sorting. ## Follow-ups (separate PRs) - TUI tab "Sessions" (icm dashboard) - Web dashboard page `/sessions` with message thread viewer - `icm hook transcript` wiring for auto-capture from Claude Code hooks - Optional RTK Cloud sync (paid audit / session replay tier)
Contributor
Author
📊 Automated PR Analysis
SummaryAdds a new 'Transcripts' subsystem to ICM for verbatim session replay, including new SQLite tables (sessions + messages with FTS5), a TranscriptStore trait with 7 methods on SqliteStore, 7 CLI subcommands, 5 new MCP tools, and 8 unit tests. This enables storing raw agent conversations for replay, audit, and training data use cases. Review Checklist
Linked issues: #107 Analyzed automatically by wshm · This is an automated analysis, not a human review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a third memory system — Transcripts — alongside the existing curated memories and knowledge-graph memoirs. A transcript is a verbatim recording of an agent conversation: every user turn, every assistant reply, every tool call stored as-is, no summarization.
Search happens at query time via FTS5 BM25 (boolean, phrase, prefix), not at write time. This is the pattern validated by MemPalace's 96.6% LongMemEval score: raw > extraction for retrieval fidelity.
Closes the gap in issue #107 and positions ICM for three use cases it couldn't previously serve:
Why this belongs in ICM (not a separate tool)
Scope of this PR
Not in this PR — explicit follow-ups
Splitting keeps each PR reviewable (this one is already ~1.5k lines).
Manual smoke test (reproducible)
```bash
SID=$(icm --db /tmp/test.db transcript start-session -a claude-code -p demo)
icm --db /tmp/test.db transcript record -s $SID -r user -c "Postgres ou MySQL ?"
icm --db /tmp/test.db transcript record -s $SID -r assistant -c "Postgres: JSONB natif, BRIN indexes."
icm --db /tmp/test.db transcript record -s $SID -r tool -c '{}' -t Bash --tokens 42
icm --db /tmp/test.db transcript search 'postgres OR mysql'
icm --db /tmp/test.db transcript search '"BRIN indexes"'
icm --db /tmp/test.db transcript show $SID
icm --db /tmp/test.db transcript stats
```
Test plan