|
1 | 1 | # Changelog |
2 | 2 |
|
| 3 | +## v0.11.6 — Tool-description tightening (+5% routing P@1) + OpenRouter backend |
| 4 | + |
| 5 | +First run of the routing-recall benchmark landed v0.11.4 at **P@1 = 18/20 = 90.0%** |
| 6 | +(`anthropic/claude-sonnet-4.5` via OpenRouter). The two misses were both semantic |
| 7 | +overlaps between adjacent tools. This release tightens 4 tool descriptions and |
| 8 | +re-runs the bench: **P@1 = 19/20 = 95.0%**, a net +5.0 points with one miss |
| 9 | +remaining (borderline — "show me the EmbeddingModel struct" routes to `ast_search` |
| 10 | +with `type=struct`, which returns the right answer albeit via the "enumerate" |
| 11 | +tool rather than the "inspect ONE" tool). |
| 12 | + |
| 13 | +### Tool-description changes (`src/mcp/tools.rs`) |
| 14 | + |
| 15 | +All stay under the 200-char registry limit. |
| 16 | + |
| 17 | +- **`get_call_graph`** — leads with `"Who calls X, what X calls"` + `"Returns a |
| 18 | + graph (not a flat list)"`. Fixed routing for "Who calls ensure_indexed?" |
| 19 | + (was → `find_references`, now → `get_call_graph`). |
| 20 | +- **`find_references`** — leads with `"Flat enumeration of all usage sites"` + |
| 21 | + explicit deflection: `"For 'who calls X?', use get_call_graph."`. |
| 22 | +- **`get_ast_node`** — leads with `"Inspect ONE named symbol"` + `"you have a |
| 23 | + symbol name (or node_id) and want its definition/body"` to claim the |
| 24 | + "show me X / signature of Y" intent. |
| 25 | +- **`ast_search`** — leads with `"Enumerate MULTIPLE symbols by structural |
| 26 | + criteria"` + deflection: `"For ONE known symbol, use get_ast_node."`. |
| 27 | + |
| 28 | +Pattern: each description now leads with a shape verb (`who calls`, `flat |
| 29 | +enumeration`, `inspect ONE`, `enumerate MULTIPLE`) and points at the |
| 30 | +adjacent tool when a query drifts into overlap. |
| 31 | + |
| 32 | +### Routing-bench OpenRouter backend (`tests/routing_bench.rs`) |
| 33 | + |
| 34 | +Auto-detects `ANTHROPIC_API_KEY` (native Messages API) or `OPENROUTER_API_KEY` |
| 35 | +(OpenAI-compatible `/chat/completions`). Tool schemas re-packaged as |
| 36 | +`{type: "function", function: {...}}` for the OpenRouter path. Model default |
| 37 | +`anthropic/claude-sonnet-4.5`; override with `ROUTING_BENCH_MODEL`. Anthropic |
| 38 | +wins if both keys present. |
| 39 | + |
| 40 | +### Baseline measurement (published) |
| 41 | + |
| 42 | +| Run | Backend / Model | P@1 | |
| 43 | +|-----|-----------------|-----| |
| 44 | +| v0.11.4 baseline | openrouter / anthropic/claude-sonnet-4.5 | 18/20 (90.0%) | |
| 45 | +| v0.11.6 post-tightening | openrouter / anthropic/claude-sonnet-4.5 | 19/20 (95.0%) | |
| 46 | + |
| 47 | +Cost ≈ $0.10/run. Threshold stays at 0.70; consider raising to 0.85 after two |
| 48 | +more releases confirm 95% as stable baseline (20-query sample is within model |
| 49 | +stochasticity range). |
| 50 | + |
3 | 51 | ## v0.11.5 — Hotfix: clippy 1.95 parity (`unnecessary_sort_by`) |
4 | 52 |
|
5 | 53 | `-D warnings` on stable clippy 1.95 flagged the two `sort_by(|a, b| b.0.cmp(&a.0))` |
|
0 commit comments