[Docs Search] Dedup root README/CLAUDE/MEMORY against same-name docs/research/ files (Draft)#4
[Docs Search] Dedup root README/CLAUDE/MEMORY against same-name docs/research/ files (Draft)#4hang-in wants to merge 43 commits intojaytoone:masterfrom
Conversation
…ition
- tests/golden/bm25_memory_outputs.jsonl: 14 deterministic fixtures (6 categories)
categories: keyword_single(3) korean_paraphrase(2) english_code(2)
avoidance(2) empty_short(3) hooks_keyword(2)
- tests/golden/run_golden.py: fixture runner with --update flag
- docs/refactor/PRODUCTION_REFACTOR_PLAN.md: full refactor plan (Phase 0–9)
Capture env: HOME=/tmp/ctx_golden_home (isolated), CTX_DISABLE_SEMANTIC_RERANK=1,
CTX_CROSS_ENCODER=0, CTX_TELEMETRY=, CTX_DASHBOARD_INTERNAL=1
Corpus: .omc/decision_corpus.json HEAD=201c810 (217 entries)
Determinism: all 14 fixtures verified 2×-run identical; HAS_BM25=False
(rank_bm25 absent on python3.14) — G2-GREP+session-notes+world-model path captured
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds 12 new fixtures (_bm25path suffix) captured via .venv-golden/bin/python (rank-bm25 0.2.2 installed) to cover the HAS_BM25=True execution path. These fixtures expose G1 [RECENT DECISIONS] + G2-DOCS blocks absent in the 14 existing fallback fixtures (HAS_BM25=False). Changes: - tests/golden/bm25_memory_outputs.jsonl: 14 → 26 fixtures - tests/golden/run_golden.py: support optional python_bin field per fixture; relative paths resolved from project root; missing interpreter is hard FAIL (not skip); HOME skeleton created for both /tmp/ctx_golden_home paths; removed "rank_bm25" token from docstring to avoid grep pollution - .gitignore: .venv-golden/ (pre-existing addition, committed together) All 26/26 fixtures pass: 14 fallback (system python3) + 12 BM25-path (.venv-golden). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to G1 corpus The previous commit itself became a G1 decision corpus entry, shifting BM25 rankings in 8 of 12 _bm25path fixtures (G1 top-7 changed). Re-captured all 12 _bm25path fixtures — all DETERMINISTIC. 26/26 fixtures pass: 14 fallback (system python3) + 12 BM25-path (.venv-golden). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rpus Problem: G1 BM25 ranking in _bm25path fixtures drifted with each new git commit because bm25-memory.py rebuilds decision_corpus on HEAD change. Fix: - tests/golden/bm25_path_corpus_frozen.json: frozen 220-entry corpus (embeddings stripped, no head field); 62KB snapshot at b398ee8 - run_golden.py: inject frozen corpus before each _bm25path fixture run (writes .omc/decision_corpus.json with current HEAD + frozen corpus) so bm25-memory.py treats it as a fresh cache hit → BM25 ranking stable - Re-captured 8 changed _bm25path fixtures against frozen corpus 26/26 fixtures pass (14 fallback + 12 BM25-path), stable across commits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move tokenize(), expand_query_tokens(), _KO_PARTICLES, _STOPWORDS, _SYNONYM_EXPANSION, and Porter stemmer block to _bm25/tokenizer.py. Orchestrator imports via sys.path.insert + from _bm25.tokenizer. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _AUTO_TUNE/_AUTO_TUNE_ACTIVE loader to _bm25/autotune.py. Orchestrator imports AUTO_TUNE, AUTO_TUNE_ACTIVE with _ aliases for backward compatibility. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _bge_rerank, _vec_embed, _cosine, semantic_rerank_filter, VEC_SOCK, BGE_SOCK, VEC_DISABLED, USE_CROSS_ENCODER to _bm25/rerank.py. _last_retrieval_scores stays in orchestrator (pre-ranker.py). Update 2 golden fixtures reflecting grep rank change from file shrink. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…emory, bm25-memory cache - pytest infra: pyproject.toml [tool.pytest.ini_options] with testpaths, pythonpath, markers - tests/unit/conftest.py: tmp_home, tmp_project, isolated_env, run_hook fixtures - test_settings_patcher.py: 20 tests — atomic write, backup, idempotency, dry-run, unpatch, corrupted JSON, partial-write safety (settings_patcher.py coverage 93%) - test_install_cli.py: 28 tests — _new_hooks_block structure, step_ functions, cmd_install/uninstall/status flows (install.py coverage 73%) - test_chat_memory_fallback.py: 9 tests — no vault.db, no vec-daemon socket, invalid stdin, excluded project (subprocess-based) - test_bm25_memory_cache.py: 7 tests (2 skipped on fresh repo) — cache path regression, HEAD change invalidation, cache hit, corrupted cache rebuild Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _is_decision, _is_structural_noise, _classify_query_type, get_git_head, build_decision_corpus, embed_corpus_items, get_decision_corpus to _bm25/corpus.py. corpus.py imports vec_embed from .rerank for embed_corpus_items. Update 5 golden fixtures for grep rank changes from file shrink. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move dense_rank_decisions, rrf_merge, bm25_rank_decisions, hybrid_rank_decisions to _bm25/ranker.py with last_retrieval_scores module-level dict. Orchestrator aliases _last_retrieval_scores = _ranker_scores so clear()/read remain backward-compatible. Update 2 golden fixtures. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _extra_doc_files, chunk_document, build_docs_bm25, bm25_search_docs, embed_docs_units, dense_rank_docs, hybrid_search_docs, _KO_EN_DOCS to _bm25/docs_search.py. dense_rank_docs updates ranker.last_retrieval_scores directly. Update 8 golden fixtures for grep rank changes. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _STOP_WORDS, _KO_EN, _CODE_EXT, _SKIP_PREFIXES, extract_keywords, find_db, log_retrieved_nodes, check_and_trigger_reindex, search_graph_for_prompt, search_files_by_grep to _bm25/code_search.py. Update 2 golden fixtures for grep rank changes. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _HOOKS_DIR, _HOOKS_TRIGGER_KWS, _build_hook_doc, search_hooks_files, _has_hooks_keywords to _bm25/hooks_search.py. hooks_search.py imports tokenize from .tokenizer. Update 7 golden fixtures for grep rank changes. Golden 26/26 PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ction/output/autotune - session.py: get_world_model, get_session_decisions, consume_pending_decisions - injection.py: write_injection_record + _collect_items (P1 utility tracking) - output.py: build_header_lines + emit_output (header formatting + stdout emit) - autotune.py: get_g1_top_k / get_g2d_top_k (project-type top_k dispatch) - bm25-memory.py: 1837→300 lines; all modules ≤400 lines; 26/26 golden PASS - fixtures: 2 updated for grep rank order change (bm25-memory.py size reduction) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… step [Critical] pyproject.toml: add ctx_retriever.hooks._bm25 to packages list and ctx_retriever.hooks._bm25 = ["*.py"] to package-data, so wheel contains all 11 _bm25/*.py modules. [Critical] src/cli/install.py step_copy_hooks(): add recursive copy of _bm25/ dir → ~/.claude/hooks/_bm25/ (idempotent, dirs_exist_ok pattern). [Major 1] tests/unit/test_bm25_memory_cache.py: inject CLAUDE_PROJECT_DIR into hook_env and cwd= into _run_hook subprocess so hook targets tmp_project instead of real cwd. Convert 2 pytest.skip → assert, achieving 7/7 PASS. [Major 2] src/hooks/chat-memory.py: guard bare import sqlite_vec with try/except → HAS_SQLITE_VEC flag. query_vault_vector() returns [] when HAS_SQLITE_VEC is False. Emits ⚠ warning to stderr on import failure. [Major 2] tests/unit/test_chat_memory_fallback.py: strengthen test_chat_memory_no_crash_on_missing_sqlite_vec to require exit 0, ⚠ warning in stderr, and no traceback (was: only checked returncode is not None). Result: 64 passed 0 skip, golden 26/26. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
adaptive_trigger.py now uses src.hooks._bm25.tokenizer.tokenize() + expand_query_tokens() for corpus build and all query tokenization paths (_tfidf_retrieve, _concept_retrieve, _symbol_retrieve, _implicit_retrieve). Fallback to original regex path when _bm25 package is unavailable. ranker.py gains score_corpus_bm25(tokenized_corpus, query_tokens) — a generic low-level BM25 scorer returning a raw numpy score array, usable by both eval pipeline and production hook without G1-specific MMR/dedup overhead. Acceptance: - _HAS_UNIFIED_TOKENIZER = True (import verified) - scripts/verify_bm25_unified.py → ALL CHECKS PASSED - pytest tests/unit → 64 passed / 0 skip - tests/golden/run_golden.py → 15/26 (identical to pre-change baseline) - doc_retrieval_eval_v2.py → CTX R@3=0.740 (identical pre/post change) Option A chosen: adaptive_trigger imports _bm25 directly. Rationale: minimal disruption to Wave 1 outputs, no new package needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace `from rank_bm25 import BM25Okapi` in doc_retrieval_eval_v2.py with `score_corpus_bm25` from src/hooks/_bm25/ranker.py — the canonical single BM25 primitive. BM25Okapi direct import now appears only in _bm25/ modules, not in eval scripts. All retrieval metrics identical to baseline (delta=0.0000 across R@3/R@5/NDCG@5/MRR for all three strategies). Update golden fixture for grep order change caused by removal of the rank_bm25 import line. golden: 26/26 PASS pytest: 64 PASS / 0 skip Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace direct BM25Okapi instantiation in bm25_retriever.py with score_corpus_bm25() from src/hooks/_bm25/ranker. Local _tokenize() retained for identifier-focused code vocabulary; adds None guard for score_corpus_bm25 return (rank_bm25 unavailable case). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace direct BM25Okapi import/instantiation in evaluate_bm25() with score_corpus_bm25() from src/hooks/_bm25/ranker. Whitespace-split tokenization preserved (intentional COIR code-search vocabulary choice). Adds None fallback for score_corpus_bm25 result. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds full telemetry instrumentation to bm25-memory.py orchestrator. Emits hook_complete (summary), prompt_received, g1_done, g2_docs_done, g2_code_done, g2_hooks_done events; captures fallback_reasons (vec_daemon_down, bge_daemon_down, mcp_db_stale, mcp_db_missing). _ctx_telemetry.py extended with 7 new event-type allowed-key entries. _log_event() wrapper now auto-injects hook= field. 6 new unit tests in test_bm25_memory_telemetry.py (70 total, 0 fail). Golden 26/26 PASS maintained. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…one#2/jaytoone#4 + Minor#1 + golden optB Critical (install.py): - step_copy_hooks: hash-compare → update if changed, backup before overwrite - --force-hooks: skip hash check, always overwrite - --no-update-hooks: legacy skip-existing behaviour - returns (copied, updated, skipped, errors) 4-tuple Major jaytoone#1 (bm25-memory.py): - _TELEMETRY_ENABLED cached at module load (os.environ + Path.exists once) - _log_event_impl lazy-imported on first enabled call - disabled path: single bool check, zero I/O overhead Major jaytoone#2 (scripts/verify_bm25_unified.py): - self-contained sys.path insert → runs without PYTHONPATH=. Major jaytoone#4 (code_search.py): - search_files_by_grep sort key: (-count, path) for deterministic ties Minor jaytoone#1 (settings_patcher.py): - _save_atomic uses backup_made flag; new file → '' (not path) golden option B (run_golden.py): - _normalize_g2grep: parses JSON, normalizes file list in additionalContext - fixtures: 25/26 → 26/26 PASS - new test: tests/unit/test_code_search_sort.py (7 cases) - updated tests/unit/test_install_cli.py (+4 tuple tests) - updated tests/unit/test_settings_patcher.py (+2 _save_atomic cases) - pytest: 70 → 82 PASS / 0 skip Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r commit) tests/unit/test_code_search_sort.py was created during the Phase 9 follow-up patch (commit 86d0df7) but never staged. This commit adds it cleanly so the deterministic-sort regression guard is part of the tree. Also adds .coverage to .gitignore (ephemeral pytest-cov artifact).
- LICENSE: MIT preserved, original jaytoone/CTX copyright cited alongside
the tunaCtx fork copyright.
- README: trimmed to factual content per project intent.
- Top notice clearly marks this as a production-level refactor/augmentation
of jaytoone/CTX. Retrieval algorithm is upstream's; this fork only touches
Claude Code hook implementation safety.
- Removed paper section, removed marketing benchmark numbers, removed
PyPI/HuggingFace badges that referred to the upstream package.
- Kept: usage (where/how), install flow, control tags, opt-in telemetry,
what changed in this fork, test results (golden 26/26, pytest 82/0),
known follow-ups, accurate directory structure.
run_fixture() now returns (stdout, stderr, exit_code). Comparison logic checks expected_stderr only when the fixture has the field set — absent field = skip (backward-compat). --update also persists expected_stderr when already present. Existing 26 fixtures carry no expected_stderr field → no new failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three new cases in test_settings_patcher.py: - test_atomic_write_real_filesystem_rename: real disk write + backup check - test_atomic_write_no_tmp_residual_on_new_file: no .tmp_ctx leftover - test_atomic_write_backup_name_contains_timestamp: YYYYMMDD_HHMMSS pattern All three run against real tmp_path (no mocks) to validate actual rename semantics, not just the os.replace call path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests __init__.py now re-exports all public functions across 8 submodules so callers can use 'from _bm25 import tokenize, score_corpus_bm25' etc. Module-level state (AUTO_TUNE, AUTO_TUNE_ACTIVE, last_retrieval_scores) intentionally excluded — access via submodule path. Circular import check: all submodules use named 'from .x import y' imports — no 'from . import x' pattern found. No new side effects introduced; autotune.py file-read already runs when orchestrator loads. test_bm25_init_reexport.py (10 cases): - all __all__ names importable + callable - no circular import on cold load - module-level state not re-exported - submodule imports still work Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously --uninstall only removed settings.json registrations. Now it also removes hook files and _bm25/ with safety guards: - Hash comparison against package source (SHA-256). User-modified files → kept with warning; re-run with --force to override. - _bm25/ removed only when all *.py files match source and no extras present. Extra user files → keep whole directory; --force overrides. - --force flag added: bypass all hash checks, remove unconditionally. - dry_run respected: all checks run, nothing deleted. - Status output classifies each file as removed / kept / not_found. test_uninstall_cleanup.py (10 cases): - clean install removes matching files and _bm25/ - user-modified file kept without --force - --force removes modified files - dry_run does not delete - not_found reported cleanly - _bm25/ with extra files kept; --force removes - cmd_uninstall integration: cleanup called, force flag forwarded Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New commits added during cycle-2 (golden runner stderr guard, atomic write test strengthening, _bm25 re-export, uninstall cleanup) entered the G1 decision corpus, shifting BM25 top-7 rankings in 6 BM25-path fixtures. No production behavior change — only corpus drift from natural git history evolution. Production code paths verified deterministic (same input → same output) via run_golden re-run. golden: 20/26 → 26/26 PASS
The original PRODUCTION_REFACTOR_PLAN.md listed `~/.claude/ctx-retrieval-events.jsonl` as the telemetry output path, but the actual implementation in `_ctx_telemetry.py:33` writes to `~/.claude/ctx-telemetry.jsonl`. Code and README are the source of truth; adding an inline footnote to the plan to prevent confusion in future cycles.
Comprehensive handoff document covering: - Fork identity (what was/wasn't done — retrieval algo unchanged) - Full work history (Phase 0 → Cycle-2, 18+ commits) - Current code state + intentional residuals (BM25Okapi sites, archival benchmarks) - ctx-install applied state (~/.claude paths, current limitations) - BM25/semantic-layer activation (option B venv vs option C pipx) - Verification commands for next session sanity check - Known traps (golden git-history dependence, telemetry gate, cross-package imports) - Upstream issue reference (jaytoone#1) - "What not to do" guardrails for the next session Goal: zero context loss when this conversation ends and a new session picks up.
…rement Measured: 5 prompts × 4 states (CTX+CM/CM-only/CTX-only/baseline) on seCall + tunaFlow + tunaCtx repos via `claude -p --model opus` headless. Total: 20 measurements, $8.01 cost, Gemini-as-judge for ranking. Key patterns: - Synergy in code-search + Korean docstring scenarios (CTX+CM=1st) - Sandbox permission conflict in tool-heavy scenarios (CTX+CM=4th, baseline=1st) - CTX-only beats all combinations on commit-evolution analysis - CTX cost-effective: $1.23 (CTX-only) vs $2.30 (both) for similar quality Files: - EVAL_RESULTS.md: full data + 4-state matrix + judge rankings + recommendations - UPSTREAM_ISSUE_jaytoone.md: pre-drafted issue for jaytoone/CTX (Korean tokenization observations + fork-specific changes available as PRs if desired) - UPSTREAM_ISSUE_mksglu.md: pre-drafted issue for mksglu/context-mode (headless permission denial pattern + tool-light vs tool-heavy heuristic suggestion) Raw data in /tmp/eval-results/ (not committed).
…rtifact Initial measurement showed CTX+CM (state A) ranking 4th in scenarios 2 and 5, attributed to "Context Mode sandbox conflict". Re-measured the same 8 cells with `claude -p --dangerously-skip-permissions` to isolate the permission layer: - Scenario 2 A: "Permission needed. Asking the user..." (abort) → with skip-perm: full 30-commit analysis with feat/fix/Merge breakdown - Scenario 5 A: "ctx_batch_execute 권한 거부됨" (partial fallback) → with skip-perm: precise .py TODO scan with .venv-golden noise filtered Cost rises 13–21% with skip-perm — Context Mode's batch tool actually executes instead of being denied. Quality regression in default measurement was an artifact of headless `claude -p` not being able to surface permission prompts, not a defect in Context Mode. Updates: - EVAL_RESULTS.md: §시너지/충돌 → "headless permission artifact" with proof. Recommendation now distinguishes interactive (always-on safe) from headless (skip-perm or off). - UPSTREAM_ISSUE_mksglu.md: Pattern 1 strengthened with 8-measurement A/B data. - Total measurement count: 20 → 28, total cost: $8.01 → $10.58.
- README: short summary block with key findings + links to full report and blog post. - docs/community/BLOG_POST_eval_ko.md: 한국어 블로그 포스트 draft — 본 fork 컨텍스트 + 5 시나리오 × 4 상태 측정 + skip-perm 검증 + 한계 + 세 줄 요약. ~2K 단어, 마케팅 톤 없는 정보 위주. Format-portable markdown — Velog, Tistory, dev.to, 회사 블로그 어디든 copy-paste 가능. 직접 게시는 사용자가 결정.
Add explicit '검색 stack' bullet to '어디에 어떻게 쓰는가' section, listing the three layers (G1 time axis, G2 BM25 + cross-encoder rerank, chat-memory FTS5 + vec0 dense hybrid) to address community misperception that the project is BM25-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The R@5=0.152 figure cited as a "weakness" across several docs is the pre-fix baseline from 20260326-ctx-methodology-comparison.md. Subsequent generalization fixes and the iter11 re-measurement (Mean R@5=0.595, per benchmarks/results/reeval_external_iter11.json) supersede it. - CLAUDE.md L91, L197: weakness/future-work wordings updated - docs/refactor/PRODUCTION_REFACTOR_PLAN.md L263: footnote added - README.md: external codebase measurement reference + link to upstream issue jaytoone#2 flagging the same inconsistency upstream No retrieval algorithm change. Docs only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Header: last commit ca0c4b6, branch state, work dir corrected to /Users/d9ng/privateProject/tunaCtx (clone, not GitHub fork) - §2 history: Cycle-3 row added (README stack bullet, R@5 stale refresh, upstream issue jaytoone#2) - §4 constraints: pre-Cycle-3 'BM25 fallback / daemons down' state was resolved — pipx option C is now the deployed mode (vec/bge daemons running, hook commands using pipx python) - §5 verification: golden expectation lowered to 15/26 with §6-1 pointer for fallback drift; commands switched to .venv-golden python - §6-6 added: external R@5 multi-measurement landscape (0.152 / 0.495 / 0.595 / 0.744) with guidance to wait for upstream jaytoone#2 response before treating any single value as canonical - §7 upstream: issue jaytoone#2 added; PR split guidance updated - §8 next-session: directory path + commit hash + dual issue check - §10 environment: pipx venv + daemon PIDs noted Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aytoone#1) vec-daemon / bge-daemon and the three client hooks (chat-memory, utility-rate, _bm25/rerank) can now run on Windows where MSVC-built CPython lacks socket.AF_UNIX. POSIX behavior unchanged. - AF_UNIX path stays gated by hasattr(socket, "AF_UNIX") - TCP loopback fallback bound to 127.0.0.1 with CTX_VEC_PORT (29501) / CTX_BGE_PORT (29502) overrides - SO_REUSEADDR gated to non-Windows (Windows semantics allow port hijacking — gemini-code-assist review) - socket import hoisted to module top-level, removing _sock_mod / _sk workarounds (gemini-code-assist review) Co-Authored-By: gemini-code-assist <noreply@google.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dination) - Header: last commit 29f241c, Cycle-3.5 marker, Fork PR row added, upstream issue jaytoone#2 marked CLOSED, jaytoone#1 reply state noted - §2 history: Cycle-3.5 row added (PR merge + upstream issue replies) - §6-6 R@5 narrative: 0.595 confirmed canonical by jaytoone, 0.744 marked superseded — "단정 금지" guidance lifted - §6-7 added: README.md is excluded from upstream PR scope (fork and upstream have diverged on README persona — user decision) - §7 upstream: 5-stage PR split plan documented + subtoken splitter flagged as separate cycle candidate (not in fork yet either) - §8 next-session: commit hash + simplified issue-watch (only jaytoone#1 still awaiting response) - §9 intentional-not-done: README inclusion in upstream PR added Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….tokenize
Goal-1 prep for upstream PR — make `_bm25.tokenizer.tokenize` the single
canonical entry point as already documented in `_bm25/__init__.py`
("eval and production share a single canonical tokenizer/scorer (Task C)").
Converted (each verified against original on baseline corpus):
- benchmarks/eval/g1_docs_bm25_eval.py — 1/8 sample diff (Porter stem add)
- benchmarks/eval/g1_longterm_baseline_eval.py — 3/8 diff (decimal preservation;
baseline numbers may shift)
- benchmarks/eval/g2_docs_paraphrase_eval.py — 0/8 diff (KO particle parity)
Out-of-scope (intentional divergence — reason annotated in source):
- src/cli/telemetry.py — identifier-frequency stats, not BM25 ranking
- src/retrieval/bm25_retriever.py — code-search needs raw TF (canonical's
dict.fromkeys() dedup flattens TF scoring)
Adds tests/regression/test_pr1_tokenizer_baseline.py to document delta and
guard against future regressions.
Validation: golden 26/26 PASS (production hook unaffected — eval-only changes).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ch/ same-name
build_docs_bm25 indexed docs/research/*.md AND root extras (CLAUDE.md,
README.md, MEMORY.md) without dedup. When docs/research/README.md exists
(placeholder, ~843B) alongside root README.md (canonical fork persona,
~10KB), both are indexed under the same `name` ("README.md"). The
bm25_search_docs path that returns bm_filtered[:top_k] without rerank
(line 144) had no name dedup, so both copies could appear in the
G2-DOCS output block with identical first-line previews.
Fix: switch to a name-keyed dict during corpus build; root extras win
on collision (root README is canonical fork metadata; docs/research/
counterparts are placeholders).
Golden: 3 fixtures re-captured to reflect both this dedup and the
incidental G2-GREP shift (the new docstring contains "README", which
the user-prompt-driven grep now matches in docs_search.py itself):
- avoidance_fix_typo
- avoidance_fix_typo_bm25path
- korean_paraphrase_decision_mem_bm25path
Result: 24/26 → 26/26 PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ranker.py had 3 sort sites that relied on Python's stable-sort guarantee
to keep equal-key items in input order. Stable sort is currently
guaranteed in CPython, but the upstream maintainer flagged this as a
"subtle non-determinism bug" worth addressing — the equal-key paths
were brittle to:
- input ordering changes (corpus iteration order, dict insertion)
- alternative interpreters (PyPy, future CPython changes)
- numpy float comparisons at epsilon boundaries
Sites fixed (matches existing pattern in code_search.py:233):
L52 dense_rank_decisions:
scored.sort(key=lambda x: -x[0])
→ scored.sort(key=lambda x: (-x[0], x[1].get("hash") or
(x[1].get("text") or "")[:20]))
L84 rrf_merge:
sorted(scores.keys(), key=lambda h: -scores[h])
→ sorted(scores.keys(), key=lambda h: (-scores[h], h))
L160 bm25_rank_decisions:
sorted(range(len(corpus)), key=lambda i: scores[i], reverse=True)
→ sorted(range(len(corpus)), key=lambda i: (-scores[i], i))
Adds tests/regression/test_pr3_deterministic_sort.py with 5 cases:
- rrf_merge idempotent (same input → same output)
- rrf_merge equal-rank tiebreak independent of list_a/list_b order
- rrf_merge equal-score tiebreak by hash ascending
- dense_rank_decisions no-emb sanity
- bm25_rank_decisions index tiebreak
Validation: regression 5/5 PASS, golden 26/26 PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two reference docs for the upstream coordination cycle: 1. upstream-sync-2026-05-08.md — trial merge inventory - Cataloged 11 new commits on upstream/master since fork base - Found upstream commit 08e262b (Korean tokenizer eval fix) explicitly references hang-in/tunaCtx tokenizer.py — partial pre-adoption of PR-1 motivation - Trial merge in isolated worktree produced 16 conflict files; b799aae (giant batch commit) drives ~80% of the conflict surface - Conclusion: ship upstream PRs as new commits branched from upstream/master, not as merges from fork master 2. upstream-issue-1-reply-draft.md — reply draft for issue jaytoone#1 comment 2 (jaytoone 2026-05-07) - Reorders 5-stage PR plan to 4 stages aligned with jaytoone's priorities (P0 tokenizer / P1 tests / P2 deterministic sort / PR-4 decomposition pending boundary review) - Drops sqlite_vec PR (already in 0.3.14) - Module boundary table for the 11-module decomposition - Co-maintain acceptance with proposed area-of-ownership split Neither doc is the final issue comment — both are working drafts to be revised based on the audit findings now committed in dd27565, 4997fc3, 83b82cb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e wrap) Comment posted: jaytoone#1 (comment) Body covers: - 3 Draft PRs opened on jaytoone/CTX (jaytoone#3/jaytoone#4/jaytoone#5) mapped to jaytoone's P0/P1/P2 priorities - sqlite_vec dropped from plan (already in 0.3.14 ba7df3d) - Four audit findings: 1. 08e262b already covers part of PR-1 (doc_retrieval_eval_v2.py); this PR covers remaining 3 sites + 2 intentionally-divergent annotated 2. Test count corrected 82 -> 80 unit + 26 golden (audit re-classification: 23 PR-4-dependent, 66 fork-only) 3. PR-2 carries an unrelated production-hook bug fix (build_docs_bm25 README/CLAUDE/MEMORY name-collision dedup) discovered during audit 4. PR-3 ships 5 regression cases (idempotent / equal-rank / equal-score / no-emb / index tiebreak) - Co-maintain accepted, area-of-ownership split proposed (hook hardening on us, algorithm/paper/benchmark on jaytoone) - Order of operations: jaytoone boundary review -> either cherry-pick onto merged decomposition or re-author into upstream monolith Supersedes the earlier draft at upstream-issue-1-reply-draft.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Good find @hang-in. The duplicate The name-keyed dict with root-wins-on-collision is the right fix: root Plan: Same as PR-3/PR-5 — will port the 24/26 → 26/26 after the fixture re-capture is a clean validation. Will include in the next patch. |
|
Sounds good — porting the dedup directly into Two small notes for when you do the port:
Thanks for the credit plan — happy to close this Draft once your port lands. |
Summary
Found during the audit cycle for the test-suite extraction (#1, P1):
build_docs_bm25indexesdocs/research/*.mdand root extras (CLAUDE.md,README.md,MEMORY.md) without name-collision dedup. Whendocs/research/README.mdexists alongside rootREADME.md, both are indexed under the samename, and thebm25_search_docspath that returnsbm_filtered[:top_k]without rerank can emit both.In our golden suite this manifested as a duplicate
> README.mdline in the G2-DOCS output block — a real production-path issue, not just a fixture artifact.Fix
Switch corpus build to a name-keyed dict; root extras win on collision (root
READMEis canonical fork metadata;docs/research/counterparts are typically placeholders).Focus commit
4997fc3—fix(docs_search): dedup root README/CLAUDE/MEMORY against docs/research/ same-nameFiles in scope of this PR
src/hooks/_bm25/docs_search.pybuild_docs_bm25insidebm25-memory.pytests/golden/bm25_memory_outputs.jsonlWhy Draft + relation to PR-2 from #1 conversation
The PR-2 promise from the
#1thread was the test suite (80 unit + 26 golden —82reported earlier was off by 2 after audit re-classification). This PR includes the bug fix discovered during that audit. The full test-suite extraction is gated on the_bm25/package layout decision (same Draft caveat as PR-1).Validation
Related: #1