jaytoone · hang-in · May 4, 2026 · May 4, 2026 · May 4, 2026 · May 4, 2026
diff --git a/.gitignore b/.gitignore
@@ -51,3 +51,5 @@ plugin/scripts/
 benchmarks/datasets/longmemeval/longmemeval_s
 benchmarks/datasets/longmemeval/longmemeval_oracle
 benchmarks/datasets/longmemeval/.cache/
+.venv-golden/
+.coverage
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -87,10 +87,10 @@ CTX = **Claude Code의 자동 context 주입 시스템**.
 - `docs/research/CTX_NEMOTRON_COMPARISON_REPORT.docx`
 
 ### Phase 3: CTX 약점 분석 + 대안 조사 (expert-research)
-**CTX 3대 약점**:
-1. 외부 코드베이스 R@5=0.152 (heuristic 과적합)
-2. keyword 쿼리 R@3=0.379 < BM25=0.667
-3. 교차 파일 추론 불가 (multi-hop)
+**CTX 3대 약점** (2026-03-27 시점 진단):
+1. ~~외부 코드베이스 R@5=0.152 (heuristic 과적합)~~ — **갱신: iter11 재측정 (`benchmarks/results/reeval_external_iter11.json`) 결과 Mean R@5=0.595** (Flask 0.6462 / FastAPI 0.3870 / Requests 0.7526). 0.152 는 pre-fix baseline 으로 stale.
+2. keyword 쿼리 R@3=0.379 < BM25=0.667 — Phase 5 에서 **0.724 달성** (해소)
+3. 교차 파일 추론 불가 (multi-hop) — 잔존 약점
 
 **즉시 실행 가능 개선**: TF-IDF → BM25 교체 (ROI 최고)
 - 결과 문서: `FromScratch/docs/research/20260327-ctx-alternatives-research.md`
@@ -194,7 +194,7 @@ def rank_ctx_doc(query, docs, bm25_index=None):
 3. **G2 real codebase Δ+0.200 개선**: instruction parsing → CTX query 변환 레이어 추가
 
 ### 중기 (1-2주)
-3. **외부 코드베이스 R@5=0.152 개선**: AST 파서 기반 심볼 추출 (heuristic 제거)
+3. **외부 코드베이스 R@5 추가 개선**: 현재 Mean R@5=0.595 (iter11), FastAPI 0.387 가 최약점. AST 파서 기반 심볼 추출 (heuristic 제거) 검토
    - `src/retrieval/adaptive_trigger.py`의 `_index_symbols()` 개선
 4. **교차 파일 추론**: Import graph BFS 확장 (현재 2-hop 한계)
 

diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,7 @@
 MIT License
 
-Copyright (c) 2026 jaytoone
+Copyright (c) 2026 hang-in (tunaCtx fork — production-level refactor)
+Copyright (c) 2026 jaytoone (original CTX — https://github.com/jaytoone/CTX)
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
diff --git a/benchmarks/eval/doc_retrieval_eval_v2.py b/benchmarks/eval/doc_retrieval_eval_v2.py
@@ -19,8 +19,13 @@
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple
 
+import sys
+import os as _os
+sys.path.insert(0, _os.path.join(_os.path.dirname(_os.path.dirname(_os.path.dirname(
+    _os.path.abspath(__file__)))), 'src', 'hooks'))
+
 import numpy as np
-from rank_bm25 import BM25Okapi
+from _bm25.ranker import score_corpus_bm25
 from sklearn.feature_extraction.text import TfidfVectorizer
 from sklearn.metrics.pairwise import cosine_similarity
 
@@ -270,11 +275,13 @@ def rank_tfidf(query: str, docs: List[DocFile],
 def rank_ctx_doc(
     query: "str | DocQuery",
     docs: List[DocFile],
-    bm25_index: "BM25Okapi | None" = None,
+    bm25_index=None,  # unused — kept for backward compat; doc_tokens used instead
     doc_tokens: "List[List[str]] | None" = None,
 ) -> List[Tuple[str, float]]:
     """CTX-doc: heading match + BM25 (query_type-aware blending).
 
+    BM25 scoring via _bm25/ranker.score_corpus_bm25 (canonical single source).
+
     keyword queries: BM25 dominant (heading overlap weight halved, bm25 norm unpenalized)
     other queries:   heading dominant (original weights)
     """
@@ -311,10 +318,12 @@ def rank_ctx_doc(
             if score > 0:
                 scored[doc.rel_path] = score
 
-    # Stage 2: BM25 augmentation
-    if bm25_index is not None:
+    # Stage 2: BM25 augmentation via _bm25/ranker.score_corpus_bm25 (canonical)
+    if doc_tokens is not None:
         q_tokens = re.findall(r'\b[a-z]{2,}\b', query_lower)
-        bm25_scores = bm25_index.get_scores(q_tokens)
+        bm25_scores = score_corpus_bm25(doc_tokens, q_tokens)
+        if bm25_scores is None:
+            bm25_scores = np.zeros(len(docs))
         max_bm25 = float(np.max(bm25_scores)) if bm25_scores.max() > 0 else 1.0
         for i, bm25_s in enumerate(bm25_scores):
             fpath = docs[i].rel_path
@@ -445,22 +454,22 @@ def main() -> None:
     )
     tfidf_matrix = vectorizer.fit_transform([d.content for d in docs])
 
-    # Build BM25 index for CTX-doc augmentation (enriched: stem+heading for heading queries)
+    # Build enriched token lists for CTX-doc BM25 augmentation (stem+heading for heading queries)
+    # score_corpus_bm25 (_bm25/ranker.py) is the single canonical BM25 primitive — no BM25Okapi here
     doc_token_lists_enriched = [_doc_tokens_with_stem(d) for d in docs]
-    bm25_idx = BM25Okapi(doc_token_lists_enriched)
 
     print("Running evaluations...")
 
     results = []
 
     # Strategy 1: CTX-doc (query_type-aware routing)
     # keyword queries: TF-only BM25 (rank_bm25) — matches/beats 0.724 baseline
-    # heading queries: heading match + BM25Okapi augmentation (rank_ctx_doc)
+    # heading queries: heading match + score_corpus_bm25 augmentation (rank_ctx_doc)
     ctx_result = evaluate_strategy(
         "CTX-doc (heading+BM25)",
         valid_queries,
         lambda q: (rank_bm25(q.text, docs) if q.query_type == "keyword"
-                   else rank_ctx_doc(q, docs, bm25_index=bm25_idx)),
+                   else rank_ctx_doc(q, docs, doc_tokens=doc_token_lists_enriched)),
     )
     results.append(ctx_result)
 

diff --git a/benchmarks/eval/g1_docs_bm25_eval.py b/benchmarks/eval/g1_docs_bm25_eval.py
@@ -22,6 +22,9 @@
 import anthropic
 from rank_bm25 import BM25Okapi
 
+sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "src" / "hooks"))
+from _bm25.tokenizer import tokenize  # noqa: E402  canonical (PR-1)
+
 
 # ──────────────────────────────────────────────────────────────────────────────
 # QA Pairs (same as g1_docs_memory_eval.py)
@@ -75,12 +78,6 @@
 # Step 1: Build BM25 index over doc chunks
 # ──────────────────────────────────────────────────────────────────────────────
 
-def tokenize(text: str) -> List[str]:
-    """Lowercase; preserve decimal numbers (0.724) and numeric ranges (7-30)."""
-    tokens = re.findall(r'\d+[-\u2013]\d+|\d+\.\d+|\w+', text.lower())
-    return [t for t in tokens if t]
-
-
 def chunk_document(filename: str, content: str) -> List[str]:
     """Split a document by ## section headers. Each chunk = filename § header\ncontent."""
     chunks = []

diff --git a/benchmarks/eval/g1_longterm_baseline_eval.py b/benchmarks/eval/g1_longterm_baseline_eval.py
@@ -20,6 +20,9 @@
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple
 
+sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "src" / "hooks"))
+from _bm25.tokenizer import tokenize as _canonical_tokenize  # noqa: E402  canonical (PR-1)
+
 # ── LLM client ───────────────────────────────────────────────────────────────
 
 def get_llm_client():
@@ -264,9 +267,7 @@ def get_bm25_context(query: str, commit_corpus: List[Dict], top_k: int = 7) -> T
     if not commit_corpus:
         return "[Empty corpus]", 0
 
-    def tokenize(text: str) -> List[str]:
-        return re.findall(r'\b\w+\b', text.lower())
-
+    tokenize = _canonical_tokenize  # PR-1: was local re.findall(r'\b\w+\b'); now canonical _bm25 tokenize
     subjects = [c.get('subject', '') for c in commit_corpus]
     tokenized = [tokenize(s) for s in subjects]
     bm25 = BM25Okapi(tokenized)

diff --git a/benchmarks/eval/g2_docs_paraphrase_eval.py b/benchmarks/eval/g2_docs_paraphrase_eval.py
@@ -18,12 +18,16 @@
 
 import json
 import re
+import sys
 import time
 from pathlib import Path
 from typing import List, Tuple
 
 from rank_bm25 import BM25Okapi
 
+sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "src" / "hooks"))
+from _bm25.tokenizer import tokenize  # noqa: E402  canonical (PR-1)
+
 
 # ──────────────────────────────────────────────────────────────────────────────
 # 30 Paraphrase QA Pairs
@@ -317,23 +321,6 @@
 # BM25 index construction
 # ──────────────────────────────────────────────────────────────────────────────
 
-_KO_PARTICLES = re.compile(
-    r'(와|과|이|가|은|는|을|를|의|에서|으로|에게|부터|까지|처럼|같이|보다|이나|며|에|로|도|만|나|고)$'
-)
-
-
-def tokenize(text: str) -> List[str]:
-    """Preserve decimal numbers and numeric ranges. Strip Korean particles from mixed tokens."""
-    raw = re.findall(r'\d+[-\u2013]\d+|\d+\.\d+|\w+', text.lower())
-    result = []
-    for tok in raw:
-        cleaned = _KO_PARTICLES.sub('', tok)
-        if cleaned and cleaned != tok:
-            result.append(cleaned)
-        result.append(tok)
-    return list(dict.fromkeys(result))
-
-
 def chunk_document(filename: str, content: str) -> List[str]:
     """Split by ## section headers."""
     chunks = []

diff --git a/benchmarks/results/doc_retrieval_eval_v2.md b/benchmarks/results/doc_retrieval_eval_v2.md
@@ -1,66 +1,66 @@
 # CTX Document Retrieval Evaluation v2
 
-**Date**: 2026-04-03 09:58
-**Corpus**: 62 .md files from docs/
+**Date**: 2026-05-05 05:15
+**Corpus**: 119 .md files from docs/
 **Queries**: 100 (heading_exact + heading_paraphrase + keyword)
 **Metrics**: Recall@3, Recall@5, NDCG@5, MRR
 
 ## Summary Table
 
 | Strategy | Recall@3 | Recall@5 | NDCG@5 | MRR |
 |----------|----------|----------|--------|-----|
-| CTX-doc (heading+BM25) | **0.870** | **0.940** | 0.815 | 0.782 |
-| BM25 | **0.590** | **0.760** | 0.594 | 0.562 |
-| Dense TF-IDF | **0.560** | **0.670** | 0.546 | 0.537 |
+| CTX-doc (heading+BM25) | **0.740** | **0.790** | 0.680 | 0.662 |
+| BM25 | **0.490** | **0.590** | 0.443 | 0.424 |
+| Dense TF-IDF | **0.490** | **0.610** | 0.472 | 0.452 |
 
 ## Per-Strategy Analysis
 
 ### CTX-doc (heading+BM25)
-- Hits@3: 87/100 (87.0%)
-- Hits@5: 94/100 (94.0%)
-- NDCG@5: 0.815
-- MRR: 0.782
+- Hits@3: 74/100 (74.0%)
+- Hits@5: 79/100 (79.0%)
+- NDCG@5: 0.680
+- MRR: 0.662
 
 **Misses (top 5)**:
-- [keyword] `show information about minimax without` → expected `research/20260328-ctx-downstream-eval-complete.md`
-- [keyword] `find docs related to memory cross` → expected `research/20260325-long-session-context-management.md`
-- [keyword] `which document covers trigger retrieval` → expected `paper_draft_outline.md`
-- [keyword] `nemotron research documentation` → expected `research/20260329-ctx-paper-gap-analysis.md`
-- [keyword] `find docs related to locagent source` → expected `research/20260327-ctx-alternatives-research.md`
+- [keyword] `find docs related to dense import` → expected `research/20260326-ctx-methodology-comparison.md`
+- [keyword] `which document covers graph retrieval` → expected `paper_draft_outline.md`
+- [heading_exact] `original question` → expected `research/20260330-ctx-academic-critique-web-grounded.md`
+- [keyword] `find docs related to beir locagent` → expected `research/20260327-ctx-alternatives-research.md`
+- [keyword] `notes about evaluation quality` → expected `research/20260402-g2-evaluation-methods-research-summary.md`
 
 ### BM25
-- Hits@3: 59/100 (59.0%)
-- Hits@5: 76/100 (76.0%)
-- NDCG@5: 0.594
-- MRR: 0.562
+- Hits@3: 49/100 (49.0%)
+- Hits@5: 59/100 (59.0%)
+- NDCG@5: 0.443
+- MRR: 0.424
 
 **Misses (top 5)**:
-- [heading_paraphrase] `where is ctx — document index documented` → expected `DOC_INDEX.md`
-- [heading_exact] `즉시 실행 순서` → expected `marketing/active_outreach_playbook.md`
-- [heading_exact] `실험 설계` → expected `research/20260327-ctx-downstream-eval.md`
-- [heading_exact] `[expert-research-v2] ctx 약점 보완 대안 기술 분석` → expected `research/20260327-ctx-alternatives-research.md`
-- [heading_exact] `ctx architecture` → expected `ARCHITECTURE.md`
+- [heading_paraphrase] `I need info on [expert-research-v2] ctx 실험 방식 상위 티어 논문 기준 평론` → expected `research/20260324-ctx-methodology-critique-top-tier.md`
+- [heading_exact] `5개 실제 시나리오` → expected `research/20260328-ctx-real-codebase-g2-eval.md`
+- [heading_paraphrase] `find documentation about [expert-research-v2] ctx 성과 평론 — 상위` → expected `research/20260326-ctx-results-review.md`
+- [heading_exact] `g1: cross-session memory recall` → expected `research/20260327-ctx-downstream-eval.md`
+- [keyword] `find docs related to dense import` → expected `research/20260326-ctx-methodology-comparison.md`
 
 ### Dense TF-IDF
-- Hits@3: 56/100 (56.0%)
-- Hits@5: 67/100 (67.0%)
-- NDCG@5: 0.546
-- MRR: 0.537
+- Hits@3: 49/100 (49.0%)
+- Hits@5: 61/100 (61.0%)
+- NDCG@5: 0.472
+- MRR: 0.452
 
 **Misses (top 5)**:
-- [heading_paraphrase] `where is ctx — document index documented` → expected `DOC_INDEX.md`
-- [keyword] `which document covers memory codebase` → expected `research/20260402-production-context-retrieval-research.md`
-- [heading_exact] `즉시 실행 순서` → expected `marketing/active_outreach_playbook.md`
-- [heading_exact] `실험 설계` → expected `research/20260327-ctx-downstream-eval.md`
-- [heading_exact] `[expert-research-v2] ctx 약점 보완 대안 기술 분석` → expected `research/20260327-ctx-alternatives-research.md`
+- [heading_paraphrase] `documentation for ctx: trigger-driven dynamic context loadin` → expected `paper/CTX_paper_draft.md`
+- [heading_paraphrase] `I need info on [expert-research-v2] ctx 실험 방식 상위 티어 논문 기준 평론` → expected `research/20260324-ctx-methodology-critique-top-tier.md`
+- [heading_exact] `5개 실제 시나리오` → expected `research/20260328-ctx-real-codebase-g2-eval.md`
+- [heading_paraphrase] `find documentation about [expert-research-v2] ctx 성과 평론 — 상위` → expected `research/20260326-ctx-results-review.md`
+- [heading_exact] `g1: cross-session memory recall` → expected `research/20260327-ctx-downstream-eval.md`
 
 ## Per-Query-Type Breakdown
 
 | Type | N | CTX R@3 | BM25 R@3 | Dense R@3 |
 |------|---|---------|----------|-----------|
-| heading_exact | 37 | 0.973 | 0.595 | 0.514 |
-| heading_paraphrase | 31 | 1.000 | 0.548 | 0.613 |
-| keyword | 32 | 0.625 | 0.625 | 0.562 |
+| heading_exact | 32 | 0.812 | 0.531 | 0.469 |
+| heading_paraphrase | 34 | 1.000 | 0.529 | 0.588 |
+| keyword | 34 | 0.412 | 0.412 | 0.412 |
 
 ## Method Description
 
@@ -72,6 +72,6 @@
 
 | Stat | Value |
 |------|-------|
-| Total docs | 62 |
-| Average headings/doc | 14.5 |
-| Average keywords/doc | 14.8 |
+| Total docs | 119 |
+| Average headings/doc | 19.0 |
+| Average keywords/doc | 15.0 |