Add thrifty scoring backend to larch-usher CLI#113
Conversation
Expose Thrifty/netam ML scoring through larch-usher CLI flags: - --scoring-backend [parsimony|ml] to select scoring method - --model-config FILE and --model-weights FILE for ML model paths - --move-coeff-ml FLOAT to weight ML score relative to parsimony When ML backend is active, fragment edges are scored using netam::crepe log-likelihood and blended with parsimony scores in the move accept/reject decision. The ML score is computed per-edge via poisson_context_log_likelihood and subtracted from the effective score (higher LL = better = lower score). CMakeLists.txt links netam to larch-usher when USE_NETAM=yes. All ML code paths are gated behind #ifdef USE_NETAM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change ML scoring from absolute fragment LL to a proper delta: compute edge log-likelihoods for both new (post-move) and old (pre-move) parent-child relationships in the fragment, return the difference. Only changed edges are scored (~5-10 per move), not full tree traversals. Refactor ComputeFragmentMLScore into ComputeEdgeLL helper and delta computation. For each fragment edge, the old LL uses GetOld() to access pre-move compact genomes and parent topology. MoveNew nodes (created by SPR) have no old edge and contribute only to new LL. Scoring coefficients now default based on --scoring-backend: - parsimony (default): pscore=1, nodes=1, ml=0 - ml: pscore=0, nodes=1, ml=1.0 User-specified values always override. Validate that --move-coeff-ml requires --scoring-backend ml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --scoring-backend ml is active, compute and display the total log-likelihood of the best parsimony tree at each iteration in the optimization log. Samples the min-parsimony tree from the DAG and sums per-edge log-likelihoods using the loaded netam model. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move before_optimize.pb and after_optimize.pb writes behind #ifdef USE_TEST_LOGS (default: OFF). Previously after_optimize.pb was always written unconditionally, and before_optimize.pb was tied to KEEP_ASSERTS (which also gates the MAT/MADAG equality check, kept separate). Add USE_TEST_LOGS option to CMakeLists.txt, larch_compile_opts, and the pixi build env template. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --quiet is passed, skip creating the optimization_log/ directory, writing logfile.csv, and writing intermediate DAGs. The logfile open/write/close and directory creation are now inside the existing write_intermediate_dag guard. Stdout logging (parsimony scores, tree counts, etc.) is unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sampled tree from MinWeightSampleTree needs RecomputeCompactGenomes to populate node compact genomes from edge mutations. Without this, all nodes had empty CGs (equal to reference), producing LL=0 for all edges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The new-node penalty is scaled for parsimony (1 node ≈ 1 mutation) but is not meaningful on the log-likelihood scale. Default to nodes=0 for ML mode so only LL delta drives accept/reject. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract the logging lambda and related setup into an OptimizationLogger class that owns the logfile, timer, intermediate DAG saves, and ML scoring. This replaces the large lambda with scattered #ifdef blocks. CLI changes: - Logging is now opt-in via -l,--logpath DIR (was on by default) - --quiet flag removed (no longer needed) - --inter-save now requires -l - Intermediate DAGs and snapshots are written inside the log directory (e.g. <logdir>/intermediate.pb, <logdir>/snapshot.5.pb) - BestTreeLL column added to logfile.csv when ML backend is active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract MLScoringConfig struct with AdjustScore() method, replacing duplicated ML scoring blocks and loose ml_model_/ml_coeff_ members across three callback structs - Branch upfront in ComputeMetrics for use_ua_free_parsimony to avoid wasting a full BinaryParsimonyScore traversal - Fix BestTreeLL column missing from logfile header by setting ML model before opening the logfile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --list-tags to larch-test to print all available test tags - Add pixi run test-netam task for running all netam-tagged tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- WriteHeader/WriteRow: each line now ends with newline (CSV well-formed mid-run) - Rename ComputeBestTreeLL -> ComputeBestParsimonyTreeLL - FormatExt: use switch with Fail() for unsupported formats - Comment on Merge_All_Moves_Found_Callback explaining why it skips ML scoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean Code ReviewRan a clean-code-reviewer analysis on the PR. Here's a summary of issues found and how they were addressed:
Strengths noted
|
|
🤖 PR Review: Add thrifty scoring backend to larch-usher CLISummaryThis PR adds ML-based (Thrifty/netam) scoring to
No benchmarks or performance data are included in the PR description. Concerns1. Thread safety of ML scoring in callbacks
2. Per-edge forward passes (performance note)Each 3. Score sign convention deserves a commentIn return base_score - coeff * fragment_ll;This is correct (positive delta LL → lower effective score → accepted), but the sign inversion relative to the LL convention is non-obvious. A one-line comment explaining the sign would help readers. 4.
|
The kmer indices feed libtorch embedding lookups using weights trained in Python netam, so the C++ encoder's index layout must match netam's. netam.sequences.generate_kmers prepends "N" (index 0) and places the 4^k ACGT kmers at indices 1..4^k; previously the C++ encoder put the N-kmer placeholder at the last index (4^k), misaligning every kmer's embedding row. Prepend the N placeholder instead and update the test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Expose Thrifty/netam model scoring through
larch-usherCLI flags, enabling thrifty likelihood-guided DAG optimization alongside or instead of parsimony scoring.--scoring-backend [parsimony|ml]to select scoring method--model-configand--model-weightsfor ML model paths--move-coeff-mlto weight ML score relative to parsimonypscore=1, ml=0; ML mode defaults topscore=0, ml=1; blended scoring available via explicit coefficientsbefore_optimize.pb,after_optimize.pb,intermediate_newick*.pb.gz) behindUSE_TEST_LOGScmake option (default: OFF)OptimizationLoggerclassRelated: matsengrp/bcr-larch-experiments#6
CLI API changes
New flags
--scoring-backend [parsimony|ml]-DUSE_NETAM=yesbuild.--model-config FILE--scoring-backend ml)--model-weights FILE--scoring-backend ml)--move-coeff-ml FLOATChanged flags
-l,--logpath DIRlogfile.csv, intermediate DAGs, and--inter-savesnapshots — all written inside the given directory.--inter-save N-lto be set.Removed flags
--quiet-lto opt in.Logging directory layout (when
-l DIRis used)Default coefficient behavior
--move-coeff-pscore--move-coeff-nodes--move-coeff-mlAll coefficients can be overridden explicitly for blended scoring.
Example usage
Test plan
USE_TEST_LOGS=OFFsuppresses debug protobuf artifacts-lenables it🤖 Generated with Claude Code