Add thrifty scoring backend to larch-usher CLI by davidrich27 · Pull Request #113 · matsengrp/larch

davidrich27 · 2026-03-19T22:48:11Z

Summary

Expose Thrifty/netam model scoring through larch-usher CLI flags, enabling thrifty likelihood-guided DAG optimization alongside or instead of parsimony scoring.

Add --scoring-backend [parsimony|ml] to select scoring method
Add --model-config and --model-weights for ML model paths
Add --move-coeff-ml to weight ML score relative to parsimony
ML scoring computes per-edge delta log-likelihood (new LL - old LL) for fragment edges affected by each SPR move
Scoring coefficients default based on backend: parsimony mode defaults to pscore=1, ml=0; ML mode defaults to pscore=0, ml=1; blended scoring available via explicit coefficients
Log best-tree log-likelihood at each iteration when ML model is loaded
Gate debug artifacts (before_optimize.pb, after_optimize.pb, intermediate_newick*.pb.gz) behind USE_TEST_LOGS cmake option (default: OFF)
Refactor optimization logging into OptimizationLogger class

Related: matsengrp/bcr-larch-experiments#6

CLI API changes

New flags

Flag	Description
`--scoring-backend [parsimony\|ml]`	Select scoring method (default: parsimony). ML requires `-DUSE_NETAM=yes` build.
`--model-config FILE`	Path to YAML model config (required with `--scoring-backend ml`)
`--model-weights FILE`	Path to libtorch .pth weights (required with `--scoring-backend ml`)
`--move-coeff-ml FLOAT`	ML log-likelihood coefficient (default: 1.0 with ml backend, 0.0 otherwise)

Changed flags

Flag	Change
`-l,--logpath DIR`	Now opt-in (was on by default). Enables `logfile.csv`, intermediate DAGs, and `--inter-save` snapshots — all written inside the given directory.
`--inter-save N`	Now requires `-l` to be set.

Removed flags

Flag	Replacement
`--quiet`	No longer needed — logging is off by default. Use `-l` to opt in.

Logging directory layout (when `-l DIR` is used)

DIR/
  logfile.csv              # per-iteration metrics (+ BestTreeLL column when ML is active)
  intermediate.pb          # latest intermediate DAG (overwritten each iteration)
  snapshot.5.pb            # numbered snapshots (when --inter-save is used)
  snapshot.10.pb

Default coefficient behavior

Backend	`--move-coeff-pscore`	`--move-coeff-nodes`	`--move-coeff-ml`
parsimony (default)	1	1	0
ml	0	0	1

All coefficients can be overridden explicitly for blended scoring.

Example usage

# Parsimony only (unchanged default)
larch-usher -i input.pb -o output.pb -c 10

# ML scoring with Thrifty model
larch-usher -i input.pb -o output.pb -c 10 \
  --scoring-backend ml \
  --model-config data/linearham/ThriftyHumV0.2-45.yml \
  --model-weights data/linearham/ThriftyHumV0.2-45-libtorch.pth

# Blended parsimony + ML
larch-usher -i input.pb -o output.pb -c 10 \
  --scoring-backend ml \
  --model-config data/linearham/ThriftyHumV0.2-45.yml \
  --model-weights data/linearham/ThriftyHumV0.2-45-libtorch.pth \
  --move-coeff-pscore 1

# With optimization logging
larch-usher -i input.pb -o output.pb -c 10 \
  -l my_optimization_log

Test plan

Parsimony-only mode unchanged behavior
ML-only mode accepts/rejects moves based on likelihood delta
Blended mode combines parsimony + ML scores
USE_TEST_LOGS=OFF suppresses debug protobuf artifacts
Logging off by default; -l enables it

🤖 Generated with Claude Code

Expose Thrifty/netam ML scoring through larch-usher CLI flags: - --scoring-backend [parsimony|ml] to select scoring method - --model-config FILE and --model-weights FILE for ML model paths - --move-coeff-ml FLOAT to weight ML score relative to parsimony When ML backend is active, fragment edges are scored using netam::crepe log-likelihood and blended with parsimony scores in the move accept/reject decision. The ML score is computed per-edge via poisson_context_log_likelihood and subtracted from the effective score (higher LL = better = lower score). CMakeLists.txt links netam to larch-usher when USE_NETAM=yes. All ML code paths are gated behind #ifdef USE_NETAM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Change ML scoring from absolute fragment LL to a proper delta: compute edge log-likelihoods for both new (post-move) and old (pre-move) parent-child relationships in the fragment, return the difference. Only changed edges are scored (~5-10 per move), not full tree traversals. Refactor ComputeFragmentMLScore into ComputeEdgeLL helper and delta computation. For each fragment edge, the old LL uses GetOld() to access pre-move compact genomes and parent topology. MoveNew nodes (created by SPR) have no old edge and contribute only to new LL. Scoring coefficients now default based on --scoring-backend: - parsimony (default): pscore=1, nodes=1, ml=0 - ml: pscore=0, nodes=1, ml=1.0 User-specified values always override. Validate that --move-coeff-ml requires --scoring-backend ml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When --scoring-backend ml is active, compute and display the total log-likelihood of the best parsimony tree at each iteration in the optimization log. Samples the min-parsimony tree from the DAG and sums per-edge log-likelihoods using the loaded netam model. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move before_optimize.pb and after_optimize.pb writes behind #ifdef USE_TEST_LOGS (default: OFF). Previously after_optimize.pb was always written unconditionally, and before_optimize.pb was tied to KEEP_ASSERTS (which also gates the MAT/MADAG equality check, kept separate). Add USE_TEST_LOGS option to CMakeLists.txt, larch_compile_opts, and the pixi build env template. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When --quiet is passed, skip creating the optimization_log/ directory, writing logfile.csv, and writing intermediate DAGs. The logfile open/write/close and directory creation are now inside the existing write_intermediate_dag guard. Stdout logging (parsimony scores, tree counts, etc.) is unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The sampled tree from MinWeightSampleTree needs RecomputeCompactGenomes to populate node compact genomes from edge mutations. Without this, all nodes had empty CGs (equal to reference), producing LL=0 for all edges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The new-node penalty is scaled for parsimony (1 node ≈ 1 mutation) but is not meaningful on the log-likelihood scale. Default to nodes=0 for ML mode so only LL delta drives accept/reject. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract the logging lambda and related setup into an OptimizationLogger class that owns the logfile, timer, intermediate DAG saves, and ML scoring. This replaces the large lambda with scattered #ifdef blocks. CLI changes: - Logging is now opt-in via -l,--logpath DIR (was on by default) - --quiet flag removed (no longer needed) - --inter-save now requires -l - Intermediate DAGs and snapshots are written inside the log directory (e.g. <logdir>/intermediate.pb, <logdir>/snapshot.5.pb) - BestTreeLL column added to logfile.csv when ML backend is active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract MLScoringConfig struct with AdjustScore() method, replacing duplicated ML scoring blocks and loose ml_model_/ml_coeff_ members across three callback structs - Branch upfront in ComputeMetrics for use_ua_free_parsimony to avoid wasting a full BinaryParsimonyScore traversal - Fix BestTreeLL column missing from logfile header by setting ML model before opening the logfile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add --list-tags to larch-test to print all available test tags - Add pixi run test-netam task for running all netam-tagged tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- WriteHeader/WriteRow: each line now ends with newline (CSV well-formed mid-run) - Rename ComputeBestTreeLL -> ComputeBestParsimonyTreeLL - FormatExt: use switch with Fail() for unsupported formats - Comment on Merge_All_Moves_Found_Callback explaining why it skips ML scoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

davidrich27 · 2026-03-25T11:35:36Z

Clean Code Review

Ran a clean-code-reviewer analysis on the PR. Here's a summary of issues found and how they were addressed:

#	Issue	Resolution
1	`ComputeMetrics` wasted a full `BinaryParsimonyScore` traversal when `use_ua_free_parsimony_` was true	Branch upfront with `if/else`; added comment that `max_parsimony` always uses `MaxBinaryParsimonyScore`
2	ML scoring block duplicated across 3 callback structs (identical code + two loose members each)	Introduced `MLScoringConfig` struct with `AdjustScore()` method; each callback holds a single `ml_config_` member
3	`set_ml_scoring` lambda broke encapsulation by accessing members directly	Resolved by #2 — `ml_config_` is set via assignment
4	`timer_` never reset between iterations — `SecondsElapsed` cumulative?	No change needed — verified this matches old behavior. `SecondsElapsed` is cumulative wall clock from start; per-iteration time derivable by diffing rows
5	`WriteHeader()` had no trailing newline; CSV malformed mid-run	Header and rows now each end with `'\n'`
6	`ComputeBestTreeLL` misleading — computes LL of best-parsimony tree, not best-LL tree	Renamed to `ComputeBestParsimonyTreeLL`. DAG-level LL optimization (Felsenstein-like DP) not yet implemented
7	`FormatExt()` silently defaulted to `.pb` for unknown formats	Replaced with `switch` + `Fail()` on default
8	`Merge_All_Moves_Found_Callback` silently skipped ML scoring with no explanation	Added comment: this callback is for max DAG exploration (`--callback-option all-moves`), intentionally skips all scoring

Strengths noted

OptimizationLogger class is a clear improvement over the captured lambda
-l opt-in default is the right design
Coefficient resolution with std::optional and backend-dependent defaults is clean
ComputeFragmentMLScore documentation explains the non-obvious delta LL well
Fail-fast argument validation gives actionable error messages

matsen · 2026-03-26T10:59:20Z

🤖

PR Review: Add thrifty scoring backend to larch-usher CLI

Summary

This PR adds ML-based (Thrifty/netam) scoring to larch-usher's SPR move accept/reject decisions, enabling likelihood-guided DAG optimization. Key changes:

New CLI flags: --scoring-backend [parsimony|ml], --model-config, --model-weights, --move-coeff-ml
Delta LL scoring: ComputeFragmentMLScore computes (new_LL - old_LL) for edges affected by each SPR move — only changed edges, not full tree traversals
OptimizationLogger class: Refactored from a large lambda; owns logfile, timer, intermediate DAG saves, ML scoring
MLScoringConfig struct: Deduplicates ML scoring logic across 3 callback structs via AdjustScore() method
Logging changes: -l,--logpath is now opt-in (was on by default), --quiet removed, --inter-save requires -l
USE_TEST_LOGS cmake option: Gates before_optimize.pb/after_optimize.pb debug artifacts (default OFF)
Test utilities: --list-tags flag for larch-test, pixi run test-netam task

No benchmarks or performance data are included in the PR description.

Concerns

1. Thread safety of ML scoring in callbacks

ComputeFragmentMLScore is called from OnMove in the callback structs. OnMove runs inside moves_batch_.AddElement() which holds only a shared_lock on mat_mtx_ — meaning multiple OnMove calls (and thus multiple model->forward() calls) can execute concurrently. The netam::crepe model is accessed via a shared raw pointer. Is netam::crepe::forward() thread-safe for concurrent reads? If not, this is a data race. The torch::NoGradGuard is thread-local, but the model weights/buffers are shared.

2. Per-edge forward passes (performance note)

Each ComputeEdgeLL call does a full model forward pass (encode → forward → softmax). For ~5-10 fragment edges per move × thousands of moves per iteration, this could be a significant bottleneck. Batching edges into a single forward pass would help. Acceptable for an initial implementation, but worth flagging for future optimization.

3. Score sign convention deserves a comment

In MLScoringConfig::AdjustScore:

return base_score - coeff * fragment_ll;

This is correct (positive delta LL → lower effective score → accepted), but the sign inversion relative to the LL convention is non-obvious. A one-line comment explaining the sign would help readers.

4. `ComputeBestParsimonyTreeLL` ignores UA-free parsimony setting

ComputeBestParsimonyTreeLL always uses BinaryParsimonyScore to sample the "best" tree, even when use_ua_free_parsimony_ is true. ComputeMetrics correctly branches between UAFreeParsimonyScore and BinaryParsimonyScore, but the LL computation doesn't follow suit. When --ignore-root_edge_mutations is set, the tree sampled for LL computation may not be the one reported as having the best parsimony score.

5. `std::remove("intermediate_newick1.pb.gz")` cleanup is fragile

In produce_mat_impl.hpp, the code removes a hardcoded filename that matOptimize writes unconditionally. The comment explains the rationale well, but if matOptimize changes this filename or writes additional files, the cleanup silently becomes incomplete. Pragmatic but brittle.

Minor note (pre-existing)

The CSV header writes MaxParsimony for the min-parsimony column and WorstParsimony for max-parsimony. This is inherited from the old code — not introduced by this PR — but confusing.

Overall

Well-structured PR. The OptimizationLogger refactor and MLScoringConfig deduplication are clean improvements. The delta-LL scoring approach (only scoring changed edges) is the right design. The CLI API with backend-dependent defaults and explicit override is well-designed. David's clean code review comment already addressed several quality issues.

Main items to resolve: (1) thread safety of concurrent model access, and (4) UA-free parsimony mismatch in LL computation.

The kmer indices feed libtorch embedding lookups using weights trained in Python netam, so the C++ encoder's index layout must match netam's. netam.sequences.generate_kmers prepends "N" (index 0) and places the 4^k ACGT kmers at indices 1..4^k; previously the C++ encoder put the N-kmer placeholder at the last index (4^k), misaligning every kmer's embedding row. Prepend the N placeholder instead and update the test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… suite

davidrich27 and others added 11 commits March 18, 2026 21:49

Add --list-tags flag to larch-test and pixi test-netam task

1c7b2bd

- Add --list-tags to larch-test to print all available test tags - Add pixi run test-netam task for running all netam-tagged tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

davidrich27 changed the title ~~WIP: Add ML scoring backend to larch-usher CLI~~ Add thrifty scoring backend to larch-usher CLI Mar 25, 2026

davidrich27 marked this pull request as ready for review March 25, 2026 11:37

davidrich27 requested a review from matsen March 25, 2026 11:37

davidrich27 and others added 2 commits June 3, 2026 20:39

rename bcr-larch to larch-bcr to be consistent with other programs in…

2fb426e

… suite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add thrifty scoring backend to larch-usher CLI#113

Add thrifty scoring backend to larch-usher CLI#113
davidrich27 wants to merge 13 commits into
pixi-environmentfrom
add-ml-scoring-cli

davidrich27 commented Mar 19, 2026 •

edited

Loading

Uh oh!

davidrich27 commented Mar 25, 2026

Uh oh!

matsen commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davidrich27 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

CLI API changes

New flags

Changed flags

Removed flags

Logging directory layout (when -l DIR is used)

Default coefficient behavior

Example usage

Test plan

Uh oh!

davidrich27 commented Mar 25, 2026

Clean Code Review

Strengths noted

Uh oh!

matsen commented Mar 26, 2026

PR Review: Add thrifty scoring backend to larch-usher CLI

Summary

Concerns

1. Thread safety of ML scoring in callbacks

2. Per-edge forward passes (performance note)

3. Score sign convention deserves a comment

4. ComputeBestParsimonyTreeLL ignores UA-free parsimony setting

5. std::remove("intermediate_newick1.pb.gz") cleanup is fragile

Minor note (pre-existing)

Overall

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidrich27 commented Mar 19, 2026 •

edited

Loading

Logging directory layout (when `-l DIR` is used)

4. `ComputeBestParsimonyTreeLL` ignores UA-free parsimony setting

5. `std::remove("intermediate_newick1.pb.gz")` cleanup is fragile