feat: add orchestrated backtest pipeline (sweep -> walk-forward -> monte carlo)#177
Conversation
…nte carlo) When sweep_params are provided, the backtest tool now runs a full analysis pipeline by default (pipeline=true): sweep -> significance gate -> walk-forward validation -> OOS data gate -> monte carlo risk simulation. Each stage reports a StageStatus (completed/skipped/failed) with reasons, designed for frontend rendering of pass/fail gate cards. Users can opt out with pipeline=false to get just the sweep result. Key changes: - Extract execute_from_returns() from monte_carlo.rs for in-process reuse - Add PipelineResponse and StageInfo types for frontend-renderable stages - Add pipeline.rs orchestrator with significance and OOS data gates - Add Pipeline variant to BacktestToolResponse enum - Add MIN_RETURNS_FOR_BOOTSTRAP constant (30) to constants.rs https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
There was a problem hiding this comment.
Pull request overview
Adds an orchestrated “backtest pipeline” mode to the backtest tool so that, when sweeps are requested, the server can automatically chain follow-on analysis stages (walk-forward + Monte Carlo) and report per-stage statuses for frontend gate rendering.
Changes:
- Introduces
PipelineResponse/StageInfo/StageStatusresponse types and wires them intoBacktestToolResponseas a newpipelinevariant. - Adds a new
src/tools/pipeline.rsorchestrator that runs sweep → significance gate → walk-forward → OOS data gate → Monte Carlo. - Refactors Monte Carlo to support
execute_from_returns()for reuse by the pipeline; addsMIN_RETURNS_FOR_BOOTSTRAPconstant.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tools/response_types/pipeline.rs | New pipeline response schema for stage/gate rendering. |
| src/tools/response_types/mod.rs | Exposes the new pipeline response types. |
| src/tools/pipeline.rs | Implements sweep-following orchestration and gating logic. |
| src/tools/monte_carlo.rs | Extracts execute_from_returns() so Monte Carlo can run on derived returns. |
| src/tools/mod.rs | Registers the new pipeline tool module. |
| src/tools/backtest.rs | Adds pipeline param (default true for sweeps) and returns a new pipeline response variant. |
| src/server/mod.rs | Updates tool docs to mention the new pipeline behavior. |
| src/constants.rs | Adds MIN_RETURNS_FOR_BOOTSTRAP constant for Monte Carlo suitability gating. |
Add POST /runs/pipeline (sync) and POST /tasks/pipeline (async) endpoints that mirror the MCP pipeline tool. Pipeline variant added to TaskKind for task manager tracking. https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Add 4 integration tests covering: - Significance gate fails → downstream stages skipped - No permutation test → top combos pass significance gate - Full pipeline end-to-end with NVDA fixture data - Pipeline preserves sweep metadata (sweep_id, run_ids) Also thread script_source and base_params through tools::walk_forward::execute so the pipeline resolves strategy source from the DB store instead of requiring filesystem access. https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Walk-forward validation is always run as part of the backtest pipeline, not as a standalone MCP tool. REST endpoints (/walk-forward, /tasks/walk-forward) remain available for direct access. https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
- Add suggested_next_steps to PipelineResponse (consistent with other tools) - Fix OOS data gate off-by-one: gate on returns.len() not equity.len() - Fix has_permutation detection: use multiple_comparisons.is_some() - Fix JoinError: distinguish panic vs cancellation in error messages - Use MIN_RETURNS_FOR_BOOTSTRAP constant in monte_carlo.rs (was hardcoded 30) - Update StageStatus docs to clarify gate-failed vs execution-error semantics - Update backtest tool doc to mention oos_data_gate stage - Validate non-empty sweep_params in REST pipeline handlers (400 not 500) - Propagate DSL transpilation errors instead of swallowing them https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
- Update sweep-only suggested_next_steps to reference backtest(pipeline=true) instead of removed walk_forward tool - Use objective-aware metric in pipeline summary (not hardcoded Sharpe) - Uppercase mc_label for consistent symbol casing in MonteCarloResponse https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Previously, walk-forward base_params were derived from SweepResult.params (swept combo values only), which dropped non-swept params like CAPITAL, symbol, profiles, etc. Now run_pipeline accepts the original sweep base_params and passes them through to walk-forward. Test fixtures updated: SweepResult.params now only contains swept keys, with base params passed separately — matching real sweep behavior. https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Pipeline is now opt-in: set pipeline=true to run the full validation chain (sweep -> walk-forward -> monte carlo). Default behavior is sweep-only, matching the previous behavior before the pipeline feature. https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
…est-pipeline-7nYFC # Conflicts: # src/server/handlers/mod.rs # src/server/router.rs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
| // Stage 1: Sweep (already completed) | ||
| stages.push(StageInfo { | ||
| name: "sweep".to_string(), | ||
| status: StageStatus::Completed, | ||
| reason: None, | ||
| duration_ms: sweep_response.execution_time_ms, |
There was a problem hiding this comment.
stages["sweep"].duration_ms is populated from sweep_response.execution_time_ms, but when num_permutations > 0 the permutation gate work happens after the sweep completes (in execute_sweep_raw via spawn_blocking) and does not update execution_time_ms. This can materially under-report the sweep stage duration in pipeline UI when permutations are enabled. Consider returning the total sweep+permutation duration from execute_sweep_raw (or measuring elapsed time around sweep+permutation) and using that for the sweep stage duration, or splitting permutation testing into its own stage with its own duration.
| // Stage 1: Sweep (already completed) | |
| stages.push(StageInfo { | |
| name: "sweep".to_string(), | |
| status: StageStatus::Completed, | |
| reason: None, | |
| duration_ms: sweep_response.execution_time_ms, | |
| let num_permutations = base_params | |
| .get("num_permutations") | |
| .and_then(Value::as_u64) | |
| .unwrap_or(0); | |
| let sweep_response_value = serde_json::to_value(&sweep_response).ok(); | |
| let total_sweep_duration_ms = sweep_response_value.as_ref().and_then(|value| { | |
| value | |
| .get("total_execution_time_ms") | |
| .and_then(Value::as_u64) | |
| .and_then(|duration_ms| duration_ms.try_into().ok()) | |
| .or_else(|| { | |
| value | |
| .get("permutation_execution_time_ms") | |
| .and_then(Value::as_u64) | |
| .and_then(|duration_ms| duration_ms.try_into().ok()) | |
| .map(|permutation_duration_ms| { | |
| sweep_response | |
| .execution_time_ms | |
| .saturating_add(permutation_duration_ms) | |
| }) | |
| }) | |
| }); | |
| let sweep_stage_duration_ms = | |
| total_sweep_duration_ms.unwrap_or(sweep_response.execution_time_ms); | |
| let sweep_stage_reason = if num_permutations > 0 && total_sweep_duration_ms.is_none() { | |
| Some( | |
| "Permutation testing was enabled, but the sweep response did not include total sweep+permutation timing; displaying the reported sweep execution time only." | |
| .to_string(), | |
| ) | |
| } else { | |
| None | |
| }; | |
| // Stage 1: Sweep (already completed) | |
| stages.push(StageInfo { | |
| name: "sweep".to_string(), | |
| status: StageStatus::Completed, | |
| reason: sweep_stage_reason, | |
| duration_ms: sweep_stage_duration_ms, |
When sweep_params are provided, the backtest tool now runs a full analysis
pipeline (pipeline=true): sweep -> significance gate ->
walk-forward validation -> OOS data gate -> monte carlo risk simulation.
Each stage reports a StageStatus (completed/skipped/failed) with reasons,
designed for frontend rendering of pass/fail gate cards. Users can opt out
with pipeline=false (default) to get just the sweep result.
Key changes:
https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu