Skip to content

feat: add orchestrated backtest pipeline (sweep -> walk-forward -> monte carlo)#177

Merged
michaelchu merged 16 commits intomainfrom
claude/document-backtest-pipeline-7nYFC
Apr 6, 2026
Merged

feat: add orchestrated backtest pipeline (sweep -> walk-forward -> monte carlo)#177
michaelchu merged 16 commits intomainfrom
claude/document-backtest-pipeline-7nYFC

Conversation

@michaelchu
Copy link
Copy Markdown
Member

@michaelchu michaelchu commented Apr 6, 2026

When sweep_params are provided, the backtest tool now runs a full analysis
pipeline (pipeline=true): sweep -> significance gate ->
walk-forward validation -> OOS data gate -> monte carlo risk simulation.

Each stage reports a StageStatus (completed/skipped/failed) with reasons,
designed for frontend rendering of pass/fail gate cards. Users can opt out
with pipeline=false (default) to get just the sweep result.

Key changes:

  • Extract execute_from_returns() from monte_carlo.rs for in-process reuse
  • Add PipelineResponse and StageInfo types for frontend-renderable stages
  • Add pipeline.rs orchestrator with significance and OOS data gates
  • Add Pipeline variant to BacktestToolResponse enum
  • Add MIN_RETURNS_FOR_BOOTSTRAP constant (30) to constants.rs

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu

…nte carlo)

When sweep_params are provided, the backtest tool now runs a full analysis
pipeline by default (pipeline=true): sweep -> significance gate ->
walk-forward validation -> OOS data gate -> monte carlo risk simulation.

Each stage reports a StageStatus (completed/skipped/failed) with reasons,
designed for frontend rendering of pass/fail gate cards. Users can opt out
with pipeline=false to get just the sweep result.

Key changes:
- Extract execute_from_returns() from monte_carlo.rs for in-process reuse
- Add PipelineResponse and StageInfo types for frontend-renderable stages
- Add pipeline.rs orchestrator with significance and OOS data gates
- Add Pipeline variant to BacktestToolResponse enum
- Add MIN_RETURNS_FOR_BOOTSTRAP constant (30) to constants.rs

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Copilot AI review requested due to automatic review settings April 6, 2026 03:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an orchestrated “backtest pipeline” mode to the backtest tool so that, when sweeps are requested, the server can automatically chain follow-on analysis stages (walk-forward + Monte Carlo) and report per-stage statuses for frontend gate rendering.

Changes:

  • Introduces PipelineResponse / StageInfo / StageStatus response types and wires them into BacktestToolResponse as a new pipeline variant.
  • Adds a new src/tools/pipeline.rs orchestrator that runs sweep → significance gate → walk-forward → OOS data gate → Monte Carlo.
  • Refactors Monte Carlo to support execute_from_returns() for reuse by the pipeline; adds MIN_RETURNS_FOR_BOOTSTRAP constant.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/tools/response_types/pipeline.rs New pipeline response schema for stage/gate rendering.
src/tools/response_types/mod.rs Exposes the new pipeline response types.
src/tools/pipeline.rs Implements sweep-following orchestration and gating logic.
src/tools/monte_carlo.rs Extracts execute_from_returns() so Monte Carlo can run on derived returns.
src/tools/mod.rs Registers the new pipeline tool module.
src/tools/backtest.rs Adds pipeline param (default true for sweeps) and returns a new pipeline response variant.
src/server/mod.rs Updates tool docs to mention the new pipeline behavior.
src/constants.rs Adds MIN_RETURNS_FOR_BOOTSTRAP constant for Monte Carlo suitability gating.

Comment thread src/tools/response_types/pipeline.rs
Comment thread src/tools/response_types/pipeline.rs Outdated
Comment thread src/tools/pipeline.rs Outdated
Comment thread src/tools/pipeline.rs
Comment thread src/tools/monte_carlo.rs
Comment thread src/server/mod.rs Outdated
claude added 2 commits April 6, 2026 04:06
Add POST /runs/pipeline (sync) and POST /tasks/pipeline (async) endpoints
that mirror the MCP pipeline tool. Pipeline variant added to TaskKind for
task manager tracking.

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Add 4 integration tests covering:
- Significance gate fails → downstream stages skipped
- No permutation test → top combos pass significance gate
- Full pipeline end-to-end with NVDA fixture data
- Pipeline preserves sweep metadata (sweep_id, run_ids)

Also thread script_source and base_params through
tools::walk_forward::execute so the pipeline resolves strategy
source from the DB store instead of requiring filesystem access.

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Copilot AI review requested due to automatic review settings April 6, 2026 05:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Comment thread src/server/handlers/pipeline.rs
Comment thread src/server/handlers/tasks.rs
Comment thread src/tools/pipeline.rs Outdated
Comment thread src/tools/pipeline.rs Outdated
Comment thread src/tools/pipeline.rs Outdated
Comment thread src/tools/pipeline.rs
claude added 2 commits April 6, 2026 05:33
Walk-forward validation is always run as part of the backtest pipeline,
not as a standalone MCP tool. REST endpoints (/walk-forward, /tasks/walk-forward)
remain available for direct access.

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
- Add suggested_next_steps to PipelineResponse (consistent with other tools)
- Fix OOS data gate off-by-one: gate on returns.len() not equity.len()
- Fix has_permutation detection: use multiple_comparisons.is_some()
- Fix JoinError: distinguish panic vs cancellation in error messages
- Use MIN_RETURNS_FOR_BOOTSTRAP constant in monte_carlo.rs (was hardcoded 30)
- Update StageStatus docs to clarify gate-failed vs execution-error semantics
- Update backtest tool doc to mention oos_data_gate stage
- Validate non-empty sweep_params in REST pipeline handlers (400 not 500)
- Propagate DSL transpilation errors instead of swallowing them

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Copilot AI review requested due to automatic review settings April 6, 2026 05:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comment thread src/tools/backtest.rs
Comment thread src/tools/pipeline.rs
Comment thread src/tools/pipeline.rs Outdated
- Update sweep-only suggested_next_steps to reference backtest(pipeline=true)
  instead of removed walk_forward tool
- Use objective-aware metric in pipeline summary (not hardcoded Sharpe)
- Uppercase mc_label for consistent symbol casing in MonteCarloResponse

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Comment thread src/tools/pipeline.rs Outdated
Comment thread tests/pipeline.rs Outdated
claude added 2 commits April 6, 2026 13:13
Previously, walk-forward base_params were derived from SweepResult.params
(swept combo values only), which dropped non-swept params like CAPITAL,
symbol, profiles, etc. Now run_pipeline accepts the original sweep
base_params and passes them through to walk-forward.

Test fixtures updated: SweepResult.params now only contains swept keys,
with base params passed separately — matching real sweep behavior.

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Pipeline is now opt-in: set pipeline=true to run the full validation
chain (sweep -> walk-forward -> monte carlo). Default behavior is
sweep-only, matching the previous behavior before the pipeline feature.

https://claude.ai/code/session_01DEHwjSk7Y38DhefWeGZCLu
Copilot AI review requested due to automatic review settings April 6, 2026 14:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread src/tools/backtest.rs
Comment thread src/tools/pipeline.rs Outdated
Comment thread src/tools/pipeline.rs Outdated
Comment thread src/tools/pipeline.rs
claude and others added 2 commits April 6, 2026 14:49
…est-pipeline-7nYFC

# Conflicts:
#	src/server/handlers/mod.rs
#	src/server/router.rs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 6, 2026 15:11
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread src/tools/backtest.rs
Comment thread src/server/handlers/pipeline.rs
Comment thread src/server/handlers/tasks.rs
Comment thread src/server/handlers/pipeline.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 6, 2026 15:19
@michaelchu
Copy link
Copy Markdown
Member Author

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comment thread src/tools/pipeline.rs Outdated
Comment thread src/server/handlers/pipeline.rs
Comment thread src/server/handlers/tasks.rs Outdated
Copilot AI review requested due to automatic review settings April 6, 2026 15:52
@michaelchu michaelchu merged commit 2e199ef into main Apr 6, 2026
7 checks passed
@michaelchu michaelchu deleted the claude/document-backtest-pipeline-7nYFC branch April 6, 2026 15:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Comment thread src/tools/pipeline.rs
Comment on lines +44 to +49
// Stage 1: Sweep (already completed)
stages.push(StageInfo {
name: "sweep".to_string(),
status: StageStatus::Completed,
reason: None,
duration_ms: sweep_response.execution_time_ms,
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stages["sweep"].duration_ms is populated from sweep_response.execution_time_ms, but when num_permutations > 0 the permutation gate work happens after the sweep completes (in execute_sweep_raw via spawn_blocking) and does not update execution_time_ms. This can materially under-report the sweep stage duration in pipeline UI when permutations are enabled. Consider returning the total sweep+permutation duration from execute_sweep_raw (or measuring elapsed time around sweep+permutation) and using that for the sweep stage duration, or splitting permutation testing into its own stage with its own duration.

Suggested change
// Stage 1: Sweep (already completed)
stages.push(StageInfo {
name: "sweep".to_string(),
status: StageStatus::Completed,
reason: None,
duration_ms: sweep_response.execution_time_ms,
let num_permutations = base_params
.get("num_permutations")
.and_then(Value::as_u64)
.unwrap_or(0);
let sweep_response_value = serde_json::to_value(&sweep_response).ok();
let total_sweep_duration_ms = sweep_response_value.as_ref().and_then(|value| {
value
.get("total_execution_time_ms")
.and_then(Value::as_u64)
.and_then(|duration_ms| duration_ms.try_into().ok())
.or_else(|| {
value
.get("permutation_execution_time_ms")
.and_then(Value::as_u64)
.and_then(|duration_ms| duration_ms.try_into().ok())
.map(|permutation_duration_ms| {
sweep_response
.execution_time_ms
.saturating_add(permutation_duration_ms)
})
})
});
let sweep_stage_duration_ms =
total_sweep_duration_ms.unwrap_or(sweep_response.execution_time_ms);
let sweep_stage_reason = if num_permutations > 0 && total_sweep_duration_ms.is_none() {
Some(
"Permutation testing was enabled, but the sweep response did not include total sweep+permutation timing; displaying the reported sweep execution time only."
.to_string(),
)
} else {
None
};
// Stage 1: Sweep (already completed)
stages.push(StageInfo {
name: "sweep".to_string(),
status: StageStatus::Completed,
reason: sweep_stage_reason,
duration_ms: sweep_stage_duration_ms,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants