diff --git a/.github/agents/typeshed-update-agent.md b/.github/agents/typeshed-update-agent.md
index f7fcd719a65a..806a8fa3ff6f 100644
--- a/.github/agents/typeshed-update-agent.md
+++ b/.github/agents/typeshed-update-agent.md
@@ -20,8 +20,8 @@ The agent must follow the **Pyright Test Policy**.
 Update typeshed to <commit>
 
 Include the typeshed commit hash and summary.
-For now, create PRs against `origin` rather than `upstream`.
-Use the local `origin` remote as the PR base repository, and do not target `microsoft/pyright` unless explicitly asked.
+The PR must target the upstream Pyright repository (`microsoft/pyright`), not the fork repository.
+Use the fork branch as the PR head if needed, and verify the reported PR URL is under `github.com/microsoft/pyright`.
 
 ## Running The Update Script
 
diff --git a/.github/plans/pyright_benchmark_plan.md b/.github/plans/pyright_benchmark_plan.md
deleted file mode 100644
index db800e98aa29..000000000000
--- a/.github/plans/pyright_benchmark_plan.md
+++ /dev/null
@@ -1,1321 +0,0 @@
-# Pyright Ecosystem Performance, Correctness, and Heuristics Benchmark Plan
-
-## Goal
-
-Build a repeatable benchmark system for Pyright and Pylance that answers five questions on every meaningful change:
-
-1. Did diagnostics change?
-2. Did total runtime regress?
-3. Which phase regressed: parse, bind, type evaluation, import resolution, typeshed loading, completion building, or cache behavior?
-4. Which project shape triggered it?
-5. Can we safely tune evaluator heuristics such as recursion limits, union expansion limits, overload pruning thresholds, and protocol matching depth?
-
-The plan combines three ideas:
-
-- Use `mypy_primer` as the real-world project source of truth.
-- Use Ty/Ruff-style cold, warm, incremental, and language-server benchmarks.
-- Add Pyright-specific instrumentation so regressions and heuristic wins are explainable.
-
-## Status Update
-
-Current implementation status as of 2026-05-08:
-
-- Completed: benchmark test directory layout, shared benchmark utilities, benchmark README, parser/tokenizer JSON artifact output,
-  synthetic evaluator microbenchmarks, structured timing snapshots, evaluator phase timing metrics, curated ecosystem smoke
-  project manifest and selectors, and old/new/comparison report comparison helpers.
-- Completed: comparison helpers now support summary sections, largest regressions/improvements, threshold classification,
-  `old.json`, `new.json`, `comparison.json`, and `comparison.md` generation, plus loading reports back from disk and a
-  one-call compare-and-write flow.
-- Completed externally: CodSpeed bootstrap work has an initial PR in `bschnurr/pyright`, so the remaining local work is
-  to align this benchmark suite with that setup rather than starting CodSpeed integration from zero.
-- In progress: ecosystem benchmark runner implementation. The manifest, selectors, report schema, comparison pipeline,
-  and a `runEcosystemBenchmark.ts` entry point are in place for smoke-suite selection, local project execution with
-  per-project generated `pyrightconfig.json` files, packaged-CLI local runs, and report comparison. There is not yet a
-  `mypy_primer`-backed runner that prepares project checkouts, installs dependencies, honors project dates, and executes
-  base/head Pyright across the smoke suite automatically.
-- In progress: `mypy_primer` metadata synchronization now has a checked-in smoke snapshot, generated project metadata,
-  and local overrides for smoke-suite source roots. The sync parser handles the known upstream smoke metadata shape,
-  including `name_override`, `pyright_cmd=None`, dependency/install/platform/cost fields, duplicate derived names, and
-  portable checked-in `inputFile` paths.
-- Not started: heuristic sweep harness and LSP benchmarks.
-
-### Review Gaps To Address Next
-
-The branch review identified the following gaps that should be treated as near-term work before broad CI use:
-
-Current PR staging note: create PRs against `origin` (`bschnurr/pyright`) for now. Do not target `upstream`
-(`microsoft/pyright`) until the benchmark baseline and workflow shape are ready for upstream review.
-
-1. Add base/head Pyright orchestration. Project checkout preparation now exists, but CI still needs a workflow-level way
-  to build and pass distinct baseline and candidate Pyright commands for the selected project set.
-2. Add a checked-in main-branch baseline report. PR runs need a stable comparison target before CI artifact lookup is in
-  place. Running the smoke suite on a known `main` commit should update a committed baseline artifact that PRs can use
-  as `old.json` when producing comparison reports.
-
----
-
-## Core Objectives
-
-The benchmark system should support these goals:
-
-1. Detect diagnostic regressions on real projects.
-2. Detect total performance regressions.
-3. Attribute regressions to parser, binder, evaluator, import resolver, typeshed, LSP, completion, or memory behavior.
-4. Compare Pyright against Ty-style benchmark categories: cold check, warm check, time to first diagnostic, and incremental re-check.
-5. Reuse the same ecosystem strategy as `mypy_primer`.
-6. Benchmark Pylance/LSP operations like completion, hover, references, semantic tokens, and workspace load.
-7. Safely tune Pyright type-evaluator heuristics and bailout thresholds.
-8. Produce PR comments and artifacts that are useful to reviewers.
-9. Support local developer workflows for comparing a branch against `main`.
-
----
-
-## Why Reuse `mypy_primer`
-
-`mypy_primer` already solves a hard problem: maintaining a real-world corpus of typed Python projects that can be checked by different type checkers. It includes project metadata such as:
-
-```python
-Project(
-    location="https://github.com/pandas-dev/pandas",
-    pyright_cmd="{pyright} {paths}",
-    paths=["pandas"],
-    deps=[...],
-    expected_success=("mypy",),
-    cost={"mypy": 355, "ty": 14},
-)
-```
-
-Pyright should not invent a completely separate ecosystem list. Instead, Pyright should reuse the `mypy_primer` project list, then add Pyright-specific tags, benchmark tiers, performance metrics, and heuristic experiments.
-
-The role split should be:
-
-```text
-mypy_primer:
-  Did real-world diagnostics change?
-
-Pyright benchmark harness:
-  Why did performance change?
-  Which phase changed?
-  Which project pattern exposed it?
-  Which heuristic settings are safe?
-```
-
----
-
-## Benchmark Categories
-
-### 1. Microbenchmarks
-
-Run on every relevant PR.
-
-Purpose: catch parser, binder, evaluator, and completion hot-path regressions quickly.
-
-Example cases:
-
-```text
-micro/parser_large_file
-micro/tokenizer_comments_strings
-micro/binder_many_imports
-micro/union_expansion
-micro/large_union_narrowing
-micro/overload_many_candidates
-micro/overload_union_cross_product
-micro/protocol_many_members_match
-micro/protocol_many_members_mismatch
-micro/recursive_protocol
-micro/typed_dict_many_keys
-micro/typevar_constraint_matrix
-micro/deep_generic_alias_chain
-micro/literal_union_math
-micro/completion_list_building
-```
-
-Metrics:
-
-```text
-elapsedMs
-parseMs
-bindMs
-checkMs
-tokens/sec
-filesParsed
-filesBound
-filesChecked
-AST node count
-symbol count
-type cache hits/misses
-heapUsedMb
-```
-
-Use synthetic generators rather than committing giant hand-written Python files.
-
-Example generator targets:
-
-```text
-generateLargeUnionNarrowingCase(10, 50, 100, 250)
-generateManyOverloadsCase(10, 50, 100, 500)
-generateProtocolCase(members=50, match=true/false)
-generateLargeTypedDictCase(keys=100, 500, 1000)
-generateImportGraphCase(files=100, 1000)
-generateRecursiveAliasCase(depth=16, 32, 64, 128)
-```
-
----
-
-### 2. Ecosystem Smoke Benchmarks
-
-Run on most PRs that touch parser, binder, evaluator, import resolver, typeshed, or diagnostics.
-
-Use a curated subset of `mypy_primer` projects.
-
-Suggested smoke suite:
-
-```text
-black
-pytest
-attrs
-pydantic
-python-chess
-packaging
-rich
-mypy_primer
-django-modern-rest
-pandas
-```
-
-Reasoning:
-
-```text
-black:
-  Parser-heavy, practical codebase.
-
-pytest:
-  Large, dynamic Python codebase.
-
-attrs:
-  Dataclass-like patterns and decorators.
-
-pydantic:
-  Decorators, generics, validation model patterns.
-
-python-chess:
-  Relatively clean expected-success signal.
-
-packaging:
-  Small stable baseline.
-
-rich:
-  Practical typed library with meaningful structure.
-
-mypy_primer:
-  Typed tool codebase.
-
-django-modern-rest:
-  Web, Django-ish, pydantic-ish patterns.
-
-pandas:
-  Data-science, stubs-heavy, overload-heavy.
-```
-
-Target runtime: under 10–15 minutes.
-
-Metrics:
-
-```text
-diagnostic diff
-total runtime
-parse/bind/check/import resolver timings
-files analyzed
-memory usage
-phase-level deltas
-```
-
----
-
-### 3. Full Ecosystem Benchmarks
-
-Run nightly, manually, and on risky PRs.
-
-Use all `mypy_primer` projects that support Pyright via `pyright_cmd`.
-
-Use sharding:
-
-```yaml
-strategy:
-  matrix:
-    shard-index: [0, 1, 2, 3, 4, 5, 6, 7]
-```
-
-Inputs:
-
-```text
---suite full
---num-shards 8
---shard-index N
---project-date YYYY-MM-DD
-```
-
-The full run should compare:
-
-```text
-base commit vs head commit
-old diagnostics vs new diagnostics
-old metrics vs new metrics
-old phase timings vs new phase timings
-```
-
----
-
-### 4. Ty-Style Benchmarks
-
-Ty tracks more than one mode. Pyright should mirror the same broad categories:
-
-```text
-cold check:
-  Type-check a project from scratch.
-
-warm check:
-  Re-check with caches already populated.
-
-time to first diagnostic:
-  Start a language-server-like session and measure first diagnostics.
-
-incremental re-check:
-  Simulate an edit and measure diagnostics recomputation.
-```
-
-Benchmark operations:
-
-```text
-cold[project]
-warm[project]
-first_diagnostic[project]
-incremental[edit_private_function_body]
-incremental[edit_public_function_signature]
-incremental[edit_imported_symbol]
-incremental[edit_protocol_member]
-incremental[edit_type_alias]
-incremental[edit_pyproject_config]
-```
-
-Track:
-
-```text
-elapsedMs
-files invalidated
-files reparsed
-files rebound
-files rechecked
-diagnostics recomputed
-cache hits/misses
-memory before/after
-```
-
----
-
-### 5. Pylance/LSP Benchmarks
-
-CLI type checking does not exercise all user-visible performance paths. Add a dedicated LSP harness.
-
-Operations:
-
-```text
-lsp/open_workspace
-lsp/first_diagnostics
-lsp/completion_after_dot
-lsp/completion_import_statement
-lsp/completion_auto_imports_small
-lsp/completion_auto_imports_large
-lsp/hover_generic_call
-lsp/go_to_definition
-lsp/find_references
-lsp/rename_symbol
-lsp/document_symbols
-lsp/workspace_symbols
-lsp/semantic_tokens_large_file
-```
-
-Metrics:
-
-```text
-request latency p50/p95
-items produced
-items filtered
-auto-import candidates scanned
-sort/filter time
-symbol index lookup time
-diagnostics latency
-semantic token count
-heap before/after
-```
-
-Useful LSP stress workspaces:
-
-```text
-large venv
-pandas-like project
-django-like project
-repo with many exports
-repo with many same-named symbols
-repo with deep import graph
-```
-
----
-
-## Evaluator Heuristics Tuning
-
-This should be a first-class goal.
-
-Pyright has many evaluator heuristics and bailout thresholds. The benchmark suite should allow safe experimentation with:
-
-```text
-recursion limits
-union expansion limits
-overload candidate pruning
-protocol matching depth
-recursive type alias expansion
-speculative evaluation limits
-constraint solver bailout thresholds
-literal math / enum expansion thresholds
-TypedDict key analysis limits
-call-site cache eviction thresholds
-type cache sizing
-```
-
-The benchmark suite should answer:
-
-```text
-Can we lower or raise this limit?
-Does it improve performance?
-Does it change diagnostics?
-Does it reduce worst-case cliffs?
-Which real projects are affected?
-```
-
----
-
-## Evaluator Heuristic Sweeps
-
-Add a dedicated benchmark category:
-
-```text
-packages/pyright-internal/benchmarks/evaluatorHeuristics/
-  heuristicMatrix.json
-  runHeuristicSweep.ts
-  renderHeuristicReport.ts
-  cases/
-    recursiveAlias.ts
-    deepGenericAlias.ts
-    overloadUnionExpansion.ts
-    protocolRecursive.ts
-    constrainedTypeVarExplosion.ts
-    typedDictHugeKeySet.ts
-```
-
-Example `heuristicMatrix.json`:
-
-```json
-{
-  "recursionDepthLimit": [16, 32, 64, 128],
-  "unionExpansionLimit": [16, 32, 64, 128],
-  "overloadCandidateLimit": [32, 64, 128, 256],
-  "protocolMatchDepthLimit": [8, 16, 32, 64],
-  "typeAliasExpansionLimit": [16, 32, 64, 128],
-  "speculativeEvalLimit": [64, 128, 256, 512]
-}
-```
-
-Example command:
-
-```bash
-node runHeuristicSweep.js   --project pandas   --heuristic unionExpansionLimit   --values 16,32,64,128
-```
-
-Possible hidden/test-only override mechanism:
-
-```bash
-PYRIGHT_PERF_UNION_EXPANSION_LIMIT=32
-PYRIGHT_PERF_RECURSION_DEPTH_LIMIT=64
-PYRIGHT_PERF_PROTOCOL_DEPTH_LIMIT=16
-```
-
-Or a test-only config object:
-
-```ts
-const options = {
-  typeCheckingMode: "strict",
-  perfOptions: {
-    evaluatorHeuristics: {
-      unionExpansionLimit: 32,
-      recursionDepthLimit: 64,
-      protocolMatchDepthLimit: 16
-    }
-  }
-};
-```
-
----
-
-## Heuristic Instrumentation
-
-Add optional counters for when heuristics trigger:
-
-```text
-recursionLimitHitCount
-unionExpansionLimitHitCount
-overloadPrunedCandidateCount
-protocolDepthLimitHitCount
-typeAliasExpansionLimitHitCount
-speculativeEvalLimitHitCount
-constraintSolverBailoutCount
-maxTypeEvalRecursionDepth
-maxUnionExpansionSize
-maxProtocolMatchDepth
-maxOverloadCandidateCount
-```
-
-Example raw result:
-
-```json
-{
-  "case": "recursive_alias_depth_64",
-  "heuristic": "recursionDepthLimit",
-  "value": 32,
-  "diagnosticCount": 2,
-  "diagnosticDiff": false,
-  "elapsedMs": 84,
-  "checkMs": 72,
-  "bailoutCount": 1,
-  "maxObservedDepth": 31,
-  "cacheHitRate": 0.82
-}
-```
-
-Useful interpretation:
-
-```text
-pandas:
-  checkMs: +2.1%
-  overloadPrunedCandidateCount: 0
-  recursionLimitHitCount: 0
-
-pydantic:
-  checkMs: -14.8%
-  speculativeEvalLimitHitCount: +120
-  diagnosticDiff: false
-```
-
-That tells reviewers whether a heuristic helped safely.
-
----
-
-## Synthetic Cliff Tests
-
-Add synthetic cases that intentionally hit worst-case evaluator paths.
-
-```text
-synthetic[recursive_alias_depth][16,32,64,128]
-synthetic[overload_union_cross_product][4x4,8x8,16x16]
-synthetic[protocol_recursive_members][8,16,32]
-synthetic[generic_alias_chain][16,32,64,128]
-synthetic[constrained_typevar_matrix][4,8,16]
-synthetic[literal_union_math][32,64,128,256]
-synthetic[typed_dict_key_count][100,500,1000]
-```
-
-Goal: reveal complexity cliffs.
-
-Example output:
-
-```text
-recursive_alias_depth:
-  depth=16    8ms
-  depth=32   21ms
-  depth=64   98ms
-  depth=128  1100ms  ⚠️ cliff
-```
-
----
-
-## Real-Project Heuristic Targets
-
-Run heuristic sweeps against selected ecosystem projects.
-
-```text
-pandas:
-  overloads, stubs, data-science
-
-pydantic:
-  decorators, generics, dataclass-like transforms
-
-attrs:
-  dataclass-like, protocols
-
-sqlalchemy:
-  generics, overloads, ORM patterns
-
-xarray:
-  pandas/numpy typing, overloads
-
-jax:
-  numpy-style typing, generics
-
-pytest:
-  dynamic patterns, plugins
-
-django-modern-rest:
-  pydantic + web + serializers
-
-mypy_primer:
-  typed codebase, real tool
-```
-
-For each heuristic experiment, require:
-
-```text
-no unexpected diagnostic diff
-no new crashes
-no large increase in Unknown/Any if tracked
-performance improvement or reduced worst-case cliff
-```
-
----
-
-## Heuristic Decision Report
-
-Each heuristic sweep should produce a recommendation document.
-
-Example:
-
-```md
-# Heuristic sweep: unionExpansionLimit
-
-## Recommendation
-
-Keep default at 64.
-
-## Why
-
-- 32 improves worst-case synthetic benchmarks by 18–40%.
-- But 32 causes diagnostic diffs in pandas and xarray.
-- 64 avoids diffs and still prevents 128-depth explosion.
-- 128 gives no useful real-project benefit and increases check time in overload-heavy cases.
-
-## Results
-
-| Project | 32 | 64 | 128 | Diagnostic diff |
-|---|---:|---:|---:|---|
-| pandas | 41.2s | 44.0s | 46.7s | yes at 32 |
-| pydantic | 12.1s | 12.4s | 12.8s | no |
-| xarray | 31.4s | 33.0s | 36.5s | yes at 32 |
-```
-
-This turns heuristic tuning into an evidence-based process.
-
----
-
-## Project Tagging
-
-Add Pyright-specific tags on top of the `mypy_primer` manifest.
-
-Example `ecosystem-projects.overrides.json`:
-
-```json
-{
-  "pandas": {
-    "tags": ["large", "data-science", "numpy", "overloads", "stubs-heavy"]
-  },
-  "jax": {
-    "tags": ["large", "ml", "numpy", "generics", "overloads"]
-  },
-  "pydantic": {
-    "tags": ["decorators", "dataclass-like", "generics"]
-  },
-  "attrs": {
-    "tags": ["dataclass-like", "stubs", "protocols"]
-  },
-  "pytest": {
-    "tags": ["dynamic", "plugins", "large-tests"]
-  },
-  "django-modern-rest": {
-    "tags": ["django", "pydantic", "web"]
-  },
-  "sqlalchemy": {
-    "tags": ["orm", "generics", "overloads"]
-  },
-  "xarray": {
-    "tags": ["data-science", "pandas", "numpy", "large"]
-  }
-}
-```
-
-Commands:
-
-```bash
-node runEcosystemBenchmark.js --tag overloads
-node runEcosystemBenchmark.js --tag parser-heavy
-node runEcosystemBenchmark.js --tag data-science
-node runEcosystemBenchmark.js --tag decorators
-node runEcosystemBenchmark.js --tag completion-heavy
-```
-
-This lets a parser PR run parser-heavy projects, while an overload PR runs overload-heavy projects.
-
----
-
-## Metrics Model
-
-Every benchmark should emit structured JSON.
-
-Example:
-
-```json
-{
-  "benchmark": "cold[pandas]",
-  "suite": "ecosystem-smoke",
-  "project": "pandas",
-  "commit": "abc123",
-  "totalMs": 123456,
-  "parseMs": 1234,
-  "bindMs": 2345,
-  "checkMs": 100000,
-  "importResolverMs": 3456,
-  "typeshedLoadMs": 789,
-  "filesParsed": 1234,
-  "filesBound": 1234,
-  "filesChecked": 1200,
-  "sourceLines": 500000,
-  "tokenCount": 8000000,
-  "astNodeCount": 3000000,
-  "symbolCount": 400000,
-  "typeCacheHits": 123456,
-  "typeCacheMisses": 12345,
-  "overloadResolutionCount": 9876,
-  "unionExpansionCount": 1234,
-  "speculativeEvalCount": 2222,
-  "heuristicCounters": {
-    "recursionLimitHitCount": 0,
-    "unionExpansionLimitHitCount": 12,
-    "overloadPrunedCandidateCount": 300
-  },
-  "diagnosticCount": 42,
-  "heapUsedMb": 512
-}
-```
-
----
-
-## Comparison Output
-
-Generate:
-
-```text
-old.json
-new.json
-comparison.json
-comparison.md
-```
-
-### Checked-In Main Baseline
-
-PR comparisons need a stable baseline before the workflow can reliably fetch prior CI artifacts. Add a checked-in smoke
-baseline generated from a known `main` commit and use it as the default `old.json` input for PR smoke comparisons.
-
-Proposed layout:
-
-```text
-packages/pyright-internal/src/tests/benchmarks/baselines/
-  ecosystem-smoke-main.json
-  README.md
-```
-
-Baseline policy:
-
-- The checked-in baseline is generated only from `main` or an explicitly recorded main-branch commit.
-- The baseline report records the Pyright commit SHA, project snapshot date, benchmark suite, selected projects, Node and
-  Python versions, platform, and generated config mode.
-- PR benchmark runs generate `new.json` and compare it against `baselines/ecosystem-smoke-main.json` unless a fresher CI
-  artifact is supplied explicitly.
-- Updating the checked-in baseline should be a deliberate maintenance action after benchmark harness changes, project
-  snapshot refreshes, or accepted performance/diagnostic shifts on `main`.
-- The baseline should remain small and smoke-suite scoped. Full ecosystem and noisy exploratory runs should stay as CI
-  artifacts, not checked-in repository data.
-
-Near-term bootstrap command shape:
-
-```bash
-npm run build:cli:dev
-cd packages/pyright-internal
-npm run build
-npm run bench:ecosystem:run:local -- --suite smoke --project-root q:/path/to/main-checkouts --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-main
-```
-
-The implementation should add a runner option or script that copies the generated `baseline-report.json` into the
-checked-in baseline path and stamps it with the source commit. PR comparison mode should accept that baseline path without
-requiring the developer or workflow to manually rename files.
-
-Example Markdown report:
-
-```md
-# Pyright Ecosystem Benchmark
-
-Base: abc123
-Head: def456
-
-## Summary
-
-| Metric | Old | New | Delta |
-|---|---:|---:|---:|
-| Total time | 322.4s | 309.8s | -3.9% |
-| Parse time | 24.1s | 17.2s | -28.6% |
-| Bind time | 31.0s | 31.5s | +1.6% |
-| Check time | 250.7s | 247.9s | -1.1% |
-
-## Largest Regressions
-
-| Project | Old | New | Delta | Phase |
-|---|---:|---:|---:|---|
-| pandas | 58.2s | 63.1s | +8.4% | check |
-| jax | 41.0s | 43.7s | +6.6% | import resolver |
-
-## Largest Wins
-
-| Project | Old | New | Delta | Phase |
-|---|---:|---:|---:|---|
-| black | 11.2s | 8.0s | -28.6% | parse |
-```
-
----
-
-## Regression Thresholds
-
-Use both percent and absolute thresholds.
-
-Example:
-
-```json
-{
-  "failOnDiagnosticsDiff": true,
-  "warnTotalRegressionPct": 5,
-  "failTotalRegressionPct": 10,
-  "warnProjectRegressionPct": 10,
-  "failProjectRegressionPct": 20,
-  "minAbsoluteRegressionMs": 3000
-}
-```
-
-Reason: tiny projects can produce noisy percentage swings.
-
----
-
-## Project-Date Pinning
-
-Use a pinned project date for ecosystem stability.
-
-Example:
-
-```bash
-mypy_primer --type-checker pyright --project-date 2026-01-01
-```
-
-Store in the benchmark config:
-
-```json
-{
-  "projectDate": "2026-01-01"
-}
-```
-
-Update the date intentionally, maybe monthly, not accidentally on every run.
-
----
-
-## File Layout
-
-```text
-packages/pyright-internal/
-  src/tests/benchmarks/
-    README.md
-
-    micro/
-      runMicroBenchmarks.ts
-      cases/
-        parserLargeFile.ts
-        tokenizerStrings.ts
-        overloadCache.ts
-        unionExpansion.ts
-        recursiveAlias.ts
-        protocolMatching.ts
-        typedDictHuge.ts
-
-    ecosystem/
-      ecosystem-projects.generated.json
-      ecosystem-projects.overrides.json
-      syncMypyPrimerProjects.ts
-      runEcosystemBenchmark.ts
-      compareBenchmarkResults.ts
-      renderMarkdownReport.ts
-      projectTags.ts
-
-    lsp/
-      runLspBenchmarks.ts
-      lspPerfHarness.ts
-      scenarios/
-        completionLargeModule.json
-        completionAutoImports.json
-        hoverLargeUnion.json
-        semanticTokensLargeFile.json
-        findReferencesLargeWorkspace.json
-
-    evaluatorHeuristics/
-      heuristicMatrix.json
-      runHeuristicSweep.ts
-      renderHeuristicReport.ts
-      cases/
-        recursiveAlias.ts
-        deepGenericAlias.ts
-        overloadUnionExpansion.ts
-        protocolRecursive.ts
-        constrainedTypeVarExplosion.ts
-        typedDictHugeKeySet.ts
-
-    artifacts/
-      .gitignore
-```
-
----
-
-## CI Workflows
-
-### PR Smoke Benchmark
-
-```yaml
-name: Pyright ecosystem smoke benchmark
-
-on:
-  pull_request:
-    paths:
-      - 'packages/pyright/**'
-      - 'packages/pyright-internal/src/**'
-      - 'packages/pyright-internal/typeshed-fallback/**'
-
-jobs:
-  ecosystem-smoke:
-    runs-on: ubuntu-latest
-
-    steps:
-      - uses: actions/checkout@v4
-
-      - uses: actions/setup-node@v4
-
-      - uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-
-      - run: npm ci
-      - run: npm run build
-
-      - run: python -m pip install -U pip
-      - run: pip install git+https://github.com/hauntsaninja/mypy_primer.git
-
-      - name: Run smoke ecosystem benchmark
-        run: |
-          node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js \
-            --suite smoke \
-            --base origin/${{ github.base_ref }} \
-            --head ${{ github.sha }} \
-            --project-date 2026-01-01 \
-            --output artifacts/ecosystem-smoke
-
-      - uses: actions/upload-artifact@v4
-        with:
-          name: pyright-ecosystem-smoke
-          path: artifacts/ecosystem-smoke
-```
-
-### Nightly Full Benchmark
-
-```yaml
-name: Pyright ecosystem full benchmark
-
-on:
-  schedule:
-    - cron: '0 8 * * *'
-  workflow_dispatch:
-
-jobs:
-  full:
-    strategy:
-      fail-fast: false
-      matrix:
-        shard-index: [0,1,2,3,4,5,6,7]
-
-    runs-on: ubuntu-latest
-
-    steps:
-      - uses: actions/checkout@v4
-      - uses: actions/setup-node@v4
-      - uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-
-      - run: npm ci
-      - run: npm run build
-      - run: python -m pip install -U pip
-      - run: pip install git+https://github.com/hauntsaninja/mypy_primer.git
-
-      - run: |
-          node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js \
-            --suite full \
-            --num-shards 8 \
-            --shard-index ${{ matrix.shard-index }} \
-            --project-date 2026-01-01 \
-            --output artifacts/full-${{ matrix.shard-index }}
-```
-
-### Manual Targeted Benchmark
-
-```yaml
-on:
-  workflow_dispatch:
-    inputs:
-      tag:
-        description: 'Project tag: overloads, parser-heavy, data-science, decorators'
-        required: false
-      project:
-        description: 'Specific project regex'
-        required: false
-      heuristic:
-        description: 'Optional heuristic sweep name'
-        required: false
-```
-
----
-
-## Local Developer Commands
-
-Add scripts:
-
-```json
-{
-  "scripts": {
-    "bench:micro": "node packages/pyright-internal/benchmarks/micro/runMicroBenchmarks.js",
-    "bench:ecosystem:smoke": "node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js --suite smoke",
-    "bench:ecosystem:full": "node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js --suite full",
-    "bench:ecosystem:tag": "node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js --tag",
-    "bench:lsp": "node packages/pyright-internal/benchmarks/lsp/runLspBenchmarks.js",
-    "bench:heuristics": "node packages/pyright-internal/benchmarks/evaluatorHeuristics/runHeuristicSweep.js"
-  }
-}
-```
-
-Example usage:
-
-```bash
-npm run bench:micro
-npm run bench:ecosystem:smoke
-npm run bench:ecosystem:tag -- overloads
-npm run bench:lsp
-npm run bench:heuristics -- --heuristic recursionDepthLimit --values 16,32,64,128
-```
-
----
-
-## CodSpeed Integration
-
-Use CodSpeed for Tier 0 and selected stable microbenchmarks.
-
-Status update:
-
-- Initial CodSpeed setup already exists in an external PR in `bschnurr/pyright`.
-- The next step in this repo is to wire the stable microbenchmark subset into that setup once the local benchmark entry
-  points match the expected runner shape.
-
-Good candidates:
-
-```text
-parser large file
-tokenizer comments/strings
-union expansion
-overload many candidates
-protocol mismatch
-typed dict many keys
-completion list building
-```
-
-Do not start by putting all ecosystem benchmarks into CodSpeed. Use CodSpeed for stable, smaller, lower-noise cases. Use the ecosystem runner for heavier PR and nightly artifacts.
-
----
-
-## Optimization Use Cases
-
-### Parser/tokenizer rewrite
-
-Expected wins:
-
-```text
-parseMs lower
-token/sec higher
-totalMs lower on parser-heavy projects
-no diagnostic diff
-```
-
-Stress projects:
-
-```text
-black
-mypy
-pytest
-pandas
-sphinx-like docs projects
-```
-
-### Import resolver/cache changes
-
-Expected wins:
-
-```text
-importResolverMs lower
-typeshedLoadMs lower
-fewer filesystem stats
-fewer repeated module resolutions
-```
-
-Stress projects:
-
-```text
-pandas
-xarray
-jax
-scikit-learn
-django-style projects
-large venv workspace
-```
-
-### Overload resolution optimization
-
-Expected wins:
-
-```text
-checkMs lower
-overloadResolutionCount same or lower
-cache hit rate higher
-diagnostics unchanged
-```
-
-Stress projects:
-
-```text
-pandas
-jax
-xarray
-pydantic
-sqlalchemy
-numpy/scipy-stubs if included
-```
-
-### Evaluator heuristic tuning
-
-Expected wins:
-
-```text
-reduced worst-case cliffs
-fewer runaway expansions
-diagnostics unchanged
-bounded cache/memory growth
-```
-
-Stress cases:
-
-```text
-recursive aliases
-deep generic aliases
-union cross products
-large overload sets
-recursive protocols
-constrained TypeVar matrices
-```
-
-### Completion list building
-
-Expected wins:
-
-```text
-completion latency lower
-auto-import scan time lower
-sort/filter time lower
-items unchanged or intentionally improved
-```
-
-Stress workspaces:
-
-```text
-large venv
-pandas project
-django project
-repo with many exports
-```
-
-### Typeshed/stub changes
-
-Expected wins:
-
-```text
-diagnostic diffs explainable
-typeshedLoadMs stable
-checkMs stable
-Unknown/Any regressions detected if tracked
-```
-
-Stress projects:
-
-```text
-pandas
-requests users
-django-stubs users
-numpy/scipy-stubs users
-pydantic users
-```
-
----
-
-## MVP Implementation
-
-First useful version:
-
-1. [x] Add benchmark directory layout.
-2. [~] Add `syncMypyPrimerProjects.ts`.
-  - [x] Parse the checked-in smoke snapshot into generated metadata.
-  - [x] Write generated metadata from the built sync script back to the source tree.
-  - [x] Remove machine-local absolute `inputFile` paths from checked-in generated metadata.
-  - [x] Support full upstream `mypy_primer/projects.py` sync fields: `name_override`, `pyright_cmd=None`, deps,
-    install command, supported platforms, cost, and duplicate-location entries.
-3. [x] Generate `ecosystem-projects.generated.json`.
-4. [x] Add `ecosystem-projects.overrides.json`.
-  - [x] Add smoke-suite source root overrides for upstream entries that omit `paths`.
-5. [x] Add a smoke suite of 8–10 projects.
-6. [~] Add `runEcosystemBenchmark.ts`.
-  - [x] Parse smoke-suite selection inputs (`--suite`, `--tag`, `--project`, `--num-shards`, `--shard-index`, `--output`).
-  - [x] Write a selection manifest artifact for the resolved project set.
-  - [x] Compare existing ecosystem benchmark reports into `old.json`, `new.json`, `comparison.json`, and `comparison.md`.
-  - [x] Execute selected local project checkouts with provided baseline/candidate Pyright commands.
-  - [x] Generate per-project `pyrightconfig.json` files with config-relative source roots.
-  - [x] Add a packaged-CLI local run path for realistic local execution.
-  - [x] Prepare selected project checkouts with `--prepare-projects`.
-  - [x] Honor `--project-date` during checkout preparation.
-  - [x] Install project dependencies/stubs according to synced metadata with `--install-dependencies`.
-  - [ ] Run base vs head Pyright for the selected projects from synchronized `mypy_primer` checkouts.
-  - [x] Resolve the smoke suite from generated project metadata plus local overrides.
-  - [x] Preserve or deliberately merge project-level Pyright configuration instead of blindly replacing it.
-  - [x] Extend project-level `pyrightconfig.json` files when they exist.
-  - [x] Merge `[tool.pyright]` settings from `pyproject.toml` when no `pyrightconfig.json` exists, while preserving the
-    benchmark-owned include/exclude scope.
-  - [x] Include process status, stderr, and command details when Pyright does not emit JSON.
-7. [~] Run base vs head Pyright.
-  - [x] Execute local baseline/candidate commands against preexisting project checkouts.
-  - [x] Prepare project checkouts automatically when `--prepare-projects` is provided.
-  - [x] Honor `--project-date` during checkout preparation.
-  - [x] Install project dependencies/stubs according to synced metadata when `--install-dependencies` is provided.
-  - [ ] Build and pass distinct base/head Pyright commands automatically in CI.
-8. [ ] Capture:
-   - [x] total runtime
-   - [x] files analyzed
-   - [x] diagnostic count
-   - [x] severity counts
-  - [x] diagnostic diff
-   - [ ] process memory
-9. [~] Generate:
-  - [x] `old.json`
-  - [x] `new.json`
-  - [x] `comparison.json`
-  - [x] `comparison.md`
-  - [~] Wire these artifacts into an actual ecosystem benchmark runner output.
-  - [x] Include diagnostic and analyzed-file metrics in ecosystem comparison artifacts.
-  - [x] Add diagnostic-diff sections to comparison artifacts once diagnostic identities are captured.
-10. [ ] Add checked-in main-branch smoke baseline.
-  - [x] Add `src/tests/benchmarks/baselines/README.md` documenting checked-in baseline policy.
-  - [ ] Add `src/tests/benchmarks/baselines/ecosystem-smoke-main.json`.
-  - [x] Stamp refreshed baselines with source commit SHA, project snapshot date, refresh timestamp, and config mode.
-  - [x] Add a script or runner option to update the checked-in baseline from a verified main-branch run.
-  - [x] Make PR comparison mode default to the checked-in baseline when no explicit baseline report is supplied.
-11. [~] Add GitHub workflow.
-  - [x] Add a manual workflow for smoke comparison and baseline refresh runs.
-  - [x] In manual compare mode, run smoke benchmarks as `new.json` and compare against the checked-in main baseline.
-  - [x] In manual refresh mode, run smoke benchmarks and upload the refreshed checked-in baseline candidate.
-  - [ ] Add automatic PR triggering once the checked-in main baseline exists.
-12. [ ] Add one heuristic sweep:
-   - `recursionDepthLimit` or `unionExpansionLimit`
-13. [x] Add two synthetic heuristic cases:
-  - [x] recursive alias depth
-  - [x] overload union cross product
-14. [ ] Add one heuristic report:
-   - `heuristic-recommendation.md`
-
-MVP smoke project list:
-
-```text
-black
-pytest
-attrs
-pydantic
-python-chess
-packaging
-rich
-mypy_primer
-django-modern-rest
-pandas
-```
-
----
-
-## Longer-Term Implementation Stages
-
-### Stage 1: Correctness + wall time
-
-Use `mypy_primer` project list. Compare old vs new Pyright output and total runtime.
-
-### Stage 2: Phase metrics
-
-Add Pyright benchmark JSON output with parse, bind, check, import resolver, typeshed, and memory metrics.
-
-### Stage 3: LSP metrics
-
-Add Pylance-style LSP operation benchmark harness.
-
-### Stage 4: Heuristic sweeps
-
-Add test-only evaluator heuristic overrides and sweep reports.
-
-### Stage 5: PR comments
-
-Post concise benchmark summaries on PRs.
-
-### Stage 6: CodSpeed
-
-Use CodSpeed for stable microbenchmarks and low-noise hot paths.
-
-### Stage 7: Nightly dashboards
-
-Track trends over time for full ecosystem and heuristic counters.
-
----
-
-## Final Design Principle
-
-Use `mypy_primer` as the ecosystem correctness corpus, but own the Pyright performance and heuristic story.
-
-`mypy_primer` answers:
-
-```text
-Did behavior change on real projects?
-```
-
-The Pyright benchmark harness answers:
-
-```text
-Why did performance change?
-Which phase changed?
-Which project pattern exposed it?
-Which evaluator heuristic setting is safe?
-What should reviewers do with this information?
-```
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index ac0558702a80..fae015806ec4 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -12,7 +12,7 @@ on:
 
 jobs:
   build:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
+    if: github.repository == 'microsoft/pyright'
     runs-on: ubuntu-latest
     name: Build
 
@@ -43,7 +43,7 @@ jobs:
           path: packages/vscode-pyright/${{ env.VSIX_NAME }}
 
   create_release:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
+    if: github.repository == 'microsoft/pyright'
     runs-on: ubuntu-latest
     name: Create release
     needs: [build]
diff --git a/.github/workflows/mypy_primer_comment.yaml b/.github/workflows/mypy_primer_comment.yaml
index aa90dc84dac9..d9754a76d824 100644
--- a/.github/workflows/mypy_primer_comment.yaml
+++ b/.github/workflows/mypy_primer_comment.yaml
@@ -18,7 +18,7 @@ jobs:
   comment:
     name: Comment PR from mypy_primer
     runs-on: ubuntu-latest
-    if: ${{ (github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright') && github.event.workflow_run.conclusion == 'success' }}
+    if: ${{ github.event.workflow_run.conclusion == 'success' }}
     steps:
       - name: Download diffs
         uses: actions/github-script@v7
diff --git a/.github/workflows/mypy_primer_pr.yaml b/.github/workflows/mypy_primer_pr.yaml
index fc16feb937f3..5b1d4eb1d0c0 100644
--- a/.github/workflows/mypy_primer_pr.yaml
+++ b/.github/workflows/mypy_primer_pr.yaml
@@ -29,7 +29,6 @@ concurrency:
 
 jobs:
   mypy_primer:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
     name: Run mypy_primer on PR
     runs-on: ubuntu-latest
     permissions:
diff --git a/.github/workflows/mypy_primer_push.yaml b/.github/workflows/mypy_primer_push.yaml
index 08c191a3f157..db1c9270e479 100644
--- a/.github/workflows/mypy_primer_push.yaml
+++ b/.github/workflows/mypy_primer_push.yaml
@@ -22,7 +22,6 @@ concurrency:
 
 jobs:
   mypy_primer:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
     name: Run mypy_primer on push
     runs-on: ubuntu-latest
     permissions:
diff --git a/.github/workflows/pyright_ecosystem_benchmark.yaml b/.github/workflows/pyright_ecosystem_benchmark.yaml
deleted file mode 100644
index 5d2f40aa8e44..000000000000
--- a/.github/workflows/pyright_ecosystem_benchmark.yaml
+++ /dev/null
@@ -1,130 +0,0 @@
-name: Pyright ecosystem benchmark
-
-on:
-  workflow_dispatch:
-    inputs:
-      mode:
-        description: 'Run mode'
-        required: true
-        default: compare
-        type: choice
-        options:
-          - compare
-          - refresh-baseline
-      project_date:
-        description: 'Project checkout date passed to the ecosystem runner'
-        required: true
-        default: '2026-01-01'
-        type: string
-      project:
-        description: 'Optional project name regex filter'
-        required: false
-        default: ''
-        type: string
-      install_dependencies:
-        description: 'Install synced ecosystem project dependencies before running Pyright'
-        required: false
-        default: false
-        type: boolean
-
-concurrency:
-  group: ${{ github.workflow }}-${{ github.ref }}-${{ inputs.mode }}
-  cancel-in-progress: true
-
-jobs:
-  ecosystem-smoke:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
-    name: Ecosystem smoke benchmark
-    runs-on: ubuntu-latest
-    permissions:
-      contents: read
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-
-      - uses: actions/setup-node@v4
-        with:
-          node-version: '20'
-
-      - uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-
-      - name: Install dependencies
-        run: npm ci
-
-      - name: Build Pyright CLI and benchmark runner
-        run: |
-          npm run build:cli:dev
-          cd packages/pyright-internal
-          npm run build
-          npm run bench:ecosystem:sync
-
-      - name: Run ecosystem smoke comparison
-        if: ${{ inputs.mode == 'compare' }}
-        shell: bash
-        run: |
-          set -euo pipefail
-          cd packages/pyright-internal
-
-          baseline_path="./src/tests/benchmarks/baselines/ecosystem-smoke-main.json"
-          if [[ ! -f "$baseline_path" ]]; then
-            echo "Missing checked-in ecosystem smoke baseline at $baseline_path" >&2
-            exit 1
-          fi
-
-          args=(
-            --suite smoke
-            --project-root "$GITHUB_WORKSPACE/.ecosystem-projects"
-            --prepare-projects
-            --project-date "${{ inputs.project_date }}"
-            --candidate-executable "node ../pyright/index.js"
-            --main-baseline-report "$baseline_path"
-            --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-pr
-          )
-
-          if [[ -n "${{ inputs.project }}" ]]; then
-            args+=(--project "${{ inputs.project }}")
-          fi
-
-          if [[ "${{ inputs.install_dependencies }}" == "true" ]]; then
-            args+=(--install-dependencies)
-          fi
-
-          npm run bench:ecosystem:run -- "${args[@]}"
-
-      - name: Refresh ecosystem smoke baseline
-        if: ${{ inputs.mode == 'refresh-baseline' }}
-        shell: bash
-        run: |
-          set -euo pipefail
-          cd packages/pyright-internal
-
-          args=(
-            --suite smoke
-            --project-root "$GITHUB_WORKSPACE/.ecosystem-projects"
-            --prepare-projects
-            --project-date "${{ inputs.project_date }}"
-            --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-main
-            --baseline-source-commit "$GITHUB_SHA"
-          )
-
-          if [[ -n "${{ inputs.project }}" ]]; then
-            args+=(--project "${{ inputs.project }}")
-          fi
-
-          if [[ "${{ inputs.install_dependencies }}" == "true" ]]; then
-            args+=(--install-dependencies)
-          fi
-
-          npm run bench:ecosystem:update-main-baseline -- "${args[@]}"
-
-      - name: Upload ecosystem benchmark artifacts
-        uses: actions/upload-artifact@v4
-        with:
-          name: pyright-ecosystem-benchmark-${{ inputs.mode }}
-          path: |
-            packages/pyright-internal/src/tests/benchmarks/.generated/benchmark-results/
-            packages/pyright-internal/src/tests/benchmarks/baselines/ecosystem-smoke-main.json
-          if-no-files-found: warn
diff --git a/.github/workflows/validation.yml b/.github/workflows/validation.yml
index 5368eb2bb58e..7273dcd6ff48 100644
--- a/.github/workflows/validation.yml
+++ b/.github/workflows/validation.yml
@@ -14,7 +14,7 @@ on:
 
 jobs:
   typecheck:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
+    if: github.repository == 'microsoft/pyright'
     runs-on: ubuntu-latest
     name: Typecheck
 
@@ -34,7 +34,7 @@ jobs:
       - run: npx lerna exec --stream --no-bail -- tsc --noEmit
 
   style:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
+    if: github.repository == 'microsoft/pyright'
     runs-on: ubuntu-latest
     name: Style
 
@@ -57,7 +57,6 @@ jobs:
       - run: npm run check
 
   test:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
     strategy:
       fail-fast: false
       matrix:
@@ -127,7 +126,6 @@ jobs:
         working-directory: packages/pyright-internal
 
   build:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
     runs-on: ubuntu-latest
     name: Build
     needs: typecheck
@@ -153,7 +151,6 @@ jobs:
         working-directory: packages/vscode-pyright
 
   required:
-    if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright'
     runs-on: ubuntu-latest
     name: Required
     needs:
diff --git a/packages/pyright-internal/package.json b/packages/pyright-internal/package.json
index d218133453f6..ba66f68d1894 100644
--- a/packages/pyright-internal/package.json
+++ b/packages/pyright-internal/package.json
@@ -1,63 +1,58 @@
-{
-    "name": "pyright-internal",
-    "displayName": "pyright",
-    "description": "Type checker for the Python language",
-    "version": "1.1.409",
-    "license": "MIT",
-    "private": true,
-    "files": [
-        "dist"
-    ],
-    "scripts": {
-        "build": "tsc",
-        "clean": "shx rm -rf ./dist ./out",
-        "webpack:testserver": "rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development",
-        "webpack:testserver:watch": "npm run clean && rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development --watch",
-        "test": "npm run webpack:testserver && node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks",
-        "test:norebuild": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks",
-        "test:benchmark": "cross-env PYRIGHT_RUN_BENCHMARKS=1 node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testTimeout=300000 --runInBand --detectOpenHandles src/tests/benchmarks",
-        "bench:ecosystem:run": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js",
-        "bench:ecosystem:run:local": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js --baseline-executable \"node ../pyright/index.js\" --candidate-executable \"node ../pyright/index.js\"",
-        "bench:ecosystem:update-main-baseline": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js --baseline-executable \"node ../pyright/index.js\" --update-main-baseline",
-        "bench:ecosystem:sync": "node ./out/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.js",
-        "bench:ecosystem:smoke": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js --suite smoke --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-run",
-        "test:coverage": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks --reporters=jest-junit --reporters=default --coverage --coverageReporters=cobertura --coverageReporters=html --coverageReporters=json",
-        "test:imports": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest importResolver.test --forceExit --runInBand"
-    },
-    "dependencies": {
-        "@yarnpkg/fslib": "2.10.4",
-        "@yarnpkg/libzip": "2.3.0",
-        "chalk": "^4.1.2",
-        "chokidar": "^3.6.0",
-        "command-line-args": "^5.2.1",
-        "jsonc-parser": "^3.3.1",
-        "smol-toml": "^1.6.1",
-        "source-map-support": "^0.5.21",
-        "tmp": "^0.2.5",
-        "vscode-jsonrpc": "^9.0.0-next.8",
-        "vscode-languageserver": "^10.0.0-next.13",
-        "vscode-languageserver-protocol": "^3.17.6-next.13",
-        "vscode-languageserver-textdocument": "^1.0.11",
-        "vscode-languageserver-types": "^3.17.6-next.6",
-        "vscode-uri": "^3.1.0"
-    },
-    "devDependencies": {
-        "@rspack/cli": "^1.7.2",
-        "@rspack/core": "^1.7.2",
-        "@types/command-line-args": "^5.2.3",
-        "@types/fs-extra": "^11.0.4",
-        "@types/jest": "^30.0.0",
-        "@types/lodash": "^4.17.23",
-        "@types/node": "^22.19.6",
-        "@types/tmp": "^0.2.6",
-        "esbuild-loader": "^4.4.2",
-        "jest": "^30.2.0",
-        "jest-junit": "^16.0.0",
-        "shx": "^0.4.0",
-        "ts-jest": "^29.4.6",
-        "ts-loader": "^9.5.4",
-        "typescript": "~5.5.4",
-        "webpack": "^5.104.1",
-        "word-wrap": "1.2.5"
-    }
-}
+{
+    "name": "pyright-internal",
+    "displayName": "pyright",
+    "description": "Type checker for the Python language",
+    "version": "1.1.409",
+    "license": "MIT",
+    "private": true,
+    "files": [
+        "dist"
+    ],
+    "scripts": {
+        "build": "tsc",
+        "clean": "shx rm -rf ./dist ./out",
+        "webpack:testserver": "rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development",
+        "webpack:testserver:watch": "npm run clean && rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development --watch",
+        "test": "npm run webpack:testserver && node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks",
+        "test:norebuild": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks",
+        "test:benchmark": "cross-env PYRIGHT_RUN_BENCHMARKS=1 node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testTimeout=300000 --runInBand --detectOpenHandles src/tests/benchmarks",
+        "test:coverage": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks --reporters=jest-junit --reporters=default --coverage --coverageReporters=cobertura --coverageReporters=html --coverageReporters=json",
+        "test:imports": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest importResolver.test --forceExit --runInBand"
+    },
+    "dependencies": {
+        "@yarnpkg/fslib": "2.10.4",
+        "@yarnpkg/libzip": "2.3.0",
+        "chalk": "^4.1.2",
+        "chokidar": "^3.6.0",
+        "command-line-args": "^5.2.1",
+        "jsonc-parser": "^3.3.1",
+        "smol-toml": "^1.6.1",
+        "source-map-support": "^0.5.21",
+        "tmp": "^0.2.5",
+        "vscode-jsonrpc": "^9.0.0-next.8",
+        "vscode-languageserver": "^10.0.0-next.13",
+        "vscode-languageserver-protocol": "^3.17.6-next.13",
+        "vscode-languageserver-textdocument": "^1.0.11",
+        "vscode-languageserver-types": "^3.17.6-next.6",
+        "vscode-uri": "^3.1.0"
+    },
+    "devDependencies": {
+        "@rspack/cli": "^1.7.2",
+        "@rspack/core": "^1.7.2",
+        "@types/command-line-args": "^5.2.3",
+        "@types/fs-extra": "^11.0.4",
+        "@types/jest": "^30.0.0",
+        "@types/lodash": "^4.17.23",
+        "@types/node": "^22.19.6",
+        "@types/tmp": "^0.2.6",
+        "esbuild-loader": "^4.4.2",
+        "jest": "^30.2.0",
+        "jest-junit": "^16.0.0",
+        "shx": "^0.4.0",
+        "ts-jest": "^29.4.6",
+        "ts-loader": "^9.5.4",
+        "typescript": "~5.5.4",
+        "webpack": "^5.104.1",
+        "word-wrap": "1.2.5"
+    }
+}
diff --git a/packages/pyright-internal/src/common/timing.ts b/packages/pyright-internal/src/common/timing.ts
index 2487f5d6db33..29b41ad3f52e 100644
--- a/packages/pyright-internal/src/common/timing.ts
+++ b/packages/pyright-internal/src/common/timing.ts
@@ -10,24 +10,6 @@
 
 import { ConsoleInterface } from './console';
 
-export interface TimingStatSnapshot {
-    totalTimeMs: number;
-    callCount: number;
-}
-
-export interface TimingStatsSnapshot {
-    totalDurationMs: number;
-    findFiles: TimingStatSnapshot;
-    readFile: TimingStatSnapshot;
-    tokenize: TimingStatSnapshot;
-    parse: TimingStatSnapshot;
-    resolveImports: TimingStatSnapshot;
-    cycleDetection: TimingStatSnapshot;
-    bind: TimingStatSnapshot;
-    typeCheck: TimingStatSnapshot;
-    typeEvaluation: TimingStatSnapshot;
-}
-
 export class Duration {
     private _startTime: number;
 
@@ -84,20 +66,6 @@ export class TimingStat {
         const roundedTime = Math.round(totalTimeInSec * 100) / 100;
         return roundedTime.toString() + 'sec';
     }
-
-    getSnapshot(): TimingStatSnapshot {
-        return {
-            totalTimeMs: this.totalTime,
-            callCount: this.callCount,
-        };
-    }
-}
-
-function subtractTimingStatSnapshot(end: TimingStatSnapshot, start: TimingStatSnapshot): TimingStatSnapshot {
-    return {
-        totalTimeMs: end.totalTimeMs - start.totalTimeMs,
-        callCount: end.callCount - start.callCount,
-    };
 }
 
 export class TimingStats {
@@ -132,38 +100,6 @@ export class TimingStats {
     getTotalDuration() {
         return this.totalDuration.getDurationInSeconds();
     }
-
-    getSnapshot(): TimingStatsSnapshot {
-        return {
-            totalDurationMs: this.totalDuration.getDurationInMilliseconds(),
-            findFiles: this.findFilesTime.getSnapshot(),
-            readFile: this.readFileTime.getSnapshot(),
-            tokenize: this.tokenizeFileTime.getSnapshot(),
-            parse: this.parseFileTime.getSnapshot(),
-            resolveImports: this.resolveImportsTime.getSnapshot(),
-            cycleDetection: this.cycleDetectionTime.getSnapshot(),
-            bind: this.bindTime.getSnapshot(),
-            typeCheck: this.typeCheckerTime.getSnapshot(),
-            typeEvaluation: this.typeEvaluationTime.getSnapshot(),
-        };
-    }
-
-    getSnapshotDelta(start: TimingStatsSnapshot): TimingStatsSnapshot {
-        const end = this.getSnapshot();
-
-        return {
-            totalDurationMs: end.totalDurationMs - start.totalDurationMs,
-            findFiles: subtractTimingStatSnapshot(end.findFiles, start.findFiles),
-            readFile: subtractTimingStatSnapshot(end.readFile, start.readFile),
-            tokenize: subtractTimingStatSnapshot(end.tokenize, start.tokenize),
-            parse: subtractTimingStatSnapshot(end.parse, start.parse),
-            resolveImports: subtractTimingStatSnapshot(end.resolveImports, start.resolveImports),
-            cycleDetection: subtractTimingStatSnapshot(end.cycleDetection, start.cycleDetection),
-            bind: subtractTimingStatSnapshot(end.bind, start.bind),
-            typeCheck: subtractTimingStatSnapshot(end.typeCheck, start.typeCheck),
-            typeEvaluation: subtractTimingStatSnapshot(end.typeEvaluation, start.typeEvaluation),
-        };
-    }
 }
 
 export const timingStats = new TimingStats();
diff --git a/packages/pyright-internal/src/tests/benchmarks/README.md b/packages/pyright-internal/src/tests/benchmarks/README.md
deleted file mode 100644
index 8de27c6c9722..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/README.md
+++ /dev/null
@@ -1,130 +0,0 @@
-# Pyright Benchmarks
-
-This directory contains opt-in performance benchmarks for Pyright internals. They are excluded from the normal Jest test
-suite and run through the package benchmark script.
-
-```bash
-cd packages/pyright-internal
-npm run test:benchmark
-```
-
-Benchmark JSON artifacts are written under:
-
-```text
-src/tests/benchmarks/.generated/benchmark-results/
-```
-
-## Current Suites
-
-- `parserBenchmark.test.ts` measures parser throughput over representative Python corpora.
-- `tokenizerBenchmark.test.ts` measures tokenizer throughput and runs each corpus in a fresh child process to reduce
-    cross-test heap effects.
-- `evaluatorBenchmark.test.ts` measures cold analysis time for generated evaluator-heavy Python cases.
-- `ecosystemSmokeBenchmark.test.ts` validates the curated ecosystem smoke project manifest and writes it as a JSON
-    artifact derived from generated project metadata and local overrides for future mypy_primer-based runners.
-- `runEcosystemBenchmark.ts` provides the first ecosystem runner entry point: it resolves smoke-suite selection from CLI
-    filters, writes a run manifest artifact, executes selected local project checkouts with provided Pyright commands,
-    and compares existing or freshly executed ecosystem report files into
-    `old.json`/`new.json`/`comparison.json`/`comparison.md` artifacts, including diagnostic count metrics and added/removed
-    diagnostic summaries.
-- `syncMypyPrimerProjects.ts` is the first sync scaffold for normalizing `mypy_primer` project definitions into the
-    generated ecosystem metadata file consumed by the smoke manifest. The checked-in smoke snapshot now carries the
-    upstream `pyright_cmd` and `paths` data for the current smoke suite, so generated project configs can target real
-    source roots like `src`, `pandas`, `pydantic`, and `chess` instead of defaulting to the repo root.
-- `syntheticCases.ts` contains deterministic Python generators for recursive aliases, overload/union cross products,
-    protocol mismatches, generic alias chains, constrained TypeVar matrices, literal-union math, and large TypedDicts.
-- `ecosystemSmokeProjects.ts` derives the smoke project list from `ecosystem-projects.generated.json` and
-    `ecosystem-projects.overrides.json`, then exposes the existing tag/pattern/shard selection helpers.
-- `benchmarkComparison.ts` contains shared old/new result and report comparison helpers plus Markdown rendering for
-    summary, largest-regression, largest-improvement, threshold classification, `old.json`, `new.json`,
-    `comparison.json`, and `comparison.md` generation, including loading reports back from disk and writing the full
-    artifact set in one call.
-- `benchmarkUtils.ts` contains shared statistics, system metadata, corpus loading, JSON artifact writing, count
-    formatting, child-process benchmark helpers, and generated-source type analysis helpers.
-
-## Result Shape
-
-The current microbenchmark reports use this common envelope:
-
-```ts
-interface BenchmarkReport<ResultT> {
-    schemaVersion: number;
-    suiteName: string;
-    timestamp: string;
-    system: BenchmarkSystemInfo;
-    config: {
-        warmupIterations: number;
-        benchmarkIterations: number;
-    };
-    results: ResultT[];
-}
-```
-
-Individual suites add case-specific fields such as token count, AST node count, median time, p95 time, and throughput.
-Ecosystem benchmark results additionally preserve per-project fields like `filesAnalyzed`, diagnostic counts, normalized
-diagnostics, and total runtime so report artifacts can distinguish execution-scope changes from pure performance
-regressions.
-
-Generated per-project configs always own the benchmark `include` and `exclude` scope so local runs stay focused on source
-roots rather than tests. If a project has `pyrightconfig.json`, the generated config extends it. If a project only has
-`[tool.pyright]` in `pyproject.toml`, the runner copies those settings into the generated config and rebases known path
-fields like `extraPaths`, `stubPath`, `typeshedPath`, and `venvPath` relative to the generated config file.
-
-## Implementation Roadmap
-
-1. Extend microbenchmarks with deterministic generated cases for evaluator-heavy paths.
-2. Extend the ecosystem runner from selection-only manifest emission to base/head Pyright execution on a curated
-    mypy_primer-compatible project list.
-    The metadata source layer and first local execution path now exist; the next step is automated base/head ecosystem
-    execution driven from synchronized `mypy_primer` project checkouts.
-3. Use `TimingStats.getSnapshot()` for structured phase metrics rather than parsing CLI `--stats` text.
-4. Add heuristic counters and sweep reports for evaluator bailout thresholds.
-5. Add LSP operation benchmarks after CLI and ecosystem reporting are stable.
-
-## CodSpeed Notes
-
-Before adding CodSpeed integration, review the current CodSpeed documentation at <https://codspeed.io/docs>. Use CodSpeed
-only for stable, low-noise microbenchmarks at first; keep ecosystem, heuristic sweep, and LSP benchmarks in the JSON
-artifact/report workflow until their runtime and variance are better understood.
-
-Current status: initial CodSpeed setup already exists in an external PR in `bschnurr/pyright`. The next local step is to
-connect the stable microbenchmark subset in this directory to that setup rather than creating a second parallel CodSpeed
-path.
-
-Keep new benchmark cases deterministic and report-only by default. Performance thresholds should be introduced only after
-repeated runs establish noise levels.
-
-## Local Ecosystem Runs
-
-For real local ecosystem execution, use the packaged Pyright CLI rather than the internal `out/.../pyright.js`
-entrypoint. The packaged CLI picks up the bundled resources correctly and matches the way end users invoke Pyright.
-
-```bash
-cd q:/dev/pyright-benchmark-suite
-npm run build:cli:dev
-
-cd packages/pyright-internal
-npm run build
-npm run bench:ecosystem:sync
-npm run bench:ecosystem:run:local -- --suite smoke --project "black|attrs" --project-root q:/path/to/checkouts --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-local
-```
-
-`bench:ecosystem:run:local` defaults both baseline and candidate executables to `node ../pyright/index.js`, so the only
-required execution-specific arguments are the usual runner filters plus `--project-root` and `--output`.
-
-Add `--prepare-projects` to clone or update selected project checkouts under `--project-root`. When `--project-date` is
-provided, preparation checks out the newest project commit before that date. Add `--install-dependencies` to install
-synced dependency metadata and run synced install commands after checkout preparation.
-
-To refresh the checked-in smoke baseline from a verified main-branch run, execute the baseline side of the local runner,
-pass `--update-main-baseline`, and stamp the source commit:
-
-```bash
-npm run bench:ecosystem:update-main-baseline -- --suite smoke --project-root q:/path/to/main-checkouts --prepare-projects --project-date 2026-01-01 --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-main --baseline-source-commit <main-commit-sha>
-```
-
-PR comparison mode can then use the checked-in baseline by passing only the candidate report:
-
-```bash
-npm run bench:ecosystem:run -- --candidate-report ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-pr/candidate-report.json --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-pr-comparison
-```
diff --git a/packages/pyright-internal/src/tests/benchmarks/baselines/README.md b/packages/pyright-internal/src/tests/benchmarks/baselines/README.md
deleted file mode 100644
index 7caa38ed68f1..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/baselines/README.md
+++ /dev/null
@@ -1,7 +0,0 @@
-# Ecosystem Benchmark Baselines
-
-This directory is reserved for checked-in smoke benchmark baselines generated from `main` branch commits.
-
-`ecosystem-smoke-main.json` should be updated only from a deliberate main-branch run. PR comparisons can use that file as the default baseline when no fresher CI artifact is supplied.
-
-Full ecosystem reports and exploratory local runs should stay under `.generated/benchmark-results/` or CI artifacts rather than being checked in here.
diff --git a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.test.ts b/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.test.ts
deleted file mode 100644
index 065468e39537..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.test.ts
+++ /dev/null
@@ -1,363 +0,0 @@
-/*
- * benchmarkComparison.test.ts
- * Copyright (c) Microsoft Corporation.
- *
- * Tests for benchmark result comparison helpers.
- */
-
-import * as fs from 'fs';
-import * as os from 'os';
-import * as path from 'path';
-
-import {
-    calculatePercentDelta,
-    classifyBenchmarkRegression,
-    compareAndWriteBenchmarkReportFiles,
-    compareBenchmarkReportFiles,
-    compareBenchmarkReports,
-    compareBenchmarkResultSets,
-    getBenchmarkRegressionThresholdResults,
-    loadBenchmarkReport,
-    renderBenchmarkComparisonMarkdown,
-    summarizeBenchmarkComparison,
-    writeBenchmarkComparisonArtifacts,
-    writeBenchmarkReportComparisonArtifacts,
-} from './benchmarkComparison';
-import { BenchmarkReport, benchmarkReportSchemaVersion } from './benchmarkUtils';
-
-const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS';
-
-interface TestResult {
-    name: string;
-    medianMs?: number;
-    tokensPerSec?: number;
-}
-
-const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip;
-
-benchmarkSuite('Benchmark Comparison', () => {
-    test('calculates percent deltas', () => {
-        expect(calculatePercentDelta(100, 125)).toBe(25);
-        expect(calculatePercentDelta(100, 80)).toBe(-20);
-        expect(calculatePercentDelta(0, 0)).toBe(0);
-        expect(calculatePercentDelta(0, 10)).toBeUndefined();
-    });
-
-    test('compares common benchmark results and tracks added and removed cases', () => {
-        const comparison = compareBenchmarkResultSets<TestResult>(
-            [
-                { name: 'large_file', medianMs: 100, tokensPerSec: 1000 },
-                { name: 'removed_case', medianMs: 50, tokensPerSec: 500 },
-            ],
-            [
-                { name: 'large_file', medianMs: 115, tokensPerSec: 1200 },
-                { name: 'added_case', medianMs: 10, tokensPerSec: 100 },
-            ],
-            (result) => result.name,
-            [
-                { name: 'medianMs', getValue: (result) => result.medianMs, minAbsoluteDelta: 5 },
-                {
-                    name: 'tokensPerSec',
-                    getValue: (result) => result.tokensPerSec,
-                    lowerIsBetter: false,
-                    minAbsoluteDelta: 10,
-                },
-            ]
-        );
-
-        expect(comparison.addedKeys).toEqual(['added_case']);
-        expect(comparison.removedKeys).toEqual(['removed_case']);
-        expect(comparison.compared).toHaveLength(1);
-        expect(comparison.compared[0].metrics).toEqual([
-            {
-                metric: 'medianMs',
-                baselineValue: 100,
-                candidateValue: 115,
-                absoluteDelta: 15,
-                percentDelta: 15,
-                direction: 'regression',
-            },
-            {
-                metric: 'tokensPerSec',
-                baselineValue: 1000,
-                candidateValue: 1200,
-                absoluteDelta: 200,
-                percentDelta: 20,
-                direction: 'improvement',
-            },
-        ]);
-    });
-
-    test('compares benchmark report envelopes', () => {
-        const comparison = compareBenchmarkReports<TestResult>(
-            createTestReport('parser', '2026-05-07T00:00:00.000Z', [{ name: 'case_a', medianMs: 100 }]),
-            createTestReport('parser', '2026-05-07T01:00:00.000Z', [{ name: 'case_a', medianMs: 90 }]),
-            (result) => result.name,
-            [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-        );
-
-        expect(comparison.schemaVersion).toBe(benchmarkReportSchemaVersion);
-        expect(comparison.suiteName).toBe('parser');
-        expect(comparison.baselineTimestamp).toBe('2026-05-07T00:00:00.000Z');
-        expect(comparison.candidateTimestamp).toBe('2026-05-07T01:00:00.000Z');
-        expect(comparison.compared[0].metrics[0].direction).toBe('improvement');
-    });
-
-    test('rejects incompatible benchmark report envelopes', () => {
-        expect(() =>
-            compareBenchmarkReports<TestResult>(
-                createTestReport('parser', '2026-05-07T00:00:00.000Z', []),
-                createTestReport('tokenizer', '2026-05-07T01:00:00.000Z', []),
-                (result) => result.name,
-                [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-            )
-        ).toThrow('different suites');
-
-        expect(() =>
-            compareBenchmarkReports<TestResult>(
-                { ...createTestReport('parser', '2026-05-07T00:00:00.000Z', []), schemaVersion: 0 },
-                createTestReport('parser', '2026-05-07T01:00:00.000Z', []),
-                (result) => result.name,
-                [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-            )
-        ).toThrow('Unsupported baseline benchmark report schema version');
-    });
-
-    test('renders a markdown comparison table', () => {
-        const comparison = compareBenchmarkResultSets<TestResult>(
-            [
-                { name: 'case_a', medianMs: 100 },
-                { name: 'case_b', medianMs: 100 },
-            ],
-            [
-                { name: 'case_a', medianMs: 110 },
-                { name: 'case_b', medianMs: 80 },
-            ],
-            (result) => result.name,
-            [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-        );
-        const markdown = renderBenchmarkComparisonMarkdown(comparison);
-
-        expect(markdown).toContain('## Summary');
-        expect(markdown).toContain('Regressions: 1');
-        expect(markdown).toContain('Improvements: 1');
-        expect(markdown).toContain('## Largest Regressions');
-        expect(markdown).toContain('## Largest Improvements');
-        expect(markdown).toContain('| case_a | medianMs | 100.00 | 110.00 |');
-    });
-
-    test('summarizes benchmark comparison directions', () => {
-        const comparison = compareBenchmarkResultSets<TestResult>(
-            [
-                { name: 'regression', medianMs: 100 },
-                { name: 'improvement', medianMs: 100 },
-                { name: 'unchanged', medianMs: 100 },
-            ],
-            [
-                { name: 'regression', medianMs: 120 },
-                { name: 'improvement', medianMs: 80 },
-                { name: 'unchanged', medianMs: 100 },
-            ],
-            (result) => result.name,
-            [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-        );
-
-        expect(summarizeBenchmarkComparison(comparison, 1)).toMatchObject({
-            comparedResultCount: 3,
-            metricCount: 3,
-            regressionCount: 1,
-            improvementCount: 1,
-            unchangedCount: 1,
-            largestRegressions: [{ key: 'regression' }],
-            largestImprovements: [{ key: 'improvement' }],
-        });
-    });
-
-    test('classifies regression thresholds', () => {
-        const comparison = compareBenchmarkResultSets<TestResult>(
-            [
-                { name: 'warning_case', medianMs: 100 },
-                { name: 'failure_case', medianMs: 100 },
-                { name: 'small_absolute_case', medianMs: 100 },
-                { name: 'improvement_case', medianMs: 100 },
-            ],
-            [
-                { name: 'warning_case', medianMs: 106 },
-                { name: 'failure_case', medianMs: 112 },
-                { name: 'small_absolute_case', medianMs: 104 },
-                { name: 'improvement_case', medianMs: 90 },
-            ],
-            (result) => result.name,
-            [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-        );
-        const thresholdResults = getBenchmarkRegressionThresholdResults(comparison, {
-            warnRegressionPct: 5,
-            failRegressionPct: 10,
-            minAbsoluteRegression: 5,
-        });
-
-        expect(thresholdResults.map((result) => [result.key, result.severity])).toEqual([
-            ['failure_case', 'failure'],
-            ['warning_case', 'warning'],
-        ]);
-
-        const improvement = comparison.compared.find((result) => result.key === 'improvement_case');
-        expect(improvement).toBeDefined();
-        expect(classifyBenchmarkRegression(improvement!.metrics[0], { warnRegressionPct: 5 })).toBe('none');
-    });
-
-    test('writes comparison artifacts', () => {
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-comparison-'));
-
-        try {
-            const comparison = compareBenchmarkResultSets<TestResult>(
-                [{ name: 'case_a', medianMs: 100 }],
-                [{ name: 'case_a', medianMs: 110 }],
-                (result) => result.name,
-                [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-            );
-            const paths = writeBenchmarkComparisonArtifacts(outputDir, comparison);
-
-            expect(paths.jsonPath).toBe(path.join(outputDir, 'comparison.json'));
-            expect(paths.markdownPath).toBe(path.join(outputDir, 'comparison.md'));
-            expect(JSON.parse(fs.readFileSync(paths.jsonPath, 'utf-8'))).toEqual(comparison);
-            expect(fs.readFileSync(paths.markdownPath, 'utf-8')).toContain('| case_a | medianMs |');
-        } finally {
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('writes report comparison artifact set', () => {
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-report-comparison-'));
-        const baselineReport = createTestReport('parser', '2026-05-07T00:00:00.000Z', [
-            { name: 'case_a', medianMs: 100 },
-        ]);
-        const candidateReport = createTestReport('parser', '2026-05-07T01:00:00.000Z', [
-            { name: 'case_a', medianMs: 110 },
-        ]);
-
-        try {
-            const comparison = compareBenchmarkReports<TestResult>(
-                baselineReport,
-                candidateReport,
-                (result) => result.name,
-                [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-            );
-            const paths = writeBenchmarkReportComparisonArtifacts(
-                outputDir,
-                baselineReport,
-                candidateReport,
-                comparison
-            );
-
-            expect(paths.oldJsonPath).toBe(path.join(outputDir, 'old.json'));
-            expect(paths.newJsonPath).toBe(path.join(outputDir, 'new.json'));
-            expect(paths.jsonPath).toBe(path.join(outputDir, 'comparison.json'));
-            expect(paths.markdownPath).toBe(path.join(outputDir, 'comparison.md'));
-            expect(JSON.parse(fs.readFileSync(paths.oldJsonPath, 'utf-8'))).toEqual(baselineReport);
-            expect(JSON.parse(fs.readFileSync(paths.newJsonPath, 'utf-8'))).toEqual(candidateReport);
-        } finally {
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('loads and compares benchmark report files', () => {
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-report-load-'));
-        const baselineReport = createTestReport('parser', '2026-05-07T00:00:00.000Z', [
-            { name: 'case_a', medianMs: 100 },
-        ]);
-        const candidateReport = createTestReport('parser', '2026-05-07T01:00:00.000Z', [
-            { name: 'case_a', medianMs: 110 },
-        ]);
-        const baselineReportPath = path.join(outputDir, 'old.json');
-        const candidateReportPath = path.join(outputDir, 'new.json');
-
-        try {
-            fs.writeFileSync(baselineReportPath, JSON.stringify(baselineReport, undefined, 2), 'utf-8');
-            fs.writeFileSync(candidateReportPath, JSON.stringify(candidateReport, undefined, 2), 'utf-8');
-
-            expect(loadBenchmarkReport<TestResult>(baselineReportPath)).toEqual(baselineReport);
-
-            const comparison = compareBenchmarkReportFiles<TestResult>(
-                baselineReportPath,
-                candidateReportPath,
-                (result) => result.name,
-                [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-            );
-
-            expect(comparison.suiteName).toBe('parser');
-            expect(comparison.compared[0].metrics[0].direction).toBe('regression');
-        } finally {
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('compares and writes benchmark report files in one call', () => {
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-report-compare-write-'));
-        const baselineReport = createTestReport('parser', '2026-05-07T00:00:00.000Z', [
-            { name: 'case_a', medianMs: 100 },
-        ]);
-        const candidateReport = createTestReport('parser', '2026-05-07T01:00:00.000Z', [
-            { name: 'case_a', medianMs: 110 },
-        ]);
-        const baselineReportPath = path.join(outputDir, 'source-old.json');
-        const candidateReportPath = path.join(outputDir, 'source-new.json');
-
-        try {
-            fs.writeFileSync(baselineReportPath, JSON.stringify(baselineReport, undefined, 2), 'utf-8');
-            fs.writeFileSync(candidateReportPath, JSON.stringify(candidateReport, undefined, 2), 'utf-8');
-
-            const paths = compareAndWriteBenchmarkReportFiles<TestResult>(
-                baselineReportPath,
-                candidateReportPath,
-                outputDir,
-                (result) => result.name,
-                [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-            );
-
-            expect(paths.oldJsonPath).toBe(path.join(outputDir, 'old.json'));
-            expect(paths.newJsonPath).toBe(path.join(outputDir, 'new.json'));
-            expect(paths.jsonPath).toBe(path.join(outputDir, 'comparison.json'));
-            expect(paths.markdownPath).toBe(path.join(outputDir, 'comparison.md'));
-            expect(JSON.parse(fs.readFileSync(paths.oldJsonPath, 'utf-8'))).toEqual(baselineReport);
-            expect(JSON.parse(fs.readFileSync(paths.newJsonPath, 'utf-8'))).toEqual(candidateReport);
-        } finally {
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('rejects duplicate result keys', () => {
-        expect(() =>
-            compareBenchmarkResultSets<TestResult>(
-                [
-                    { name: 'duplicate', medianMs: 1 },
-                    { name: 'duplicate', medianMs: 2 },
-                ],
-                [],
-                (result) => result.name,
-                [{ name: 'medianMs', getValue: (result) => result.medianMs }]
-            )
-        ).toThrow('Duplicate benchmark result key');
-    });
-});
-
-function createTestReport(suiteName: string, timestamp: string, results: TestResult[]): BenchmarkReport<TestResult> {
-    return {
-        schemaVersion: benchmarkReportSchemaVersion,
-        suiteName,
-        timestamp,
-        system: {
-            platform: 'test',
-            arch: 'test',
-            cpus: 'test',
-            cpuCount: 1,
-            totalMemoryMB: 1,
-            nodeVersion: 'test',
-        },
-        config: {
-            warmupIterations: 0,
-            benchmarkIterations: 1,
-        },
-        results,
-    };
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.ts b/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.ts
deleted file mode 100644
index 6b863e2d8e37..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.ts
+++ /dev/null
@@ -1,462 +0,0 @@
-import * as fs from 'fs';
-import * as path from 'path';
-
-import { BenchmarkReport, benchmarkReportSchemaVersion } from './benchmarkUtils';
-
-export type BenchmarkMetricDirection = 'improvement' | 'regression' | 'unchanged';
-export type BenchmarkRegressionSeverity = 'none' | 'warning' | 'failure';
-
-export interface BenchmarkMetricDefinition<ResultT> {
-    name: string;
-    lowerIsBetter?: boolean;
-    minAbsoluteDelta?: number;
-    getValue: (result: ResultT) => number | undefined;
-}
-
-export interface BenchmarkMetricComparison {
-    metric: string;
-    baselineValue: number;
-    candidateValue: number;
-    absoluteDelta: number;
-    percentDelta: number | undefined;
-    direction: BenchmarkMetricDirection;
-}
-
-export interface BenchmarkResultComparison {
-    key: string;
-    metrics: BenchmarkMetricComparison[];
-}
-
-export interface BenchmarkResultSetComparison {
-    compared: BenchmarkResultComparison[];
-    addedKeys: string[];
-    removedKeys: string[];
-}
-
-export interface BenchmarkMetricComparisonSummaryEntry extends BenchmarkMetricComparison {
-    key: string;
-}
-
-export interface BenchmarkComparisonSummary {
-    comparedResultCount: number;
-    metricCount: number;
-    regressionCount: number;
-    improvementCount: number;
-    unchangedCount: number;
-    largestRegressions: BenchmarkMetricComparisonSummaryEntry[];
-    largestImprovements: BenchmarkMetricComparisonSummaryEntry[];
-}
-
-export interface BenchmarkRegressionThresholds {
-    warnRegressionPct?: number;
-    failRegressionPct?: number;
-    warnRegressionAbsolute?: number;
-    failRegressionAbsolute?: number;
-    minAbsoluteRegression?: number;
-}
-
-export interface BenchmarkRegressionThresholdResult extends BenchmarkMetricComparisonSummaryEntry {
-    severity: BenchmarkRegressionSeverity;
-}
-
-export interface BenchmarkReportComparison extends BenchmarkResultSetComparison {
-    schemaVersion: number;
-    suiteName: string;
-    baselineTimestamp: string;
-    candidateTimestamp: string;
-}
-
-export interface BenchmarkComparisonArtifactPaths {
-    jsonPath: string;
-    markdownPath: string;
-}
-
-export interface BenchmarkReportComparisonArtifactPaths extends BenchmarkComparisonArtifactPaths {
-    oldJsonPath: string;
-    newJsonPath: string;
-}
-
-export function calculatePercentDelta(baselineValue: number, candidateValue: number): number | undefined {
-    if (baselineValue === 0) {
-        return candidateValue === 0 ? 0 : undefined;
-    }
-
-    return ((candidateValue - baselineValue) / Math.abs(baselineValue)) * 100;
-}
-
-export function compareBenchmarkResultSets<ResultT>(
-    baselineResults: ReadonlyArray<ResultT>,
-    candidateResults: ReadonlyArray<ResultT>,
-    getKey: (result: ResultT) => string,
-    metrics: ReadonlyArray<BenchmarkMetricDefinition<ResultT>>
-): BenchmarkResultSetComparison {
-    const baselineByKey = indexResultsByKey(baselineResults, getKey);
-    const candidateByKey = indexResultsByKey(candidateResults, getKey);
-    const baselineKeys = [...baselineByKey.keys()].sort();
-    const candidateKeys = [...candidateByKey.keys()].sort();
-    const comparedKeys = baselineKeys.filter((key) => candidateByKey.has(key));
-
-    return {
-        compared: comparedKeys.map((key) =>
-            compareBenchmarkResult(key, baselineByKey.get(key)!, candidateByKey.get(key)!, metrics)
-        ),
-        addedKeys: candidateKeys.filter((key) => !baselineByKey.has(key)),
-        removedKeys: baselineKeys.filter((key) => !candidateByKey.has(key)),
-    };
-}
-
-export function compareBenchmarkReports<ResultT>(
-    baselineReport: BenchmarkReport<ResultT>,
-    candidateReport: BenchmarkReport<ResultT>,
-    getKey: (result: ResultT) => string,
-    metrics: ReadonlyArray<BenchmarkMetricDefinition<ResultT>>
-): BenchmarkReportComparison {
-    validateBenchmarkReportPair(baselineReport, candidateReport);
-
-    return {
-        schemaVersion: baselineReport.schemaVersion,
-        suiteName: baselineReport.suiteName,
-        baselineTimestamp: baselineReport.timestamp,
-        candidateTimestamp: candidateReport.timestamp,
-        ...compareBenchmarkResultSets(baselineReport.results, candidateReport.results, getKey, metrics),
-    };
-}
-
-export function loadBenchmarkReport<ResultT>(reportPath: string): BenchmarkReport<ResultT> {
-    const fileContents = fs.readFileSync(reportPath, 'utf-8');
-    return JSON.parse(fileContents) as BenchmarkReport<ResultT>;
-}
-
-export function compareBenchmarkReportFiles<ResultT>(
-    baselineReportPath: string,
-    candidateReportPath: string,
-    getKey: (result: ResultT) => string,
-    metrics: ReadonlyArray<BenchmarkMetricDefinition<ResultT>>
-): BenchmarkReportComparison {
-    return compareBenchmarkReports(
-        loadBenchmarkReport<ResultT>(baselineReportPath),
-        loadBenchmarkReport<ResultT>(candidateReportPath),
-        getKey,
-        metrics
-    );
-}
-
-export function compareAndWriteBenchmarkReportFiles<ResultT>(
-    baselineReportPath: string,
-    candidateReportPath: string,
-    outputDir: string,
-    getKey: (result: ResultT) => string,
-    metrics: ReadonlyArray<BenchmarkMetricDefinition<ResultT>>
-): BenchmarkReportComparisonArtifactPaths {
-    const baselineReport = loadBenchmarkReport<ResultT>(baselineReportPath);
-    const candidateReport = loadBenchmarkReport<ResultT>(candidateReportPath);
-    const comparison = compareBenchmarkReports(baselineReport, candidateReport, getKey, metrics);
-
-    return writeBenchmarkReportComparisonArtifacts(outputDir, baselineReport, candidateReport, comparison);
-}
-
-export function summarizeBenchmarkComparison(
-    comparison: BenchmarkResultSetComparison,
-    limit = 5
-): BenchmarkComparisonSummary {
-    const entries = getComparisonMetricEntries(comparison);
-    const regressions = entries.filter((entry) => entry.direction === 'regression');
-    const improvements = entries.filter((entry) => entry.direction === 'improvement');
-    const unchanged = entries.filter((entry) => entry.direction === 'unchanged');
-
-    return {
-        comparedResultCount: comparison.compared.length,
-        metricCount: entries.length,
-        regressionCount: regressions.length,
-        improvementCount: improvements.length,
-        unchangedCount: unchanged.length,
-        largestRegressions: sortMetricEntriesByMagnitude(regressions).slice(0, limit),
-        largestImprovements: sortMetricEntriesByMagnitude(improvements).slice(0, limit),
-    };
-}
-
-export function classifyBenchmarkRegression(
-    entry: BenchmarkMetricComparison,
-    thresholds: BenchmarkRegressionThresholds
-): BenchmarkRegressionSeverity {
-    if (entry.direction !== 'regression') {
-        return 'none';
-    }
-
-    const absoluteMagnitude = Math.abs(entry.absoluteDelta);
-    if (absoluteMagnitude < (thresholds.minAbsoluteRegression ?? 0)) {
-        return 'none';
-    }
-
-    if (exceedsRegressionThreshold(entry, thresholds.failRegressionPct, thresholds.failRegressionAbsolute)) {
-        return 'failure';
-    }
-
-    if (exceedsRegressionThreshold(entry, thresholds.warnRegressionPct, thresholds.warnRegressionAbsolute)) {
-        return 'warning';
-    }
-
-    return 'none';
-}
-
-export function getBenchmarkRegressionThresholdResults(
-    comparison: BenchmarkResultSetComparison,
-    thresholds: BenchmarkRegressionThresholds
-): BenchmarkRegressionThresholdResult[] {
-    return getComparisonMetricEntries(comparison)
-        .map((entry) => ({ ...entry, severity: classifyBenchmarkRegression(entry, thresholds) }))
-        .filter((entry) => entry.severity !== 'none')
-        .sort(compareThresholdResults);
-}
-
-export function renderBenchmarkComparisonMarkdown(comparison: BenchmarkResultSetComparison): string {
-    const summary = summarizeBenchmarkComparison(comparison);
-    const lines = [
-        '## Summary',
-        '',
-        `Compared cases: ${summary.comparedResultCount}`,
-        `Compared metrics: ${summary.metricCount}`,
-        `Regressions: ${summary.regressionCount}`,
-        `Improvements: ${summary.improvementCount}`,
-        `Unchanged: ${summary.unchangedCount}`,
-        '',
-    ];
-
-    appendMetricEntryTable(lines, '## Largest Regressions', summary.largestRegressions);
-    appendMetricEntryTable(lines, '## Largest Improvements', summary.largestImprovements);
-
-    lines.push(
-        '## Details',
-        '',
-        '| Case | Metric | Baseline | Candidate | Delta | Delta % | Direction |',
-        '|---|---:|---:|---:|---:|---:|---|'
-    );
-
-    for (const result of comparison.compared) {
-        for (const metric of result.metrics) {
-            lines.push(
-                `| ${result.key} | ${metric.metric} | ${formatMetric(metric.baselineValue)} | ${formatMetric(
-                    metric.candidateValue
-                )} | ${formatMetric(metric.absoluteDelta)} | ${formatPercent(metric.percentDelta)} | ${
-                    metric.direction
-                } |`
-            );
-        }
-    }
-
-    if (comparison.addedKeys.length > 0) {
-        lines.push('', `Added cases: ${comparison.addedKeys.join(', ')}`);
-    }
-
-    if (comparison.removedKeys.length > 0) {
-        lines.push('', `Removed cases: ${comparison.removedKeys.join(', ')}`);
-    }
-
-    return `${lines.join('\n')}\n`;
-}
-
-function appendMetricEntryTable(
-    lines: string[],
-    heading: string,
-    entries: ReadonlyArray<BenchmarkMetricComparisonSummaryEntry>
-): void {
-    lines.push(heading, '');
-
-    if (entries.length === 0) {
-        lines.push('None.', '');
-        return;
-    }
-
-    lines.push('| Case | Metric | Baseline | Candidate | Delta | Delta % |', '|---|---:|---:|---:|---:|---:|');
-
-    for (const entry of entries) {
-        lines.push(
-            `| ${entry.key} | ${entry.metric} | ${formatMetric(entry.baselineValue)} | ${formatMetric(
-                entry.candidateValue
-            )} | ${formatMetric(entry.absoluteDelta)} | ${formatPercent(entry.percentDelta)} |`
-        );
-    }
-
-    lines.push('');
-}
-
-function getComparisonMetricEntries(comparison: BenchmarkResultSetComparison): BenchmarkMetricComparisonSummaryEntry[] {
-    return comparison.compared.flatMap((result) => result.metrics.map((metric) => ({ key: result.key, ...metric })));
-}
-
-function sortMetricEntriesByMagnitude(
-    entries: ReadonlyArray<BenchmarkMetricComparisonSummaryEntry>
-): BenchmarkMetricComparisonSummaryEntry[] {
-    return [...entries].sort((left, right) => getMetricMagnitude(right) - getMetricMagnitude(left));
-}
-
-function getMetricMagnitude(entry: BenchmarkMetricComparison): number {
-    return Math.abs(entry.percentDelta ?? entry.absoluteDelta);
-}
-
-function exceedsRegressionThreshold(
-    entry: BenchmarkMetricComparison,
-    percentThreshold: number | undefined,
-    absoluteThreshold: number | undefined
-): boolean {
-    const percentMagnitude = entry.percentDelta === undefined ? undefined : Math.abs(entry.percentDelta);
-    const absoluteMagnitude = Math.abs(entry.absoluteDelta);
-
-    return (
-        (percentThreshold !== undefined && percentMagnitude !== undefined && percentMagnitude >= percentThreshold) ||
-        (absoluteThreshold !== undefined && absoluteMagnitude >= absoluteThreshold)
-    );
-}
-
-function compareThresholdResults(
-    left: BenchmarkRegressionThresholdResult,
-    right: BenchmarkRegressionThresholdResult
-): number {
-    const severityDelta = getSeverityRank(right.severity) - getSeverityRank(left.severity);
-    if (severityDelta !== 0) {
-        return severityDelta;
-    }
-
-    return getMetricMagnitude(right) - getMetricMagnitude(left);
-}
-
-function getSeverityRank(severity: BenchmarkRegressionSeverity): number {
-    switch (severity) {
-        case 'failure':
-            return 2;
-        case 'warning':
-            return 1;
-        case 'none':
-            return 0;
-    }
-}
-
-export function writeBenchmarkComparisonArtifacts(
-    outputDir: string,
-    comparison: BenchmarkResultSetComparison
-): BenchmarkComparisonArtifactPaths {
-    fs.mkdirSync(outputDir, { recursive: true });
-
-    const jsonPath = path.join(outputDir, 'comparison.json');
-    const markdownPath = path.join(outputDir, 'comparison.md');
-
-    fs.writeFileSync(jsonPath, JSON.stringify(comparison, undefined, 2), 'utf-8');
-    fs.writeFileSync(markdownPath, renderBenchmarkComparisonMarkdown(comparison), 'utf-8');
-
-    return { jsonPath, markdownPath };
-}
-
-export function writeBenchmarkReportComparisonArtifacts<ResultT>(
-    outputDir: string,
-    baselineReport: BenchmarkReport<ResultT>,
-    candidateReport: BenchmarkReport<ResultT>,
-    comparison: BenchmarkReportComparison
-): BenchmarkReportComparisonArtifactPaths {
-    fs.mkdirSync(outputDir, { recursive: true });
-
-    const oldJsonPath = path.join(outputDir, 'old.json');
-    const newJsonPath = path.join(outputDir, 'new.json');
-    fs.writeFileSync(oldJsonPath, JSON.stringify(baselineReport, undefined, 2), 'utf-8');
-    fs.writeFileSync(newJsonPath, JSON.stringify(candidateReport, undefined, 2), 'utf-8');
-
-    return {
-        oldJsonPath,
-        newJsonPath,
-        ...writeBenchmarkComparisonArtifacts(outputDir, comparison),
-    };
-}
-
-function validateBenchmarkReportPair<ResultT>(
-    baselineReport: BenchmarkReport<ResultT>,
-    candidateReport: BenchmarkReport<ResultT>
-): void {
-    validateBenchmarkReport(baselineReport, 'baseline');
-    validateBenchmarkReport(candidateReport, 'candidate');
-
-    if (baselineReport.suiteName !== candidateReport.suiteName) {
-        throw new Error(
-            `Cannot compare benchmark reports for different suites: ${baselineReport.suiteName}, ${candidateReport.suiteName}`
-        );
-    }
-}
-
-function validateBenchmarkReport<ResultT>(report: BenchmarkReport<ResultT>, label: string): void {
-    if (report.schemaVersion !== benchmarkReportSchemaVersion) {
-        throw new Error(
-            `Unsupported ${label} benchmark report schema version ${report.schemaVersion}; expected ${benchmarkReportSchemaVersion}.`
-        );
-    }
-}
-
-function compareBenchmarkResult<ResultT>(
-    key: string,
-    baselineResult: ResultT,
-    candidateResult: ResultT,
-    metrics: ReadonlyArray<BenchmarkMetricDefinition<ResultT>>
-): BenchmarkResultComparison {
-    return {
-        key,
-        metrics: metrics.flatMap((metric) => {
-            const baselineValue = metric.getValue(baselineResult);
-            const candidateValue = metric.getValue(candidateResult);
-
-            if (baselineValue === undefined || candidateValue === undefined) {
-                return [];
-            }
-
-            const absoluteDelta = candidateValue - baselineValue;
-            return [
-                {
-                    metric: metric.name,
-                    baselineValue,
-                    candidateValue,
-                    absoluteDelta,
-                    percentDelta: calculatePercentDelta(baselineValue, candidateValue),
-                    direction: getMetricDirection(absoluteDelta, metric),
-                },
-            ];
-        }),
-    };
-}
-
-function getMetricDirection<ResultT>(
-    absoluteDelta: number,
-    metric: BenchmarkMetricDefinition<ResultT>
-): BenchmarkMetricDirection {
-    const minAbsoluteDelta = metric.minAbsoluteDelta ?? 0;
-
-    if (Math.abs(absoluteDelta) <= minAbsoluteDelta) {
-        return 'unchanged';
-    }
-
-    const lowerIsBetter = metric.lowerIsBetter ?? true;
-    const isHigher = absoluteDelta > 0;
-
-    return lowerIsBetter === isHigher ? 'regression' : 'improvement';
-}
-
-function indexResultsByKey<ResultT>(
-    results: ReadonlyArray<ResultT>,
-    getKey: (result: ResultT) => string
-): Map<string, ResultT> {
-    const resultsByKey = new Map<string, ResultT>();
-
-    for (const result of results) {
-        const key = getKey(result);
-        if (resultsByKey.has(key)) {
-            throw new Error(`Duplicate benchmark result key: ${key}`);
-        }
-
-        resultsByKey.set(key, result);
-    }
-
-    return resultsByKey;
-}
-
-function formatMetric(value: number): string {
-    return value.toFixed(2);
-}
-
-function formatPercent(value: number | undefined): string {
-    return value === undefined ? 'n/a' : `${value.toFixed(2)}%`;
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/benchmarkUtils.ts b/packages/pyright-internal/src/tests/benchmarks/benchmarkUtils.ts
deleted file mode 100644
index 95836d2bd37e..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/benchmarkUtils.ts
+++ /dev/null
@@ -1,238 +0,0 @@
-import { execFileSync } from 'child_process';
-import * as fs from 'fs';
-import * as os from 'os';
-import * as path from 'path';
-
-import { ImportResolver } from '../../analyzer/importResolver';
-import { Program } from '../../analyzer/program';
-import { ConfigOptions } from '../../common/configOptions';
-import { NullConsole } from '../../common/console';
-import { DiagnosticCategory } from '../../common/diagnostic';
-import { FullAccessHost } from '../../common/fullAccessHost';
-import { RealTempFile, createFromRealFileSystem } from '../../common/realFileSystem';
-import { createServiceProvider } from '../../common/serviceProviderExtensions';
-import { TimingStatsSnapshot, timingStats } from '../../common/timing';
-import { UriEx } from '../../common/uri/uriUtils';
-
-export interface BenchmarkStats {
-    median: number;
-    p95: number;
-    min: number;
-    max: number;
-    avg: number;
-}
-
-export interface BenchmarkSystemInfo {
-    platform: string;
-    arch: string;
-    cpus: string;
-    cpuCount: number;
-    totalMemoryMB: number;
-    nodeVersion: string;
-}
-
-export interface BenchmarkReport<ResultT> {
-    schemaVersion: number;
-    suiteName: string;
-    timestamp: string;
-    system: BenchmarkSystemInfo;
-    config: {
-        warmupIterations: number;
-        benchmarkIterations: number;
-    };
-    results: ResultT[];
-}
-
-export interface TypeAnalysisSummary {
-    timing: TimingStatsSnapshot;
-    diagnosticCount: number;
-    errorCount: number;
-    warningCount: number;
-    informationCount: number;
-    statementCount: number;
-}
-
-export const benchmarkDataDir = path.resolve(__dirname, '..', 'benchmarkData');
-export const benchmarkResultsDir = path.join(__dirname, '.generated', 'benchmark-results');
-export const benchmarkReportSchemaVersion = 1;
-
-export function calculateStats(times: ReadonlyArray<number>): BenchmarkStats {
-    if (times.length === 0) {
-        throw new Error('Cannot calculate benchmark stats for an empty sample set.');
-    }
-
-    const sorted = [...times].sort((a, b) => a - b);
-    const len = sorted.length;
-
-    const median = len % 2 === 0 ? (sorted[len / 2 - 1] + sorted[len / 2]) / 2 : sorted[Math.floor(len / 2)];
-    const p95Index = Math.ceil(len * 0.95) - 1;
-    const p95 = sorted[Math.min(p95Index, len - 1)];
-    const min = sorted[0];
-    const max = sorted[len - 1];
-    const avg = times.reduce((a, b) => a + b, 0) / len;
-
-    return { median, p95, min, max, avg };
-}
-
-export function loadBenchmarkCorpus(filename: string): string {
-    const filePath = path.resolve(benchmarkDataDir, filename);
-    return fs.readFileSync(filePath, 'utf-8');
-}
-
-export function getSystemInfo(): BenchmarkSystemInfo {
-    const cpus = os.cpus();
-    return {
-        platform: os.platform(),
-        arch: os.arch(),
-        cpus: cpus[0]?.model ?? 'unknown',
-        cpuCount: cpus.length,
-        totalMemoryMB: Math.round(os.totalmem() / (1024 * 1024)),
-        nodeVersion: process.version,
-    };
-}
-
-export function createBenchmarkReport<ResultT>(
-    suiteName: string,
-    warmupIterations: number,
-    benchmarkIterations: number,
-    results: ResultT[]
-): BenchmarkReport<ResultT> {
-    return {
-        schemaVersion: benchmarkReportSchemaVersion,
-        suiteName,
-        timestamp: new Date().toISOString(),
-        system: getSystemInfo(),
-        config: {
-            warmupIterations,
-            benchmarkIterations,
-        },
-        results,
-    };
-}
-
-export function writeBenchmarkReport<ResultT>(
-    suiteName: string,
-    filePrefix: string,
-    report: BenchmarkReport<ResultT>
-): string {
-    const outputDir = path.join(benchmarkResultsDir, suiteName);
-    fs.mkdirSync(outputDir, { recursive: true });
-
-    const filename = `${filePrefix}-${new Date().toISOString().replace(/[:.]/g, '-')}.json`;
-    const outputPath = path.join(outputDir, filename);
-    fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8');
-    console.log(`\nBenchmark results written to: ${outputPath}`);
-
-    return outputPath;
-}
-
-export function formatCount(value: number): string {
-    return Math.round(value).toLocaleString();
-}
-
-export function getChildProcessOutput(error: unknown): string {
-    if (!(error instanceof Error)) {
-        return '';
-    }
-
-    const stdout = 'stdout' in error && typeof error.stdout === 'string' ? error.stdout : '';
-    const stderr = 'stderr' in error && typeof error.stderr === 'string' ? error.stderr : '';
-    return [stdout, stderr].filter((part) => part.length > 0).join('\n');
-}
-
-export function escapeRegExp(text: string): string {
-    return text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
-}
-
-export function runJestBenchmarkInFreshProcess<ResultT>(
-    testFilePath: string,
-    suiteName: string,
-    testName: string,
-    resultPrefix: string,
-    childModeEnv: string
-): ResultT {
-    const jestBinPath = path.resolve(__dirname, '..', '..', '..', 'node_modules', 'jest', 'bin', 'jest.js');
-
-    try {
-        const output = execFileSync(
-            process.execPath,
-            [
-                jestBinPath,
-                testFilePath,
-                '--runInBand',
-                '--forceExit',
-                '--testTimeout=300000',
-                '--testNamePattern',
-                `^${suiteName} ${escapeRegExp(testName)}$`,
-            ],
-            {
-                cwd: path.resolve(__dirname, '..', '..', '..'),
-                encoding: 'utf-8',
-                env: {
-                    ...process.env,
-                    [childModeEnv]: '1',
-                },
-            }
-        );
-
-        const resultLine = output.split(/\r?\n/).find((line) => line.startsWith(resultPrefix));
-
-        if (!resultLine) {
-            throw new Error(`Child benchmark for "${testName}" did not emit a result.\n${output}`);
-        }
-
-        return JSON.parse(resultLine.slice(resultPrefix.length)) as ResultT;
-    } catch (error) {
-        const output = getChildProcessOutput(error);
-        const message = error instanceof Error ? error.message : String(error);
-        throw new Error(`Child benchmark for "${testName}" failed.\n${message}${output ? `\n${output}` : ''}`);
-    }
-}
-
-export function analyzeBenchmarkSource(source: string, fileName: string): TypeAnalysisSummary {
-    (global as any).__rootDirectory = path.resolve(__dirname, '..', '..', '..');
-
-    const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-'));
-    const filePath = path.join(tempDir, fileName);
-    fs.writeFileSync(filePath, source, 'utf-8');
-
-    const tempFile = new RealTempFile();
-    const fileSystem = createFromRealFileSystem(tempFile);
-    const serviceProvider = createServiceProvider(fileSystem, new NullConsole(), tempFile);
-    const configOptions = new ConfigOptions(UriEx.file(tempDir));
-    configOptions.internalTestMode = true;
-
-    const importResolver = new ImportResolver(serviceProvider, configOptions, new FullAccessHost(serviceProvider));
-    const program = new Program(importResolver, configOptions, serviceProvider);
-    const fileUri = UriEx.file(filePath);
-    const startTiming = timingStats.getSnapshot();
-
-    try {
-        program.setTrackedFiles([fileUri]);
-
-        while (program.analyze()) {
-            // Continue until analysis completes.
-        }
-
-        const sourceFile = program.getSourceFile(fileUri);
-        if (!sourceFile) {
-            throw new Error(`Could not analyze generated benchmark file ${filePath}`);
-        }
-
-        const diagnostics = sourceFile.getDiagnostics(configOptions) ?? [];
-        const parseResults = sourceFile.getParseResults();
-
-        return {
-            timing: timingStats.getSnapshotDelta(startTiming),
-            diagnosticCount: diagnostics.length,
-            errorCount: diagnostics.filter((diag) => diag.category === DiagnosticCategory.Error).length,
-            warningCount: diagnostics.filter((diag) => diag.category === DiagnosticCategory.Warning).length,
-            informationCount: diagnostics.filter((diag) => diag.category === DiagnosticCategory.Information).length,
-            statementCount: parseResults?.parserOutput.parseTree.d.statements.length ?? 0,
-        };
-    } finally {
-        program.dispose();
-        serviceProvider.dispose();
-        fs.rmSync(tempDir, { force: true, recursive: true });
-    }
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.generated.json b/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.generated.json
deleted file mode 100644
index 6d2f1892938c..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.generated.json
+++ /dev/null
@@ -1,127 +0,0 @@
-[
-  {
-    "name": "attrs",
-    "mypyPrimerProject": "attrs",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/python-attrs/attrs",
-    "pyrightCommand": "{pyright}"
-  },
-  {
-    "name": "black",
-    "mypyPrimerProject": "black",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/psf/black",
-    "pyrightCommand": "{pyright} {paths}",
-    "paths": [
-      "src"
-    ]
-  },
-  {
-    "name": "django-modern-rest",
-    "mypyPrimerProject": "django-modern-rest",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/wemake-services/django-modern-rest",
-    "pyrightCommand": "{pyright}",
-    "paths": [
-      "dmr"
-    ]
-  },
-  {
-    "name": "mypy_primer",
-    "mypyPrimerProject": "mypy_primer",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/hauntsaninja/mypy_primer",
-    "pyrightCommand": "{pyright} {paths}",
-    "paths": [
-      "."
-    ]
-  },
-  {
-    "name": "packaging",
-    "mypyPrimerProject": "packaging",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/pypa/packaging",
-    "pyrightCommand": "{pyright} {paths}",
-    "paths": [
-      "src"
-    ]
-  },
-  {
-    "name": "pandas",
-    "mypyPrimerProject": "pandas",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/pandas-dev/pandas",
-    "pyrightCommand": "{pyright} {paths}",
-    "paths": [
-      "pandas"
-    ]
-  },
-  {
-    "name": "pydantic",
-    "mypyPrimerProject": "pydantic",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/pydantic/pydantic",
-    "pyrightCommand": "{pyright} {paths}",
-    "paths": [
-      "pydantic"
-    ]
-  },
-  {
-    "name": "pytest",
-    "mypyPrimerProject": "pytest",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/pytest-dev/pytest",
-    "pyrightCommand": "{pyright} {paths}",
-    "paths": [
-      "src",
-      "testing"
-    ]
-  },
-  {
-    "name": "python-chess",
-    "mypyPrimerProject": "python-chess",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/niklasf/python-chess",
-    "pyrightCommand": "{pyright} {paths}",
-    "paths": [
-      "chess"
-    ]
-  },
-  {
-    "name": "rich",
-    "mypyPrimerProject": "rich",
-    "source": {
-      "kind": "mypy-primer",
-      "inputFile": "mypy_primer.smoke_projects.snapshot.py"
-    },
-    "location": "https://github.com/Textualize/rich",
-    "pyrightCommand": "{pyright}"
-  }
-]
diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.overrides.json b/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.overrides.json
deleted file mode 100644
index 92bd86ed4ba9..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.overrides.json
+++ /dev/null
@@ -1,75 +0,0 @@
-{
-  "black": {
-    "includeInSmoke": true,
-    "smokeOrder": 0,
-    "cost": "medium",
-    "tags": ["parser-heavy", "typed-library"],
-    "reason": "Parser-heavy practical codebase with broad syntax coverage."
-  },
-  "pytest": {
-    "includeInSmoke": true,
-    "smokeOrder": 1,
-    "cost": "large",
-    "tags": ["dynamic", "plugins", "typed-library"],
-    "reason": "Large dynamic project with plugin patterns and pragmatic typing."
-  },
-  "attrs": {
-    "includeInSmoke": true,
-    "smokeOrder": 2,
-    "cost": "small",
-    "tags": ["dataclass-like", "decorators", "typed-library"],
-    "reason": "Dataclass-like decorator patterns with stable runtime.",
-    "sourcePaths": ["src"]
-  },
-  "pydantic": {
-    "includeInSmoke": true,
-    "smokeOrder": 3,
-    "cost": "medium",
-    "tags": ["decorators", "generics", "pydantic", "typed-library"],
-    "reason": "Decorator-heavy validation models with generics and dataclass-like transforms."
-  },
-  "python-chess": {
-    "includeInSmoke": true,
-    "smokeOrder": 4,
-    "cost": "small",
-    "tags": ["typed-library"],
-    "reason": "Clean typed library with a useful expected-success signal."
-  },
-  "packaging": {
-    "includeInSmoke": true,
-    "smokeOrder": 5,
-    "cost": "small",
-    "tags": ["typed-library"],
-    "reason": "Small stable baseline project for low-noise smoke runs."
-  },
-  "rich": {
-    "includeInSmoke": true,
-    "smokeOrder": 6,
-    "cost": "medium",
-    "tags": ["typed-library"],
-    "reason": "Practical typed library with meaningful module structure.",
-    "sourcePaths": ["rich"]
-  },
-  "mypy_primer": {
-    "includeInSmoke": true,
-    "smokeOrder": 7,
-    "cost": "small",
-    "tags": ["typed-library"],
-    "reason": "Typed tool codebase that anchors compatibility with the source project manifest.",
-    "sourcePaths": ["mypy_primer"]
-  },
-  "django-modern-rest": {
-    "includeInSmoke": true,
-    "smokeOrder": 8,
-    "cost": "medium",
-    "tags": ["django", "pydantic", "web"],
-    "reason": "Web project with Django-style and pydantic-style patterns."
-  },
-  "pandas": {
-    "includeInSmoke": true,
-    "smokeOrder": 9,
-    "cost": "large",
-    "tags": ["data-science", "large", "overloads", "stubs-heavy"],
-    "reason": "Data-science project that stresses overloads, stubs, and large-project behavior."
-  }
-}
\ No newline at end of file
diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeBenchmark.test.ts
deleted file mode 100644
index 7cd1a0261dc8..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeBenchmark.test.ts
+++ /dev/null
@@ -1,90 +0,0 @@
-/*
- * ecosystemSmokeBenchmark.test.ts
- * Copyright (c) Microsoft Corporation.
- *
- * Sanity checks and artifact emission for the curated ecosystem smoke benchmark manifest.
- */
-
-import { createBenchmarkReport, writeBenchmarkReport } from './benchmarkUtils';
-import {
-    ecosystemSmokeProjects,
-    getEcosystemSmokeProjectNames,
-    getEcosystemSmokeProjectTags,
-    getGeneratedEcosystemProject,
-    selectEcosystemSmokeProjects,
-} from './ecosystemSmokeProjects';
-
-const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS';
-
-interface EcosystemSmokeManifestResult {
-    suiteName: string;
-    projectCount: number;
-    tags: string[];
-    projects: typeof ecosystemSmokeProjects;
-}
-
-const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip;
-
-benchmarkSuite('Ecosystem Smoke Manifest', () => {
-    test('validates curated project metadata', () => {
-        const projectNames = getEcosystemSmokeProjectNames();
-        const uniqueProjectNames = new Set(projectNames);
-
-        expect(ecosystemSmokeProjects).toHaveLength(10);
-        expect(uniqueProjectNames.size).toBe(projectNames.length);
-        expect(projectNames).toEqual([
-            'black',
-            'pytest',
-            'attrs',
-            'pydantic',
-            'python-chess',
-            'packaging',
-            'rich',
-            'mypy_primer',
-            'django-modern-rest',
-            'pandas',
-        ]);
-
-        for (const project of ecosystemSmokeProjects) {
-            expect(project.mypyPrimerProject).toBeTruthy();
-            expect(project.tags.length).toBeGreaterThan(0);
-            expect(project.reason).toBeTruthy();
-        }
-
-        const result: EcosystemSmokeManifestResult = {
-            suiteName: 'ecosystem-smoke',
-            projectCount: ecosystemSmokeProjects.length,
-            tags: getEcosystemSmokeProjectTags(),
-            projects: ecosystemSmokeProjects,
-        };
-
-        writeBenchmarkReport(
-            'ecosystem-smoke',
-            'ecosystem-smoke-projects',
-            createBenchmarkReport('ecosystem-smoke', 0, 0, [result])
-        );
-    });
-
-    test('selects projects by tag, pattern, and shard', () => {
-        expect(selectEcosystemSmokeProjects({ tag: 'overloads' }).map((project) => project.name)).toEqual(['pandas']);
-        expect(
-            selectEcosystemSmokeProjects({ projectPattern: /django|pandas/ }).map((project) => project.name)
-        ).toEqual(['django-modern-rest', 'pandas']);
-
-        const shard0 = selectEcosystemSmokeProjects({ numShards: 2, shardIndex: 0 }).map((project) => project.name);
-        const shard1 = selectEcosystemSmokeProjects({ numShards: 2, shardIndex: 1 }).map((project) => project.name);
-        const combinedShards = [...shard0, ...shard1].sort();
-
-        expect(shard0).toEqual(['black', 'attrs', 'python-chess', 'rich', 'django-modern-rest']);
-        expect(shard1).toEqual(['pytest', 'pydantic', 'packaging', 'mypy_primer', 'pandas']);
-        expect(combinedShards).toEqual(getEcosystemSmokeProjectNames().sort());
-        expect(() => selectEcosystemSmokeProjects({ numShards: 2, shardIndex: 2 })).toThrow('shardIndex');
-    });
-
-    test('applies smoke overrides to generated source roots for pathless upstream projects', () => {
-        expect(getGeneratedEcosystemProject('attrs')?.paths).toEqual(['src']);
-        expect(getGeneratedEcosystemProject('rich')?.paths).toEqual(['rich']);
-        expect(getGeneratedEcosystemProject('mypy_primer')?.paths).toEqual(['mypy_primer']);
-        expect(getGeneratedEcosystemProject('black')?.paths).toEqual(['src']);
-    });
-});
diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeProjects.ts b/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeProjects.ts
deleted file mode 100644
index 1a54cb174ada..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeProjects.ts
+++ /dev/null
@@ -1,186 +0,0 @@
-import * as fs from 'fs';
-import * as path from 'path';
-
-import { GeneratedEcosystemProject } from './syncMypyPrimerProjects';
-
-export type EcosystemProjectCost = 'small' | 'medium' | 'large';
-
-export type EcosystemProjectTag =
-    | 'data-science'
-    | 'dataclass-like'
-    | 'decorators'
-    | 'django'
-    | 'dynamic'
-    | 'generics'
-    | 'large'
-    | 'overloads'
-    | 'parser-heavy'
-    | 'plugins'
-    | 'pydantic'
-    | 'stubs-heavy'
-    | 'typed-library'
-    | 'web';
-
-export interface EcosystemSmokeProject {
-    name: string;
-    mypyPrimerProject: string;
-    cost: EcosystemProjectCost;
-    tags: EcosystemProjectTag[];
-    reason: string;
-}
-
-export interface EcosystemSmokeProjectSelectionOptions {
-    tag?: EcosystemProjectTag;
-    projectPattern?: RegExp;
-    numShards?: number;
-    shardIndex?: number;
-}
-
-interface EcosystemProjectOverride {
-    includeInSmoke?: boolean;
-    smokeOrder?: number;
-    cost?: EcosystemProjectCost;
-    tags?: EcosystemProjectTag[];
-    reason?: string;
-    sourcePaths?: string[];
-}
-
-const generatedProjects = loadGeneratedProjects();
-const ecosystemProjectOverrides = loadProjectOverrides();
-
-const mergedGeneratedProjects = generatedProjects.map((project) => applyProjectOverrides(project));
-
-export const ecosystemSmokeProjects: readonly EcosystemSmokeProject[] = mergedGeneratedProjects
-    .map((project) => buildSmokeProject(project, ecosystemProjectOverrides[project.name]))
-    .filter((project): project is EcosystemSmokeProject => project !== undefined)
-    .sort((left, right) => getSmokeOrder(left.name) - getSmokeOrder(right.name));
-
-export function getEcosystemSmokeProjectNames(): string[] {
-    return ecosystemSmokeProjects.map((project) => project.name);
-}
-
-export function getGeneratedEcosystemProjects(): readonly GeneratedEcosystemProject[] {
-    return mergedGeneratedProjects;
-}
-
-export function getGeneratedEcosystemProject(projectName: string): GeneratedEcosystemProject | undefined {
-    return mergedGeneratedProjects.find((project) => project.name === projectName);
-}
-
-export function getEcosystemSmokeProjectsByTag(tag: EcosystemProjectTag): EcosystemSmokeProject[] {
-    return ecosystemSmokeProjects.filter((project) => project.tags.includes(tag));
-}
-
-export function getEcosystemSmokeProjectTags(): EcosystemProjectTag[] {
-    return Array.from(new Set(ecosystemSmokeProjects.flatMap((project) => project.tags))).sort();
-}
-
-export function selectEcosystemSmokeProjects(
-    options: EcosystemSmokeProjectSelectionOptions = {}
-): EcosystemSmokeProject[] {
-    const { tag, projectPattern, numShards, shardIndex } = options;
-    let projects = [...ecosystemSmokeProjects];
-
-    if (tag) {
-        projects = projects.filter((project) => project.tags.includes(tag));
-    }
-
-    if (projectPattern) {
-        projects = projects.filter((project) => matchesProjectPattern(projectPattern, project));
-    }
-
-    if (numShards !== undefined || shardIndex !== undefined) {
-        validateShardOptions(numShards, shardIndex);
-        projects = projects.filter((_, index) => index % numShards! === shardIndex);
-    }
-
-    return projects;
-}
-
-function matchesProjectPattern(pattern: RegExp, project: EcosystemSmokeProject): boolean {
-    pattern.lastIndex = 0;
-    const matchesName = pattern.test(project.name);
-    pattern.lastIndex = 0;
-    const matchesMypyPrimerProject = pattern.test(project.mypyPrimerProject);
-    pattern.lastIndex = 0;
-
-    return matchesName || matchesMypyPrimerProject;
-}
-
-function validateShardOptions(numShards: number | undefined, shardIndex: number | undefined): void {
-    if (numShards === undefined || shardIndex === undefined) {
-        throw new Error('Both numShards and shardIndex must be provided for ecosystem smoke project sharding.');
-    }
-
-    if (!Number.isInteger(numShards) || numShards <= 0) {
-        throw new Error('numShards must be a positive integer.');
-    }
-
-    if (!Number.isInteger(shardIndex) || shardIndex < 0 || shardIndex >= numShards) {
-        throw new Error('shardIndex must be an integer greater than or equal to 0 and less than numShards.');
-    }
-}
-
-function buildSmokeProject(
-    project: GeneratedEcosystemProject,
-    override: EcosystemProjectOverride | undefined
-): EcosystemSmokeProject | undefined {
-    if (!override?.includeInSmoke) {
-        return undefined;
-    }
-
-    if (!override.cost || !override.tags || override.tags.length === 0 || !override.reason) {
-        throw new Error(`Smoke project ${project.name} is missing required ecosystem metadata overrides.`);
-    }
-
-    return {
-        name: project.name,
-        mypyPrimerProject: project.mypyPrimerProject,
-        cost: override.cost,
-        tags: [...override.tags],
-        reason: override.reason,
-    };
-}
-
-function getSmokeOrder(projectName: string): number {
-    const smokeOrder = ecosystemProjectOverrides[projectName]?.smokeOrder;
-    if (smokeOrder === undefined) {
-        return Number.MAX_SAFE_INTEGER;
-    }
-
-    return smokeOrder;
-}
-
-function applyProjectOverrides(project: GeneratedEcosystemProject): GeneratedEcosystemProject {
-    const override = ecosystemProjectOverrides[project.name];
-    if (!override?.sourcePaths || override.sourcePaths.length === 0) {
-        return project;
-    }
-
-    return {
-        ...project,
-        paths: [...override.sourcePaths],
-    };
-}
-
-function loadGeneratedProjects(): GeneratedEcosystemProject[] {
-    return readJsonFile<GeneratedEcosystemProject[]>('ecosystem-projects.generated.json');
-}
-
-function loadProjectOverrides(): Record<string, EcosystemProjectOverride> {
-    return readJsonFile<Record<string, EcosystemProjectOverride>>('ecosystem-projects.overrides.json');
-}
-
-function readJsonFile<T>(filename: string): T {
-    const filePath = getBenchmarkFilePath(filename);
-    return JSON.parse(fs.readFileSync(filePath, 'utf-8')) as T;
-}
-
-function getBenchmarkFilePath(filename: string): string {
-    const sourceFilePath = path.resolve(__dirname, filename);
-    if (fs.existsSync(sourceFilePath)) {
-        return sourceFilePath;
-    }
-
-    return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', filename);
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/evaluatorBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/evaluatorBenchmark.test.ts
deleted file mode 100644
index 0072374847c9..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/evaluatorBenchmark.test.ts
+++ /dev/null
@@ -1,213 +0,0 @@
-/*
- * evaluatorBenchmark.test.ts
- * Copyright (c) Microsoft Corporation.
- *
- * Synthetic type evaluator microbenchmarks.
- * Measures cold analysis time for generated Python cases that exercise evaluator-heavy paths.
- */
-
-import { TimingStatsSnapshot } from '../../common/timing';
-import {
-    TypeAnalysisSummary,
-    analyzeBenchmarkSource,
-    calculateStats,
-    createBenchmarkReport,
-    writeBenchmarkReport,
-} from './benchmarkUtils';
-import {
-    generateConstrainedTypeVarMatrixCase,
-    generateGenericAliasChainCase,
-    generateLiteralUnionMathCase,
-    generateOverloadUnionCrossProductCase,
-    generateProtocolMismatchCase,
-    generateRecursiveAliasCase,
-    generateTypedDictCase,
-} from './syntheticCases';
-
-const WARMUP_ITERATIONS = 1;
-const BENCHMARK_ITERATIONS = 5;
-const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS';
-
-interface BenchmarkCase {
-    name: string;
-    fileName: string;
-    scale: string;
-    code: string;
-    minDiagnosticCount: number;
-}
-
-interface BenchmarkResult {
-    caseName: string;
-    scale: string;
-    fileSizeBytes: number;
-    sourceLines: number;
-    iterations: number;
-    timesMs: number[];
-    medianMs: number;
-    p95Ms: number;
-    minMs: number;
-    maxMs: number;
-    avgMs: number;
-    diagnosticCount: number;
-    errorCount: number;
-    warningCount: number;
-    informationCount: number;
-    statementCount: number;
-    timing: TimingStatsSnapshot;
-}
-
-function benchmarkAnalyze(testCase: BenchmarkCase): BenchmarkResult {
-    const times: number[] = [];
-    let summary: TypeAnalysisSummary | undefined;
-
-    for (let i = 0; i < WARMUP_ITERATIONS; i++) {
-        analyzeBenchmarkSource(testCase.code, testCase.fileName);
-    }
-
-    for (let i = 0; i < BENCHMARK_ITERATIONS; i++) {
-        const start = performance.now();
-        summary = analyzeBenchmarkSource(testCase.code, testCase.fileName);
-        const elapsed = performance.now() - start;
-
-        times.push(elapsed);
-    }
-
-    if (!summary) {
-        throw new Error(`Benchmark case ${testCase.name} did not produce an analysis summary.`);
-    }
-
-    const stats = calculateStats(times);
-
-    return {
-        caseName: testCase.name,
-        scale: testCase.scale,
-        fileSizeBytes: Buffer.byteLength(testCase.code, 'utf-8'),
-        sourceLines: testCase.code.split('\n').length - 1,
-        iterations: BENCHMARK_ITERATIONS,
-        timesMs: times,
-        medianMs: stats.median,
-        p95Ms: stats.p95,
-        minMs: stats.min,
-        maxMs: stats.max,
-        avgMs: stats.avg,
-        diagnosticCount: summary.diagnosticCount,
-        errorCount: summary.errorCount,
-        warningCount: summary.warningCount,
-        informationCount: summary.informationCount,
-        statementCount: summary.statementCount,
-        timing: summary.timing,
-    };
-}
-
-function printResultTable(results: ReadonlyArray<BenchmarkResult>): void {
-    console.log('\n=== Evaluator Benchmark Results ===\n');
-    console.log(
-        `${'Case'.padEnd(34)} ${'Scale'.padEnd(12)} ${'Lines'.padStart(7)} ${'Diag'.padStart(5)} ${'Median'.padStart(
-            10
-        )} ${'Min'.padStart(10)} ${'Max'.padStart(10)} ${'Avg'.padStart(10)} ${'p95'.padStart(10)}`
-    );
-    console.log('-'.repeat(113));
-
-    for (const result of results) {
-        console.log(
-            `${result.caseName.padEnd(34)} ${result.scale.padEnd(12)} ${String(result.sourceLines).padStart(
-                7
-            )} ${String(result.diagnosticCount).padStart(5)} ${result.medianMs.toFixed(2).padStart(10)} ${result.minMs
-                .toFixed(2)
-                .padStart(10)} ${result.maxMs.toFixed(2).padStart(10)} ${result.avgMs
-                .toFixed(2)
-                .padStart(10)} ${result.p95Ms.toFixed(2).padStart(10)}`
-        );
-    }
-
-    console.log('');
-}
-
-const cases: BenchmarkCase[] = [
-    {
-        name: 'recursive_alias_depth',
-        fileName: 'recursiveAlias.py',
-        scale: 'depth=24',
-        code: generateRecursiveAliasCase(24),
-        minDiagnosticCount: 0,
-    },
-    {
-        name: 'overload_union_cross_product',
-        fileName: 'overloadUnionCrossProduct.py',
-        scale: '8x8',
-        code: generateOverloadUnionCrossProductCase(8),
-        minDiagnosticCount: 0,
-    },
-    {
-        name: 'protocol_many_members_mismatch',
-        fileName: 'protocolMismatch.py',
-        scale: 'members=40',
-        code: generateProtocolMismatchCase(40),
-        minDiagnosticCount: 1,
-    },
-    {
-        name: 'generic_alias_chain',
-        fileName: 'genericAliasChain.py',
-        scale: 'depth=32',
-        code: generateGenericAliasChainCase(32),
-        minDiagnosticCount: 0,
-    },
-    {
-        name: 'constrained_typevar_matrix',
-        fileName: 'constrainedTypeVarMatrix.py',
-        scale: '8x8',
-        code: generateConstrainedTypeVarMatrixCase(8),
-        minDiagnosticCount: 1,
-    },
-    {
-        name: 'literal_union_math',
-        fileName: 'literalUnionMath.py',
-        scale: 'width=64',
-        code: generateLiteralUnionMathCase(64),
-        minDiagnosticCount: 0,
-    },
-    {
-        name: 'typed_dict_many_keys',
-        fileName: 'typedDictManyKeys.py',
-        scale: 'keys=80',
-        code: generateTypedDictCase(80),
-        minDiagnosticCount: 0,
-    },
-];
-
-const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip;
-
-benchmarkSuite('Evaluator Benchmark', () => {
-    const allResults: BenchmarkResult[] = [];
-
-    for (const testCase of cases) {
-        test(`analyze ${testCase.name} ${testCase.scale}`, () => {
-            const result = benchmarkAnalyze(testCase);
-            allResults.push(result);
-
-            console.log(
-                `  ${testCase.name} ${testCase.scale}: median=${result.medianMs.toFixed(2)}ms, diagnostics=${
-                    result.diagnosticCount
-                }, check=${result.timing.typeCheck.totalTimeMs.toFixed(2)}ms, lines=${result.sourceLines}`
-            );
-
-            expect(result.statementCount).toBeGreaterThan(0);
-            expect(result.diagnosticCount).toBeGreaterThanOrEqual(testCase.minDiagnosticCount);
-            expect(result.medianMs).toBeLessThan(30000);
-        });
-    }
-
-    afterAll(() => {
-        if (allResults.length === 0) {
-            return;
-        }
-
-        printResultTable(allResults);
-
-        writeBenchmarkReport(
-            'evaluator',
-            'evaluator-benchmark',
-            createBenchmarkReport('evaluator', WARMUP_ITERATIONS, BENCHMARK_ITERATIONS, allResults)
-        );
-    });
-});
diff --git a/packages/pyright-internal/src/tests/benchmarks/mypy_primer.smoke_projects.snapshot.py b/packages/pyright-internal/src/tests/benchmarks/mypy_primer.smoke_projects.snapshot.py
deleted file mode 100644
index b75becfc7480..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/mypy_primer.smoke_projects.snapshot.py
+++ /dev/null
@@ -1,54 +0,0 @@
-from mypy_primer.model import Project
-
-
-def get_projects() -> list[Project]:
-    return [
-        Project(
-            location="https://github.com/hauntsaninja/mypy_primer",
-            pyright_cmd="{pyright} {paths}",
-            paths=["."],
-        ),
-        Project(
-            location="https://github.com/psf/black",
-            pyright_cmd="{pyright} {paths}",
-            paths=["src"],
-        ),
-        Project(
-            location="https://github.com/pytest-dev/pytest",
-            pyright_cmd="{pyright} {paths}",
-            paths=["src", "testing"],
-        ),
-        Project(
-            location="https://github.com/pandas-dev/pandas",
-            pyright_cmd="{pyright} {paths}",
-            paths=["pandas"],
-        ),
-        Project(
-            location="https://github.com/python-attrs/attrs",
-            pyright_cmd="{pyright}",
-        ),
-        Project(
-            location="https://github.com/Textualize/rich",
-            pyright_cmd="{pyright}",
-        ),
-        Project(
-            location="https://github.com/niklasf/python-chess",
-            pyright_cmd="{pyright} {paths}",
-            paths=["chess"],
-        ),
-        Project(
-            location="https://github.com/pypa/packaging",
-            pyright_cmd="{pyright} {paths}",
-            paths=["src"],
-        ),
-        Project(
-            location="https://github.com/pydantic/pydantic",
-            pyright_cmd="{pyright} {paths}",
-            paths=["pydantic"],
-        ),
-        Project(
-            location="https://github.com/wemake-services/django-modern-rest",
-            pyright_cmd="{pyright}",
-            paths=["dmr"],
-        ),
-    ]
diff --git a/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts
index 557415721546..2869777706cc 100644
--- a/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts
+++ b/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts
@@ -13,21 +13,20 @@
  *   src/tests/benchmarks/.generated/benchmark-results/parser/
  */
 
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+
 import { DiagnosticSink } from '../../common/diagnosticSink';
 import { ParseOptions, Parser } from '../../parser/parser';
-import {
-    calculateStats,
-    createBenchmarkReport,
-    formatCount,
-    loadBenchmarkCorpus,
-    writeBenchmarkReport,
-} from './benchmarkUtils';
 
 // --- Configuration ---
 
 const WARMUP_ITERATIONS = 3;
 const BENCHMARK_ITERATIONS = 10;
 
+const BENCHMARK_OUTPUT_DIR = path.join(__dirname, '.generated', 'benchmark-results', 'parser');
+
 // --- Types ---
 
 interface BenchmarkResult {
@@ -46,8 +45,70 @@ interface BenchmarkResult {
     errorCount: number;
 }
 
+interface BenchmarkReport {
+    timestamp: string;
+    system: {
+        platform: string;
+        arch: string;
+        cpus: string;
+        cpuCount: number;
+        totalMemoryMB: number;
+        nodeVersion: string;
+    };
+    config: {
+        warmupIterations: number;
+        benchmarkIterations: number;
+    };
+    results: BenchmarkResult[];
+}
+
 // --- Helpers ---
 
+function calculateStats(times: ReadonlyArray<number>): {
+    median: number;
+    p95: number;
+    min: number;
+    max: number;
+    avg: number;
+} {
+    const sorted = [...times].sort((a, b) => a - b);
+    const len = sorted.length;
+
+    const median = len % 2 === 0 ? (sorted[len / 2 - 1] + sorted[len / 2]) / 2 : sorted[Math.floor(len / 2)];
+    const p95Index = Math.ceil(len * 0.95) - 1;
+    const p95 = sorted[Math.min(p95Index, len - 1)];
+    const min = sorted[0];
+    const max = sorted[len - 1];
+    const avg = times.reduce((a, b) => a + b, 0) / len;
+
+    return { median, p95, min, max, avg };
+}
+
+function loadCorpus(filename: string): string {
+    const filePath = path.resolve(__dirname, '..', 'benchmarkData', filename);
+    return fs.readFileSync(filePath, 'utf-8');
+}
+
+function getSystemInfo(): BenchmarkReport['system'] {
+    const cpus = os.cpus();
+    return {
+        platform: os.platform(),
+        arch: os.arch(),
+        cpus: cpus[0]?.model ?? 'unknown',
+        cpuCount: cpus.length,
+        totalMemoryMB: Math.round(os.totalmem() / (1024 * 1024)),
+        nodeVersion: process.version,
+    };
+}
+
+function writeReport(report: BenchmarkReport): void {
+    fs.mkdirSync(BENCHMARK_OUTPUT_DIR, { recursive: true });
+    const filename = `parser-benchmark-${new Date().toISOString().replace(/[:.]/g, '-')}.json`;
+    const outputPath = path.join(BENCHMARK_OUTPUT_DIR, filename);
+    fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8');
+    console.log(`\nBenchmark results written to: ${outputPath}`);
+}
+
 function printResultTable(results: ReadonlyArray<BenchmarkResult>): void {
     console.log('\n=== Parser Benchmark Results ===\n');
     console.log(
@@ -66,9 +127,11 @@ function printResultTable(results: ReadonlyArray<BenchmarkResult>): void {
                 r.statementCount
             ).padStart(7)} ${String(r.errorCount).padStart(7)} ${r.medianMs.toFixed(2).padStart(10)} ${r.minMs
                 .toFixed(2)
-                .padStart(10)} ${r.maxMs.toFixed(2).padStart(10)} ${r.avgMs.toFixed(2).padStart(10)} ${formatCount(
+                .padStart(10)} ${r.maxMs.toFixed(2).padStart(10)} ${r.avgMs.toFixed(2).padStart(10)} ${Math.round(
                 r.nodesPerSec
-            ).padStart(12)}`
+            )
+                .toLocaleString()
+                .padStart(12)}`
         );
     }
     console.log('');
@@ -178,14 +241,14 @@ describe('Parser Benchmark', () => {
 
     for (const { name, file } of corpora) {
         test(`parse ${name}`, () => {
-            const code = loadBenchmarkCorpus(file);
+            const code = loadCorpus(file);
             const result = benchmarkParse(name, code);
             allResults.push(result);
 
             console.log(
                 `  ${name}: median=${result.medianMs.toFixed(2)}ms, nodes=${result.nodeCount}, stmts=${
                     result.statementCount
-                }, nodes/sec=${formatCount(result.nodesPerSec)}`
+                }, nodes/sec=${Math.round(result.nodesPerSec).toLocaleString()}`
             );
 
             // Sanity: parser should produce statements
@@ -196,7 +259,7 @@ describe('Parser Benchmark', () => {
     }
 
     test('scaled corpus (10x large_stdlib)', () => {
-        const base = loadBenchmarkCorpus('large_stdlib.py');
+        const base = loadCorpus('large_stdlib.py');
         const scaled = Array(10).fill(base).join('\n');
 
         const result = benchmarkParse('large_stdlib_10x', scaled);
@@ -205,7 +268,7 @@ describe('Parser Benchmark', () => {
         console.log(
             `  large_stdlib_10x: median=${result.medianMs.toFixed(2)}ms, nodes=${
                 result.nodeCount
-            }, nodes/sec=${formatCount(result.nodesPerSec)}`
+            }, nodes/sec=${Math.round(result.nodesPerSec).toLocaleString()}`
         );
 
         expect(result.statementCount).toBeGreaterThan(0);
@@ -218,10 +281,16 @@ describe('Parser Benchmark', () => {
 
         printResultTable(allResults);
 
-        writeBenchmarkReport(
-            'parser',
-            'parser-benchmark',
-            createBenchmarkReport('parser', WARMUP_ITERATIONS, BENCHMARK_ITERATIONS, allResults)
-        );
+        const report: BenchmarkReport = {
+            timestamp: new Date().toISOString(),
+            system: getSystemInfo(),
+            config: {
+                warmupIterations: WARMUP_ITERATIONS,
+                benchmarkIterations: BENCHMARK_ITERATIONS,
+            },
+            results: allResults,
+        };
+
+        writeReport(report);
     });
 });
diff --git a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.test.ts
deleted file mode 100644
index 8a19952b2b52..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.test.ts
+++ /dev/null
@@ -1,1024 +0,0 @@
-/*
- * runEcosystemBenchmark.test.ts
- * Copyright (c) Microsoft Corporation.
- *
- * Tests for the ecosystem benchmark runner entry point.
- */
-
-import { spawnSync } from 'child_process';
-import * as fs from 'fs';
-import * as os from 'os';
-import * as path from 'path';
-
-import { BenchmarkReport, benchmarkReportSchemaVersion } from './benchmarkUtils';
-import {
-    buildEcosystemBenchmarkManifest,
-    buildPyrightInvocation,
-    compareEcosystemBenchmarkReportData,
-    compareEcosystemBenchmarkReports,
-    EcosystemBenchmarkResult,
-    executePyrightProjectCommand,
-    getDefaultMainBaselineReportPath,
-    parseEcosystemBenchmarkArgs,
-    prepareEcosystemProjectCheckout,
-    runEcosystemBenchmark,
-    writeEcosystemBenchmarkManifest,
-    writeMainBaselineReport,
-    writeProjectPyrightConfig,
-} from './runEcosystemBenchmark';
-import { GeneratedEcosystemProject } from './syncMypyPrimerProjects';
-
-const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS';
-
-const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip;
-
-benchmarkSuite('Ecosystem Benchmark Runner', () => {
-    test('parses smoke runner arguments', () => {
-        const config = parseEcosystemBenchmarkArgs([
-            '--suite',
-            'smoke',
-            '--tag',
-            'overloads',
-            '--project',
-            'pandas',
-            '--num-shards',
-            '2',
-            '--shard-index',
-            '1',
-            '--project-date',
-            '2026-01-01',
-            '--output',
-            'artifacts/ecosystem-smoke',
-        ]);
-
-        expect(config.mode).toBe('select');
-        if (config.mode !== 'select') {
-            throw new Error('Expected selection mode.');
-        }
-
-        expect(config.suiteName).toBe('smoke');
-        expect(config.tag).toBe('overloads');
-        expect(config.projectPattern?.source).toBe('pandas');
-        expect(config.numShards).toBe(2);
-        expect(config.shardIndex).toBe(1);
-        expect(config.projectDate).toBe('2026-01-01');
-        expect(config.outputDir).toBe('artifacts/ecosystem-smoke');
-    });
-
-    test('builds a filtered smoke manifest', () => {
-        const config = parseEcosystemBenchmarkArgs(['--suite', 'smoke', '--tag', 'overloads', '--output', 'artifacts']);
-
-        expect(config.mode).toBe('select');
-        if (config.mode !== 'select') {
-            throw new Error('Expected selection mode.');
-        }
-
-        const manifest = buildEcosystemBenchmarkManifest(config);
-
-        expect(manifest.executionMode).toBe('selection-only');
-        expect(manifest.selectedProjectCount).toBe(1);
-        expect(manifest.selectedProjects.map((project) => project.name)).toEqual(['pandas']);
-    });
-
-    test('writes an ecosystem run manifest', () => {
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-runner-'));
-
-        try {
-            const config = parseEcosystemBenchmarkArgs([
-                '--suite',
-                'smoke',
-                '--project',
-                'django',
-                '--output',
-                outputDir,
-            ]);
-
-            expect(config.mode).toBe('select');
-            if (config.mode !== 'select') {
-                throw new Error('Expected selection mode.');
-            }
-
-            const manifest = buildEcosystemBenchmarkManifest(config);
-            const manifestPath = writeEcosystemBenchmarkManifest(outputDir, manifest);
-
-            expect(manifestPath).toBe(path.join(outputDir, 'ecosystem-run-manifest.json'));
-            expect(JSON.parse(fs.readFileSync(manifestPath, 'utf-8'))).toEqual(manifest);
-        } finally {
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('runs end to end and writes a manifest artifact', () => {
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-runner-main-'));
-
-        try {
-            const manifestPath = runEcosystemBenchmark([
-                '--suite',
-                'smoke',
-                '--tag',
-                'parser-heavy',
-                '--output',
-                outputDir,
-            ]);
-
-            expect(typeof manifestPath).toBe('string');
-
-            const manifest = JSON.parse(fs.readFileSync(manifestPath as string, 'utf-8'));
-
-            expect(manifest.selectedProjects.map((project: { name: string }) => project.name)).toEqual(['black']);
-        } finally {
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('rejects unsupported suite names', () => {
-        expect(() => parseEcosystemBenchmarkArgs(['--suite', 'full', '--output', 'artifacts'])).toThrow(
-            'Unsupported ecosystem benchmark suite'
-        );
-    });
-
-    test('parses comparison mode arguments', () => {
-        const config = parseEcosystemBenchmarkArgs([
-            '--baseline-report',
-            'old.json',
-            '--candidate-report',
-            'new.json',
-            '--output',
-            'artifacts',
-        ]);
-
-        expect(config).toEqual({
-            mode: 'compare',
-            baselineReportPath: 'old.json',
-            candidateReportPath: 'new.json',
-            outputDir: 'artifacts',
-        });
-    });
-
-    test('defaults comparison mode to the checked-in main baseline', () => {
-        const config = parseEcosystemBenchmarkArgs(['--candidate-report', 'new.json', '--output', 'artifacts']);
-
-        expect(config).toEqual({
-            mode: 'compare',
-            baselineReportPath: getDefaultMainBaselineReportPath(),
-            candidateReportPath: 'new.json',
-            outputDir: 'artifacts',
-        });
-    });
-
-    test('parses execution mode arguments', () => {
-        const config = parseEcosystemBenchmarkArgs([
-            '--suite',
-            'smoke',
-            '--project-root',
-            'q:/projects',
-            '--baseline-executable',
-            'node ./out/packages/pyright-internal/src/pyright.js',
-            '--output',
-            'artifacts',
-        ]);
-
-        expect(config).toEqual({
-            mode: 'execute',
-            suiteName: 'smoke',
-            outputDir: 'artifacts',
-            projectRoot: 'q:/projects',
-            projectDate: undefined,
-            tag: undefined,
-            projectPattern: undefined,
-            numShards: undefined,
-            shardIndex: undefined,
-            baselineExecutable: 'node ./out/packages/pyright-internal/src/pyright.js',
-            candidateExecutable: undefined,
-            mainBaselineReportPath: undefined,
-            baselineSourceCommit: undefined,
-            updateMainBaseline: undefined,
-            prepareProjects: undefined,
-            installDependencies: undefined,
-        });
-    });
-
-    test('parses main baseline source commit', () => {
-        const config = parseEcosystemBenchmarkArgs([
-            '--suite',
-            'smoke',
-            '--project-root',
-            'q:/projects',
-            '--baseline-executable',
-            'node ../pyright/index.js',
-            '--baseline-source-commit',
-            'abc123',
-            '--output',
-            'artifacts',
-        ]);
-
-        expect(config.mode).toBe('execute');
-        if (config.mode !== 'execute') {
-            throw new Error('Expected execution mode.');
-        }
-
-        expect(config.baselineSourceCommit).toBe('abc123');
-    });
-
-    test('parses project preparation flags', () => {
-        const config = parseEcosystemBenchmarkArgs([
-            '--suite',
-            'smoke',
-            '--project-root',
-            'q:/projects',
-            '--baseline-executable',
-            'node ../pyright/index.js',
-            '--prepare-projects',
-            '--install-dependencies',
-            '--output',
-            'artifacts',
-        ]);
-
-        expect(config.mode).toBe('execute');
-        if (config.mode !== 'execute') {
-            throw new Error('Expected execution mode.');
-        }
-
-        expect(config.prepareProjects).toBe(true);
-        expect(config.installDependencies).toBe(true);
-    });
-
-    test('builds a pyright invocation from project metadata', () => {
-        const invocation = buildPyrightInvocation(
-            'node ./dist/pyright.js',
-            {
-                name: 'black',
-                mypyPrimerProject: 'black',
-                source: { kind: 'mypy-primer' },
-                pyrightCommand: '{pyright} --lib {paths}',
-                paths: ['src', 'tests'],
-            },
-            'c:/temp/pyrightconfig.json'
-        );
-
-        expect(invocation.command).toBe('node');
-        expect(invocation.args).toEqual([
-            './dist/pyright.js',
-            '--lib',
-            '--outputjson',
-            '-p',
-            'c:/temp/pyrightconfig.json',
-        ]);
-    });
-
-    test('inserts a separator for node eval commands', () => {
-        const invocation = buildPyrightInvocation(
-            'node -e "require(\'./out/pyright.js\').main()"',
-            {
-                name: 'black',
-                mypyPrimerProject: 'black',
-                source: { kind: 'mypy-primer' },
-                pyrightCommand: '{pyright}',
-            },
-            'c:/temp/pyrightconfig.json'
-        );
-
-        expect(invocation.command).toBe('node');
-        expect(invocation.args).toEqual([
-            '-e',
-            "require('./out/pyright.js').main()",
-            '--',
-            '--outputjson',
-            '-p',
-            'c:/temp/pyrightconfig.json',
-        ]);
-    });
-
-    test('writes a project pyrightconfig.json with source-only includes', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-'));
-
-        try {
-            const configPath = writeProjectPyrightConfig(tempDir, {
-                name: 'pydantic',
-                mypyPrimerProject: 'pydantic',
-                source: { kind: 'mypy-primer' },
-                paths: ['src', 'tests', 'testdata'],
-            });
-            const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'));
-
-            expect(config.include).toEqual(['../src']);
-            expect(config.exclude).toContain('../**/tests');
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('falls back to configured paths when every path looks test-like', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-fallback-'));
-
-        try {
-            const configPath = writeProjectPyrightConfig(tempDir, {
-                name: 'example',
-                mypyPrimerProject: 'example',
-                source: { kind: 'mypy-primer' },
-                paths: ['tests'],
-            });
-            const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'));
-
-            expect(config.include).toEqual(['../tests']);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('extends an existing project pyrightconfig when writing benchmark config', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-extends-'));
-
-        try {
-            fs.writeFileSync(path.join(tempDir, 'pyrightconfig.json'), '{"typeCheckingMode":"strict"}', 'utf-8');
-
-            const configPath = writeProjectPyrightConfig(tempDir, {
-                name: 'example',
-                mypyPrimerProject: 'example',
-                source: { kind: 'mypy-primer' },
-                paths: ['src'],
-            });
-            const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'));
-
-            expect(config.extends).toBe('../pyrightconfig.json');
-            expect(config.include).toEqual(['../src']);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('merges pyproject tool pyright settings into benchmark config', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-pyproject-'));
-
-        try {
-            fs.writeFileSync(
-                path.join(tempDir, 'pyproject.toml'),
-                [
-                    '[tool.pyright]',
-                    'typeCheckingMode = "strict"',
-                    'include = ["tests"]',
-                    'extraPaths = ["typings"]',
-                    'stubPath = "stubs"',
-                ].join('\n'),
-                'utf-8'
-            );
-
-            const configPath = writeProjectPyrightConfig(tempDir, {
-                name: 'example',
-                mypyPrimerProject: 'example',
-                source: { kind: 'mypy-primer' },
-                paths: ['src'],
-            });
-            const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'));
-
-            expect(config.typeCheckingMode).toBe('strict');
-            expect(config.extraPaths).toEqual(['../typings']);
-            expect(config.stubPath).toBe('../stubs');
-            expect(config.include).toEqual(['../src']);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('executes a project command and captures benchmark results', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-execute-'));
-        const workingDirectory = path.join(tempDir, 'black');
-        const fakePyrightScriptPath = path.join(tempDir, 'fake-pyright.js');
-
-        try {
-            fs.mkdirSync(workingDirectory, { recursive: true });
-            fs.writeFileSync(
-                fakePyrightScriptPath,
-                [
-                    'const configArgIndex = process.argv.indexOf("-p");',
-                    'if (configArgIndex < 0) { throw new Error("missing -p"); }',
-                    'const fs = require("fs");',
-                    'const config = JSON.parse(fs.readFileSync(process.argv[configArgIndex + 1], "utf8"));',
-                    'if (JSON.stringify(config.include) !== JSON.stringify(["../src"])) {',
-                    '  throw new Error(`unexpected include paths: ${JSON.stringify(config.include)}`);',
-                    '}',
-                    'const result = {',
-                    '  generalDiagnostics: [{ severity: "error" }, { severity: "warning" }],',
-                    '  summary: {',
-                    '    filesAnalyzed: 3,',
-                    '    errorCount: 1,',
-                    '    warningCount: 1,',
-                    '    informationCount: 0,',
-                    '    timeInSec: 0.25',
-                    '  }',
-                    '};',
-                    'console.log(JSON.stringify(result));',
-                ].join('\n'),
-                'utf-8'
-            );
-
-            const result = executePyrightProjectCommand(
-                'black',
-                createGeneratedProject({
-                    pyrightCommand: `{pyright} "${fakePyrightScriptPath}" {paths}`,
-                    paths: ['src', 'tests'],
-                }),
-                workingDirectory,
-                process.execPath
-            );
-
-            expect(result.projectName).toBe('black');
-            expect(result.filesAnalyzed).toBe(3);
-            expect(result.diagnosticCount).toBe(2);
-            expect(result.errorCount).toBe(1);
-            expect(result.warningCount).toBe(1);
-            expect(result.informationCount).toBe(0);
-            expect(result.diagnostics).toEqual([
-                { file: undefined, severity: 'error', message: '' },
-                { file: undefined, severity: 'warning', message: '' },
-            ]);
-            expect(result.totalTimeMs).toBeGreaterThanOrEqual(0);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('prepares a project checkout from git metadata', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-prepare-'));
-        const sourceRepo = path.join(tempDir, 'source');
-        const checkoutDir = path.join(tempDir, 'checkout');
-
-        try {
-            fs.mkdirSync(sourceRepo, { recursive: true });
-            runGit(['init'], sourceRepo);
-            runGit(['config', 'core.autocrlf', 'false'], sourceRepo);
-            runGit(['config', 'user.email', 'pyright-benchmark@example.com'], sourceRepo);
-            runGit(['config', 'user.name', 'Pyright Benchmark'], sourceRepo);
-            fs.writeFileSync(path.join(sourceRepo, 'sample.py'), 'x = 1\n', 'utf-8');
-            runGit(['add', 'sample.py'], sourceRepo);
-            runGit(['commit', '-m', 'initial'], sourceRepo, {
-                GIT_AUTHOR_DATE: '2025-01-01T00:00:00Z',
-                GIT_COMMITTER_DATE: '2025-01-01T00:00:00Z',
-            });
-
-            prepareEcosystemProjectCheckout(
-                createGeneratedProject({ location: sourceRepo }),
-                checkoutDir,
-                '2026-01-01'
-            );
-
-            expect(fs.existsSync(path.join(checkoutDir, 'sample.py'))).toBe(true);
-            expect(runGit(['status', '--short'], checkoutDir)).toBe('');
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('includes command details when pyright emits no JSON', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-execute-error-'));
-        const workingDirectory = path.join(tempDir, 'black');
-        const fakePyrightScriptPath = path.join(tempDir, 'fake-pyright-error.js');
-
-        try {
-            fs.mkdirSync(workingDirectory, { recursive: true });
-            fs.writeFileSync(
-                fakePyrightScriptPath,
-                ['console.log("not json");', 'console.error("synthetic stderr");', 'process.exit(2);'].join('\n'),
-                'utf-8'
-            );
-
-            expect(() =>
-                executePyrightProjectCommand(
-                    'black',
-                    createGeneratedProject({
-                        pyrightCommand: `{pyright} "${fakePyrightScriptPath}" {paths}`,
-                        paths: ['src'],
-                    }),
-                    workingDirectory,
-                    process.execPath
-                )
-            ).toThrow(/Command: .*fake-pyright-error\.js[\s\S]*Exit status: 2[\s\S]*synthetic stderr/);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('resolves relative node script paths against the runner cwd during execution', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-relative-exec-'));
-        const workingDirectory = path.join(tempDir, 'projects', 'black');
-        const fakePyrightScriptPath = path.join(tempDir, 'fake-pyright-cli.js');
-        const previousCwd = process.cwd();
-
-        try {
-            fs.mkdirSync(workingDirectory, { recursive: true });
-            fs.writeFileSync(
-                fakePyrightScriptPath,
-                createFakePyrightScript({ errorCount: 0, warningCount: 0, informationCount: 0 }),
-                'utf-8'
-            );
-
-            process.chdir(tempDir);
-
-            const result = executePyrightProjectCommand(
-                'black',
-                createGeneratedProject({
-                    paths: ['src'],
-                }),
-                workingDirectory,
-                `"${process.execPath}" ./fake-pyright-cli.js`
-            );
-
-            expect(result.projectName).toBe('black');
-            expect(result.filesAnalyzed).toBe(3);
-            expect(result.diagnosticCount).toBe(0);
-        } finally {
-            process.chdir(previousCwd);
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('runs execution mode end to end and writes reports plus comparison artifacts', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-execution-main-'));
-        const projectRoot = tempDir;
-        const projectDir = path.join(projectRoot, 'black');
-        const outputDir = path.join(tempDir, 'artifacts');
-        const baselineScriptPath = path.join(tempDir, 'baseline-pyright.js');
-        const candidateScriptPath = path.join(tempDir, 'candidate-pyright.js');
-
-        try {
-            fs.mkdirSync(path.join(projectDir, 'src'), { recursive: true });
-            fs.writeFileSync(path.join(projectDir, 'src', 'sample.py'), 'x = 1\n', 'utf-8');
-
-            fs.writeFileSync(
-                baselineScriptPath,
-                createFakePyrightScript({ errorCount: 1, warningCount: 0, informationCount: 0 }),
-                'utf-8'
-            );
-            fs.writeFileSync(
-                candidateScriptPath,
-                createFakePyrightScript({ errorCount: 0, warningCount: 1, informationCount: 0 }),
-                'utf-8'
-            );
-
-            const artifactPaths = runEcosystemBenchmark([
-                '--suite',
-                'smoke',
-                '--tag',
-                'parser-heavy',
-                '--project-root',
-                projectRoot,
-                '--project-date',
-                '2026-01-01',
-                '--baseline-executable',
-                `"${process.execPath}" "${baselineScriptPath}"`,
-                '--candidate-executable',
-                `"${process.execPath}" "${candidateScriptPath}"`,
-                '--output',
-                outputDir,
-            ]);
-
-            expect(typeof artifactPaths).not.toBe('string');
-            expect(fs.existsSync((artifactPaths as { baselineReportPath: string }).baselineReportPath)).toBe(true);
-            expect(fs.existsSync((artifactPaths as { candidateReportPath: string }).candidateReportPath)).toBe(true);
-            expect(
-                fs.existsSync(
-                    (artifactPaths as { comparisonArtifactPaths: { jsonPath: string } }).comparisonArtifactPaths
-                        .jsonPath
-                )
-            ).toBe(true);
-
-            const baselineReport = JSON.parse(
-                fs.readFileSync((artifactPaths as { baselineReportPath: string }).baselineReportPath, 'utf-8')
-            );
-            const candidateReport = JSON.parse(
-                fs.readFileSync((artifactPaths as { candidateReportPath: string }).candidateReportPath, 'utf-8')
-            );
-
-            expect(baselineReport.results[0].filesAnalyzed).toBe(3);
-            expect(candidateReport.results[0].filesAnalyzed).toBe(3);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('compares candidate-only execution against a main baseline report when present', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-candidate-main-'));
-        const projectRoot = tempDir;
-        const projectDir = path.join(projectRoot, 'black');
-        const outputDir = path.join(tempDir, 'artifacts');
-        const candidateScriptPath = path.join(tempDir, 'candidate-pyright.js');
-        const mainBaselineReportPath = path.join(tempDir, 'baselines', 'ecosystem-smoke-main.json');
-
-        try {
-            fs.mkdirSync(path.join(projectDir, 'src'), { recursive: true });
-            fs.mkdirSync(path.dirname(mainBaselineReportPath), { recursive: true });
-            fs.writeFileSync(path.join(projectDir, 'src', 'sample.py'), 'x = 1\n', 'utf-8');
-            fs.writeFileSync(
-                candidateScriptPath,
-                createFakePyrightScript({ errorCount: 0, warningCount: 1, informationCount: 0 }),
-                'utf-8'
-            );
-            fs.writeFileSync(
-                mainBaselineReportPath,
-                JSON.stringify(
-                    createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [
-                        { projectName: 'black', diagnosticCount: 0, warningCount: 0 },
-                    ]),
-                    undefined,
-                    2
-                ),
-                'utf-8'
-            );
-
-            const artifactPaths = runEcosystemBenchmark([
-                '--suite',
-                'smoke',
-                '--tag',
-                'parser-heavy',
-                '--project-root',
-                projectRoot,
-                '--candidate-executable',
-                `"${process.execPath}" "${candidateScriptPath}"`,
-                '--main-baseline-report',
-                mainBaselineReportPath,
-                '--output',
-                outputDir,
-            ]);
-
-            expect(typeof artifactPaths).not.toBe('string');
-            expect((artifactPaths as { baselineReportPath?: string }).baselineReportPath).toBeUndefined();
-            expect(
-                fs.existsSync(
-                    (artifactPaths as { comparisonArtifactPaths: { jsonPath: string } }).comparisonArtifactPaths
-                        .jsonPath
-                )
-            ).toBe(true);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('copies execution baseline report into the main baseline path', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-main-baseline-'));
-        const projectRoot = tempDir;
-        const projectDir = path.join(projectRoot, 'black');
-        const outputDir = path.join(tempDir, 'artifacts');
-        const baselineScriptPath = path.join(tempDir, 'baseline-pyright.js');
-        const mainBaselineReportPath = path.join(tempDir, 'baselines', 'ecosystem-smoke-main.json');
-
-        try {
-            fs.mkdirSync(path.join(projectDir, 'src'), { recursive: true });
-            fs.writeFileSync(path.join(projectDir, 'src', 'sample.py'), 'x = 1\n', 'utf-8');
-            fs.writeFileSync(
-                baselineScriptPath,
-                createFakePyrightScript({ errorCount: 0, warningCount: 0, informationCount: 0 }),
-                'utf-8'
-            );
-
-            const artifactPaths = runEcosystemBenchmark([
-                '--suite',
-                'smoke',
-                '--tag',
-                'parser-heavy',
-                '--project-root',
-                projectRoot,
-                '--project-date',
-                '2026-01-01',
-                '--baseline-executable',
-                `"${process.execPath}" "${baselineScriptPath}"`,
-                '--update-main-baseline',
-                '--main-baseline-report',
-                mainBaselineReportPath,
-                '--baseline-source-commit',
-                'abc123',
-                '--output',
-                outputDir,
-            ]);
-
-            expect(typeof artifactPaths).not.toBe('string');
-            expect(fs.existsSync(mainBaselineReportPath)).toBe(true);
-            expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).results[0].projectName).toBe('black');
-            expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).mainBaseline.sourceCommit).toBe(
-                'abc123'
-            );
-            expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).mainBaseline.projectDate).toBe(
-                '2026-01-01'
-            );
-            expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).mainBaseline.configMode).toBe(
-                'generated-benchmark-config'
-            );
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('copies a report to a main baseline path', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-copy-baseline-'));
-        const sourceReportPath = path.join(tempDir, 'baseline-report.json');
-        const mainBaselineReportPath = path.join(tempDir, 'nested', 'ecosystem-smoke-main.json');
-
-        try {
-            fs.writeFileSync(sourceReportPath, '{"results":[]}', 'utf-8');
-
-            expect(writeMainBaselineReport(sourceReportPath, mainBaselineReportPath)).toBe(mainBaselineReportPath);
-            expect(fs.readFileSync(mainBaselineReportPath, 'utf-8')).toBe('{"results":[]}');
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('stamps copied main baseline metadata', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-stamp-baseline-'));
-        const sourceReportPath = path.join(tempDir, 'baseline-report.json');
-        const mainBaselineReportPath = path.join(tempDir, 'nested', 'ecosystem-smoke-main.json');
-
-        try {
-            fs.writeFileSync(sourceReportPath, '{"results":[]}', 'utf-8');
-
-            writeMainBaselineReport(sourceReportPath, mainBaselineReportPath, {
-                sourceCommit: 'abc123',
-                projectDate: '2026-01-01',
-                configMode: 'generated-benchmark-config',
-                refreshedAt: '2026-05-08T00:00:00.000Z',
-            });
-
-            expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8'))).toEqual({
-                results: [],
-                mainBaseline: {
-                    sourceCommit: 'abc123',
-                    projectDate: '2026-01-01',
-                    configMode: 'generated-benchmark-config',
-                    refreshedAt: '2026-05-08T00:00:00.000Z',
-                },
-            });
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('writes comparison artifacts from ecosystem benchmark reports', () => {
-        const reportsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-report-'));
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-compare-'));
-
-        try {
-            const baselinePath = path.join(reportsDir, 'old.json');
-            const candidatePath = path.join(reportsDir, 'new.json');
-
-            fs.writeFileSync(
-                baselinePath,
-                JSON.stringify(
-                    createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [
-                        { projectName: 'black', totalTimeMs: 100, maxMemoryMB: 250 },
-                    ]),
-                    undefined,
-                    2
-                ),
-                'utf-8'
-            );
-            fs.writeFileSync(
-                candidatePath,
-                JSON.stringify(
-                    createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [
-                        { projectName: 'black', totalTimeMs: 120, maxMemoryMB: 260 },
-                    ]),
-                    undefined,
-                    2
-                ),
-                'utf-8'
-            );
-
-            const artifactPaths = compareEcosystemBenchmarkReports(baselinePath, candidatePath, outputDir);
-
-            expect(JSON.parse(fs.readFileSync(artifactPaths.jsonPath, 'utf-8')).compared[0].key).toBe('black');
-            expect(fs.readFileSync(artifactPaths.markdownPath, 'utf-8')).toContain('Largest Regressions');
-            expect(JSON.parse(fs.readFileSync(artifactPaths.oldJsonPath, 'utf-8')).results[0].projectName).toBe(
-                'black'
-            );
-        } finally {
-            fs.rmSync(reportsDir, { force: true, recursive: true });
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('compares ecosystem diagnostic metrics when reports include them', () => {
-        const reportsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-diagnostics-'));
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-diagnostics-compare-'));
-
-        try {
-            const baselinePath = path.join(reportsDir, 'old.json');
-            const candidatePath = path.join(reportsDir, 'new.json');
-
-            fs.writeFileSync(
-                baselinePath,
-                JSON.stringify(
-                    createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [
-                        {
-                            projectName: 'black',
-                            diagnosticCount: 1,
-                            errorCount: 1,
-                            warningCount: 0,
-                            diagnostics: [{ file: 'src/a.py', severity: 'error', message: 'old diagnostic' }],
-                        },
-                    ]),
-                    undefined,
-                    2
-                ),
-                'utf-8'
-            );
-            fs.writeFileSync(
-                candidatePath,
-                JSON.stringify(
-                    createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [
-                        {
-                            projectName: 'black',
-                            diagnosticCount: 2,
-                            errorCount: 1,
-                            warningCount: 1,
-                            diagnostics: [
-                                { file: 'src/a.py', severity: 'error', message: 'old diagnostic' },
-                                { file: 'src/b.py', severity: 'warning', message: 'new diagnostic' },
-                            ],
-                        },
-                    ]),
-                    undefined,
-                    2
-                ),
-                'utf-8'
-            );
-
-            const artifactPaths = compareEcosystemBenchmarkReports(baselinePath, candidatePath, outputDir);
-            const comparison = JSON.parse(fs.readFileSync(artifactPaths.jsonPath, 'utf-8'));
-
-            expect(comparison.compared[0].metrics.map((metric: { metric: string }) => metric.metric)).toEqual([
-                'diagnosticCount',
-                'errorCount',
-                'warningCount',
-            ]);
-            expect(comparison.diagnosticDiffs).toEqual([
-                {
-                    projectName: 'black',
-                    added: ['warning | src/b.py | new diagnostic'],
-                    removed: [],
-                },
-            ]);
-            expect(fs.readFileSync(artifactPaths.markdownPath, 'utf-8')).toContain('diagnosticCount');
-            expect(fs.readFileSync(artifactPaths.markdownPath, 'utf-8')).toContain('## Diagnostic Diffs');
-        } finally {
-            fs.rmSync(reportsDir, { force: true, recursive: true });
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('builds diagnostic diffs from report data', () => {
-        const comparison = compareEcosystemBenchmarkReportData(
-            createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [
-                {
-                    projectName: 'black',
-                    diagnostics: [
-                        { file: 'src/a.py', severity: 'error', message: 'old diagnostic' },
-                        { file: 'src/stable.py', severity: 'warning', message: 'stable diagnostic' },
-                    ],
-                },
-            ]),
-            createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [
-                {
-                    projectName: 'black',
-                    diagnostics: [
-                        { file: 'src/b.py', severity: 'information', message: 'new diagnostic' },
-                        { file: 'src/stable.py', severity: 'warning', message: 'stable diagnostic' },
-                    ],
-                },
-            ])
-        );
-
-        expect(comparison.diagnosticDiffs).toEqual([
-            {
-                projectName: 'black',
-                added: ['information | src/b.py | new diagnostic'],
-                removed: ['error | src/a.py | old diagnostic'],
-            },
-        ]);
-    });
-
-    test('runs comparison mode end to end', () => {
-        const reportsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-report-main-'));
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-compare-main-'));
-
-        try {
-            const baselinePath = path.join(reportsDir, 'old.json');
-            const candidatePath = path.join(reportsDir, 'new.json');
-
-            fs.writeFileSync(
-                baselinePath,
-                JSON.stringify(
-                    createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [
-                        { projectName: 'black', totalTimeMs: 100 },
-                    ]),
-                    undefined,
-                    2
-                ),
-                'utf-8'
-            );
-            fs.writeFileSync(
-                candidatePath,
-                JSON.stringify(
-                    createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [
-                        { projectName: 'black', totalTimeMs: 95 },
-                    ]),
-                    undefined,
-                    2
-                ),
-                'utf-8'
-            );
-
-            const artifactPaths = runEcosystemBenchmark([
-                '--baseline-report',
-                baselinePath,
-                '--candidate-report',
-                candidatePath,
-                '--output',
-                outputDir,
-            ]);
-
-            expect(typeof artifactPaths).not.toBe('string');
-            expect(fs.existsSync((artifactPaths as { jsonPath: string }).jsonPath)).toBe(true);
-        } finally {
-            fs.rmSync(reportsDir, { force: true, recursive: true });
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-});
-
-function createEcosystemBenchmarkReport(
-    timestamp: string,
-    results: EcosystemBenchmarkResult[]
-): BenchmarkReport<EcosystemBenchmarkResult> {
-    return {
-        schemaVersion: benchmarkReportSchemaVersion,
-        suiteName: 'ecosystem-smoke',
-        timestamp,
-        system: {
-            platform: 'win32',
-            arch: 'x64',
-            cpus: 'test-cpu',
-            cpuCount: 8,
-            totalMemoryMB: 16384,
-            nodeVersion: process.version,
-        },
-        config: {
-            warmupIterations: 0,
-            benchmarkIterations: 1,
-        },
-        results,
-    };
-}
-
-function createGeneratedProject(overrides: Partial<GeneratedEcosystemProject> = {}): GeneratedEcosystemProject {
-    return {
-        name: 'black',
-        mypyPrimerProject: 'black',
-        source: { kind: 'mypy-primer' },
-        ...overrides,
-    };
-}
-
-function runGit(args: readonly string[], cwd: string, env: NodeJS.ProcessEnv = {}): string {
-    const result = spawnSync('git', args, {
-        cwd,
-        encoding: 'utf-8',
-        env: { ...process.env, ...env },
-    });
-
-    if (result.error) {
-        throw result.error;
-    }
-
-    if (result.status !== 0) {
-        throw new Error(
-            `git ${args.join(' ')} failed with ${result.status ?? 'unknown'}\n${result.stderr}\n${result.stdout}`
-        );
-    }
-
-    return result.stdout.trim();
-}
-
-function createFakePyrightScript(counts: {
-    errorCount: number;
-    warningCount: number;
-    informationCount: number;
-}): string {
-    const diagnosticEntries = [
-        ...Array.from({ length: counts.errorCount }, () => '{ severity: "error" }'),
-        ...Array.from({ length: counts.warningCount }, () => '{ severity: "warning" }'),
-        ...Array.from({ length: counts.informationCount }, () => '{ severity: "information" }'),
-    ].join(', ');
-
-    return [
-        'const result = {',
-        `  generalDiagnostics: [${diagnosticEntries}],`,
-        '  summary: {',
-        '    filesAnalyzed: 3,',
-        `    errorCount: ${counts.errorCount},`,
-        `    warningCount: ${counts.warningCount},`,
-        `    informationCount: ${counts.informationCount},`,
-        '    timeInSec: 0.25',
-        '  }',
-        '};',
-        'console.log(JSON.stringify(result));',
-    ].join('\n');
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.ts b/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.ts
deleted file mode 100644
index d22d0152b9f9..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.ts
+++ /dev/null
@@ -1,949 +0,0 @@
-import { spawnSync } from 'child_process';
-import commandLineArgs, { CommandLineOptions, OptionDefinition } from 'command-line-args';
-import * as fs from 'fs';
-import * as path from 'path';
-
-import { parse } from '../../common/tomlUtils';
-
-import {
-    BenchmarkMetricDefinition,
-    BenchmarkReportComparison,
-    BenchmarkReportComparisonArtifactPaths,
-    compareBenchmarkReports,
-    loadBenchmarkReport,
-    renderBenchmarkComparisonMarkdown,
-    writeBenchmarkReportComparisonArtifacts,
-} from './benchmarkComparison';
-import { BenchmarkReport, createBenchmarkReport } from './benchmarkUtils';
-import {
-    EcosystemProjectTag,
-    EcosystemSmokeProject,
-    getEcosystemSmokeProjectTags,
-    getGeneratedEcosystemProject,
-    selectEcosystemSmokeProjects,
-} from './ecosystemSmokeProjects';
-import { GeneratedEcosystemProject } from './syncMypyPrimerProjects';
-
-export interface EcosystemBenchmarkRunConfig {
-    mode: 'select';
-    suiteName: 'smoke';
-    outputDir: string;
-    projectDate?: string;
-    tag?: EcosystemProjectTag;
-    projectPattern?: RegExp;
-    numShards?: number;
-    shardIndex?: number;
-}
-
-export interface EcosystemBenchmarkComparisonConfig {
-    mode: 'compare';
-    baselineReportPath: string;
-    candidateReportPath: string;
-    outputDir: string;
-}
-
-export interface EcosystemBenchmarkExecutionConfig {
-    mode: 'execute';
-    suiteName: 'smoke';
-    outputDir: string;
-    projectRoot: string;
-    projectDate?: string;
-    tag?: EcosystemProjectTag;
-    projectPattern?: RegExp;
-    numShards?: number;
-    shardIndex?: number;
-    baselineExecutable?: string;
-    candidateExecutable?: string;
-    mainBaselineReportPath?: string;
-    baselineSourceCommit?: string;
-    updateMainBaseline?: boolean;
-    prepareProjects?: boolean;
-    installDependencies?: boolean;
-}
-
-export interface EcosystemBenchmarkResult {
-    projectName: string;
-    totalTimeMs?: number;
-    maxMemoryMB?: number;
-    filesAnalyzed?: number;
-    diagnosticCount?: number;
-    errorCount?: number;
-    warningCount?: number;
-    informationCount?: number;
-    diagnostics?: EcosystemBenchmarkDiagnostic[];
-}
-
-export interface EcosystemBenchmarkDiagnostic {
-    file?: string;
-    severity: string;
-    message: string;
-}
-
-export interface EcosystemBenchmarkDiagnosticDiff {
-    projectName: string;
-    added: string[];
-    removed: string[];
-}
-
-export interface EcosystemBenchmarkReportComparison extends BenchmarkReportComparison {
-    diagnosticDiffs: EcosystemBenchmarkDiagnosticDiff[];
-}
-
-export interface EcosystemBenchmarkManifest {
-    suiteName: 'smoke';
-    executionMode: 'selection-only' | 'command-execution';
-    outputDir: string;
-    projectDate?: string;
-    filters: {
-        tag?: EcosystemProjectTag;
-        projectPattern?: string;
-        numShards?: number;
-        shardIndex?: number;
-    };
-    selectedProjects: EcosystemSmokeProject[];
-    selectedProjectCount: number;
-    notes: string[];
-}
-
-export interface EcosystemBenchmarkExecutionArtifactPaths {
-    baselineReportPath?: string;
-    candidateReportPath?: string;
-    comparisonArtifactPaths?: BenchmarkReportComparisonArtifactPaths;
-}
-
-interface PyrightJsonResults {
-    generalDiagnostics: { file?: string; message?: string; severity: string }[];
-    summary: {
-        errorCount: number;
-        warningCount: number;
-        informationCount: number;
-        filesAnalyzed: number;
-        timeInSec: number;
-    };
-}
-
-interface ProjectPyrightConfigFile {
-    [key: string]: unknown;
-    extends?: string;
-    include: string[];
-    exclude: string[];
-}
-
-interface MainBaselineMetadata {
-    sourceCommit?: string;
-    projectDate?: string;
-    configMode: 'generated-benchmark-config';
-    refreshedAt: string;
-}
-
-export type EcosystemBenchmarkCommand =
-    | EcosystemBenchmarkRunConfig
-    | EcosystemBenchmarkComparisonConfig
-    | EcosystemBenchmarkExecutionConfig;
-
-const optionDefinitions: OptionDefinition[] = [
-    { name: 'suite', type: String },
-    { name: 'tag', type: String },
-    { name: 'project', type: String },
-    { name: 'num-shards', type: Number },
-    { name: 'shard-index', type: Number },
-    { name: 'project-date', type: String },
-    { name: 'project-root', type: String },
-    { name: 'baseline-executable', type: String },
-    { name: 'candidate-executable', type: String },
-    { name: 'baseline-report', type: String },
-    { name: 'candidate-report', type: String },
-    { name: 'main-baseline-report', type: String },
-    { name: 'update-main-baseline', type: Boolean },
-    { name: 'baseline-source-commit', type: String },
-    { name: 'prepare-projects', type: Boolean },
-    { name: 'install-dependencies', type: Boolean },
-    { name: 'output', type: String },
-];
-
-const benchmarkOwnedConfigKeys = new Set(['include', 'exclude', 'ignore', 'strict']);
-const pyrightPathArrayConfigKeys = new Set(['extraPaths']);
-const pyrightPathStringConfigKeys = new Set(['stubPath', 'typeshedPath', 'venvPath']);
-
-const ecosystemBenchmarkComparisonMetrics: readonly BenchmarkMetricDefinition<EcosystemBenchmarkResult>[] = [
-    { name: 'totalTimeMs', getValue: (result) => result.totalTimeMs },
-    { name: 'maxMemoryMB', getValue: (result) => result.maxMemoryMB },
-    { name: 'filesAnalyzed', lowerIsBetter: false, getValue: (result) => result.filesAnalyzed },
-    { name: 'diagnosticCount', getValue: (result) => result.diagnosticCount },
-    { name: 'errorCount', getValue: (result) => result.errorCount },
-    { name: 'warningCount', getValue: (result) => result.warningCount },
-    { name: 'informationCount', getValue: (result) => result.informationCount },
-];
-
-export function parseEcosystemBenchmarkArgs(args: string[]): EcosystemBenchmarkCommand {
-    const parsedArgs = commandLineArgs(optionDefinitions, { argv: args }) as CommandLineOptions;
-    const outputDir = parsedArgs.output as string | undefined;
-    if (!outputDir) {
-        throw new Error('The --output option is required.');
-    }
-
-    const baselineReportPath = parsedArgs['baseline-report'] as string | undefined;
-    const candidateReportPath = parsedArgs['candidate-report'] as string | undefined;
-    const mainBaselineReportPath = parsedArgs['main-baseline-report'] as string | undefined;
-    const baselineSourceCommit = parsedArgs['baseline-source-commit'] as string | undefined;
-    const baselineExecutable = parsedArgs['baseline-executable'] as string | undefined;
-    const candidateExecutable = parsedArgs['candidate-executable'] as string | undefined;
-
-    if (baselineReportPath || candidateReportPath) {
-        if (!candidateReportPath) {
-            throw new Error('The --candidate-report option is required when comparing ecosystem benchmark reports.');
-        }
-
-        return {
-            mode: 'compare',
-            baselineReportPath: baselineReportPath ?? mainBaselineReportPath ?? getDefaultMainBaselineReportPath(),
-            candidateReportPath,
-            outputDir,
-        };
-    }
-
-    const suiteName = (parsedArgs.suite as string | undefined) ?? 'smoke';
-
-    if (suiteName !== 'smoke') {
-        throw new Error(`Unsupported ecosystem benchmark suite "${suiteName}". Only "smoke" is implemented.`);
-    }
-
-    const tag = parsedArgs.tag as string | undefined;
-    if (tag && !getEcosystemSmokeProjectTags().includes(tag as EcosystemProjectTag)) {
-        throw new Error(`Unsupported ecosystem smoke tag "${tag}".`);
-    }
-
-    const projectPatternText = parsedArgs.project as string | undefined;
-
-    if (baselineExecutable || candidateExecutable) {
-        const projectRoot = parsedArgs['project-root'] as string | undefined;
-        if (!projectRoot) {
-            throw new Error('The --project-root option is required when executing ecosystem benchmarks.');
-        }
-
-        return {
-            mode: 'execute',
-            suiteName,
-            outputDir,
-            projectRoot,
-            projectDate: parsedArgs['project-date'] as string | undefined,
-            tag: tag as EcosystemProjectTag | undefined,
-            projectPattern: projectPatternText ? new RegExp(projectPatternText, 'i') : undefined,
-            numShards: parsedArgs['num-shards'] as number | undefined,
-            shardIndex: parsedArgs['shard-index'] as number | undefined,
-            baselineExecutable,
-            candidateExecutable,
-            mainBaselineReportPath,
-            baselineSourceCommit,
-            updateMainBaseline: parsedArgs['update-main-baseline'] as boolean | undefined,
-            prepareProjects: parsedArgs['prepare-projects'] as boolean | undefined,
-            installDependencies: parsedArgs['install-dependencies'] as boolean | undefined,
-        };
-    }
-
-    return {
-        mode: 'select',
-        suiteName,
-        outputDir,
-        projectDate: parsedArgs['project-date'] as string | undefined,
-        tag: tag as EcosystemProjectTag | undefined,
-        projectPattern: projectPatternText ? new RegExp(projectPatternText, 'i') : undefined,
-        numShards: parsedArgs['num-shards'] as number | undefined,
-        shardIndex: parsedArgs['shard-index'] as number | undefined,
-    };
-}
-
-export function getDefaultMainBaselineReportPath(): string {
-    return getWritableBenchmarkFilePath('baselines', 'ecosystem-smoke-main.json');
-}
-
-export function buildEcosystemBenchmarkManifest(config: EcosystemBenchmarkRunConfig): EcosystemBenchmarkManifest {
-    const selectedProjects = selectEcosystemSmokeProjects({
-        tag: config.tag,
-        projectPattern: config.projectPattern,
-        numShards: config.numShards,
-        shardIndex: config.shardIndex,
-    });
-
-    return {
-        suiteName: config.suiteName,
-        executionMode: 'selection-only',
-        outputDir: config.outputDir,
-        projectDate: config.projectDate,
-        filters: {
-            tag: config.tag,
-            projectPattern: config.projectPattern?.source,
-            numShards: config.numShards,
-            shardIndex: config.shardIndex,
-        },
-        selectedProjects,
-        selectedProjectCount: selectedProjects.length,
-        notes: [
-            'This runner currently resolves the ecosystem smoke selection and writes a manifest artifact.',
-            'Project execution against base/head Pyright is not implemented yet.',
-        ],
-    };
-}
-
-export function executeEcosystemBenchmark(
-    config: EcosystemBenchmarkExecutionConfig
-): EcosystemBenchmarkExecutionArtifactPaths {
-    const selectedProjects = selectEcosystemSmokeProjects({
-        tag: config.tag,
-        projectPattern: config.projectPattern,
-        numShards: config.numShards,
-        shardIndex: config.shardIndex,
-    });
-
-    if (config.prepareProjects) {
-        prepareEcosystemProjectCheckouts(
-            selectedProjects,
-            config.projectRoot,
-            config.projectDate,
-            config.installDependencies ?? false
-        );
-    }
-
-    const baselineResults = config.baselineExecutable
-        ? executeEcosystemBenchmarkSuite(selectedProjects, config.projectRoot, config.baselineExecutable)
-        : undefined;
-    const candidateResults = config.candidateExecutable
-        ? executeEcosystemBenchmarkSuite(selectedProjects, config.projectRoot, config.candidateExecutable)
-        : undefined;
-
-    const artifactPaths: EcosystemBenchmarkExecutionArtifactPaths = {};
-    fs.mkdirSync(config.outputDir, { recursive: true });
-
-    if (baselineResults) {
-        artifactPaths.baselineReportPath = writeNamedBenchmarkReport(
-            config.outputDir,
-            'baseline-report.json',
-            createBenchmarkReport('ecosystem-smoke', 0, 1, baselineResults)
-        );
-
-        if (config.updateMainBaseline) {
-            writeMainBaselineReport(
-                artifactPaths.baselineReportPath,
-                config.mainBaselineReportPath ?? getDefaultMainBaselineReportPath(),
-                {
-                    sourceCommit: config.baselineSourceCommit,
-                    projectDate: config.projectDate,
-                    configMode: 'generated-benchmark-config',
-                    refreshedAt: new Date().toISOString(),
-                }
-            );
-        }
-    }
-
-    if (candidateResults) {
-        artifactPaths.candidateReportPath = writeNamedBenchmarkReport(
-            config.outputDir,
-            'candidate-report.json',
-            createBenchmarkReport('ecosystem-smoke', 0, 1, candidateResults)
-        );
-    }
-
-    if (artifactPaths.baselineReportPath && artifactPaths.candidateReportPath) {
-        artifactPaths.comparisonArtifactPaths = compareAndWriteEcosystemBenchmarkReportFiles(
-            artifactPaths.baselineReportPath,
-            artifactPaths.candidateReportPath,
-            config.outputDir
-        );
-    } else if (artifactPaths.candidateReportPath) {
-        const mainBaselineReportPath = config.mainBaselineReportPath ?? getDefaultMainBaselineReportPath();
-        if (fs.existsSync(mainBaselineReportPath)) {
-            artifactPaths.comparisonArtifactPaths = compareAndWriteEcosystemBenchmarkReportFiles(
-                mainBaselineReportPath,
-                artifactPaths.candidateReportPath,
-                config.outputDir
-            );
-        }
-    }
-
-    return artifactPaths;
-}
-
-export function compareEcosystemBenchmarkReports(
-    baselineReportPath: string,
-    candidateReportPath: string,
-    outputDir: string
-): BenchmarkReportComparisonArtifactPaths {
-    return compareAndWriteEcosystemBenchmarkReportFiles(baselineReportPath, candidateReportPath, outputDir);
-}
-
-export function writeEcosystemBenchmarkManifest(outputDir: string, manifest: EcosystemBenchmarkManifest): string {
-    fs.mkdirSync(outputDir, { recursive: true });
-
-    const manifestPath = path.join(outputDir, 'ecosystem-run-manifest.json');
-    fs.writeFileSync(manifestPath, JSON.stringify(manifest, undefined, 2), 'utf-8');
-
-    return manifestPath;
-}
-
-export function writeMainBaselineReport(
-    sourceReportPath: string,
-    baselineReportPath: string,
-    metadata?: MainBaselineMetadata
-): string {
-    fs.mkdirSync(path.dirname(baselineReportPath), { recursive: true });
-
-    if (!metadata) {
-        fs.copyFileSync(sourceReportPath, baselineReportPath);
-        return baselineReportPath;
-    }
-
-    const report = JSON.parse(fs.readFileSync(sourceReportPath, 'utf-8')) as Record<string, unknown>;
-    report.mainBaseline = metadata;
-    fs.writeFileSync(baselineReportPath, JSON.stringify(report, undefined, 2), 'utf-8');
-    return baselineReportPath;
-}
-
-export function compareEcosystemBenchmarkReportData(
-    baselineReport: BenchmarkReport<EcosystemBenchmarkResult>,
-    candidateReport: BenchmarkReport<EcosystemBenchmarkResult>
-): EcosystemBenchmarkReportComparison {
-    return {
-        ...compareBenchmarkReports(
-            baselineReport,
-            candidateReport,
-            (result) => result.projectName,
-            ecosystemBenchmarkComparisonMetrics
-        ),
-        diagnosticDiffs: compareEcosystemDiagnosticResults(baselineReport.results, candidateReport.results),
-    };
-}
-
-export function runEcosystemBenchmark(
-    args: string[]
-): string | BenchmarkReportComparisonArtifactPaths | EcosystemBenchmarkExecutionArtifactPaths {
-    const command = parseEcosystemBenchmarkArgs(args);
-
-    if (command.mode === 'compare') {
-        const artifactPaths = compareEcosystemBenchmarkReports(
-            command.baselineReportPath,
-            command.candidateReportPath,
-            command.outputDir
-        );
-
-        console.log(`Comparison artifacts written to: ${command.outputDir}`);
-        return artifactPaths;
-    }
-
-    if (command.mode === 'execute') {
-        const artifactPaths = executeEcosystemBenchmark(command);
-        console.log(`Execution artifacts written to: ${command.outputDir}`);
-        return artifactPaths;
-    }
-
-    const manifest = buildEcosystemBenchmarkManifest(command);
-    const manifestPath = writeEcosystemBenchmarkManifest(command.outputDir, manifest);
-
-    console.log(`Selected ${manifest.selectedProjectCount} ecosystem project(s).`);
-    console.log(`Manifest written to: ${manifestPath}`);
-
-    return manifestPath;
-}
-
-function executeEcosystemBenchmarkSuite(
-    projects: readonly EcosystemSmokeProject[],
-    projectRoot: string,
-    executableCommand: string
-): EcosystemBenchmarkResult[] {
-    return projects.map((project) => executeEcosystemProject(project, projectRoot, executableCommand));
-}
-
-function prepareEcosystemProjectCheckouts(
-    projects: readonly EcosystemSmokeProject[],
-    projectRoot: string,
-    projectDate: string | undefined,
-    installDependencies: boolean
-): void {
-    fs.mkdirSync(projectRoot, { recursive: true });
-
-    for (const project of projects) {
-        const generatedProject = getGeneratedEcosystemProject(project.name);
-        if (!generatedProject) {
-            throw new Error(`No generated ecosystem metadata found for project ${project.name}.`);
-        }
-
-        prepareEcosystemProjectCheckout(generatedProject, path.join(projectRoot, generatedProject.name), projectDate);
-
-        if (installDependencies) {
-            installEcosystemProjectDependencies(generatedProject, path.join(projectRoot, generatedProject.name));
-        }
-    }
-}
-
-export function prepareEcosystemProjectCheckout(
-    project: GeneratedEcosystemProject,
-    workingDirectory: string,
-    projectDate: string | undefined
-): void {
-    if (!project.location) {
-        throw new Error(`Cannot prepare ecosystem project ${project.name}; no repository location is configured.`);
-    }
-
-    if (fs.existsSync(workingDirectory)) {
-        runRequiredProcess('git', ['fetch', '--all', '--tags'], workingDirectory, `update ${project.name}`);
-    } else {
-        runRequiredProcess('git', ['clone', project.location, workingDirectory], undefined, `clone ${project.name}`);
-    }
-
-    if (projectDate) {
-        const commit = runRequiredProcess(
-            'git',
-            ['rev-list', '-n', '1', `--before=${projectDate}`, 'HEAD'],
-            workingDirectory,
-            `resolve ${project.name} project-date commit`
-        ).trim();
-        if (!commit) {
-            throw new Error(`Could not find a ${project.name} commit before ${projectDate}.`);
-        }
-
-        runRequiredProcess('git', ['checkout', '--force', commit], workingDirectory, `checkout ${project.name}`);
-    }
-}
-
-function installEcosystemProjectDependencies(project: GeneratedEcosystemProject, workingDirectory: string): void {
-    if (project.dependencies && project.dependencies.length > 0) {
-        runRequiredProcess(
-            'python',
-            ['-m', 'pip', 'install', ...project.dependencies],
-            workingDirectory,
-            `install ${project.name} dependency metadata`
-        );
-    }
-
-    if (project.installCommand) {
-        runRequiredProcess(project.installCommand, [], workingDirectory, `run ${project.name} install command`, true);
-    }
-}
-
-function runRequiredProcess(
-    command: string,
-    args: readonly string[],
-    cwd: string | undefined,
-    description: string,
-    shell = false
-): string {
-    const result = spawnSync(command, args, {
-        cwd,
-        encoding: 'utf-8',
-        shell,
-    });
-
-    if (result.error) {
-        throw result.error;
-    }
-
-    if (result.status !== 0) {
-        throw new Error(
-            `Failed to ${description}.\nCommand: ${[command, ...args].join(' ')}\nExit status: ${
-                result.status ?? 'unknown'
-            }\nstderr:\n${(result.stderr ?? '').trim()}\nstdout:\n${(result.stdout ?? '').trim()}`
-        );
-    }
-
-    return result.stdout ?? '';
-}
-
-function executeEcosystemProject(
-    project: EcosystemSmokeProject,
-    projectRoot: string,
-    executableCommand: string
-): EcosystemBenchmarkResult {
-    const generatedProject = getGeneratedEcosystemProject(project.name);
-    if (!generatedProject) {
-        throw new Error(`No generated ecosystem metadata found for project ${project.name}.`);
-    }
-
-    const workingDirectory = path.join(projectRoot, generatedProject.name);
-    if (!fs.existsSync(workingDirectory)) {
-        throw new Error(`Expected ecosystem project checkout at ${workingDirectory}.`);
-    }
-
-    return executePyrightProjectCommand(project.name, generatedProject, workingDirectory, executableCommand);
-}
-
-export function executePyrightProjectCommand(
-    projectName: string,
-    project: GeneratedEcosystemProject,
-    workingDirectory: string,
-    executableCommand: string
-): EcosystemBenchmarkResult {
-    const pyrightConfigPath = writeProjectPyrightConfig(workingDirectory, project);
-    const invocation = resolvePyrightInvocationPaths(
-        buildPyrightInvocation(executableCommand, project, pyrightConfigPath),
-        process.cwd()
-    );
-    const startTime = process.hrtime.bigint();
-    const result = spawnSync(invocation.command, invocation.args, {
-        cwd: workingDirectory,
-        encoding: 'utf-8',
-    });
-    const elapsedMs = Number(process.hrtime.bigint() - startTime) / 1_000_000;
-
-    if (result.error) {
-        throw result.error;
-    }
-
-    const output = result.stdout?.trim();
-    if (!output) {
-        throw createPyrightExecutionError(projectName, invocation, result.status, result.stdout, result.stderr);
-    }
-
-    let jsonResults: PyrightJsonResults;
-    try {
-        jsonResults = JSON.parse(output) as PyrightJsonResults;
-    } catch (error) {
-        throw createPyrightExecutionError(projectName, invocation, result.status, result.stdout, result.stderr, error);
-    }
-    const diagnosticCount =
-        jsonResults.summary.errorCount + jsonResults.summary.warningCount + jsonResults.summary.informationCount;
-
-    return {
-        projectName,
-        totalTimeMs: Math.round(elapsedMs * 100) / 100,
-        filesAnalyzed: jsonResults.summary.filesAnalyzed,
-        diagnosticCount,
-        errorCount: jsonResults.summary.errorCount,
-        warningCount: jsonResults.summary.warningCount,
-        informationCount: jsonResults.summary.informationCount,
-        diagnostics: jsonResults.generalDiagnostics.map(normalizePyrightDiagnostic),
-    };
-}
-
-function compareAndWriteEcosystemBenchmarkReportFiles(
-    baselineReportPath: string,
-    candidateReportPath: string,
-    outputDir: string
-): BenchmarkReportComparisonArtifactPaths {
-    const baselineReport = loadBenchmarkReport<EcosystemBenchmarkResult>(baselineReportPath);
-    const candidateReport = loadBenchmarkReport<EcosystemBenchmarkResult>(candidateReportPath);
-    const comparison = compareEcosystemBenchmarkReportData(baselineReport, candidateReport);
-    const artifactPaths = writeBenchmarkReportComparisonArtifacts(
-        outputDir,
-        baselineReport,
-        candidateReport,
-        comparison
-    );
-
-    fs.writeFileSync(artifactPaths.markdownPath, renderEcosystemBenchmarkComparisonMarkdown(comparison), 'utf-8');
-    return artifactPaths;
-}
-
-function compareEcosystemDiagnosticResults(
-    baselineResults: readonly EcosystemBenchmarkResult[],
-    candidateResults: readonly EcosystemBenchmarkResult[]
-): EcosystemBenchmarkDiagnosticDiff[] {
-    const candidateByProject = new Map(candidateResults.map((result) => [result.projectName, result]));
-
-    return baselineResults.flatMap((baselineResult) => {
-        const candidateResult = candidateByProject.get(baselineResult.projectName);
-        if (!candidateResult) {
-            return [];
-        }
-
-        const baselineDiagnostics = getDiagnosticSignatureSet(baselineResult);
-        const candidateDiagnostics = getDiagnosticSignatureSet(candidateResult);
-        const added = [...candidateDiagnostics].filter((entry) => !baselineDiagnostics.has(entry)).sort();
-        const removed = [...baselineDiagnostics].filter((entry) => !candidateDiagnostics.has(entry)).sort();
-
-        return added.length > 0 || removed.length > 0
-            ? [{ projectName: baselineResult.projectName, added, removed }]
-            : [];
-    });
-}
-
-function getDiagnosticSignatureSet(result: EcosystemBenchmarkResult): Set<string> {
-    return new Set((result.diagnostics ?? []).map(formatDiagnosticSignature));
-}
-
-function normalizePyrightDiagnostic(
-    diagnostic: PyrightJsonResults['generalDiagnostics'][number]
-): EcosystemBenchmarkDiagnostic {
-    return {
-        file: diagnostic.file,
-        severity: diagnostic.severity,
-        message: diagnostic.message ?? '',
-    };
-}
-
-function formatDiagnosticSignature(diagnostic: EcosystemBenchmarkDiagnostic): string {
-    return [diagnostic.severity, diagnostic.file ?? '<unknown>', diagnostic.message].join(' | ');
-}
-
-function renderEcosystemBenchmarkComparisonMarkdown(comparison: EcosystemBenchmarkReportComparison): string {
-    const lines = [renderBenchmarkComparisonMarkdown(comparison).trimEnd(), '', '## Diagnostic Diffs', ''];
-
-    if (comparison.diagnosticDiffs.length === 0) {
-        lines.push('None.');
-        return `${lines.join('\n')}\n`;
-    }
-
-    for (const diff of comparison.diagnosticDiffs) {
-        lines.push(`### ${diff.projectName}`, '');
-        appendDiagnosticDiffList(lines, 'Added diagnostics', diff.added);
-        appendDiagnosticDiffList(lines, 'Removed diagnostics', diff.removed);
-    }
-
-    return `${lines.join('\n')}\n`;
-}
-
-function appendDiagnosticDiffList(lines: string[], heading: string, diagnostics: readonly string[]): void {
-    lines.push(`#### ${heading}`, '');
-
-    if (diagnostics.length === 0) {
-        lines.push('None.', '');
-        return;
-    }
-
-    for (const diagnostic of diagnostics) {
-        lines.push(`- ${diagnostic}`);
-    }
-
-    lines.push('');
-}
-
-function createPyrightExecutionError(
-    projectName: string,
-    invocation: { command: string; args: string[] },
-    status: number | null,
-    stdout: string | undefined,
-    stderr: string | undefined,
-    cause?: unknown
-): Error {
-    const stdoutPrefix = (stdout ?? '').trim().slice(0, 1000);
-    const stderrOutput = (stderr ?? '').trim();
-    const details = [
-        `Pyright execution for ${projectName} did not produce JSON output.`,
-        `Command: ${[invocation.command, ...invocation.args].join(' ')}`,
-        `Exit status: ${status ?? 'unknown'}`,
-    ];
-
-    if (cause instanceof Error) {
-        details.push(`JSON parse error: ${cause.message}`);
-    }
-
-    if (stderrOutput.length > 0) {
-        details.push(`stderr:\n${stderrOutput}`);
-    }
-
-    if (stdoutPrefix.length > 0) {
-        details.push(`stdout prefix:\n${stdoutPrefix}`);
-    }
-
-    return new Error(details.join('\n'));
-}
-
-export function buildPyrightInvocation(
-    executableCommand: string,
-    project: GeneratedEcosystemProject,
-    pyrightConfigPath?: string
-): { command: string; args: string[] } {
-    const template = project.pyrightCommand ?? '{pyright} {paths}';
-    const projectPaths = project.paths && project.paths.length > 0 ? project.paths : ['.'];
-    const tokens = tokenizeCommandTemplate(template);
-    const executableTokens = getExecutableCommandTokens(executableCommand);
-    if (executableTokens.length === 0) {
-        throw new Error('The Pyright executable command cannot be empty.');
-    }
-
-    const executableArgs = executableTokens.slice(1);
-    const pyrightArgs: string[] = [];
-    let command = executableTokens[0];
-    let insertedExecutable = false;
-
-    for (const token of tokens) {
-        if (token === '{pyright}') {
-            command = executableTokens[0];
-            insertedExecutable = true;
-            continue;
-        }
-
-        if (token === '{paths}') {
-            if (pyrightConfigPath) {
-                continue;
-            }
-
-            pyrightArgs.push(...projectPaths);
-            continue;
-        }
-
-        pyrightArgs.push(token);
-    }
-
-    if (!pyrightArgs.includes('--outputjson')) {
-        pyrightArgs.push('--outputjson');
-    }
-
-    if (pyrightConfigPath && !pyrightArgs.includes('-p') && !pyrightArgs.includes('--project')) {
-        pyrightArgs.push('-p', pyrightConfigPath);
-    }
-
-    const args = [...executableArgs];
-    if (requiresNodeArgumentSeparator(command, executableArgs, pyrightArgs)) {
-        args.push('--');
-    }
-
-    args.push(...pyrightArgs);
-
-    return { command, args };
-}
-
-export function writeProjectPyrightConfig(workingDirectory: string, project: GeneratedEcosystemProject): string {
-    const configDirectory = path.join(workingDirectory, '.pyright-benchmark');
-    fs.mkdirSync(configDirectory, { recursive: true });
-
-    const configPath = path.join(configDirectory, 'pyrightconfig.json');
-    const sourcePaths = selectProjectSourcePaths(project).map((entry) =>
-        getConfigRelativePath(configDirectory, path.resolve(workingDirectory, entry))
-    );
-    const projectConfigPath = path.join(workingDirectory, 'pyrightconfig.json');
-    const projectPyrightSettings = fs.existsSync(projectConfigPath)
-        ? {}
-        : readPyprojectPyrightSettings(workingDirectory, configDirectory);
-    const config: ProjectPyrightConfigFile = {
-        ...projectPyrightSettings,
-        extends: fs.existsSync(projectConfigPath)
-            ? getConfigRelativePath(configDirectory, projectConfigPath)
-            : undefined,
-        include: sourcePaths,
-        exclude: ['../**/test', '../**/tests', '../**/testing', '../**/test_*', '../**/*_test.py', '../**/*_tests.py'],
-    };
-
-    fs.writeFileSync(configPath, JSON.stringify(config, undefined, 2), 'utf-8');
-    return configPath;
-}
-
-function readPyprojectPyrightSettings(workingDirectory: string, configDirectory: string): Record<string, unknown> {
-    const pyprojectPath = path.join(workingDirectory, 'pyproject.toml');
-    if (!fs.existsSync(pyprojectPath)) {
-        return {};
-    }
-
-    const parsed = parse(fs.readFileSync(pyprojectPath, 'utf-8')) as { tool?: { pyright?: Record<string, unknown> } };
-    const pyrightSettings = parsed.tool?.pyright;
-    if (!pyrightSettings) {
-        return {};
-    }
-
-    const copiedSettings: Record<string, unknown> = {};
-    for (const [key, value] of Object.entries(pyrightSettings)) {
-        if (benchmarkOwnedConfigKeys.has(key)) {
-            continue;
-        }
-
-        copiedSettings[key] = rebasePyprojectConfigValue(key, value, workingDirectory, configDirectory);
-    }
-
-    return copiedSettings;
-}
-
-function rebasePyprojectConfigValue(
-    key: string,
-    value: unknown,
-    workingDirectory: string,
-    configDirectory: string
-): unknown {
-    if (pyrightPathArrayConfigKeys.has(key) && Array.isArray(value)) {
-        return value.map((entry) =>
-            typeof entry === 'string'
-                ? getConfigRelativePath(configDirectory, path.resolve(workingDirectory, entry))
-                : entry
-        );
-    }
-
-    if (pyrightPathStringConfigKeys.has(key) && typeof value === 'string') {
-        return getConfigRelativePath(configDirectory, path.resolve(workingDirectory, value));
-    }
-
-    return value;
-}
-
-function tokenizeCommandTemplate(template: string): string[] {
-    return Array.from(template.matchAll(/"([^"]*)"|'([^']*)'|\S+/g)).map((match) => match[1] ?? match[2] ?? match[0]);
-}
-
-function getExecutableCommandTokens(executableCommand: string): string[] {
-    return fs.existsSync(executableCommand) ? [executableCommand] : tokenizeCommandTemplate(executableCommand);
-}
-
-function resolvePyrightInvocationPaths(
-    invocation: { command: string; args: string[] },
-    baseDirectory: string
-): { command: string; args: string[] } {
-    const command = resolveExistingPath(baseDirectory, invocation.command);
-    const args = [...invocation.args];
-    const commandName = path.basename(command).toLowerCase();
-
-    if ((commandName === 'node' || commandName === 'node.exe') && args.length > 0) {
-        const firstArg = args[0];
-        if (firstArg !== '-e' && firstArg !== '--eval' && firstArg !== '--') {
-            args[0] = resolveExistingPath(baseDirectory, firstArg);
-        }
-    }
-
-    return { command, args };
-}
-
-function requiresNodeArgumentSeparator(command: string, executableArgs: string[], pyrightArgs: string[]): boolean {
-    if (pyrightArgs.length === 0) {
-        return false;
-    }
-
-    const commandName = path.basename(command).toLowerCase();
-    if (commandName !== 'node' && commandName !== 'node.exe') {
-        return false;
-    }
-
-    return executableArgs.includes('-e') || executableArgs.includes('--eval');
-}
-
-function selectProjectSourcePaths(project: GeneratedEcosystemProject): string[] {
-    const configuredPaths = project.paths && project.paths.length > 0 ? project.paths : ['.'];
-    const sourcePaths = configuredPaths.filter((entry) => !isTestLikePath(entry));
-
-    return sourcePaths.length > 0 ? sourcePaths : configuredPaths;
-}
-
-function isTestLikePath(entry: string): boolean {
-    return /(^|[\\/])(test|tests|testing|testdata)([\\/]|$)/i.test(entry);
-}
-
-function getConfigRelativePath(fromDirectory: string, targetPath: string): string {
-    const relativePath = path.relative(fromDirectory, targetPath);
-    return relativePath.length > 0 ? relativePath.replace(/\\/g, '/') : '.';
-}
-
-function resolveExistingPath(baseDirectory: string, entry: string): string {
-    if (path.isAbsolute(entry)) {
-        return entry;
-    }
-
-    const resolvedPath = path.resolve(baseDirectory, entry);
-    return fs.existsSync(resolvedPath) ? resolvedPath : entry;
-}
-
-function getWritableBenchmarkFilePath(...pathParts: string[]): string {
-    const sourceFilePath = path.resolve(__dirname, ...pathParts);
-    if (!sourceFilePath.includes(`${path.sep}out${path.sep}`)) {
-        return sourceFilePath;
-    }
-
-    return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', ...pathParts);
-}
-
-function writeNamedBenchmarkReport<ResultT>(
-    outputDir: string,
-    fileName: string,
-    report: BenchmarkReport<ResultT>
-): string {
-    const outputPath = path.join(outputDir, fileName);
-    fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8');
-    return outputPath;
-}
-
-if (require.main === module) {
-    runEcosystemBenchmark(process.argv.slice(2));
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.test.ts b/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.test.ts
deleted file mode 100644
index 92f26f6f8558..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.test.ts
+++ /dev/null
@@ -1,223 +0,0 @@
-/*
- * syncMypyPrimerProjects.test.ts
- * Copyright (c) Microsoft Corporation.
- *
- * Tests for the mypy_primer project sync scaffold.
- */
-
-import * as fs from 'fs';
-import * as os from 'os';
-import * as path from 'path';
-
-import {
-    getBenchmarkSourceDirectory,
-    getDefaultMypyPrimerProjectSourcePath,
-    parseMypyPrimerProjectSource,
-    syncMypyPrimerProjects,
-    writeGeneratedEcosystemProjects,
-} from './syncMypyPrimerProjects';
-
-const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS';
-
-const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip;
-
-benchmarkSuite('Sync Mypy Primer Projects', () => {
-    test('parses project blocks from mypy_primer source', () => {
-        const projects = parseMypyPrimerProjectSource(
-            [
-                'Project(',
-                '    location="https://github.com/psf/black",',
-                '    pyright_cmd="{pyright} {paths}",',
-                '    paths=["src"],',
-                ')',
-                '',
-                'Project(',
-                '    location="https://github.com/pydantic/pydantic",',
-                '    pyright_cmd="{pyright} {paths}",',
-                '    paths=["pydantic", "tests"],',
-                ')',
-            ].join('\n'),
-            'projects.py'
-        );
-
-        expect(projects).toEqual([
-            {
-                name: 'black',
-                mypyPrimerProject: 'black',
-                source: { kind: 'mypy-primer', inputFile: 'projects.py' },
-                location: 'https://github.com/psf/black',
-                pyrightCommand: '{pyright} {paths}',
-                paths: ['src'],
-            },
-            {
-                name: 'pydantic',
-                mypyPrimerProject: 'pydantic',
-                source: { kind: 'mypy-primer', inputFile: 'projects.py' },
-                location: 'https://github.com/pydantic/pydantic',
-                pyrightCommand: '{pyright} {paths}',
-                paths: ['pydantic', 'tests'],
-            },
-        ]);
-    });
-
-    test('parses upstream metadata and filters pyright-disabled projects', () => {
-        const projects = parseMypyPrimerProjectSource(
-            [
-                'Project(',
-                '    location="https://github.com/example/project",',
-                '    name_override="example-project",',
-                '    pyright_cmd="{pyright} {paths}",',
-                '    paths=["src"],',
-                '    deps=["types-requests"],',
-                '    install_cmd="python -m pip install -e .",',
-                '    platforms=["linux", "darwin"],',
-                '    cost=2.5,',
-                ')',
-                '',
-                'Project(',
-                '    location="https://github.com/example/project",',
-                '    pyright_cmd="{pyright} {paths}",',
-                '    paths=["lib"],',
-                ')',
-                '',
-                'Project(',
-                '    location="https://github.com/example/skip-me",',
-                '    pyright_cmd=None,',
-                ')',
-            ].join('\n'),
-            'projects.py'
-        );
-
-        expect(projects).toEqual([
-            {
-                name: 'example-project',
-                mypyPrimerProject: 'project',
-                source: { kind: 'mypy-primer', inputFile: 'projects.py' },
-                location: 'https://github.com/example/project',
-                pyrightCommand: '{pyright} {paths}',
-                paths: ['src'],
-                dependencies: ['types-requests'],
-                installCommand: 'python -m pip install -e .',
-                supportedPlatforms: ['linux', 'darwin'],
-                cost: 2.5,
-            },
-            {
-                name: 'project',
-                mypyPrimerProject: 'project',
-                source: { kind: 'mypy-primer', inputFile: 'projects.py' },
-                location: 'https://github.com/example/project',
-                pyrightCommand: '{pyright} {paths}',
-                paths: ['lib'],
-            },
-        ]);
-    });
-
-    test('deduplicates project names derived from repeated locations', () => {
-        const projects = parseMypyPrimerProjectSource(
-            [
-                'Project(location="https://github.com/example/project", pyright_cmd="{pyright}")',
-                'Project(location="https://github.com/example/project", pyright_cmd="{pyright}")',
-            ].join('\n'),
-            'projects.py'
-        );
-
-        expect(projects.map((project) => project.name)).toEqual(['project', 'project-2']);
-    });
-
-    test('writes generated ecosystem projects', () => {
-        const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-mypy-primer-sync-'));
-        const outputPath = path.join(outputDir, 'ecosystem-projects.generated.json');
-
-        try {
-            writeGeneratedEcosystemProjects(outputPath, [
-                {
-                    name: 'black',
-                    mypyPrimerProject: 'black',
-                    source: { kind: 'manual-snapshot' },
-                },
-            ]);
-
-            expect(JSON.parse(fs.readFileSync(outputPath, 'utf-8'))).toEqual([
-                {
-                    name: 'black',
-                    mypyPrimerProject: 'black',
-                    source: { kind: 'manual-snapshot' },
-                },
-            ]);
-        } finally {
-            fs.rmSync(outputDir, { force: true, recursive: true });
-        }
-    });
-
-    test('syncs project definitions from an input file', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-mypy-primer-cli-'));
-        const inputPath = path.join(tempDir, 'projects.py');
-        const outputPath = path.join(tempDir, 'ecosystem-projects.generated.json');
-
-        try {
-            fs.writeFileSync(
-                inputPath,
-                [
-                    'Project(',
-                    '    location="https://github.com/psf/black",',
-                    '    pyright_cmd="{pyright} {paths}",',
-                    '    paths=["src"],',
-                    ')',
-                ].join('\n'),
-                'utf-8'
-            );
-
-            const writtenPath = syncMypyPrimerProjects(['--input', inputPath, '--output', outputPath]);
-
-            expect(writtenPath).toBe(outputPath);
-            expect(JSON.parse(fs.readFileSync(outputPath, 'utf-8'))[0].name).toBe('black');
-            expect(path.isAbsolute(JSON.parse(fs.readFileSync(outputPath, 'utf-8'))[0].source.inputFile)).toBe(false);
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-
-    test('stores checked-in snapshot input paths relative to the benchmark source directory', () => {
-        const projects = parseMypyPrimerProjectSource(
-            [
-                'Project(',
-                '    location="https://github.com/psf/black",',
-                '    pyright_cmd="{pyright} {paths}",',
-                '    paths=["src"],',
-                ')',
-            ].join('\n'),
-            path.join(getBenchmarkSourceDirectory(), 'mypy_primer.smoke_projects.snapshot.py')
-        );
-
-        expect(projects[0].source.inputFile).toBe('mypy_primer.smoke_projects.snapshot.py');
-    });
-
-    test('defaults to the checked-in smoke snapshot and creates the output directory', () => {
-        const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-mypy-primer-default-'));
-        const outputPath = path.join(tempDir, 'nested', 'ecosystem-projects.generated.json');
-
-        try {
-            const writtenPath = syncMypyPrimerProjects(['--output', outputPath]);
-            const projects = JSON.parse(fs.readFileSync(outputPath, 'utf-8'));
-
-            expect(writtenPath).toBe(outputPath);
-            expect(fs.existsSync(getDefaultMypyPrimerProjectSourcePath())).toBe(true);
-            expect(projects).toHaveLength(10);
-            expect(
-                projects.every(
-                    (project: { source: { inputFile?: string } }) => !path.isAbsolute(project.source.inputFile ?? '')
-                )
-            ).toBe(true);
-            expect(projects.find((project: { name: string }) => project.name === 'black')).toMatchObject({
-                pyrightCommand: '{pyright} {paths}',
-                paths: ['src'],
-            });
-            expect(projects.find((project: { name: string }) => project.name === 'django-modern-rest')).toMatchObject({
-                pyrightCommand: '{pyright}',
-                paths: ['dmr'],
-            });
-        } finally {
-            fs.rmSync(tempDir, { force: true, recursive: true });
-        }
-    });
-});
diff --git a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.ts b/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.ts
deleted file mode 100644
index 25b9caf09f6b..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.ts
+++ /dev/null
@@ -1,231 +0,0 @@
-import commandLineArgs, { CommandLineOptions, OptionDefinition } from 'command-line-args';
-import * as fs from 'fs';
-import * as path from 'path';
-
-export interface GeneratedEcosystemProject {
-    name: string;
-    mypyPrimerProject: string;
-    source: {
-        kind: 'manual-snapshot' | 'mypy-primer';
-        inputFile?: string;
-    };
-    location?: string;
-    pyrightCommand?: string;
-    paths?: string[];
-    dependencies?: string[];
-    installCommand?: string;
-    supportedPlatforms?: string[];
-    cost?: number;
-}
-
-const optionDefinitions: OptionDefinition[] = [
-    { name: 'input', type: String },
-    { name: 'output', type: String },
-];
-
-const defaultMypyPrimerProjectSourcePath = getBenchmarkFilePath('mypy_primer.smoke_projects.snapshot.py');
-
-export function parseMypyPrimerProjectSource(sourceText: string, inputFile?: string): GeneratedEcosystemProject[] {
-    const blocks = extractProjectBlocks(sourceText);
-
-    return ensureUniqueProjectNames(
-        blocks.flatMap((block) => {
-            const project = parseProjectBlock(block, inputFile);
-            return project ? [project] : [];
-        })
-    ).sort((left, right) => left.name.localeCompare(right.name));
-}
-
-export function writeGeneratedEcosystemProjects(
-    outputPath: string,
-    projects: readonly GeneratedEcosystemProject[]
-): void {
-    fs.mkdirSync(path.dirname(outputPath), { recursive: true });
-    fs.writeFileSync(outputPath, `${JSON.stringify(projects, undefined, 2)}\n`, 'utf-8');
-}
-
-export function syncMypyPrimerProjects(args: string[]): string {
-    const parsedArgs = commandLineArgs(optionDefinitions, { argv: args }) as CommandLineOptions;
-    const inputPath = (parsedArgs.input as string | undefined) ?? defaultMypyPrimerProjectSourcePath;
-    const outputPath =
-        (parsedArgs.output as string | undefined) ?? getWritableBenchmarkFilePath('ecosystem-projects.generated.json');
-
-    const sourceText = fs.readFileSync(inputPath, 'utf-8');
-    const projects = parseMypyPrimerProjectSource(sourceText, inputPath);
-    writeGeneratedEcosystemProjects(outputPath, projects);
-    console.log(`Wrote ${projects.length} ecosystem project definitions to ${outputPath}`);
-
-    return outputPath;
-}
-
-export function getBenchmarkSourceDirectory(): string {
-    return path.dirname(getWritableBenchmarkFilePath('ecosystem-projects.generated.json'));
-}
-
-export function getDefaultMypyPrimerProjectSourcePath(): string {
-    return defaultMypyPrimerProjectSourcePath;
-}
-
-function getBenchmarkFilePath(filename: string): string {
-    const sourceFilePath = path.resolve(__dirname, filename);
-    if (fs.existsSync(sourceFilePath)) {
-        return sourceFilePath;
-    }
-
-    return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', filename);
-}
-
-function getWritableBenchmarkFilePath(filename: string): string {
-    const sourceFilePath = path.resolve(__dirname, filename);
-    if (!sourceFilePath.includes(`${path.sep}out${path.sep}`)) {
-        return sourceFilePath;
-    }
-
-    return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', filename);
-}
-
-function extractProjectBlocks(sourceText: string): string[] {
-    const blocks: string[] = [];
-    let startIndex = sourceText.indexOf('Project(');
-
-    while (startIndex >= 0) {
-        let depth = 0;
-        let inString = false;
-        let stringQuote = '';
-        let previousChar = '';
-
-        for (let index = startIndex; index < sourceText.length; index++) {
-            const currentChar = sourceText[index];
-
-            if (inString) {
-                if (currentChar === stringQuote && previousChar !== '\\') {
-                    inString = false;
-                    stringQuote = '';
-                }
-            } else if (currentChar === '"' || currentChar === "'") {
-                inString = true;
-                stringQuote = currentChar;
-            } else if (currentChar === '(') {
-                depth += 1;
-            } else if (currentChar === ')') {
-                depth -= 1;
-                if (depth === 0) {
-                    blocks.push(sourceText.slice(startIndex, index + 1));
-                    startIndex = sourceText.indexOf('Project(', index + 1);
-                    break;
-                }
-            }
-
-            previousChar = currentChar;
-        }
-
-        if (depth !== 0) {
-            throw new Error('Failed to parse mypy_primer project definitions.');
-        }
-    }
-
-    return blocks;
-}
-
-function parseProjectBlock(block: string, inputFile?: string): GeneratedEcosystemProject | undefined {
-    const location = matchSingleQuotedOrDoubleQuotedValue(block, 'location');
-    if (matchNoneValue(block, 'pyright_cmd')) {
-        return undefined;
-    }
-
-    const pyrightCommand = matchSingleQuotedOrDoubleQuotedValue(block, 'pyright_cmd');
-    const paths = matchStringArrayValue(block, 'paths');
-    const dependencies = matchStringArrayValue(block, 'deps');
-    const installCommand = matchSingleQuotedOrDoubleQuotedValue(block, 'install_cmd');
-    const supportedPlatforms = matchStringArrayValue(block, 'platforms');
-    const cost = matchNumberValue(block, 'cost');
-    const nameOverride = matchSingleQuotedOrDoubleQuotedValue(block, 'name_override');
-    const mypyPrimerProject = deriveProjectName(location);
-    const normalizedInputFile = inputFile ? normalizeInputFileReference(inputFile) : undefined;
-
-    return {
-        name: nameOverride ?? mypyPrimerProject,
-        mypyPrimerProject,
-        source: {
-            kind: 'mypy-primer',
-            inputFile: normalizedInputFile,
-        },
-        location,
-        pyrightCommand,
-        paths,
-        dependencies,
-        installCommand,
-        supportedPlatforms,
-        cost,
-    };
-}
-
-function ensureUniqueProjectNames(projects: readonly GeneratedEcosystemProject[]): GeneratedEcosystemProject[] {
-    const nameCounts = new Map<string, number>();
-
-    return projects.map((project) => {
-        const count = (nameCounts.get(project.name) ?? 0) + 1;
-        nameCounts.set(project.name, count);
-
-        if (count === 1) {
-            return project;
-        }
-
-        return { ...project, name: `${project.name}-${count}` };
-    });
-}
-
-function normalizeInputFileReference(inputFile: string): string {
-    if (!path.isAbsolute(inputFile)) {
-        return inputFile.replace(/\\/g, '/');
-    }
-
-    const benchmarkRelativePath = path.relative(getBenchmarkSourceDirectory(), inputFile);
-    if (!benchmarkRelativePath.startsWith('..') && !path.isAbsolute(benchmarkRelativePath)) {
-        return benchmarkRelativePath.replace(/\\/g, '/');
-    }
-
-    const cwdRelativePath = path.relative(process.cwd(), inputFile);
-    if (!cwdRelativePath.startsWith('..') && !path.isAbsolute(cwdRelativePath)) {
-        return cwdRelativePath.replace(/\\/g, '/');
-    }
-
-    return path.basename(inputFile);
-}
-
-function deriveProjectName(location: string | undefined): string {
-    if (!location) {
-        throw new Error('Each mypy_primer project must define a location.');
-    }
-
-    const trimmedLocation = location.replace(/\/+$/, '');
-    const slashIndex = trimmedLocation.lastIndexOf('/');
-    return slashIndex >= 0 ? trimmedLocation.slice(slashIndex + 1) : trimmedLocation;
-}
-
-function matchSingleQuotedOrDoubleQuotedValue(block: string, fieldName: string): string | undefined {
-    const match = block.match(new RegExp(`${fieldName}\\s*=\\s*(['\"])(.*?)\\1`, 's'));
-    return match?.[2];
-}
-
-function matchNoneValue(block: string, fieldName: string): boolean {
-    return new RegExp(`${fieldName}\\s*=\\s*None(,|\\s|\\))`, 's').test(block);
-}
-
-function matchNumberValue(block: string, fieldName: string): number | undefined {
-    const match = block.match(new RegExp(`${fieldName}\\s*=\\s*(\\d+(?:\\.\\d+)?)`, 's'));
-    return match ? Number(match[1]) : undefined;
-}
-
-function matchStringArrayValue(block: string, fieldName: string): string[] | undefined {
-    const match = block.match(new RegExp(`${fieldName}\\s*=\\s*\\[(.*?)\\]`, 's'));
-    if (!match) {
-        return undefined;
-    }
-
-    return Array.from(match[1].matchAll(/(['\"])(.*?)\1/g)).map((entry) => entry[2]);
-}
-
-if (require.main === module) {
-    syncMypyPrimerProjects(process.argv.slice(2));
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/syntheticCases.ts b/packages/pyright-internal/src/tests/benchmarks/syntheticCases.ts
deleted file mode 100644
index 2a581bd65de0..000000000000
--- a/packages/pyright-internal/src/tests/benchmarks/syntheticCases.ts
+++ /dev/null
@@ -1,169 +0,0 @@
-export function generateRecursiveAliasCase(depth: number): string {
-    const lines = ['from typing import TypeAlias', '', 'Alias0: TypeAlias = int', 'value0: Alias0 = 1'];
-
-    for (let i = 1; i <= depth; i++) {
-        lines.push(`Alias${i}: TypeAlias = list[Alias${i - 1}]`);
-        lines.push(`value${i}: Alias${i} = [value${i - 1}]`);
-    }
-
-    lines.push('');
-    lines.push(`def use_alias(value: Alias${depth}) -> Alias${depth}:`);
-    lines.push('    return value');
-    lines.push('');
-    lines.push(`result = use_alias(value${depth})`);
-
-    return `${lines.join('\n')}\n`;
-}
-
-export function generateOverloadUnionCrossProductCase(width: number): string {
-    const lines = ['from typing import Literal, overload', '', ''];
-
-    for (let left = 0; left < width; left++) {
-        for (let right = 0; right < width; right++) {
-            lines.push('@overload');
-            lines.push(
-                `def combine(left: Literal[${left}], right: Literal[${right}]) -> Literal[${left + right}]: ...`
-            );
-        }
-    }
-
-    lines.push('def combine(left: int, right: int) -> int:');
-    lines.push('    return left + right');
-    lines.push('');
-
-    const union = Array.from({ length: width }, (_, index) => `Literal[${index}]`).join(' | ');
-    lines.push(`def use(left: ${union}, right: ${union}) -> int:`);
-    lines.push('    return combine(left, right)');
-    lines.push('');
-    lines.push('result = use(0, 1)');
-
-    return `${lines.join('\n')}\n`;
-}
-
-export function generateProtocolMismatchCase(memberCount: number): string {
-    const lines = ['from typing import Protocol', '', 'class Expected(Protocol):'];
-
-    for (let i = 0; i < memberCount; i++) {
-        lines.push(`    def member_${i}(self) -> int: ...`);
-    }
-
-    lines.push('');
-    lines.push('class Candidate:');
-
-    for (let i = 0; i < memberCount - 1; i++) {
-        lines.push(`    def member_${i}(self) -> int:`);
-        lines.push(`        return ${i}`);
-    }
-
-    lines.push('');
-    lines.push('def consume(value: Expected) -> None:');
-    lines.push('    pass');
-    lines.push('');
-    lines.push('consume(Candidate())');
-
-    return `${lines.join('\n')}\n`;
-}
-
-export function generateGenericAliasChainCase(depth: number): string {
-    const lines = [
-        'from typing import Generic, TypeAlias, TypeVar',
-        '',
-        'T = TypeVar("T")',
-        '',
-        'class Box(Generic[T]):',
-        '    def __init__(self, value: T) -> None:',
-        '        self.value = value',
-        '',
-        'Alias0: TypeAlias = int',
-        'value0: Alias0 = 1',
-    ];
-
-    for (let i = 1; i <= depth; i++) {
-        lines.push(`Alias${i}: TypeAlias = Box[Alias${i - 1}]`);
-        lines.push(`value${i}: Alias${i} = Box(value${i - 1})`);
-    }
-
-    lines.push('');
-    lines.push(`def unwrap(value${depth}: Alias${depth}) -> Alias0:`);
-
-    for (let i = depth; i > 0; i--) {
-        lines.push(`    value${i - 1} = value${i}.value`);
-    }
-
-    lines.push('    return value0');
-    lines.push('');
-    lines.push(`result = unwrap(value${depth})`);
-
-    return `${lines.join('\n')}\n`;
-}
-
-export function generateConstrainedTypeVarMatrixCase(width: number): string {
-    const classNames = Array.from({ length: width }, (_, index) => `Item${index}`);
-    const lines = ['from typing import TypeVar', ''];
-
-    for (const className of classNames) {
-        lines.push(`class ${className}:`);
-        lines.push('    pass');
-        lines.push('');
-    }
-
-    lines.push(`TItem = TypeVar("TItem", ${classNames.join(', ')})`);
-    lines.push('');
-    lines.push('def choose(left: TItem, right: TItem) -> TItem:');
-    lines.push('    return left');
-    lines.push('');
-
-    for (let left = 0; left < width; left++) {
-        for (let right = 0; right < width; right++) {
-            lines.push(`value_${left}_${right} = choose(Item${left}(), Item${right}())`);
-        }
-    }
-
-    return `${lines.join('\n')}\n`;
-}
-
-export function generateLiteralUnionMathCase(width: number): string {
-    const literals = Array.from({ length: width }, (_, index) => `Literal[${index}]`);
-    const lines = ['from typing import Literal', '', `Value = ${literals.join(' | ')}`, ''];
-
-    lines.push('def bump(value: Value) -> int:');
-
-    for (let i = 0; i < width - 1; i++) {
-        const prefix = i === 0 ? 'if' : 'elif';
-        lines.push(`    ${prefix} value == ${i}:`);
-        lines.push(`        return value + ${i}`);
-    }
-
-    lines.push('    return value');
-    lines.push('');
-    lines.push('def combine(left: Value, right: Value) -> int:');
-    lines.push('    return bump(left) + bump(right)');
-    lines.push('');
-    lines.push('result = combine(0, 1)');
-
-    return `${lines.join('\n')}\n`;
-}
-
-export function generateTypedDictCase(keyCount: number): string {
-    const lines = ['from typing import TypedDict', '', 'class Payload(TypedDict):'];
-
-    for (let i = 0; i < keyCount; i++) {
-        lines.push(`    key_${i}: int`);
-    }
-
-    lines.push('');
-    lines.push('payload: Payload = {');
-
-    for (let i = 0; i < keyCount; i++) {
-        lines.push(`    "key_${i}": ${i},`);
-    }
-
-    lines.push('}');
-    lines.push('');
-    lines.push('def total(value: Payload) -> int:');
-    lines.push('    return ' + Array.from({ length: keyCount }, (_, index) => `value["key_${index}"]`).join(' + '));
-    lines.push('');
-    lines.push('result = total(payload)');
-
-    return `${lines.join('\n')}\n`;
-}
diff --git a/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts
index d2ffb9e6a5e3..48c1521badfb 100644
--- a/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts
+++ b/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts
@@ -13,21 +13,20 @@
  *   src/tests/benchmarks/.generated/benchmark-results/tokenizer/
  */
 
+import { execFileSync } from 'child_process';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+
 import { Tokenizer } from '../../parser/tokenizer';
-import {
-    calculateStats,
-    createBenchmarkReport,
-    formatCount,
-    loadBenchmarkCorpus,
-    runJestBenchmarkInFreshProcess,
-    writeBenchmarkReport,
-} from './benchmarkUtils';
 
 // --- Configuration ---
 
 const WARMUP_ITERATIONS = 3;
 const BENCHMARK_ITERATIONS = 10;
 
+const BENCHMARK_OUTPUT_DIR = path.join(__dirname, '.generated', 'benchmark-results', 'tokenizer');
+const JEST_BIN_PATH = path.resolve(__dirname, '..', '..', '..', 'node_modules', 'jest', 'bin', 'jest.js');
 const CHILD_RESULT_PREFIX = '__TOKENIZER_BENCHMARK_RESULT__';
 const CHILD_MODE_ENV = 'PYRIGHT_TOKENIZER_BENCH_CHILD';
 const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS';
@@ -48,8 +47,70 @@ interface BenchmarkResult {
     tokensPerSec: number;
 }
 
+interface BenchmarkReport {
+    timestamp: string;
+    system: {
+        platform: string;
+        arch: string;
+        cpus: string;
+        cpuCount: number;
+        totalMemoryMB: number;
+        nodeVersion: string;
+    };
+    config: {
+        warmupIterations: number;
+        benchmarkIterations: number;
+    };
+    results: BenchmarkResult[];
+}
+
 // --- Helpers ---
 
+function calculateStats(times: ReadonlyArray<number>): {
+    median: number;
+    p95: number;
+    min: number;
+    max: number;
+    avg: number;
+} {
+    const sorted = [...times].sort((a, b) => a - b);
+    const len = sorted.length;
+
+    const median = len % 2 === 0 ? (sorted[len / 2 - 1] + sorted[len / 2]) / 2 : sorted[Math.floor(len / 2)];
+    const p95Index = Math.ceil(len * 0.95) - 1;
+    const p95 = sorted[Math.min(p95Index, len - 1)];
+    const min = sorted[0];
+    const max = sorted[len - 1];
+    const avg = times.reduce((a, b) => a + b, 0) / len;
+
+    return { median, p95, min, max, avg };
+}
+
+function loadCorpus(filename: string): string {
+    const filePath = path.resolve(__dirname, '..', 'benchmarkData', filename);
+    return fs.readFileSync(filePath, 'utf-8');
+}
+
+function getSystemInfo(): BenchmarkReport['system'] {
+    const cpus = os.cpus();
+    return {
+        platform: os.platform(),
+        arch: os.arch(),
+        cpus: cpus[0]?.model ?? 'unknown',
+        cpuCount: cpus.length,
+        totalMemoryMB: Math.round(os.totalmem() / (1024 * 1024)),
+        nodeVersion: process.version,
+    };
+}
+
+function writeReport(report: BenchmarkReport): void {
+    fs.mkdirSync(BENCHMARK_OUTPUT_DIR, { recursive: true });
+    const filename = `tokenizer-benchmark-${new Date().toISOString().replace(/[:.]/g, '-')}.json`;
+    const outputPath = path.join(BENCHMARK_OUTPUT_DIR, filename);
+    fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8');
+    console.log(`\nBenchmark results written to: ${outputPath}`);
+}
+
 function printResultTable(results: ReadonlyArray<BenchmarkResult>): void {
     console.log('\n=== Tokenizer Benchmark Results ===\n');
     console.log(
@@ -68,7 +129,7 @@ function printResultTable(results: ReadonlyArray<BenchmarkResult>): void {
                 .toFixed(2)
                 .padStart(10)} ${result.avgMs.toFixed(2).padStart(10)} ${result.p95Ms
                 .toFixed(2)
-                .padStart(10)} ${formatCount(result.tokensPerSec).padStart(12)}`
+                .padStart(10)} ${Math.round(result.tokensPerSec).toLocaleString().padStart(12)}`
         );
     }
     console.log('');
@@ -78,14 +139,55 @@ function emitChildResult(result: BenchmarkResult): void {
     process.stdout.write(`${CHILD_RESULT_PREFIX}${JSON.stringify(result)}\n`);
 }
 
+function getChildOutput(error: unknown): string {
+    if (!(error instanceof Error)) {
+        return '';
+    }
+
+    const stdout = 'stdout' in error && typeof error.stdout === 'string' ? error.stdout : '';
+    const stderr = 'stderr' in error && typeof error.stderr === 'string' ? error.stderr : '';
+    return [stdout, stderr].filter((part) => part.length > 0).join('\n');
+}
+
+function escapeRegExp(text: string): string {
+    return text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+}
+
 function runBenchmarkInFreshProcess(testName: string): BenchmarkResult {
-    return runJestBenchmarkInFreshProcess(
-        __filename,
-        'Tokenizer Benchmark',
-        testName,
-        CHILD_RESULT_PREFIX,
-        CHILD_MODE_ENV
-    );
+    try {
+        const output = execFileSync(
+            process.execPath,
+            [
+                JEST_BIN_PATH,
+                __filename,
+                '--runInBand',
+                '--forceExit',
+                '--testTimeout=300000',
+                '--testNamePattern',
+                `^Tokenizer Benchmark ${escapeRegExp(testName)}$`,
+            ],
+            {
+                cwd: path.resolve(__dirname, '..', '..', '..'),
+                encoding: 'utf-8',
+                env: {
+                    ...process.env,
+                    [CHILD_MODE_ENV]: '1',
+                },
+            }
+        );
+
+        const resultLine = output.split(/\r?\n/).find((line) => line.startsWith(CHILD_RESULT_PREFIX));
+
+        if (!resultLine) {
+            throw new Error(`Child benchmark for "${testName}" did not emit a result.\n${output}`);
+        }
+
+        return JSON.parse(resultLine.slice(CHILD_RESULT_PREFIX.length)) as BenchmarkResult;
+    } catch (error) {
+        const output = getChildOutput(error);
+        const message = error instanceof Error ? error.message : String(error);
+        throw new Error(`Child benchmark for "${testName}" failed.\n${message}${output ? `\n${output}` : ''}`);
+    }
 }
 
 function benchmarkTokenize(corpusName: string, code: string): BenchmarkResult {
@@ -148,7 +250,7 @@ benchmarkSuite('Tokenizer Benchmark', () => {
     for (const { name, file } of corpora) {
         test(`tokenize ${name}`, () => {
             const result = isChildProcess
-                ? benchmarkTokenize(name, loadBenchmarkCorpus(file))
+                ? benchmarkTokenize(name, loadCorpus(file))
                 : runBenchmarkInFreshProcess(`tokenize ${name}`);
 
             if (!isChildProcess) {
@@ -156,9 +258,9 @@ benchmarkSuite('Tokenizer Benchmark', () => {
             }
 
             console.log(
-                `  ${name}: median=${result.medianMs.toFixed(2)}ms, tokens=${result.tokenCount}, tok/sec=${formatCount(
+                `  ${name}: median=${result.medianMs.toFixed(2)}ms, tokens=${result.tokenCount}, tok/sec=${Math.round(
                     result.tokensPerSec
-                )}`
+                ).toLocaleString()}`
             );
 
             if (isChildProcess) {
@@ -172,7 +274,7 @@ benchmarkSuite('Tokenizer Benchmark', () => {
 
     test('scaled corpus (10x large_stdlib)', () => {
         const result = isChildProcess
-            ? benchmarkTokenize('large_stdlib_10x', Array(10).fill(loadBenchmarkCorpus('large_stdlib.py')).join('\n'))
+            ? benchmarkTokenize('large_stdlib_10x', Array(10).fill(loadCorpus('large_stdlib.py')).join('\n'))
             : runBenchmarkInFreshProcess('scaled corpus (10x large_stdlib)');
 
         if (!isChildProcess) {
@@ -182,7 +284,7 @@ benchmarkSuite('Tokenizer Benchmark', () => {
         console.log(
             `  large_stdlib_10x: median=${result.medianMs.toFixed(2)}ms, tokens=${
                 result.tokenCount
-            }, tok/sec=${formatCount(result.tokensPerSec)}`
+            }, tok/sec=${Math.round(result.tokensPerSec).toLocaleString()}`
         );
 
         if (isChildProcess) {
@@ -199,10 +301,16 @@ benchmarkSuite('Tokenizer Benchmark', () => {
 
         printResultTable(allResults);
 
-        writeBenchmarkReport(
-            'tokenizer',
-            'tokenizer-benchmark',
-            createBenchmarkReport('tokenizer', WARMUP_ITERATIONS, BENCHMARK_ITERATIONS, allResults)
-        );
+        const report: BenchmarkReport = {
+            timestamp: new Date().toISOString(),
+            system: getSystemInfo(),
+            config: {
+                warmupIterations: WARMUP_ITERATIONS,
+                benchmarkIterations: BENCHMARK_ITERATIONS,
+            },
+            results: allResults,
+        };
+
+        writeReport(report);
     });
 });