diff --git a/.github/agents/typeshed-update-agent.md b/.github/agents/typeshed-update-agent.md index f7fcd719a65a..806a8fa3ff6f 100644 --- a/.github/agents/typeshed-update-agent.md +++ b/.github/agents/typeshed-update-agent.md @@ -20,8 +20,8 @@ The agent must follow the **Pyright Test Policy**. Update typeshed to Include the typeshed commit hash and summary. -For now, create PRs against `origin` rather than `upstream`. -Use the local `origin` remote as the PR base repository, and do not target `microsoft/pyright` unless explicitly asked. +The PR must target the upstream Pyright repository (`microsoft/pyright`), not the fork repository. +Use the fork branch as the PR head if needed, and verify the reported PR URL is under `github.com/microsoft/pyright`. ## Running The Update Script diff --git a/.github/plans/pyright_benchmark_plan.md b/.github/plans/pyright_benchmark_plan.md deleted file mode 100644 index db800e98aa29..000000000000 --- a/.github/plans/pyright_benchmark_plan.md +++ /dev/null @@ -1,1321 +0,0 @@ -# Pyright Ecosystem Performance, Correctness, and Heuristics Benchmark Plan - -## Goal - -Build a repeatable benchmark system for Pyright and Pylance that answers five questions on every meaningful change: - -1. Did diagnostics change? -2. Did total runtime regress? -3. Which phase regressed: parse, bind, type evaluation, import resolution, typeshed loading, completion building, or cache behavior? -4. Which project shape triggered it? -5. Can we safely tune evaluator heuristics such as recursion limits, union expansion limits, overload pruning thresholds, and protocol matching depth? - -The plan combines three ideas: - -- Use `mypy_primer` as the real-world project source of truth. -- Use Ty/Ruff-style cold, warm, incremental, and language-server benchmarks. -- Add Pyright-specific instrumentation so regressions and heuristic wins are explainable. - -## Status Update - -Current implementation status as of 2026-05-08: - -- Completed: benchmark test directory layout, shared benchmark utilities, benchmark README, parser/tokenizer JSON artifact output, - synthetic evaluator microbenchmarks, structured timing snapshots, evaluator phase timing metrics, curated ecosystem smoke - project manifest and selectors, and old/new/comparison report comparison helpers. -- Completed: comparison helpers now support summary sections, largest regressions/improvements, threshold classification, - `old.json`, `new.json`, `comparison.json`, and `comparison.md` generation, plus loading reports back from disk and a - one-call compare-and-write flow. -- Completed externally: CodSpeed bootstrap work has an initial PR in `bschnurr/pyright`, so the remaining local work is - to align this benchmark suite with that setup rather than starting CodSpeed integration from zero. -- In progress: ecosystem benchmark runner implementation. The manifest, selectors, report schema, comparison pipeline, - and a `runEcosystemBenchmark.ts` entry point are in place for smoke-suite selection, local project execution with - per-project generated `pyrightconfig.json` files, packaged-CLI local runs, and report comparison. There is not yet a - `mypy_primer`-backed runner that prepares project checkouts, installs dependencies, honors project dates, and executes - base/head Pyright across the smoke suite automatically. -- In progress: `mypy_primer` metadata synchronization now has a checked-in smoke snapshot, generated project metadata, - and local overrides for smoke-suite source roots. The sync parser handles the known upstream smoke metadata shape, - including `name_override`, `pyright_cmd=None`, dependency/install/platform/cost fields, duplicate derived names, and - portable checked-in `inputFile` paths. -- Not started: heuristic sweep harness and LSP benchmarks. - -### Review Gaps To Address Next - -The branch review identified the following gaps that should be treated as near-term work before broad CI use: - -Current PR staging note: create PRs against `origin` (`bschnurr/pyright`) for now. Do not target `upstream` -(`microsoft/pyright`) until the benchmark baseline and workflow shape are ready for upstream review. - -1. Add base/head Pyright orchestration. Project checkout preparation now exists, but CI still needs a workflow-level way - to build and pass distinct baseline and candidate Pyright commands for the selected project set. -2. Add a checked-in main-branch baseline report. PR runs need a stable comparison target before CI artifact lookup is in - place. Running the smoke suite on a known `main` commit should update a committed baseline artifact that PRs can use - as `old.json` when producing comparison reports. - ---- - -## Core Objectives - -The benchmark system should support these goals: - -1. Detect diagnostic regressions on real projects. -2. Detect total performance regressions. -3. Attribute regressions to parser, binder, evaluator, import resolver, typeshed, LSP, completion, or memory behavior. -4. Compare Pyright against Ty-style benchmark categories: cold check, warm check, time to first diagnostic, and incremental re-check. -5. Reuse the same ecosystem strategy as `mypy_primer`. -6. Benchmark Pylance/LSP operations like completion, hover, references, semantic tokens, and workspace load. -7. Safely tune Pyright type-evaluator heuristics and bailout thresholds. -8. Produce PR comments and artifacts that are useful to reviewers. -9. Support local developer workflows for comparing a branch against `main`. - ---- - -## Why Reuse `mypy_primer` - -`mypy_primer` already solves a hard problem: maintaining a real-world corpus of typed Python projects that can be checked by different type checkers. It includes project metadata such as: - -```python -Project( - location="https://github.com/pandas-dev/pandas", - pyright_cmd="{pyright} {paths}", - paths=["pandas"], - deps=[...], - expected_success=("mypy",), - cost={"mypy": 355, "ty": 14}, -) -``` - -Pyright should not invent a completely separate ecosystem list. Instead, Pyright should reuse the `mypy_primer` project list, then add Pyright-specific tags, benchmark tiers, performance metrics, and heuristic experiments. - -The role split should be: - -```text -mypy_primer: - Did real-world diagnostics change? - -Pyright benchmark harness: - Why did performance change? - Which phase changed? - Which project pattern exposed it? - Which heuristic settings are safe? -``` - ---- - -## Benchmark Categories - -### 1. Microbenchmarks - -Run on every relevant PR. - -Purpose: catch parser, binder, evaluator, and completion hot-path regressions quickly. - -Example cases: - -```text -micro/parser_large_file -micro/tokenizer_comments_strings -micro/binder_many_imports -micro/union_expansion -micro/large_union_narrowing -micro/overload_many_candidates -micro/overload_union_cross_product -micro/protocol_many_members_match -micro/protocol_many_members_mismatch -micro/recursive_protocol -micro/typed_dict_many_keys -micro/typevar_constraint_matrix -micro/deep_generic_alias_chain -micro/literal_union_math -micro/completion_list_building -``` - -Metrics: - -```text -elapsedMs -parseMs -bindMs -checkMs -tokens/sec -filesParsed -filesBound -filesChecked -AST node count -symbol count -type cache hits/misses -heapUsedMb -``` - -Use synthetic generators rather than committing giant hand-written Python files. - -Example generator targets: - -```text -generateLargeUnionNarrowingCase(10, 50, 100, 250) -generateManyOverloadsCase(10, 50, 100, 500) -generateProtocolCase(members=50, match=true/false) -generateLargeTypedDictCase(keys=100, 500, 1000) -generateImportGraphCase(files=100, 1000) -generateRecursiveAliasCase(depth=16, 32, 64, 128) -``` - ---- - -### 2. Ecosystem Smoke Benchmarks - -Run on most PRs that touch parser, binder, evaluator, import resolver, typeshed, or diagnostics. - -Use a curated subset of `mypy_primer` projects. - -Suggested smoke suite: - -```text -black -pytest -attrs -pydantic -python-chess -packaging -rich -mypy_primer -django-modern-rest -pandas -``` - -Reasoning: - -```text -black: - Parser-heavy, practical codebase. - -pytest: - Large, dynamic Python codebase. - -attrs: - Dataclass-like patterns and decorators. - -pydantic: - Decorators, generics, validation model patterns. - -python-chess: - Relatively clean expected-success signal. - -packaging: - Small stable baseline. - -rich: - Practical typed library with meaningful structure. - -mypy_primer: - Typed tool codebase. - -django-modern-rest: - Web, Django-ish, pydantic-ish patterns. - -pandas: - Data-science, stubs-heavy, overload-heavy. -``` - -Target runtime: under 10–15 minutes. - -Metrics: - -```text -diagnostic diff -total runtime -parse/bind/check/import resolver timings -files analyzed -memory usage -phase-level deltas -``` - ---- - -### 3. Full Ecosystem Benchmarks - -Run nightly, manually, and on risky PRs. - -Use all `mypy_primer` projects that support Pyright via `pyright_cmd`. - -Use sharding: - -```yaml -strategy: - matrix: - shard-index: [0, 1, 2, 3, 4, 5, 6, 7] -``` - -Inputs: - -```text ---suite full ---num-shards 8 ---shard-index N ---project-date YYYY-MM-DD -``` - -The full run should compare: - -```text -base commit vs head commit -old diagnostics vs new diagnostics -old metrics vs new metrics -old phase timings vs new phase timings -``` - ---- - -### 4. Ty-Style Benchmarks - -Ty tracks more than one mode. Pyright should mirror the same broad categories: - -```text -cold check: - Type-check a project from scratch. - -warm check: - Re-check with caches already populated. - -time to first diagnostic: - Start a language-server-like session and measure first diagnostics. - -incremental re-check: - Simulate an edit and measure diagnostics recomputation. -``` - -Benchmark operations: - -```text -cold[project] -warm[project] -first_diagnostic[project] -incremental[edit_private_function_body] -incremental[edit_public_function_signature] -incremental[edit_imported_symbol] -incremental[edit_protocol_member] -incremental[edit_type_alias] -incremental[edit_pyproject_config] -``` - -Track: - -```text -elapsedMs -files invalidated -files reparsed -files rebound -files rechecked -diagnostics recomputed -cache hits/misses -memory before/after -``` - ---- - -### 5. Pylance/LSP Benchmarks - -CLI type checking does not exercise all user-visible performance paths. Add a dedicated LSP harness. - -Operations: - -```text -lsp/open_workspace -lsp/first_diagnostics -lsp/completion_after_dot -lsp/completion_import_statement -lsp/completion_auto_imports_small -lsp/completion_auto_imports_large -lsp/hover_generic_call -lsp/go_to_definition -lsp/find_references -lsp/rename_symbol -lsp/document_symbols -lsp/workspace_symbols -lsp/semantic_tokens_large_file -``` - -Metrics: - -```text -request latency p50/p95 -items produced -items filtered -auto-import candidates scanned -sort/filter time -symbol index lookup time -diagnostics latency -semantic token count -heap before/after -``` - -Useful LSP stress workspaces: - -```text -large venv -pandas-like project -django-like project -repo with many exports -repo with many same-named symbols -repo with deep import graph -``` - ---- - -## Evaluator Heuristics Tuning - -This should be a first-class goal. - -Pyright has many evaluator heuristics and bailout thresholds. The benchmark suite should allow safe experimentation with: - -```text -recursion limits -union expansion limits -overload candidate pruning -protocol matching depth -recursive type alias expansion -speculative evaluation limits -constraint solver bailout thresholds -literal math / enum expansion thresholds -TypedDict key analysis limits -call-site cache eviction thresholds -type cache sizing -``` - -The benchmark suite should answer: - -```text -Can we lower or raise this limit? -Does it improve performance? -Does it change diagnostics? -Does it reduce worst-case cliffs? -Which real projects are affected? -``` - ---- - -## Evaluator Heuristic Sweeps - -Add a dedicated benchmark category: - -```text -packages/pyright-internal/benchmarks/evaluatorHeuristics/ - heuristicMatrix.json - runHeuristicSweep.ts - renderHeuristicReport.ts - cases/ - recursiveAlias.ts - deepGenericAlias.ts - overloadUnionExpansion.ts - protocolRecursive.ts - constrainedTypeVarExplosion.ts - typedDictHugeKeySet.ts -``` - -Example `heuristicMatrix.json`: - -```json -{ - "recursionDepthLimit": [16, 32, 64, 128], - "unionExpansionLimit": [16, 32, 64, 128], - "overloadCandidateLimit": [32, 64, 128, 256], - "protocolMatchDepthLimit": [8, 16, 32, 64], - "typeAliasExpansionLimit": [16, 32, 64, 128], - "speculativeEvalLimit": [64, 128, 256, 512] -} -``` - -Example command: - -```bash -node runHeuristicSweep.js --project pandas --heuristic unionExpansionLimit --values 16,32,64,128 -``` - -Possible hidden/test-only override mechanism: - -```bash -PYRIGHT_PERF_UNION_EXPANSION_LIMIT=32 -PYRIGHT_PERF_RECURSION_DEPTH_LIMIT=64 -PYRIGHT_PERF_PROTOCOL_DEPTH_LIMIT=16 -``` - -Or a test-only config object: - -```ts -const options = { - typeCheckingMode: "strict", - perfOptions: { - evaluatorHeuristics: { - unionExpansionLimit: 32, - recursionDepthLimit: 64, - protocolMatchDepthLimit: 16 - } - } -}; -``` - ---- - -## Heuristic Instrumentation - -Add optional counters for when heuristics trigger: - -```text -recursionLimitHitCount -unionExpansionLimitHitCount -overloadPrunedCandidateCount -protocolDepthLimitHitCount -typeAliasExpansionLimitHitCount -speculativeEvalLimitHitCount -constraintSolverBailoutCount -maxTypeEvalRecursionDepth -maxUnionExpansionSize -maxProtocolMatchDepth -maxOverloadCandidateCount -``` - -Example raw result: - -```json -{ - "case": "recursive_alias_depth_64", - "heuristic": "recursionDepthLimit", - "value": 32, - "diagnosticCount": 2, - "diagnosticDiff": false, - "elapsedMs": 84, - "checkMs": 72, - "bailoutCount": 1, - "maxObservedDepth": 31, - "cacheHitRate": 0.82 -} -``` - -Useful interpretation: - -```text -pandas: - checkMs: +2.1% - overloadPrunedCandidateCount: 0 - recursionLimitHitCount: 0 - -pydantic: - checkMs: -14.8% - speculativeEvalLimitHitCount: +120 - diagnosticDiff: false -``` - -That tells reviewers whether a heuristic helped safely. - ---- - -## Synthetic Cliff Tests - -Add synthetic cases that intentionally hit worst-case evaluator paths. - -```text -synthetic[recursive_alias_depth][16,32,64,128] -synthetic[overload_union_cross_product][4x4,8x8,16x16] -synthetic[protocol_recursive_members][8,16,32] -synthetic[generic_alias_chain][16,32,64,128] -synthetic[constrained_typevar_matrix][4,8,16] -synthetic[literal_union_math][32,64,128,256] -synthetic[typed_dict_key_count][100,500,1000] -``` - -Goal: reveal complexity cliffs. - -Example output: - -```text -recursive_alias_depth: - depth=16 8ms - depth=32 21ms - depth=64 98ms - depth=128 1100ms ⚠️ cliff -``` - ---- - -## Real-Project Heuristic Targets - -Run heuristic sweeps against selected ecosystem projects. - -```text -pandas: - overloads, stubs, data-science - -pydantic: - decorators, generics, dataclass-like transforms - -attrs: - dataclass-like, protocols - -sqlalchemy: - generics, overloads, ORM patterns - -xarray: - pandas/numpy typing, overloads - -jax: - numpy-style typing, generics - -pytest: - dynamic patterns, plugins - -django-modern-rest: - pydantic + web + serializers - -mypy_primer: - typed codebase, real tool -``` - -For each heuristic experiment, require: - -```text -no unexpected diagnostic diff -no new crashes -no large increase in Unknown/Any if tracked -performance improvement or reduced worst-case cliff -``` - ---- - -## Heuristic Decision Report - -Each heuristic sweep should produce a recommendation document. - -Example: - -```md -# Heuristic sweep: unionExpansionLimit - -## Recommendation - -Keep default at 64. - -## Why - -- 32 improves worst-case synthetic benchmarks by 18–40%. -- But 32 causes diagnostic diffs in pandas and xarray. -- 64 avoids diffs and still prevents 128-depth explosion. -- 128 gives no useful real-project benefit and increases check time in overload-heavy cases. - -## Results - -| Project | 32 | 64 | 128 | Diagnostic diff | -|---|---:|---:|---:|---| -| pandas | 41.2s | 44.0s | 46.7s | yes at 32 | -| pydantic | 12.1s | 12.4s | 12.8s | no | -| xarray | 31.4s | 33.0s | 36.5s | yes at 32 | -``` - -This turns heuristic tuning into an evidence-based process. - ---- - -## Project Tagging - -Add Pyright-specific tags on top of the `mypy_primer` manifest. - -Example `ecosystem-projects.overrides.json`: - -```json -{ - "pandas": { - "tags": ["large", "data-science", "numpy", "overloads", "stubs-heavy"] - }, - "jax": { - "tags": ["large", "ml", "numpy", "generics", "overloads"] - }, - "pydantic": { - "tags": ["decorators", "dataclass-like", "generics"] - }, - "attrs": { - "tags": ["dataclass-like", "stubs", "protocols"] - }, - "pytest": { - "tags": ["dynamic", "plugins", "large-tests"] - }, - "django-modern-rest": { - "tags": ["django", "pydantic", "web"] - }, - "sqlalchemy": { - "tags": ["orm", "generics", "overloads"] - }, - "xarray": { - "tags": ["data-science", "pandas", "numpy", "large"] - } -} -``` - -Commands: - -```bash -node runEcosystemBenchmark.js --tag overloads -node runEcosystemBenchmark.js --tag parser-heavy -node runEcosystemBenchmark.js --tag data-science -node runEcosystemBenchmark.js --tag decorators -node runEcosystemBenchmark.js --tag completion-heavy -``` - -This lets a parser PR run parser-heavy projects, while an overload PR runs overload-heavy projects. - ---- - -## Metrics Model - -Every benchmark should emit structured JSON. - -Example: - -```json -{ - "benchmark": "cold[pandas]", - "suite": "ecosystem-smoke", - "project": "pandas", - "commit": "abc123", - "totalMs": 123456, - "parseMs": 1234, - "bindMs": 2345, - "checkMs": 100000, - "importResolverMs": 3456, - "typeshedLoadMs": 789, - "filesParsed": 1234, - "filesBound": 1234, - "filesChecked": 1200, - "sourceLines": 500000, - "tokenCount": 8000000, - "astNodeCount": 3000000, - "symbolCount": 400000, - "typeCacheHits": 123456, - "typeCacheMisses": 12345, - "overloadResolutionCount": 9876, - "unionExpansionCount": 1234, - "speculativeEvalCount": 2222, - "heuristicCounters": { - "recursionLimitHitCount": 0, - "unionExpansionLimitHitCount": 12, - "overloadPrunedCandidateCount": 300 - }, - "diagnosticCount": 42, - "heapUsedMb": 512 -} -``` - ---- - -## Comparison Output - -Generate: - -```text -old.json -new.json -comparison.json -comparison.md -``` - -### Checked-In Main Baseline - -PR comparisons need a stable baseline before the workflow can reliably fetch prior CI artifacts. Add a checked-in smoke -baseline generated from a known `main` commit and use it as the default `old.json` input for PR smoke comparisons. - -Proposed layout: - -```text -packages/pyright-internal/src/tests/benchmarks/baselines/ - ecosystem-smoke-main.json - README.md -``` - -Baseline policy: - -- The checked-in baseline is generated only from `main` or an explicitly recorded main-branch commit. -- The baseline report records the Pyright commit SHA, project snapshot date, benchmark suite, selected projects, Node and - Python versions, platform, and generated config mode. -- PR benchmark runs generate `new.json` and compare it against `baselines/ecosystem-smoke-main.json` unless a fresher CI - artifact is supplied explicitly. -- Updating the checked-in baseline should be a deliberate maintenance action after benchmark harness changes, project - snapshot refreshes, or accepted performance/diagnostic shifts on `main`. -- The baseline should remain small and smoke-suite scoped. Full ecosystem and noisy exploratory runs should stay as CI - artifacts, not checked-in repository data. - -Near-term bootstrap command shape: - -```bash -npm run build:cli:dev -cd packages/pyright-internal -npm run build -npm run bench:ecosystem:run:local -- --suite smoke --project-root q:/path/to/main-checkouts --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-main -``` - -The implementation should add a runner option or script that copies the generated `baseline-report.json` into the -checked-in baseline path and stamps it with the source commit. PR comparison mode should accept that baseline path without -requiring the developer or workflow to manually rename files. - -Example Markdown report: - -```md -# Pyright Ecosystem Benchmark - -Base: abc123 -Head: def456 - -## Summary - -| Metric | Old | New | Delta | -|---|---:|---:|---:| -| Total time | 322.4s | 309.8s | -3.9% | -| Parse time | 24.1s | 17.2s | -28.6% | -| Bind time | 31.0s | 31.5s | +1.6% | -| Check time | 250.7s | 247.9s | -1.1% | - -## Largest Regressions - -| Project | Old | New | Delta | Phase | -|---|---:|---:|---:|---| -| pandas | 58.2s | 63.1s | +8.4% | check | -| jax | 41.0s | 43.7s | +6.6% | import resolver | - -## Largest Wins - -| Project | Old | New | Delta | Phase | -|---|---:|---:|---:|---| -| black | 11.2s | 8.0s | -28.6% | parse | -``` - ---- - -## Regression Thresholds - -Use both percent and absolute thresholds. - -Example: - -```json -{ - "failOnDiagnosticsDiff": true, - "warnTotalRegressionPct": 5, - "failTotalRegressionPct": 10, - "warnProjectRegressionPct": 10, - "failProjectRegressionPct": 20, - "minAbsoluteRegressionMs": 3000 -} -``` - -Reason: tiny projects can produce noisy percentage swings. - ---- - -## Project-Date Pinning - -Use a pinned project date for ecosystem stability. - -Example: - -```bash -mypy_primer --type-checker pyright --project-date 2026-01-01 -``` - -Store in the benchmark config: - -```json -{ - "projectDate": "2026-01-01" -} -``` - -Update the date intentionally, maybe monthly, not accidentally on every run. - ---- - -## File Layout - -```text -packages/pyright-internal/ - src/tests/benchmarks/ - README.md - - micro/ - runMicroBenchmarks.ts - cases/ - parserLargeFile.ts - tokenizerStrings.ts - overloadCache.ts - unionExpansion.ts - recursiveAlias.ts - protocolMatching.ts - typedDictHuge.ts - - ecosystem/ - ecosystem-projects.generated.json - ecosystem-projects.overrides.json - syncMypyPrimerProjects.ts - runEcosystemBenchmark.ts - compareBenchmarkResults.ts - renderMarkdownReport.ts - projectTags.ts - - lsp/ - runLspBenchmarks.ts - lspPerfHarness.ts - scenarios/ - completionLargeModule.json - completionAutoImports.json - hoverLargeUnion.json - semanticTokensLargeFile.json - findReferencesLargeWorkspace.json - - evaluatorHeuristics/ - heuristicMatrix.json - runHeuristicSweep.ts - renderHeuristicReport.ts - cases/ - recursiveAlias.ts - deepGenericAlias.ts - overloadUnionExpansion.ts - protocolRecursive.ts - constrainedTypeVarExplosion.ts - typedDictHugeKeySet.ts - - artifacts/ - .gitignore -``` - ---- - -## CI Workflows - -### PR Smoke Benchmark - -```yaml -name: Pyright ecosystem smoke benchmark - -on: - pull_request: - paths: - - 'packages/pyright/**' - - 'packages/pyright-internal/src/**' - - 'packages/pyright-internal/typeshed-fallback/**' - -jobs: - ecosystem-smoke: - runs-on: ubuntu-latest - - steps: - - uses: actions/checkout@v4 - - - uses: actions/setup-node@v4 - - - uses: actions/setup-python@v5 - with: - python-version: '3.11' - - - run: npm ci - - run: npm run build - - - run: python -m pip install -U pip - - run: pip install git+https://github.com/hauntsaninja/mypy_primer.git - - - name: Run smoke ecosystem benchmark - run: | - node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js \ - --suite smoke \ - --base origin/${{ github.base_ref }} \ - --head ${{ github.sha }} \ - --project-date 2026-01-01 \ - --output artifacts/ecosystem-smoke - - - uses: actions/upload-artifact@v4 - with: - name: pyright-ecosystem-smoke - path: artifacts/ecosystem-smoke -``` - -### Nightly Full Benchmark - -```yaml -name: Pyright ecosystem full benchmark - -on: - schedule: - - cron: '0 8 * * *' - workflow_dispatch: - -jobs: - full: - strategy: - fail-fast: false - matrix: - shard-index: [0,1,2,3,4,5,6,7] - - runs-on: ubuntu-latest - - steps: - - uses: actions/checkout@v4 - - uses: actions/setup-node@v4 - - uses: actions/setup-python@v5 - with: - python-version: '3.11' - - - run: npm ci - - run: npm run build - - run: python -m pip install -U pip - - run: pip install git+https://github.com/hauntsaninja/mypy_primer.git - - - run: | - node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js \ - --suite full \ - --num-shards 8 \ - --shard-index ${{ matrix.shard-index }} \ - --project-date 2026-01-01 \ - --output artifacts/full-${{ matrix.shard-index }} -``` - -### Manual Targeted Benchmark - -```yaml -on: - workflow_dispatch: - inputs: - tag: - description: 'Project tag: overloads, parser-heavy, data-science, decorators' - required: false - project: - description: 'Specific project regex' - required: false - heuristic: - description: 'Optional heuristic sweep name' - required: false -``` - ---- - -## Local Developer Commands - -Add scripts: - -```json -{ - "scripts": { - "bench:micro": "node packages/pyright-internal/benchmarks/micro/runMicroBenchmarks.js", - "bench:ecosystem:smoke": "node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js --suite smoke", - "bench:ecosystem:full": "node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js --suite full", - "bench:ecosystem:tag": "node packages/pyright-internal/benchmarks/ecosystem/runEcosystemBenchmark.js --tag", - "bench:lsp": "node packages/pyright-internal/benchmarks/lsp/runLspBenchmarks.js", - "bench:heuristics": "node packages/pyright-internal/benchmarks/evaluatorHeuristics/runHeuristicSweep.js" - } -} -``` - -Example usage: - -```bash -npm run bench:micro -npm run bench:ecosystem:smoke -npm run bench:ecosystem:tag -- overloads -npm run bench:lsp -npm run bench:heuristics -- --heuristic recursionDepthLimit --values 16,32,64,128 -``` - ---- - -## CodSpeed Integration - -Use CodSpeed for Tier 0 and selected stable microbenchmarks. - -Status update: - -- Initial CodSpeed setup already exists in an external PR in `bschnurr/pyright`. -- The next step in this repo is to wire the stable microbenchmark subset into that setup once the local benchmark entry - points match the expected runner shape. - -Good candidates: - -```text -parser large file -tokenizer comments/strings -union expansion -overload many candidates -protocol mismatch -typed dict many keys -completion list building -``` - -Do not start by putting all ecosystem benchmarks into CodSpeed. Use CodSpeed for stable, smaller, lower-noise cases. Use the ecosystem runner for heavier PR and nightly artifacts. - ---- - -## Optimization Use Cases - -### Parser/tokenizer rewrite - -Expected wins: - -```text -parseMs lower -token/sec higher -totalMs lower on parser-heavy projects -no diagnostic diff -``` - -Stress projects: - -```text -black -mypy -pytest -pandas -sphinx-like docs projects -``` - -### Import resolver/cache changes - -Expected wins: - -```text -importResolverMs lower -typeshedLoadMs lower -fewer filesystem stats -fewer repeated module resolutions -``` - -Stress projects: - -```text -pandas -xarray -jax -scikit-learn -django-style projects -large venv workspace -``` - -### Overload resolution optimization - -Expected wins: - -```text -checkMs lower -overloadResolutionCount same or lower -cache hit rate higher -diagnostics unchanged -``` - -Stress projects: - -```text -pandas -jax -xarray -pydantic -sqlalchemy -numpy/scipy-stubs if included -``` - -### Evaluator heuristic tuning - -Expected wins: - -```text -reduced worst-case cliffs -fewer runaway expansions -diagnostics unchanged -bounded cache/memory growth -``` - -Stress cases: - -```text -recursive aliases -deep generic aliases -union cross products -large overload sets -recursive protocols -constrained TypeVar matrices -``` - -### Completion list building - -Expected wins: - -```text -completion latency lower -auto-import scan time lower -sort/filter time lower -items unchanged or intentionally improved -``` - -Stress workspaces: - -```text -large venv -pandas project -django project -repo with many exports -``` - -### Typeshed/stub changes - -Expected wins: - -```text -diagnostic diffs explainable -typeshedLoadMs stable -checkMs stable -Unknown/Any regressions detected if tracked -``` - -Stress projects: - -```text -pandas -requests users -django-stubs users -numpy/scipy-stubs users -pydantic users -``` - ---- - -## MVP Implementation - -First useful version: - -1. [x] Add benchmark directory layout. -2. [~] Add `syncMypyPrimerProjects.ts`. - - [x] Parse the checked-in smoke snapshot into generated metadata. - - [x] Write generated metadata from the built sync script back to the source tree. - - [x] Remove machine-local absolute `inputFile` paths from checked-in generated metadata. - - [x] Support full upstream `mypy_primer/projects.py` sync fields: `name_override`, `pyright_cmd=None`, deps, - install command, supported platforms, cost, and duplicate-location entries. -3. [x] Generate `ecosystem-projects.generated.json`. -4. [x] Add `ecosystem-projects.overrides.json`. - - [x] Add smoke-suite source root overrides for upstream entries that omit `paths`. -5. [x] Add a smoke suite of 8–10 projects. -6. [~] Add `runEcosystemBenchmark.ts`. - - [x] Parse smoke-suite selection inputs (`--suite`, `--tag`, `--project`, `--num-shards`, `--shard-index`, `--output`). - - [x] Write a selection manifest artifact for the resolved project set. - - [x] Compare existing ecosystem benchmark reports into `old.json`, `new.json`, `comparison.json`, and `comparison.md`. - - [x] Execute selected local project checkouts with provided baseline/candidate Pyright commands. - - [x] Generate per-project `pyrightconfig.json` files with config-relative source roots. - - [x] Add a packaged-CLI local run path for realistic local execution. - - [x] Prepare selected project checkouts with `--prepare-projects`. - - [x] Honor `--project-date` during checkout preparation. - - [x] Install project dependencies/stubs according to synced metadata with `--install-dependencies`. - - [ ] Run base vs head Pyright for the selected projects from synchronized `mypy_primer` checkouts. - - [x] Resolve the smoke suite from generated project metadata plus local overrides. - - [x] Preserve or deliberately merge project-level Pyright configuration instead of blindly replacing it. - - [x] Extend project-level `pyrightconfig.json` files when they exist. - - [x] Merge `[tool.pyright]` settings from `pyproject.toml` when no `pyrightconfig.json` exists, while preserving the - benchmark-owned include/exclude scope. - - [x] Include process status, stderr, and command details when Pyright does not emit JSON. -7. [~] Run base vs head Pyright. - - [x] Execute local baseline/candidate commands against preexisting project checkouts. - - [x] Prepare project checkouts automatically when `--prepare-projects` is provided. - - [x] Honor `--project-date` during checkout preparation. - - [x] Install project dependencies/stubs according to synced metadata when `--install-dependencies` is provided. - - [ ] Build and pass distinct base/head Pyright commands automatically in CI. -8. [ ] Capture: - - [x] total runtime - - [x] files analyzed - - [x] diagnostic count - - [x] severity counts - - [x] diagnostic diff - - [ ] process memory -9. [~] Generate: - - [x] `old.json` - - [x] `new.json` - - [x] `comparison.json` - - [x] `comparison.md` - - [~] Wire these artifacts into an actual ecosystem benchmark runner output. - - [x] Include diagnostic and analyzed-file metrics in ecosystem comparison artifacts. - - [x] Add diagnostic-diff sections to comparison artifacts once diagnostic identities are captured. -10. [ ] Add checked-in main-branch smoke baseline. - - [x] Add `src/tests/benchmarks/baselines/README.md` documenting checked-in baseline policy. - - [ ] Add `src/tests/benchmarks/baselines/ecosystem-smoke-main.json`. - - [x] Stamp refreshed baselines with source commit SHA, project snapshot date, refresh timestamp, and config mode. - - [x] Add a script or runner option to update the checked-in baseline from a verified main-branch run. - - [x] Make PR comparison mode default to the checked-in baseline when no explicit baseline report is supplied. -11. [~] Add GitHub workflow. - - [x] Add a manual workflow for smoke comparison and baseline refresh runs. - - [x] In manual compare mode, run smoke benchmarks as `new.json` and compare against the checked-in main baseline. - - [x] In manual refresh mode, run smoke benchmarks and upload the refreshed checked-in baseline candidate. - - [ ] Add automatic PR triggering once the checked-in main baseline exists. -12. [ ] Add one heuristic sweep: - - `recursionDepthLimit` or `unionExpansionLimit` -13. [x] Add two synthetic heuristic cases: - - [x] recursive alias depth - - [x] overload union cross product -14. [ ] Add one heuristic report: - - `heuristic-recommendation.md` - -MVP smoke project list: - -```text -black -pytest -attrs -pydantic -python-chess -packaging -rich -mypy_primer -django-modern-rest -pandas -``` - ---- - -## Longer-Term Implementation Stages - -### Stage 1: Correctness + wall time - -Use `mypy_primer` project list. Compare old vs new Pyright output and total runtime. - -### Stage 2: Phase metrics - -Add Pyright benchmark JSON output with parse, bind, check, import resolver, typeshed, and memory metrics. - -### Stage 3: LSP metrics - -Add Pylance-style LSP operation benchmark harness. - -### Stage 4: Heuristic sweeps - -Add test-only evaluator heuristic overrides and sweep reports. - -### Stage 5: PR comments - -Post concise benchmark summaries on PRs. - -### Stage 6: CodSpeed - -Use CodSpeed for stable microbenchmarks and low-noise hot paths. - -### Stage 7: Nightly dashboards - -Track trends over time for full ecosystem and heuristic counters. - ---- - -## Final Design Principle - -Use `mypy_primer` as the ecosystem correctness corpus, but own the Pyright performance and heuristic story. - -`mypy_primer` answers: - -```text -Did behavior change on real projects? -``` - -The Pyright benchmark harness answers: - -```text -Why did performance change? -Which phase changed? -Which project pattern exposed it? -Which evaluator heuristic setting is safe? -What should reviewers do with this information? -``` diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index ac0558702a80..fae015806ec4 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -12,7 +12,7 @@ on: jobs: build: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' + if: github.repository == 'microsoft/pyright' runs-on: ubuntu-latest name: Build @@ -43,7 +43,7 @@ jobs: path: packages/vscode-pyright/${{ env.VSIX_NAME }} create_release: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' + if: github.repository == 'microsoft/pyright' runs-on: ubuntu-latest name: Create release needs: [build] diff --git a/.github/workflows/mypy_primer_comment.yaml b/.github/workflows/mypy_primer_comment.yaml index aa90dc84dac9..d9754a76d824 100644 --- a/.github/workflows/mypy_primer_comment.yaml +++ b/.github/workflows/mypy_primer_comment.yaml @@ -18,7 +18,7 @@ jobs: comment: name: Comment PR from mypy_primer runs-on: ubuntu-latest - if: ${{ (github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright') && github.event.workflow_run.conclusion == 'success' }} + if: ${{ github.event.workflow_run.conclusion == 'success' }} steps: - name: Download diffs uses: actions/github-script@v7 diff --git a/.github/workflows/mypy_primer_pr.yaml b/.github/workflows/mypy_primer_pr.yaml index fc16feb937f3..5b1d4eb1d0c0 100644 --- a/.github/workflows/mypy_primer_pr.yaml +++ b/.github/workflows/mypy_primer_pr.yaml @@ -29,7 +29,6 @@ concurrency: jobs: mypy_primer: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' name: Run mypy_primer on PR runs-on: ubuntu-latest permissions: diff --git a/.github/workflows/mypy_primer_push.yaml b/.github/workflows/mypy_primer_push.yaml index 08c191a3f157..db1c9270e479 100644 --- a/.github/workflows/mypy_primer_push.yaml +++ b/.github/workflows/mypy_primer_push.yaml @@ -22,7 +22,6 @@ concurrency: jobs: mypy_primer: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' name: Run mypy_primer on push runs-on: ubuntu-latest permissions: diff --git a/.github/workflows/pyright_ecosystem_benchmark.yaml b/.github/workflows/pyright_ecosystem_benchmark.yaml deleted file mode 100644 index 5d2f40aa8e44..000000000000 --- a/.github/workflows/pyright_ecosystem_benchmark.yaml +++ /dev/null @@ -1,130 +0,0 @@ -name: Pyright ecosystem benchmark - -on: - workflow_dispatch: - inputs: - mode: - description: 'Run mode' - required: true - default: compare - type: choice - options: - - compare - - refresh-baseline - project_date: - description: 'Project checkout date passed to the ecosystem runner' - required: true - default: '2026-01-01' - type: string - project: - description: 'Optional project name regex filter' - required: false - default: '' - type: string - install_dependencies: - description: 'Install synced ecosystem project dependencies before running Pyright' - required: false - default: false - type: boolean - -concurrency: - group: ${{ github.workflow }}-${{ github.ref }}-${{ inputs.mode }} - cancel-in-progress: true - -jobs: - ecosystem-smoke: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' - name: Ecosystem smoke benchmark - runs-on: ubuntu-latest - permissions: - contents: read - steps: - - uses: actions/checkout@v4 - with: - fetch-depth: 0 - - - uses: actions/setup-node@v4 - with: - node-version: '20' - - - uses: actions/setup-python@v5 - with: - python-version: '3.11' - - - name: Install dependencies - run: npm ci - - - name: Build Pyright CLI and benchmark runner - run: | - npm run build:cli:dev - cd packages/pyright-internal - npm run build - npm run bench:ecosystem:sync - - - name: Run ecosystem smoke comparison - if: ${{ inputs.mode == 'compare' }} - shell: bash - run: | - set -euo pipefail - cd packages/pyright-internal - - baseline_path="./src/tests/benchmarks/baselines/ecosystem-smoke-main.json" - if [[ ! -f "$baseline_path" ]]; then - echo "Missing checked-in ecosystem smoke baseline at $baseline_path" >&2 - exit 1 - fi - - args=( - --suite smoke - --project-root "$GITHUB_WORKSPACE/.ecosystem-projects" - --prepare-projects - --project-date "${{ inputs.project_date }}" - --candidate-executable "node ../pyright/index.js" - --main-baseline-report "$baseline_path" - --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-pr - ) - - if [[ -n "${{ inputs.project }}" ]]; then - args+=(--project "${{ inputs.project }}") - fi - - if [[ "${{ inputs.install_dependencies }}" == "true" ]]; then - args+=(--install-dependencies) - fi - - npm run bench:ecosystem:run -- "${args[@]}" - - - name: Refresh ecosystem smoke baseline - if: ${{ inputs.mode == 'refresh-baseline' }} - shell: bash - run: | - set -euo pipefail - cd packages/pyright-internal - - args=( - --suite smoke - --project-root "$GITHUB_WORKSPACE/.ecosystem-projects" - --prepare-projects - --project-date "${{ inputs.project_date }}" - --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-main - --baseline-source-commit "$GITHUB_SHA" - ) - - if [[ -n "${{ inputs.project }}" ]]; then - args+=(--project "${{ inputs.project }}") - fi - - if [[ "${{ inputs.install_dependencies }}" == "true" ]]; then - args+=(--install-dependencies) - fi - - npm run bench:ecosystem:update-main-baseline -- "${args[@]}" - - - name: Upload ecosystem benchmark artifacts - uses: actions/upload-artifact@v4 - with: - name: pyright-ecosystem-benchmark-${{ inputs.mode }} - path: | - packages/pyright-internal/src/tests/benchmarks/.generated/benchmark-results/ - packages/pyright-internal/src/tests/benchmarks/baselines/ecosystem-smoke-main.json - if-no-files-found: warn diff --git a/.github/workflows/validation.yml b/.github/workflows/validation.yml index 5368eb2bb58e..7273dcd6ff48 100644 --- a/.github/workflows/validation.yml +++ b/.github/workflows/validation.yml @@ -14,7 +14,7 @@ on: jobs: typecheck: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' + if: github.repository == 'microsoft/pyright' runs-on: ubuntu-latest name: Typecheck @@ -34,7 +34,7 @@ jobs: - run: npx lerna exec --stream --no-bail -- tsc --noEmit style: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' + if: github.repository == 'microsoft/pyright' runs-on: ubuntu-latest name: Style @@ -57,7 +57,6 @@ jobs: - run: npm run check test: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' strategy: fail-fast: false matrix: @@ -127,7 +126,6 @@ jobs: working-directory: packages/pyright-internal build: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' runs-on: ubuntu-latest name: Build needs: typecheck @@ -153,7 +151,6 @@ jobs: working-directory: packages/vscode-pyright required: - if: github.repository == 'microsoft/pyright' || github.repository == 'bschnurr/pyright' runs-on: ubuntu-latest name: Required needs: diff --git a/packages/pyright-internal/package.json b/packages/pyright-internal/package.json index d218133453f6..ba66f68d1894 100644 --- a/packages/pyright-internal/package.json +++ b/packages/pyright-internal/package.json @@ -1,63 +1,58 @@ -{ - "name": "pyright-internal", - "displayName": "pyright", - "description": "Type checker for the Python language", - "version": "1.1.409", - "license": "MIT", - "private": true, - "files": [ - "dist" - ], - "scripts": { - "build": "tsc", - "clean": "shx rm -rf ./dist ./out", - "webpack:testserver": "rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development", - "webpack:testserver:watch": "npm run clean && rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development --watch", - "test": "npm run webpack:testserver && node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks", - "test:norebuild": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks", - "test:benchmark": "cross-env PYRIGHT_RUN_BENCHMARKS=1 node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testTimeout=300000 --runInBand --detectOpenHandles src/tests/benchmarks", - "bench:ecosystem:run": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js", - "bench:ecosystem:run:local": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js --baseline-executable \"node ../pyright/index.js\" --candidate-executable \"node ../pyright/index.js\"", - "bench:ecosystem:update-main-baseline": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js --baseline-executable \"node ../pyright/index.js\" --update-main-baseline", - "bench:ecosystem:sync": "node ./out/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.js", - "bench:ecosystem:smoke": "node ./out/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.js --suite smoke --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-run", - "test:coverage": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks --reporters=jest-junit --reporters=default --coverage --coverageReporters=cobertura --coverageReporters=html --coverageReporters=json", - "test:imports": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest importResolver.test --forceExit --runInBand" - }, - "dependencies": { - "@yarnpkg/fslib": "2.10.4", - "@yarnpkg/libzip": "2.3.0", - "chalk": "^4.1.2", - "chokidar": "^3.6.0", - "command-line-args": "^5.2.1", - "jsonc-parser": "^3.3.1", - "smol-toml": "^1.6.1", - "source-map-support": "^0.5.21", - "tmp": "^0.2.5", - "vscode-jsonrpc": "^9.0.0-next.8", - "vscode-languageserver": "^10.0.0-next.13", - "vscode-languageserver-protocol": "^3.17.6-next.13", - "vscode-languageserver-textdocument": "^1.0.11", - "vscode-languageserver-types": "^3.17.6-next.6", - "vscode-uri": "^3.1.0" - }, - "devDependencies": { - "@rspack/cli": "^1.7.2", - "@rspack/core": "^1.7.2", - "@types/command-line-args": "^5.2.3", - "@types/fs-extra": "^11.0.4", - "@types/jest": "^30.0.0", - "@types/lodash": "^4.17.23", - "@types/node": "^22.19.6", - "@types/tmp": "^0.2.6", - "esbuild-loader": "^4.4.2", - "jest": "^30.2.0", - "jest-junit": "^16.0.0", - "shx": "^0.4.0", - "ts-jest": "^29.4.6", - "ts-loader": "^9.5.4", - "typescript": "~5.5.4", - "webpack": "^5.104.1", - "word-wrap": "1.2.5" - } -} +{ + "name": "pyright-internal", + "displayName": "pyright", + "description": "Type checker for the Python language", + "version": "1.1.409", + "license": "MIT", + "private": true, + "files": [ + "dist" + ], + "scripts": { + "build": "tsc", + "clean": "shx rm -rf ./dist ./out", + "webpack:testserver": "rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development", + "webpack:testserver:watch": "npm run clean && rspack build --config ./src/tests/lsp/rspack.testserver.config.js --mode development --watch", + "test": "npm run webpack:testserver && node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks", + "test:norebuild": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks", + "test:benchmark": "cross-env PYRIGHT_RUN_BENCHMARKS=1 node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testTimeout=300000 --runInBand --detectOpenHandles src/tests/benchmarks", + "test:coverage": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest --forceExit --testPathIgnorePatterns src/tests/benchmarks --reporters=jest-junit --reporters=default --coverage --coverageReporters=cobertura --coverageReporters=html --coverageReporters=json", + "test:imports": "node --max-old-space-size=8192 --expose-gc ./node_modules/jest/bin/jest importResolver.test --forceExit --runInBand" + }, + "dependencies": { + "@yarnpkg/fslib": "2.10.4", + "@yarnpkg/libzip": "2.3.0", + "chalk": "^4.1.2", + "chokidar": "^3.6.0", + "command-line-args": "^5.2.1", + "jsonc-parser": "^3.3.1", + "smol-toml": "^1.6.1", + "source-map-support": "^0.5.21", + "tmp": "^0.2.5", + "vscode-jsonrpc": "^9.0.0-next.8", + "vscode-languageserver": "^10.0.0-next.13", + "vscode-languageserver-protocol": "^3.17.6-next.13", + "vscode-languageserver-textdocument": "^1.0.11", + "vscode-languageserver-types": "^3.17.6-next.6", + "vscode-uri": "^3.1.0" + }, + "devDependencies": { + "@rspack/cli": "^1.7.2", + "@rspack/core": "^1.7.2", + "@types/command-line-args": "^5.2.3", + "@types/fs-extra": "^11.0.4", + "@types/jest": "^30.0.0", + "@types/lodash": "^4.17.23", + "@types/node": "^22.19.6", + "@types/tmp": "^0.2.6", + "esbuild-loader": "^4.4.2", + "jest": "^30.2.0", + "jest-junit": "^16.0.0", + "shx": "^0.4.0", + "ts-jest": "^29.4.6", + "ts-loader": "^9.5.4", + "typescript": "~5.5.4", + "webpack": "^5.104.1", + "word-wrap": "1.2.5" + } +} diff --git a/packages/pyright-internal/src/common/timing.ts b/packages/pyright-internal/src/common/timing.ts index 2487f5d6db33..29b41ad3f52e 100644 --- a/packages/pyright-internal/src/common/timing.ts +++ b/packages/pyright-internal/src/common/timing.ts @@ -10,24 +10,6 @@ import { ConsoleInterface } from './console'; -export interface TimingStatSnapshot { - totalTimeMs: number; - callCount: number; -} - -export interface TimingStatsSnapshot { - totalDurationMs: number; - findFiles: TimingStatSnapshot; - readFile: TimingStatSnapshot; - tokenize: TimingStatSnapshot; - parse: TimingStatSnapshot; - resolveImports: TimingStatSnapshot; - cycleDetection: TimingStatSnapshot; - bind: TimingStatSnapshot; - typeCheck: TimingStatSnapshot; - typeEvaluation: TimingStatSnapshot; -} - export class Duration { private _startTime: number; @@ -84,20 +66,6 @@ export class TimingStat { const roundedTime = Math.round(totalTimeInSec * 100) / 100; return roundedTime.toString() + 'sec'; } - - getSnapshot(): TimingStatSnapshot { - return { - totalTimeMs: this.totalTime, - callCount: this.callCount, - }; - } -} - -function subtractTimingStatSnapshot(end: TimingStatSnapshot, start: TimingStatSnapshot): TimingStatSnapshot { - return { - totalTimeMs: end.totalTimeMs - start.totalTimeMs, - callCount: end.callCount - start.callCount, - }; } export class TimingStats { @@ -132,38 +100,6 @@ export class TimingStats { getTotalDuration() { return this.totalDuration.getDurationInSeconds(); } - - getSnapshot(): TimingStatsSnapshot { - return { - totalDurationMs: this.totalDuration.getDurationInMilliseconds(), - findFiles: this.findFilesTime.getSnapshot(), - readFile: this.readFileTime.getSnapshot(), - tokenize: this.tokenizeFileTime.getSnapshot(), - parse: this.parseFileTime.getSnapshot(), - resolveImports: this.resolveImportsTime.getSnapshot(), - cycleDetection: this.cycleDetectionTime.getSnapshot(), - bind: this.bindTime.getSnapshot(), - typeCheck: this.typeCheckerTime.getSnapshot(), - typeEvaluation: this.typeEvaluationTime.getSnapshot(), - }; - } - - getSnapshotDelta(start: TimingStatsSnapshot): TimingStatsSnapshot { - const end = this.getSnapshot(); - - return { - totalDurationMs: end.totalDurationMs - start.totalDurationMs, - findFiles: subtractTimingStatSnapshot(end.findFiles, start.findFiles), - readFile: subtractTimingStatSnapshot(end.readFile, start.readFile), - tokenize: subtractTimingStatSnapshot(end.tokenize, start.tokenize), - parse: subtractTimingStatSnapshot(end.parse, start.parse), - resolveImports: subtractTimingStatSnapshot(end.resolveImports, start.resolveImports), - cycleDetection: subtractTimingStatSnapshot(end.cycleDetection, start.cycleDetection), - bind: subtractTimingStatSnapshot(end.bind, start.bind), - typeCheck: subtractTimingStatSnapshot(end.typeCheck, start.typeCheck), - typeEvaluation: subtractTimingStatSnapshot(end.typeEvaluation, start.typeEvaluation), - }; - } } export const timingStats = new TimingStats(); diff --git a/packages/pyright-internal/src/tests/benchmarks/README.md b/packages/pyright-internal/src/tests/benchmarks/README.md deleted file mode 100644 index 8de27c6c9722..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/README.md +++ /dev/null @@ -1,130 +0,0 @@ -# Pyright Benchmarks - -This directory contains opt-in performance benchmarks for Pyright internals. They are excluded from the normal Jest test -suite and run through the package benchmark script. - -```bash -cd packages/pyright-internal -npm run test:benchmark -``` - -Benchmark JSON artifacts are written under: - -```text -src/tests/benchmarks/.generated/benchmark-results/ -``` - -## Current Suites - -- `parserBenchmark.test.ts` measures parser throughput over representative Python corpora. -- `tokenizerBenchmark.test.ts` measures tokenizer throughput and runs each corpus in a fresh child process to reduce - cross-test heap effects. -- `evaluatorBenchmark.test.ts` measures cold analysis time for generated evaluator-heavy Python cases. -- `ecosystemSmokeBenchmark.test.ts` validates the curated ecosystem smoke project manifest and writes it as a JSON - artifact derived from generated project metadata and local overrides for future mypy_primer-based runners. -- `runEcosystemBenchmark.ts` provides the first ecosystem runner entry point: it resolves smoke-suite selection from CLI - filters, writes a run manifest artifact, executes selected local project checkouts with provided Pyright commands, - and compares existing or freshly executed ecosystem report files into - `old.json`/`new.json`/`comparison.json`/`comparison.md` artifacts, including diagnostic count metrics and added/removed - diagnostic summaries. -- `syncMypyPrimerProjects.ts` is the first sync scaffold for normalizing `mypy_primer` project definitions into the - generated ecosystem metadata file consumed by the smoke manifest. The checked-in smoke snapshot now carries the - upstream `pyright_cmd` and `paths` data for the current smoke suite, so generated project configs can target real - source roots like `src`, `pandas`, `pydantic`, and `chess` instead of defaulting to the repo root. -- `syntheticCases.ts` contains deterministic Python generators for recursive aliases, overload/union cross products, - protocol mismatches, generic alias chains, constrained TypeVar matrices, literal-union math, and large TypedDicts. -- `ecosystemSmokeProjects.ts` derives the smoke project list from `ecosystem-projects.generated.json` and - `ecosystem-projects.overrides.json`, then exposes the existing tag/pattern/shard selection helpers. -- `benchmarkComparison.ts` contains shared old/new result and report comparison helpers plus Markdown rendering for - summary, largest-regression, largest-improvement, threshold classification, `old.json`, `new.json`, - `comparison.json`, and `comparison.md` generation, including loading reports back from disk and writing the full - artifact set in one call. -- `benchmarkUtils.ts` contains shared statistics, system metadata, corpus loading, JSON artifact writing, count - formatting, child-process benchmark helpers, and generated-source type analysis helpers. - -## Result Shape - -The current microbenchmark reports use this common envelope: - -```ts -interface BenchmarkReport { - schemaVersion: number; - suiteName: string; - timestamp: string; - system: BenchmarkSystemInfo; - config: { - warmupIterations: number; - benchmarkIterations: number; - }; - results: ResultT[]; -} -``` - -Individual suites add case-specific fields such as token count, AST node count, median time, p95 time, and throughput. -Ecosystem benchmark results additionally preserve per-project fields like `filesAnalyzed`, diagnostic counts, normalized -diagnostics, and total runtime so report artifacts can distinguish execution-scope changes from pure performance -regressions. - -Generated per-project configs always own the benchmark `include` and `exclude` scope so local runs stay focused on source -roots rather than tests. If a project has `pyrightconfig.json`, the generated config extends it. If a project only has -`[tool.pyright]` in `pyproject.toml`, the runner copies those settings into the generated config and rebases known path -fields like `extraPaths`, `stubPath`, `typeshedPath`, and `venvPath` relative to the generated config file. - -## Implementation Roadmap - -1. Extend microbenchmarks with deterministic generated cases for evaluator-heavy paths. -2. Extend the ecosystem runner from selection-only manifest emission to base/head Pyright execution on a curated - mypy_primer-compatible project list. - The metadata source layer and first local execution path now exist; the next step is automated base/head ecosystem - execution driven from synchronized `mypy_primer` project checkouts. -3. Use `TimingStats.getSnapshot()` for structured phase metrics rather than parsing CLI `--stats` text. -4. Add heuristic counters and sweep reports for evaluator bailout thresholds. -5. Add LSP operation benchmarks after CLI and ecosystem reporting are stable. - -## CodSpeed Notes - -Before adding CodSpeed integration, review the current CodSpeed documentation at . Use CodSpeed -only for stable, low-noise microbenchmarks at first; keep ecosystem, heuristic sweep, and LSP benchmarks in the JSON -artifact/report workflow until their runtime and variance are better understood. - -Current status: initial CodSpeed setup already exists in an external PR in `bschnurr/pyright`. The next local step is to -connect the stable microbenchmark subset in this directory to that setup rather than creating a second parallel CodSpeed -path. - -Keep new benchmark cases deterministic and report-only by default. Performance thresholds should be introduced only after -repeated runs establish noise levels. - -## Local Ecosystem Runs - -For real local ecosystem execution, use the packaged Pyright CLI rather than the internal `out/.../pyright.js` -entrypoint. The packaged CLI picks up the bundled resources correctly and matches the way end users invoke Pyright. - -```bash -cd q:/dev/pyright-benchmark-suite -npm run build:cli:dev - -cd packages/pyright-internal -npm run build -npm run bench:ecosystem:sync -npm run bench:ecosystem:run:local -- --suite smoke --project "black|attrs" --project-root q:/path/to/checkouts --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-local -``` - -`bench:ecosystem:run:local` defaults both baseline and candidate executables to `node ../pyright/index.js`, so the only -required execution-specific arguments are the usual runner filters plus `--project-root` and `--output`. - -Add `--prepare-projects` to clone or update selected project checkouts under `--project-root`. When `--project-date` is -provided, preparation checks out the newest project commit before that date. Add `--install-dependencies` to install -synced dependency metadata and run synced install commands after checkout preparation. - -To refresh the checked-in smoke baseline from a verified main-branch run, execute the baseline side of the local runner, -pass `--update-main-baseline`, and stamp the source commit: - -```bash -npm run bench:ecosystem:update-main-baseline -- --suite smoke --project-root q:/path/to/main-checkouts --prepare-projects --project-date 2026-01-01 --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-main --baseline-source-commit -``` - -PR comparison mode can then use the checked-in baseline by passing only the candidate report: - -```bash -npm run bench:ecosystem:run -- --candidate-report ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-pr/candidate-report.json --output ./src/tests/benchmarks/.generated/benchmark-results/ecosystem-pr-comparison -``` diff --git a/packages/pyright-internal/src/tests/benchmarks/baselines/README.md b/packages/pyright-internal/src/tests/benchmarks/baselines/README.md deleted file mode 100644 index 7caa38ed68f1..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/baselines/README.md +++ /dev/null @@ -1,7 +0,0 @@ -# Ecosystem Benchmark Baselines - -This directory is reserved for checked-in smoke benchmark baselines generated from `main` branch commits. - -`ecosystem-smoke-main.json` should be updated only from a deliberate main-branch run. PR comparisons can use that file as the default baseline when no fresher CI artifact is supplied. - -Full ecosystem reports and exploratory local runs should stay under `.generated/benchmark-results/` or CI artifacts rather than being checked in here. diff --git a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.test.ts b/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.test.ts deleted file mode 100644 index 065468e39537..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.test.ts +++ /dev/null @@ -1,363 +0,0 @@ -/* - * benchmarkComparison.test.ts - * Copyright (c) Microsoft Corporation. - * - * Tests for benchmark result comparison helpers. - */ - -import * as fs from 'fs'; -import * as os from 'os'; -import * as path from 'path'; - -import { - calculatePercentDelta, - classifyBenchmarkRegression, - compareAndWriteBenchmarkReportFiles, - compareBenchmarkReportFiles, - compareBenchmarkReports, - compareBenchmarkResultSets, - getBenchmarkRegressionThresholdResults, - loadBenchmarkReport, - renderBenchmarkComparisonMarkdown, - summarizeBenchmarkComparison, - writeBenchmarkComparisonArtifacts, - writeBenchmarkReportComparisonArtifacts, -} from './benchmarkComparison'; -import { BenchmarkReport, benchmarkReportSchemaVersion } from './benchmarkUtils'; - -const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS'; - -interface TestResult { - name: string; - medianMs?: number; - tokensPerSec?: number; -} - -const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip; - -benchmarkSuite('Benchmark Comparison', () => { - test('calculates percent deltas', () => { - expect(calculatePercentDelta(100, 125)).toBe(25); - expect(calculatePercentDelta(100, 80)).toBe(-20); - expect(calculatePercentDelta(0, 0)).toBe(0); - expect(calculatePercentDelta(0, 10)).toBeUndefined(); - }); - - test('compares common benchmark results and tracks added and removed cases', () => { - const comparison = compareBenchmarkResultSets( - [ - { name: 'large_file', medianMs: 100, tokensPerSec: 1000 }, - { name: 'removed_case', medianMs: 50, tokensPerSec: 500 }, - ], - [ - { name: 'large_file', medianMs: 115, tokensPerSec: 1200 }, - { name: 'added_case', medianMs: 10, tokensPerSec: 100 }, - ], - (result) => result.name, - [ - { name: 'medianMs', getValue: (result) => result.medianMs, minAbsoluteDelta: 5 }, - { - name: 'tokensPerSec', - getValue: (result) => result.tokensPerSec, - lowerIsBetter: false, - minAbsoluteDelta: 10, - }, - ] - ); - - expect(comparison.addedKeys).toEqual(['added_case']); - expect(comparison.removedKeys).toEqual(['removed_case']); - expect(comparison.compared).toHaveLength(1); - expect(comparison.compared[0].metrics).toEqual([ - { - metric: 'medianMs', - baselineValue: 100, - candidateValue: 115, - absoluteDelta: 15, - percentDelta: 15, - direction: 'regression', - }, - { - metric: 'tokensPerSec', - baselineValue: 1000, - candidateValue: 1200, - absoluteDelta: 200, - percentDelta: 20, - direction: 'improvement', - }, - ]); - }); - - test('compares benchmark report envelopes', () => { - const comparison = compareBenchmarkReports( - createTestReport('parser', '2026-05-07T00:00:00.000Z', [{ name: 'case_a', medianMs: 100 }]), - createTestReport('parser', '2026-05-07T01:00:00.000Z', [{ name: 'case_a', medianMs: 90 }]), - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - - expect(comparison.schemaVersion).toBe(benchmarkReportSchemaVersion); - expect(comparison.suiteName).toBe('parser'); - expect(comparison.baselineTimestamp).toBe('2026-05-07T00:00:00.000Z'); - expect(comparison.candidateTimestamp).toBe('2026-05-07T01:00:00.000Z'); - expect(comparison.compared[0].metrics[0].direction).toBe('improvement'); - }); - - test('rejects incompatible benchmark report envelopes', () => { - expect(() => - compareBenchmarkReports( - createTestReport('parser', '2026-05-07T00:00:00.000Z', []), - createTestReport('tokenizer', '2026-05-07T01:00:00.000Z', []), - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ) - ).toThrow('different suites'); - - expect(() => - compareBenchmarkReports( - { ...createTestReport('parser', '2026-05-07T00:00:00.000Z', []), schemaVersion: 0 }, - createTestReport('parser', '2026-05-07T01:00:00.000Z', []), - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ) - ).toThrow('Unsupported baseline benchmark report schema version'); - }); - - test('renders a markdown comparison table', () => { - const comparison = compareBenchmarkResultSets( - [ - { name: 'case_a', medianMs: 100 }, - { name: 'case_b', medianMs: 100 }, - ], - [ - { name: 'case_a', medianMs: 110 }, - { name: 'case_b', medianMs: 80 }, - ], - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - const markdown = renderBenchmarkComparisonMarkdown(comparison); - - expect(markdown).toContain('## Summary'); - expect(markdown).toContain('Regressions: 1'); - expect(markdown).toContain('Improvements: 1'); - expect(markdown).toContain('## Largest Regressions'); - expect(markdown).toContain('## Largest Improvements'); - expect(markdown).toContain('| case_a | medianMs | 100.00 | 110.00 |'); - }); - - test('summarizes benchmark comparison directions', () => { - const comparison = compareBenchmarkResultSets( - [ - { name: 'regression', medianMs: 100 }, - { name: 'improvement', medianMs: 100 }, - { name: 'unchanged', medianMs: 100 }, - ], - [ - { name: 'regression', medianMs: 120 }, - { name: 'improvement', medianMs: 80 }, - { name: 'unchanged', medianMs: 100 }, - ], - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - - expect(summarizeBenchmarkComparison(comparison, 1)).toMatchObject({ - comparedResultCount: 3, - metricCount: 3, - regressionCount: 1, - improvementCount: 1, - unchangedCount: 1, - largestRegressions: [{ key: 'regression' }], - largestImprovements: [{ key: 'improvement' }], - }); - }); - - test('classifies regression thresholds', () => { - const comparison = compareBenchmarkResultSets( - [ - { name: 'warning_case', medianMs: 100 }, - { name: 'failure_case', medianMs: 100 }, - { name: 'small_absolute_case', medianMs: 100 }, - { name: 'improvement_case', medianMs: 100 }, - ], - [ - { name: 'warning_case', medianMs: 106 }, - { name: 'failure_case', medianMs: 112 }, - { name: 'small_absolute_case', medianMs: 104 }, - { name: 'improvement_case', medianMs: 90 }, - ], - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - const thresholdResults = getBenchmarkRegressionThresholdResults(comparison, { - warnRegressionPct: 5, - failRegressionPct: 10, - minAbsoluteRegression: 5, - }); - - expect(thresholdResults.map((result) => [result.key, result.severity])).toEqual([ - ['failure_case', 'failure'], - ['warning_case', 'warning'], - ]); - - const improvement = comparison.compared.find((result) => result.key === 'improvement_case'); - expect(improvement).toBeDefined(); - expect(classifyBenchmarkRegression(improvement!.metrics[0], { warnRegressionPct: 5 })).toBe('none'); - }); - - test('writes comparison artifacts', () => { - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-comparison-')); - - try { - const comparison = compareBenchmarkResultSets( - [{ name: 'case_a', medianMs: 100 }], - [{ name: 'case_a', medianMs: 110 }], - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - const paths = writeBenchmarkComparisonArtifacts(outputDir, comparison); - - expect(paths.jsonPath).toBe(path.join(outputDir, 'comparison.json')); - expect(paths.markdownPath).toBe(path.join(outputDir, 'comparison.md')); - expect(JSON.parse(fs.readFileSync(paths.jsonPath, 'utf-8'))).toEqual(comparison); - expect(fs.readFileSync(paths.markdownPath, 'utf-8')).toContain('| case_a | medianMs |'); - } finally { - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('writes report comparison artifact set', () => { - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-report-comparison-')); - const baselineReport = createTestReport('parser', '2026-05-07T00:00:00.000Z', [ - { name: 'case_a', medianMs: 100 }, - ]); - const candidateReport = createTestReport('parser', '2026-05-07T01:00:00.000Z', [ - { name: 'case_a', medianMs: 110 }, - ]); - - try { - const comparison = compareBenchmarkReports( - baselineReport, - candidateReport, - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - const paths = writeBenchmarkReportComparisonArtifacts( - outputDir, - baselineReport, - candidateReport, - comparison - ); - - expect(paths.oldJsonPath).toBe(path.join(outputDir, 'old.json')); - expect(paths.newJsonPath).toBe(path.join(outputDir, 'new.json')); - expect(paths.jsonPath).toBe(path.join(outputDir, 'comparison.json')); - expect(paths.markdownPath).toBe(path.join(outputDir, 'comparison.md')); - expect(JSON.parse(fs.readFileSync(paths.oldJsonPath, 'utf-8'))).toEqual(baselineReport); - expect(JSON.parse(fs.readFileSync(paths.newJsonPath, 'utf-8'))).toEqual(candidateReport); - } finally { - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('loads and compares benchmark report files', () => { - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-report-load-')); - const baselineReport = createTestReport('parser', '2026-05-07T00:00:00.000Z', [ - { name: 'case_a', medianMs: 100 }, - ]); - const candidateReport = createTestReport('parser', '2026-05-07T01:00:00.000Z', [ - { name: 'case_a', medianMs: 110 }, - ]); - const baselineReportPath = path.join(outputDir, 'old.json'); - const candidateReportPath = path.join(outputDir, 'new.json'); - - try { - fs.writeFileSync(baselineReportPath, JSON.stringify(baselineReport, undefined, 2), 'utf-8'); - fs.writeFileSync(candidateReportPath, JSON.stringify(candidateReport, undefined, 2), 'utf-8'); - - expect(loadBenchmarkReport(baselineReportPath)).toEqual(baselineReport); - - const comparison = compareBenchmarkReportFiles( - baselineReportPath, - candidateReportPath, - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - - expect(comparison.suiteName).toBe('parser'); - expect(comparison.compared[0].metrics[0].direction).toBe('regression'); - } finally { - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('compares and writes benchmark report files in one call', () => { - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-report-compare-write-')); - const baselineReport = createTestReport('parser', '2026-05-07T00:00:00.000Z', [ - { name: 'case_a', medianMs: 100 }, - ]); - const candidateReport = createTestReport('parser', '2026-05-07T01:00:00.000Z', [ - { name: 'case_a', medianMs: 110 }, - ]); - const baselineReportPath = path.join(outputDir, 'source-old.json'); - const candidateReportPath = path.join(outputDir, 'source-new.json'); - - try { - fs.writeFileSync(baselineReportPath, JSON.stringify(baselineReport, undefined, 2), 'utf-8'); - fs.writeFileSync(candidateReportPath, JSON.stringify(candidateReport, undefined, 2), 'utf-8'); - - const paths = compareAndWriteBenchmarkReportFiles( - baselineReportPath, - candidateReportPath, - outputDir, - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ); - - expect(paths.oldJsonPath).toBe(path.join(outputDir, 'old.json')); - expect(paths.newJsonPath).toBe(path.join(outputDir, 'new.json')); - expect(paths.jsonPath).toBe(path.join(outputDir, 'comparison.json')); - expect(paths.markdownPath).toBe(path.join(outputDir, 'comparison.md')); - expect(JSON.parse(fs.readFileSync(paths.oldJsonPath, 'utf-8'))).toEqual(baselineReport); - expect(JSON.parse(fs.readFileSync(paths.newJsonPath, 'utf-8'))).toEqual(candidateReport); - } finally { - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('rejects duplicate result keys', () => { - expect(() => - compareBenchmarkResultSets( - [ - { name: 'duplicate', medianMs: 1 }, - { name: 'duplicate', medianMs: 2 }, - ], - [], - (result) => result.name, - [{ name: 'medianMs', getValue: (result) => result.medianMs }] - ) - ).toThrow('Duplicate benchmark result key'); - }); -}); - -function createTestReport(suiteName: string, timestamp: string, results: TestResult[]): BenchmarkReport { - return { - schemaVersion: benchmarkReportSchemaVersion, - suiteName, - timestamp, - system: { - platform: 'test', - arch: 'test', - cpus: 'test', - cpuCount: 1, - totalMemoryMB: 1, - nodeVersion: 'test', - }, - config: { - warmupIterations: 0, - benchmarkIterations: 1, - }, - results, - }; -} diff --git a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.ts b/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.ts deleted file mode 100644 index 6b863e2d8e37..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/benchmarkComparison.ts +++ /dev/null @@ -1,462 +0,0 @@ -import * as fs from 'fs'; -import * as path from 'path'; - -import { BenchmarkReport, benchmarkReportSchemaVersion } from './benchmarkUtils'; - -export type BenchmarkMetricDirection = 'improvement' | 'regression' | 'unchanged'; -export type BenchmarkRegressionSeverity = 'none' | 'warning' | 'failure'; - -export interface BenchmarkMetricDefinition { - name: string; - lowerIsBetter?: boolean; - minAbsoluteDelta?: number; - getValue: (result: ResultT) => number | undefined; -} - -export interface BenchmarkMetricComparison { - metric: string; - baselineValue: number; - candidateValue: number; - absoluteDelta: number; - percentDelta: number | undefined; - direction: BenchmarkMetricDirection; -} - -export interface BenchmarkResultComparison { - key: string; - metrics: BenchmarkMetricComparison[]; -} - -export interface BenchmarkResultSetComparison { - compared: BenchmarkResultComparison[]; - addedKeys: string[]; - removedKeys: string[]; -} - -export interface BenchmarkMetricComparisonSummaryEntry extends BenchmarkMetricComparison { - key: string; -} - -export interface BenchmarkComparisonSummary { - comparedResultCount: number; - metricCount: number; - regressionCount: number; - improvementCount: number; - unchangedCount: number; - largestRegressions: BenchmarkMetricComparisonSummaryEntry[]; - largestImprovements: BenchmarkMetricComparisonSummaryEntry[]; -} - -export interface BenchmarkRegressionThresholds { - warnRegressionPct?: number; - failRegressionPct?: number; - warnRegressionAbsolute?: number; - failRegressionAbsolute?: number; - minAbsoluteRegression?: number; -} - -export interface BenchmarkRegressionThresholdResult extends BenchmarkMetricComparisonSummaryEntry { - severity: BenchmarkRegressionSeverity; -} - -export interface BenchmarkReportComparison extends BenchmarkResultSetComparison { - schemaVersion: number; - suiteName: string; - baselineTimestamp: string; - candidateTimestamp: string; -} - -export interface BenchmarkComparisonArtifactPaths { - jsonPath: string; - markdownPath: string; -} - -export interface BenchmarkReportComparisonArtifactPaths extends BenchmarkComparisonArtifactPaths { - oldJsonPath: string; - newJsonPath: string; -} - -export function calculatePercentDelta(baselineValue: number, candidateValue: number): number | undefined { - if (baselineValue === 0) { - return candidateValue === 0 ? 0 : undefined; - } - - return ((candidateValue - baselineValue) / Math.abs(baselineValue)) * 100; -} - -export function compareBenchmarkResultSets( - baselineResults: ReadonlyArray, - candidateResults: ReadonlyArray, - getKey: (result: ResultT) => string, - metrics: ReadonlyArray> -): BenchmarkResultSetComparison { - const baselineByKey = indexResultsByKey(baselineResults, getKey); - const candidateByKey = indexResultsByKey(candidateResults, getKey); - const baselineKeys = [...baselineByKey.keys()].sort(); - const candidateKeys = [...candidateByKey.keys()].sort(); - const comparedKeys = baselineKeys.filter((key) => candidateByKey.has(key)); - - return { - compared: comparedKeys.map((key) => - compareBenchmarkResult(key, baselineByKey.get(key)!, candidateByKey.get(key)!, metrics) - ), - addedKeys: candidateKeys.filter((key) => !baselineByKey.has(key)), - removedKeys: baselineKeys.filter((key) => !candidateByKey.has(key)), - }; -} - -export function compareBenchmarkReports( - baselineReport: BenchmarkReport, - candidateReport: BenchmarkReport, - getKey: (result: ResultT) => string, - metrics: ReadonlyArray> -): BenchmarkReportComparison { - validateBenchmarkReportPair(baselineReport, candidateReport); - - return { - schemaVersion: baselineReport.schemaVersion, - suiteName: baselineReport.suiteName, - baselineTimestamp: baselineReport.timestamp, - candidateTimestamp: candidateReport.timestamp, - ...compareBenchmarkResultSets(baselineReport.results, candidateReport.results, getKey, metrics), - }; -} - -export function loadBenchmarkReport(reportPath: string): BenchmarkReport { - const fileContents = fs.readFileSync(reportPath, 'utf-8'); - return JSON.parse(fileContents) as BenchmarkReport; -} - -export function compareBenchmarkReportFiles( - baselineReportPath: string, - candidateReportPath: string, - getKey: (result: ResultT) => string, - metrics: ReadonlyArray> -): BenchmarkReportComparison { - return compareBenchmarkReports( - loadBenchmarkReport(baselineReportPath), - loadBenchmarkReport(candidateReportPath), - getKey, - metrics - ); -} - -export function compareAndWriteBenchmarkReportFiles( - baselineReportPath: string, - candidateReportPath: string, - outputDir: string, - getKey: (result: ResultT) => string, - metrics: ReadonlyArray> -): BenchmarkReportComparisonArtifactPaths { - const baselineReport = loadBenchmarkReport(baselineReportPath); - const candidateReport = loadBenchmarkReport(candidateReportPath); - const comparison = compareBenchmarkReports(baselineReport, candidateReport, getKey, metrics); - - return writeBenchmarkReportComparisonArtifacts(outputDir, baselineReport, candidateReport, comparison); -} - -export function summarizeBenchmarkComparison( - comparison: BenchmarkResultSetComparison, - limit = 5 -): BenchmarkComparisonSummary { - const entries = getComparisonMetricEntries(comparison); - const regressions = entries.filter((entry) => entry.direction === 'regression'); - const improvements = entries.filter((entry) => entry.direction === 'improvement'); - const unchanged = entries.filter((entry) => entry.direction === 'unchanged'); - - return { - comparedResultCount: comparison.compared.length, - metricCount: entries.length, - regressionCount: regressions.length, - improvementCount: improvements.length, - unchangedCount: unchanged.length, - largestRegressions: sortMetricEntriesByMagnitude(regressions).slice(0, limit), - largestImprovements: sortMetricEntriesByMagnitude(improvements).slice(0, limit), - }; -} - -export function classifyBenchmarkRegression( - entry: BenchmarkMetricComparison, - thresholds: BenchmarkRegressionThresholds -): BenchmarkRegressionSeverity { - if (entry.direction !== 'regression') { - return 'none'; - } - - const absoluteMagnitude = Math.abs(entry.absoluteDelta); - if (absoluteMagnitude < (thresholds.minAbsoluteRegression ?? 0)) { - return 'none'; - } - - if (exceedsRegressionThreshold(entry, thresholds.failRegressionPct, thresholds.failRegressionAbsolute)) { - return 'failure'; - } - - if (exceedsRegressionThreshold(entry, thresholds.warnRegressionPct, thresholds.warnRegressionAbsolute)) { - return 'warning'; - } - - return 'none'; -} - -export function getBenchmarkRegressionThresholdResults( - comparison: BenchmarkResultSetComparison, - thresholds: BenchmarkRegressionThresholds -): BenchmarkRegressionThresholdResult[] { - return getComparisonMetricEntries(comparison) - .map((entry) => ({ ...entry, severity: classifyBenchmarkRegression(entry, thresholds) })) - .filter((entry) => entry.severity !== 'none') - .sort(compareThresholdResults); -} - -export function renderBenchmarkComparisonMarkdown(comparison: BenchmarkResultSetComparison): string { - const summary = summarizeBenchmarkComparison(comparison); - const lines = [ - '## Summary', - '', - `Compared cases: ${summary.comparedResultCount}`, - `Compared metrics: ${summary.metricCount}`, - `Regressions: ${summary.regressionCount}`, - `Improvements: ${summary.improvementCount}`, - `Unchanged: ${summary.unchangedCount}`, - '', - ]; - - appendMetricEntryTable(lines, '## Largest Regressions', summary.largestRegressions); - appendMetricEntryTable(lines, '## Largest Improvements', summary.largestImprovements); - - lines.push( - '## Details', - '', - '| Case | Metric | Baseline | Candidate | Delta | Delta % | Direction |', - '|---|---:|---:|---:|---:|---:|---|' - ); - - for (const result of comparison.compared) { - for (const metric of result.metrics) { - lines.push( - `| ${result.key} | ${metric.metric} | ${formatMetric(metric.baselineValue)} | ${formatMetric( - metric.candidateValue - )} | ${formatMetric(metric.absoluteDelta)} | ${formatPercent(metric.percentDelta)} | ${ - metric.direction - } |` - ); - } - } - - if (comparison.addedKeys.length > 0) { - lines.push('', `Added cases: ${comparison.addedKeys.join(', ')}`); - } - - if (comparison.removedKeys.length > 0) { - lines.push('', `Removed cases: ${comparison.removedKeys.join(', ')}`); - } - - return `${lines.join('\n')}\n`; -} - -function appendMetricEntryTable( - lines: string[], - heading: string, - entries: ReadonlyArray -): void { - lines.push(heading, ''); - - if (entries.length === 0) { - lines.push('None.', ''); - return; - } - - lines.push('| Case | Metric | Baseline | Candidate | Delta | Delta % |', '|---|---:|---:|---:|---:|---:|'); - - for (const entry of entries) { - lines.push( - `| ${entry.key} | ${entry.metric} | ${formatMetric(entry.baselineValue)} | ${formatMetric( - entry.candidateValue - )} | ${formatMetric(entry.absoluteDelta)} | ${formatPercent(entry.percentDelta)} |` - ); - } - - lines.push(''); -} - -function getComparisonMetricEntries(comparison: BenchmarkResultSetComparison): BenchmarkMetricComparisonSummaryEntry[] { - return comparison.compared.flatMap((result) => result.metrics.map((metric) => ({ key: result.key, ...metric }))); -} - -function sortMetricEntriesByMagnitude( - entries: ReadonlyArray -): BenchmarkMetricComparisonSummaryEntry[] { - return [...entries].sort((left, right) => getMetricMagnitude(right) - getMetricMagnitude(left)); -} - -function getMetricMagnitude(entry: BenchmarkMetricComparison): number { - return Math.abs(entry.percentDelta ?? entry.absoluteDelta); -} - -function exceedsRegressionThreshold( - entry: BenchmarkMetricComparison, - percentThreshold: number | undefined, - absoluteThreshold: number | undefined -): boolean { - const percentMagnitude = entry.percentDelta === undefined ? undefined : Math.abs(entry.percentDelta); - const absoluteMagnitude = Math.abs(entry.absoluteDelta); - - return ( - (percentThreshold !== undefined && percentMagnitude !== undefined && percentMagnitude >= percentThreshold) || - (absoluteThreshold !== undefined && absoluteMagnitude >= absoluteThreshold) - ); -} - -function compareThresholdResults( - left: BenchmarkRegressionThresholdResult, - right: BenchmarkRegressionThresholdResult -): number { - const severityDelta = getSeverityRank(right.severity) - getSeverityRank(left.severity); - if (severityDelta !== 0) { - return severityDelta; - } - - return getMetricMagnitude(right) - getMetricMagnitude(left); -} - -function getSeverityRank(severity: BenchmarkRegressionSeverity): number { - switch (severity) { - case 'failure': - return 2; - case 'warning': - return 1; - case 'none': - return 0; - } -} - -export function writeBenchmarkComparisonArtifacts( - outputDir: string, - comparison: BenchmarkResultSetComparison -): BenchmarkComparisonArtifactPaths { - fs.mkdirSync(outputDir, { recursive: true }); - - const jsonPath = path.join(outputDir, 'comparison.json'); - const markdownPath = path.join(outputDir, 'comparison.md'); - - fs.writeFileSync(jsonPath, JSON.stringify(comparison, undefined, 2), 'utf-8'); - fs.writeFileSync(markdownPath, renderBenchmarkComparisonMarkdown(comparison), 'utf-8'); - - return { jsonPath, markdownPath }; -} - -export function writeBenchmarkReportComparisonArtifacts( - outputDir: string, - baselineReport: BenchmarkReport, - candidateReport: BenchmarkReport, - comparison: BenchmarkReportComparison -): BenchmarkReportComparisonArtifactPaths { - fs.mkdirSync(outputDir, { recursive: true }); - - const oldJsonPath = path.join(outputDir, 'old.json'); - const newJsonPath = path.join(outputDir, 'new.json'); - fs.writeFileSync(oldJsonPath, JSON.stringify(baselineReport, undefined, 2), 'utf-8'); - fs.writeFileSync(newJsonPath, JSON.stringify(candidateReport, undefined, 2), 'utf-8'); - - return { - oldJsonPath, - newJsonPath, - ...writeBenchmarkComparisonArtifacts(outputDir, comparison), - }; -} - -function validateBenchmarkReportPair( - baselineReport: BenchmarkReport, - candidateReport: BenchmarkReport -): void { - validateBenchmarkReport(baselineReport, 'baseline'); - validateBenchmarkReport(candidateReport, 'candidate'); - - if (baselineReport.suiteName !== candidateReport.suiteName) { - throw new Error( - `Cannot compare benchmark reports for different suites: ${baselineReport.suiteName}, ${candidateReport.suiteName}` - ); - } -} - -function validateBenchmarkReport(report: BenchmarkReport, label: string): void { - if (report.schemaVersion !== benchmarkReportSchemaVersion) { - throw new Error( - `Unsupported ${label} benchmark report schema version ${report.schemaVersion}; expected ${benchmarkReportSchemaVersion}.` - ); - } -} - -function compareBenchmarkResult( - key: string, - baselineResult: ResultT, - candidateResult: ResultT, - metrics: ReadonlyArray> -): BenchmarkResultComparison { - return { - key, - metrics: metrics.flatMap((metric) => { - const baselineValue = metric.getValue(baselineResult); - const candidateValue = metric.getValue(candidateResult); - - if (baselineValue === undefined || candidateValue === undefined) { - return []; - } - - const absoluteDelta = candidateValue - baselineValue; - return [ - { - metric: metric.name, - baselineValue, - candidateValue, - absoluteDelta, - percentDelta: calculatePercentDelta(baselineValue, candidateValue), - direction: getMetricDirection(absoluteDelta, metric), - }, - ]; - }), - }; -} - -function getMetricDirection( - absoluteDelta: number, - metric: BenchmarkMetricDefinition -): BenchmarkMetricDirection { - const minAbsoluteDelta = metric.minAbsoluteDelta ?? 0; - - if (Math.abs(absoluteDelta) <= minAbsoluteDelta) { - return 'unchanged'; - } - - const lowerIsBetter = metric.lowerIsBetter ?? true; - const isHigher = absoluteDelta > 0; - - return lowerIsBetter === isHigher ? 'regression' : 'improvement'; -} - -function indexResultsByKey( - results: ReadonlyArray, - getKey: (result: ResultT) => string -): Map { - const resultsByKey = new Map(); - - for (const result of results) { - const key = getKey(result); - if (resultsByKey.has(key)) { - throw new Error(`Duplicate benchmark result key: ${key}`); - } - - resultsByKey.set(key, result); - } - - return resultsByKey; -} - -function formatMetric(value: number): string { - return value.toFixed(2); -} - -function formatPercent(value: number | undefined): string { - return value === undefined ? 'n/a' : `${value.toFixed(2)}%`; -} diff --git a/packages/pyright-internal/src/tests/benchmarks/benchmarkUtils.ts b/packages/pyright-internal/src/tests/benchmarks/benchmarkUtils.ts deleted file mode 100644 index 95836d2bd37e..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/benchmarkUtils.ts +++ /dev/null @@ -1,238 +0,0 @@ -import { execFileSync } from 'child_process'; -import * as fs from 'fs'; -import * as os from 'os'; -import * as path from 'path'; - -import { ImportResolver } from '../../analyzer/importResolver'; -import { Program } from '../../analyzer/program'; -import { ConfigOptions } from '../../common/configOptions'; -import { NullConsole } from '../../common/console'; -import { DiagnosticCategory } from '../../common/diagnostic'; -import { FullAccessHost } from '../../common/fullAccessHost'; -import { RealTempFile, createFromRealFileSystem } from '../../common/realFileSystem'; -import { createServiceProvider } from '../../common/serviceProviderExtensions'; -import { TimingStatsSnapshot, timingStats } from '../../common/timing'; -import { UriEx } from '../../common/uri/uriUtils'; - -export interface BenchmarkStats { - median: number; - p95: number; - min: number; - max: number; - avg: number; -} - -export interface BenchmarkSystemInfo { - platform: string; - arch: string; - cpus: string; - cpuCount: number; - totalMemoryMB: number; - nodeVersion: string; -} - -export interface BenchmarkReport { - schemaVersion: number; - suiteName: string; - timestamp: string; - system: BenchmarkSystemInfo; - config: { - warmupIterations: number; - benchmarkIterations: number; - }; - results: ResultT[]; -} - -export interface TypeAnalysisSummary { - timing: TimingStatsSnapshot; - diagnosticCount: number; - errorCount: number; - warningCount: number; - informationCount: number; - statementCount: number; -} - -export const benchmarkDataDir = path.resolve(__dirname, '..', 'benchmarkData'); -export const benchmarkResultsDir = path.join(__dirname, '.generated', 'benchmark-results'); -export const benchmarkReportSchemaVersion = 1; - -export function calculateStats(times: ReadonlyArray): BenchmarkStats { - if (times.length === 0) { - throw new Error('Cannot calculate benchmark stats for an empty sample set.'); - } - - const sorted = [...times].sort((a, b) => a - b); - const len = sorted.length; - - const median = len % 2 === 0 ? (sorted[len / 2 - 1] + sorted[len / 2]) / 2 : sorted[Math.floor(len / 2)]; - const p95Index = Math.ceil(len * 0.95) - 1; - const p95 = sorted[Math.min(p95Index, len - 1)]; - const min = sorted[0]; - const max = sorted[len - 1]; - const avg = times.reduce((a, b) => a + b, 0) / len; - - return { median, p95, min, max, avg }; -} - -export function loadBenchmarkCorpus(filename: string): string { - const filePath = path.resolve(benchmarkDataDir, filename); - return fs.readFileSync(filePath, 'utf-8'); -} - -export function getSystemInfo(): BenchmarkSystemInfo { - const cpus = os.cpus(); - return { - platform: os.platform(), - arch: os.arch(), - cpus: cpus[0]?.model ?? 'unknown', - cpuCount: cpus.length, - totalMemoryMB: Math.round(os.totalmem() / (1024 * 1024)), - nodeVersion: process.version, - }; -} - -export function createBenchmarkReport( - suiteName: string, - warmupIterations: number, - benchmarkIterations: number, - results: ResultT[] -): BenchmarkReport { - return { - schemaVersion: benchmarkReportSchemaVersion, - suiteName, - timestamp: new Date().toISOString(), - system: getSystemInfo(), - config: { - warmupIterations, - benchmarkIterations, - }, - results, - }; -} - -export function writeBenchmarkReport( - suiteName: string, - filePrefix: string, - report: BenchmarkReport -): string { - const outputDir = path.join(benchmarkResultsDir, suiteName); - fs.mkdirSync(outputDir, { recursive: true }); - - const filename = `${filePrefix}-${new Date().toISOString().replace(/[:.]/g, '-')}.json`; - const outputPath = path.join(outputDir, filename); - fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8'); - console.log(`\nBenchmark results written to: ${outputPath}`); - - return outputPath; -} - -export function formatCount(value: number): string { - return Math.round(value).toLocaleString(); -} - -export function getChildProcessOutput(error: unknown): string { - if (!(error instanceof Error)) { - return ''; - } - - const stdout = 'stdout' in error && typeof error.stdout === 'string' ? error.stdout : ''; - const stderr = 'stderr' in error && typeof error.stderr === 'string' ? error.stderr : ''; - return [stdout, stderr].filter((part) => part.length > 0).join('\n'); -} - -export function escapeRegExp(text: string): string { - return text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); -} - -export function runJestBenchmarkInFreshProcess( - testFilePath: string, - suiteName: string, - testName: string, - resultPrefix: string, - childModeEnv: string -): ResultT { - const jestBinPath = path.resolve(__dirname, '..', '..', '..', 'node_modules', 'jest', 'bin', 'jest.js'); - - try { - const output = execFileSync( - process.execPath, - [ - jestBinPath, - testFilePath, - '--runInBand', - '--forceExit', - '--testTimeout=300000', - '--testNamePattern', - `^${suiteName} ${escapeRegExp(testName)}$`, - ], - { - cwd: path.resolve(__dirname, '..', '..', '..'), - encoding: 'utf-8', - env: { - ...process.env, - [childModeEnv]: '1', - }, - } - ); - - const resultLine = output.split(/\r?\n/).find((line) => line.startsWith(resultPrefix)); - - if (!resultLine) { - throw new Error(`Child benchmark for "${testName}" did not emit a result.\n${output}`); - } - - return JSON.parse(resultLine.slice(resultPrefix.length)) as ResultT; - } catch (error) { - const output = getChildProcessOutput(error); - const message = error instanceof Error ? error.message : String(error); - throw new Error(`Child benchmark for "${testName}" failed.\n${message}${output ? `\n${output}` : ''}`); - } -} - -export function analyzeBenchmarkSource(source: string, fileName: string): TypeAnalysisSummary { - (global as any).__rootDirectory = path.resolve(__dirname, '..', '..', '..'); - - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-benchmark-')); - const filePath = path.join(tempDir, fileName); - fs.writeFileSync(filePath, source, 'utf-8'); - - const tempFile = new RealTempFile(); - const fileSystem = createFromRealFileSystem(tempFile); - const serviceProvider = createServiceProvider(fileSystem, new NullConsole(), tempFile); - const configOptions = new ConfigOptions(UriEx.file(tempDir)); - configOptions.internalTestMode = true; - - const importResolver = new ImportResolver(serviceProvider, configOptions, new FullAccessHost(serviceProvider)); - const program = new Program(importResolver, configOptions, serviceProvider); - const fileUri = UriEx.file(filePath); - const startTiming = timingStats.getSnapshot(); - - try { - program.setTrackedFiles([fileUri]); - - while (program.analyze()) { - // Continue until analysis completes. - } - - const sourceFile = program.getSourceFile(fileUri); - if (!sourceFile) { - throw new Error(`Could not analyze generated benchmark file ${filePath}`); - } - - const diagnostics = sourceFile.getDiagnostics(configOptions) ?? []; - const parseResults = sourceFile.getParseResults(); - - return { - timing: timingStats.getSnapshotDelta(startTiming), - diagnosticCount: diagnostics.length, - errorCount: diagnostics.filter((diag) => diag.category === DiagnosticCategory.Error).length, - warningCount: diagnostics.filter((diag) => diag.category === DiagnosticCategory.Warning).length, - informationCount: diagnostics.filter((diag) => diag.category === DiagnosticCategory.Information).length, - statementCount: parseResults?.parserOutput.parseTree.d.statements.length ?? 0, - }; - } finally { - program.dispose(); - serviceProvider.dispose(); - fs.rmSync(tempDir, { force: true, recursive: true }); - } -} diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.generated.json b/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.generated.json deleted file mode 100644 index 6d2f1892938c..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.generated.json +++ /dev/null @@ -1,127 +0,0 @@ -[ - { - "name": "attrs", - "mypyPrimerProject": "attrs", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/python-attrs/attrs", - "pyrightCommand": "{pyright}" - }, - { - "name": "black", - "mypyPrimerProject": "black", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/psf/black", - "pyrightCommand": "{pyright} {paths}", - "paths": [ - "src" - ] - }, - { - "name": "django-modern-rest", - "mypyPrimerProject": "django-modern-rest", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/wemake-services/django-modern-rest", - "pyrightCommand": "{pyright}", - "paths": [ - "dmr" - ] - }, - { - "name": "mypy_primer", - "mypyPrimerProject": "mypy_primer", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/hauntsaninja/mypy_primer", - "pyrightCommand": "{pyright} {paths}", - "paths": [ - "." - ] - }, - { - "name": "packaging", - "mypyPrimerProject": "packaging", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/pypa/packaging", - "pyrightCommand": "{pyright} {paths}", - "paths": [ - "src" - ] - }, - { - "name": "pandas", - "mypyPrimerProject": "pandas", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/pandas-dev/pandas", - "pyrightCommand": "{pyright} {paths}", - "paths": [ - "pandas" - ] - }, - { - "name": "pydantic", - "mypyPrimerProject": "pydantic", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/pydantic/pydantic", - "pyrightCommand": "{pyright} {paths}", - "paths": [ - "pydantic" - ] - }, - { - "name": "pytest", - "mypyPrimerProject": "pytest", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/pytest-dev/pytest", - "pyrightCommand": "{pyright} {paths}", - "paths": [ - "src", - "testing" - ] - }, - { - "name": "python-chess", - "mypyPrimerProject": "python-chess", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/niklasf/python-chess", - "pyrightCommand": "{pyright} {paths}", - "paths": [ - "chess" - ] - }, - { - "name": "rich", - "mypyPrimerProject": "rich", - "source": { - "kind": "mypy-primer", - "inputFile": "mypy_primer.smoke_projects.snapshot.py" - }, - "location": "https://github.com/Textualize/rich", - "pyrightCommand": "{pyright}" - } -] diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.overrides.json b/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.overrides.json deleted file mode 100644 index 92bd86ed4ba9..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/ecosystem-projects.overrides.json +++ /dev/null @@ -1,75 +0,0 @@ -{ - "black": { - "includeInSmoke": true, - "smokeOrder": 0, - "cost": "medium", - "tags": ["parser-heavy", "typed-library"], - "reason": "Parser-heavy practical codebase with broad syntax coverage." - }, - "pytest": { - "includeInSmoke": true, - "smokeOrder": 1, - "cost": "large", - "tags": ["dynamic", "plugins", "typed-library"], - "reason": "Large dynamic project with plugin patterns and pragmatic typing." - }, - "attrs": { - "includeInSmoke": true, - "smokeOrder": 2, - "cost": "small", - "tags": ["dataclass-like", "decorators", "typed-library"], - "reason": "Dataclass-like decorator patterns with stable runtime.", - "sourcePaths": ["src"] - }, - "pydantic": { - "includeInSmoke": true, - "smokeOrder": 3, - "cost": "medium", - "tags": ["decorators", "generics", "pydantic", "typed-library"], - "reason": "Decorator-heavy validation models with generics and dataclass-like transforms." - }, - "python-chess": { - "includeInSmoke": true, - "smokeOrder": 4, - "cost": "small", - "tags": ["typed-library"], - "reason": "Clean typed library with a useful expected-success signal." - }, - "packaging": { - "includeInSmoke": true, - "smokeOrder": 5, - "cost": "small", - "tags": ["typed-library"], - "reason": "Small stable baseline project for low-noise smoke runs." - }, - "rich": { - "includeInSmoke": true, - "smokeOrder": 6, - "cost": "medium", - "tags": ["typed-library"], - "reason": "Practical typed library with meaningful module structure.", - "sourcePaths": ["rich"] - }, - "mypy_primer": { - "includeInSmoke": true, - "smokeOrder": 7, - "cost": "small", - "tags": ["typed-library"], - "reason": "Typed tool codebase that anchors compatibility with the source project manifest.", - "sourcePaths": ["mypy_primer"] - }, - "django-modern-rest": { - "includeInSmoke": true, - "smokeOrder": 8, - "cost": "medium", - "tags": ["django", "pydantic", "web"], - "reason": "Web project with Django-style and pydantic-style patterns." - }, - "pandas": { - "includeInSmoke": true, - "smokeOrder": 9, - "cost": "large", - "tags": ["data-science", "large", "overloads", "stubs-heavy"], - "reason": "Data-science project that stresses overloads, stubs, and large-project behavior." - } -} \ No newline at end of file diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeBenchmark.test.ts deleted file mode 100644 index 7cd1a0261dc8..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeBenchmark.test.ts +++ /dev/null @@ -1,90 +0,0 @@ -/* - * ecosystemSmokeBenchmark.test.ts - * Copyright (c) Microsoft Corporation. - * - * Sanity checks and artifact emission for the curated ecosystem smoke benchmark manifest. - */ - -import { createBenchmarkReport, writeBenchmarkReport } from './benchmarkUtils'; -import { - ecosystemSmokeProjects, - getEcosystemSmokeProjectNames, - getEcosystemSmokeProjectTags, - getGeneratedEcosystemProject, - selectEcosystemSmokeProjects, -} from './ecosystemSmokeProjects'; - -const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS'; - -interface EcosystemSmokeManifestResult { - suiteName: string; - projectCount: number; - tags: string[]; - projects: typeof ecosystemSmokeProjects; -} - -const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip; - -benchmarkSuite('Ecosystem Smoke Manifest', () => { - test('validates curated project metadata', () => { - const projectNames = getEcosystemSmokeProjectNames(); - const uniqueProjectNames = new Set(projectNames); - - expect(ecosystemSmokeProjects).toHaveLength(10); - expect(uniqueProjectNames.size).toBe(projectNames.length); - expect(projectNames).toEqual([ - 'black', - 'pytest', - 'attrs', - 'pydantic', - 'python-chess', - 'packaging', - 'rich', - 'mypy_primer', - 'django-modern-rest', - 'pandas', - ]); - - for (const project of ecosystemSmokeProjects) { - expect(project.mypyPrimerProject).toBeTruthy(); - expect(project.tags.length).toBeGreaterThan(0); - expect(project.reason).toBeTruthy(); - } - - const result: EcosystemSmokeManifestResult = { - suiteName: 'ecosystem-smoke', - projectCount: ecosystemSmokeProjects.length, - tags: getEcosystemSmokeProjectTags(), - projects: ecosystemSmokeProjects, - }; - - writeBenchmarkReport( - 'ecosystem-smoke', - 'ecosystem-smoke-projects', - createBenchmarkReport('ecosystem-smoke', 0, 0, [result]) - ); - }); - - test('selects projects by tag, pattern, and shard', () => { - expect(selectEcosystemSmokeProjects({ tag: 'overloads' }).map((project) => project.name)).toEqual(['pandas']); - expect( - selectEcosystemSmokeProjects({ projectPattern: /django|pandas/ }).map((project) => project.name) - ).toEqual(['django-modern-rest', 'pandas']); - - const shard0 = selectEcosystemSmokeProjects({ numShards: 2, shardIndex: 0 }).map((project) => project.name); - const shard1 = selectEcosystemSmokeProjects({ numShards: 2, shardIndex: 1 }).map((project) => project.name); - const combinedShards = [...shard0, ...shard1].sort(); - - expect(shard0).toEqual(['black', 'attrs', 'python-chess', 'rich', 'django-modern-rest']); - expect(shard1).toEqual(['pytest', 'pydantic', 'packaging', 'mypy_primer', 'pandas']); - expect(combinedShards).toEqual(getEcosystemSmokeProjectNames().sort()); - expect(() => selectEcosystemSmokeProjects({ numShards: 2, shardIndex: 2 })).toThrow('shardIndex'); - }); - - test('applies smoke overrides to generated source roots for pathless upstream projects', () => { - expect(getGeneratedEcosystemProject('attrs')?.paths).toEqual(['src']); - expect(getGeneratedEcosystemProject('rich')?.paths).toEqual(['rich']); - expect(getGeneratedEcosystemProject('mypy_primer')?.paths).toEqual(['mypy_primer']); - expect(getGeneratedEcosystemProject('black')?.paths).toEqual(['src']); - }); -}); diff --git a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeProjects.ts b/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeProjects.ts deleted file mode 100644 index 1a54cb174ada..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/ecosystemSmokeProjects.ts +++ /dev/null @@ -1,186 +0,0 @@ -import * as fs from 'fs'; -import * as path from 'path'; - -import { GeneratedEcosystemProject } from './syncMypyPrimerProjects'; - -export type EcosystemProjectCost = 'small' | 'medium' | 'large'; - -export type EcosystemProjectTag = - | 'data-science' - | 'dataclass-like' - | 'decorators' - | 'django' - | 'dynamic' - | 'generics' - | 'large' - | 'overloads' - | 'parser-heavy' - | 'plugins' - | 'pydantic' - | 'stubs-heavy' - | 'typed-library' - | 'web'; - -export interface EcosystemSmokeProject { - name: string; - mypyPrimerProject: string; - cost: EcosystemProjectCost; - tags: EcosystemProjectTag[]; - reason: string; -} - -export interface EcosystemSmokeProjectSelectionOptions { - tag?: EcosystemProjectTag; - projectPattern?: RegExp; - numShards?: number; - shardIndex?: number; -} - -interface EcosystemProjectOverride { - includeInSmoke?: boolean; - smokeOrder?: number; - cost?: EcosystemProjectCost; - tags?: EcosystemProjectTag[]; - reason?: string; - sourcePaths?: string[]; -} - -const generatedProjects = loadGeneratedProjects(); -const ecosystemProjectOverrides = loadProjectOverrides(); - -const mergedGeneratedProjects = generatedProjects.map((project) => applyProjectOverrides(project)); - -export const ecosystemSmokeProjects: readonly EcosystemSmokeProject[] = mergedGeneratedProjects - .map((project) => buildSmokeProject(project, ecosystemProjectOverrides[project.name])) - .filter((project): project is EcosystemSmokeProject => project !== undefined) - .sort((left, right) => getSmokeOrder(left.name) - getSmokeOrder(right.name)); - -export function getEcosystemSmokeProjectNames(): string[] { - return ecosystemSmokeProjects.map((project) => project.name); -} - -export function getGeneratedEcosystemProjects(): readonly GeneratedEcosystemProject[] { - return mergedGeneratedProjects; -} - -export function getGeneratedEcosystemProject(projectName: string): GeneratedEcosystemProject | undefined { - return mergedGeneratedProjects.find((project) => project.name === projectName); -} - -export function getEcosystemSmokeProjectsByTag(tag: EcosystemProjectTag): EcosystemSmokeProject[] { - return ecosystemSmokeProjects.filter((project) => project.tags.includes(tag)); -} - -export function getEcosystemSmokeProjectTags(): EcosystemProjectTag[] { - return Array.from(new Set(ecosystemSmokeProjects.flatMap((project) => project.tags))).sort(); -} - -export function selectEcosystemSmokeProjects( - options: EcosystemSmokeProjectSelectionOptions = {} -): EcosystemSmokeProject[] { - const { tag, projectPattern, numShards, shardIndex } = options; - let projects = [...ecosystemSmokeProjects]; - - if (tag) { - projects = projects.filter((project) => project.tags.includes(tag)); - } - - if (projectPattern) { - projects = projects.filter((project) => matchesProjectPattern(projectPattern, project)); - } - - if (numShards !== undefined || shardIndex !== undefined) { - validateShardOptions(numShards, shardIndex); - projects = projects.filter((_, index) => index % numShards! === shardIndex); - } - - return projects; -} - -function matchesProjectPattern(pattern: RegExp, project: EcosystemSmokeProject): boolean { - pattern.lastIndex = 0; - const matchesName = pattern.test(project.name); - pattern.lastIndex = 0; - const matchesMypyPrimerProject = pattern.test(project.mypyPrimerProject); - pattern.lastIndex = 0; - - return matchesName || matchesMypyPrimerProject; -} - -function validateShardOptions(numShards: number | undefined, shardIndex: number | undefined): void { - if (numShards === undefined || shardIndex === undefined) { - throw new Error('Both numShards and shardIndex must be provided for ecosystem smoke project sharding.'); - } - - if (!Number.isInteger(numShards) || numShards <= 0) { - throw new Error('numShards must be a positive integer.'); - } - - if (!Number.isInteger(shardIndex) || shardIndex < 0 || shardIndex >= numShards) { - throw new Error('shardIndex must be an integer greater than or equal to 0 and less than numShards.'); - } -} - -function buildSmokeProject( - project: GeneratedEcosystemProject, - override: EcosystemProjectOverride | undefined -): EcosystemSmokeProject | undefined { - if (!override?.includeInSmoke) { - return undefined; - } - - if (!override.cost || !override.tags || override.tags.length === 0 || !override.reason) { - throw new Error(`Smoke project ${project.name} is missing required ecosystem metadata overrides.`); - } - - return { - name: project.name, - mypyPrimerProject: project.mypyPrimerProject, - cost: override.cost, - tags: [...override.tags], - reason: override.reason, - }; -} - -function getSmokeOrder(projectName: string): number { - const smokeOrder = ecosystemProjectOverrides[projectName]?.smokeOrder; - if (smokeOrder === undefined) { - return Number.MAX_SAFE_INTEGER; - } - - return smokeOrder; -} - -function applyProjectOverrides(project: GeneratedEcosystemProject): GeneratedEcosystemProject { - const override = ecosystemProjectOverrides[project.name]; - if (!override?.sourcePaths || override.sourcePaths.length === 0) { - return project; - } - - return { - ...project, - paths: [...override.sourcePaths], - }; -} - -function loadGeneratedProjects(): GeneratedEcosystemProject[] { - return readJsonFile('ecosystem-projects.generated.json'); -} - -function loadProjectOverrides(): Record { - return readJsonFile>('ecosystem-projects.overrides.json'); -} - -function readJsonFile(filename: string): T { - const filePath = getBenchmarkFilePath(filename); - return JSON.parse(fs.readFileSync(filePath, 'utf-8')) as T; -} - -function getBenchmarkFilePath(filename: string): string { - const sourceFilePath = path.resolve(__dirname, filename); - if (fs.existsSync(sourceFilePath)) { - return sourceFilePath; - } - - return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', filename); -} diff --git a/packages/pyright-internal/src/tests/benchmarks/evaluatorBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/evaluatorBenchmark.test.ts deleted file mode 100644 index 0072374847c9..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/evaluatorBenchmark.test.ts +++ /dev/null @@ -1,213 +0,0 @@ -/* - * evaluatorBenchmark.test.ts - * Copyright (c) Microsoft Corporation. - * - * Synthetic type evaluator microbenchmarks. - * Measures cold analysis time for generated Python cases that exercise evaluator-heavy paths. - */ - -import { TimingStatsSnapshot } from '../../common/timing'; -import { - TypeAnalysisSummary, - analyzeBenchmarkSource, - calculateStats, - createBenchmarkReport, - writeBenchmarkReport, -} from './benchmarkUtils'; -import { - generateConstrainedTypeVarMatrixCase, - generateGenericAliasChainCase, - generateLiteralUnionMathCase, - generateOverloadUnionCrossProductCase, - generateProtocolMismatchCase, - generateRecursiveAliasCase, - generateTypedDictCase, -} from './syntheticCases'; - -const WARMUP_ITERATIONS = 1; -const BENCHMARK_ITERATIONS = 5; -const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS'; - -interface BenchmarkCase { - name: string; - fileName: string; - scale: string; - code: string; - minDiagnosticCount: number; -} - -interface BenchmarkResult { - caseName: string; - scale: string; - fileSizeBytes: number; - sourceLines: number; - iterations: number; - timesMs: number[]; - medianMs: number; - p95Ms: number; - minMs: number; - maxMs: number; - avgMs: number; - diagnosticCount: number; - errorCount: number; - warningCount: number; - informationCount: number; - statementCount: number; - timing: TimingStatsSnapshot; -} - -function benchmarkAnalyze(testCase: BenchmarkCase): BenchmarkResult { - const times: number[] = []; - let summary: TypeAnalysisSummary | undefined; - - for (let i = 0; i < WARMUP_ITERATIONS; i++) { - analyzeBenchmarkSource(testCase.code, testCase.fileName); - } - - for (let i = 0; i < BENCHMARK_ITERATIONS; i++) { - const start = performance.now(); - summary = analyzeBenchmarkSource(testCase.code, testCase.fileName); - const elapsed = performance.now() - start; - - times.push(elapsed); - } - - if (!summary) { - throw new Error(`Benchmark case ${testCase.name} did not produce an analysis summary.`); - } - - const stats = calculateStats(times); - - return { - caseName: testCase.name, - scale: testCase.scale, - fileSizeBytes: Buffer.byteLength(testCase.code, 'utf-8'), - sourceLines: testCase.code.split('\n').length - 1, - iterations: BENCHMARK_ITERATIONS, - timesMs: times, - medianMs: stats.median, - p95Ms: stats.p95, - minMs: stats.min, - maxMs: stats.max, - avgMs: stats.avg, - diagnosticCount: summary.diagnosticCount, - errorCount: summary.errorCount, - warningCount: summary.warningCount, - informationCount: summary.informationCount, - statementCount: summary.statementCount, - timing: summary.timing, - }; -} - -function printResultTable(results: ReadonlyArray): void { - console.log('\n=== Evaluator Benchmark Results ===\n'); - console.log( - `${'Case'.padEnd(34)} ${'Scale'.padEnd(12)} ${'Lines'.padStart(7)} ${'Diag'.padStart(5)} ${'Median'.padStart( - 10 - )} ${'Min'.padStart(10)} ${'Max'.padStart(10)} ${'Avg'.padStart(10)} ${'p95'.padStart(10)}` - ); - console.log('-'.repeat(113)); - - for (const result of results) { - console.log( - `${result.caseName.padEnd(34)} ${result.scale.padEnd(12)} ${String(result.sourceLines).padStart( - 7 - )} ${String(result.diagnosticCount).padStart(5)} ${result.medianMs.toFixed(2).padStart(10)} ${result.minMs - .toFixed(2) - .padStart(10)} ${result.maxMs.toFixed(2).padStart(10)} ${result.avgMs - .toFixed(2) - .padStart(10)} ${result.p95Ms.toFixed(2).padStart(10)}` - ); - } - - console.log(''); -} - -const cases: BenchmarkCase[] = [ - { - name: 'recursive_alias_depth', - fileName: 'recursiveAlias.py', - scale: 'depth=24', - code: generateRecursiveAliasCase(24), - minDiagnosticCount: 0, - }, - { - name: 'overload_union_cross_product', - fileName: 'overloadUnionCrossProduct.py', - scale: '8x8', - code: generateOverloadUnionCrossProductCase(8), - minDiagnosticCount: 0, - }, - { - name: 'protocol_many_members_mismatch', - fileName: 'protocolMismatch.py', - scale: 'members=40', - code: generateProtocolMismatchCase(40), - minDiagnosticCount: 1, - }, - { - name: 'generic_alias_chain', - fileName: 'genericAliasChain.py', - scale: 'depth=32', - code: generateGenericAliasChainCase(32), - minDiagnosticCount: 0, - }, - { - name: 'constrained_typevar_matrix', - fileName: 'constrainedTypeVarMatrix.py', - scale: '8x8', - code: generateConstrainedTypeVarMatrixCase(8), - minDiagnosticCount: 1, - }, - { - name: 'literal_union_math', - fileName: 'literalUnionMath.py', - scale: 'width=64', - code: generateLiteralUnionMathCase(64), - minDiagnosticCount: 0, - }, - { - name: 'typed_dict_many_keys', - fileName: 'typedDictManyKeys.py', - scale: 'keys=80', - code: generateTypedDictCase(80), - minDiagnosticCount: 0, - }, -]; - -const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip; - -benchmarkSuite('Evaluator Benchmark', () => { - const allResults: BenchmarkResult[] = []; - - for (const testCase of cases) { - test(`analyze ${testCase.name} ${testCase.scale}`, () => { - const result = benchmarkAnalyze(testCase); - allResults.push(result); - - console.log( - ` ${testCase.name} ${testCase.scale}: median=${result.medianMs.toFixed(2)}ms, diagnostics=${ - result.diagnosticCount - }, check=${result.timing.typeCheck.totalTimeMs.toFixed(2)}ms, lines=${result.sourceLines}` - ); - - expect(result.statementCount).toBeGreaterThan(0); - expect(result.diagnosticCount).toBeGreaterThanOrEqual(testCase.minDiagnosticCount); - expect(result.medianMs).toBeLessThan(30000); - }); - } - - afterAll(() => { - if (allResults.length === 0) { - return; - } - - printResultTable(allResults); - - writeBenchmarkReport( - 'evaluator', - 'evaluator-benchmark', - createBenchmarkReport('evaluator', WARMUP_ITERATIONS, BENCHMARK_ITERATIONS, allResults) - ); - }); -}); diff --git a/packages/pyright-internal/src/tests/benchmarks/mypy_primer.smoke_projects.snapshot.py b/packages/pyright-internal/src/tests/benchmarks/mypy_primer.smoke_projects.snapshot.py deleted file mode 100644 index b75becfc7480..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/mypy_primer.smoke_projects.snapshot.py +++ /dev/null @@ -1,54 +0,0 @@ -from mypy_primer.model import Project - - -def get_projects() -> list[Project]: - return [ - Project( - location="https://github.com/hauntsaninja/mypy_primer", - pyright_cmd="{pyright} {paths}", - paths=["."], - ), - Project( - location="https://github.com/psf/black", - pyright_cmd="{pyright} {paths}", - paths=["src"], - ), - Project( - location="https://github.com/pytest-dev/pytest", - pyright_cmd="{pyright} {paths}", - paths=["src", "testing"], - ), - Project( - location="https://github.com/pandas-dev/pandas", - pyright_cmd="{pyright} {paths}", - paths=["pandas"], - ), - Project( - location="https://github.com/python-attrs/attrs", - pyright_cmd="{pyright}", - ), - Project( - location="https://github.com/Textualize/rich", - pyright_cmd="{pyright}", - ), - Project( - location="https://github.com/niklasf/python-chess", - pyright_cmd="{pyright} {paths}", - paths=["chess"], - ), - Project( - location="https://github.com/pypa/packaging", - pyright_cmd="{pyright} {paths}", - paths=["src"], - ), - Project( - location="https://github.com/pydantic/pydantic", - pyright_cmd="{pyright} {paths}", - paths=["pydantic"], - ), - Project( - location="https://github.com/wemake-services/django-modern-rest", - pyright_cmd="{pyright}", - paths=["dmr"], - ), - ] diff --git a/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts index 557415721546..2869777706cc 100644 --- a/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts +++ b/packages/pyright-internal/src/tests/benchmarks/parserBenchmark.test.ts @@ -13,21 +13,20 @@ * src/tests/benchmarks/.generated/benchmark-results/parser/ */ +import * as fs from 'fs'; +import * as os from 'os'; +import * as path from 'path'; + import { DiagnosticSink } from '../../common/diagnosticSink'; import { ParseOptions, Parser } from '../../parser/parser'; -import { - calculateStats, - createBenchmarkReport, - formatCount, - loadBenchmarkCorpus, - writeBenchmarkReport, -} from './benchmarkUtils'; // --- Configuration --- const WARMUP_ITERATIONS = 3; const BENCHMARK_ITERATIONS = 10; +const BENCHMARK_OUTPUT_DIR = path.join(__dirname, '.generated', 'benchmark-results', 'parser'); + // --- Types --- interface BenchmarkResult { @@ -46,8 +45,70 @@ interface BenchmarkResult { errorCount: number; } +interface BenchmarkReport { + timestamp: string; + system: { + platform: string; + arch: string; + cpus: string; + cpuCount: number; + totalMemoryMB: number; + nodeVersion: string; + }; + config: { + warmupIterations: number; + benchmarkIterations: number; + }; + results: BenchmarkResult[]; +} + // --- Helpers --- +function calculateStats(times: ReadonlyArray): { + median: number; + p95: number; + min: number; + max: number; + avg: number; +} { + const sorted = [...times].sort((a, b) => a - b); + const len = sorted.length; + + const median = len % 2 === 0 ? (sorted[len / 2 - 1] + sorted[len / 2]) / 2 : sorted[Math.floor(len / 2)]; + const p95Index = Math.ceil(len * 0.95) - 1; + const p95 = sorted[Math.min(p95Index, len - 1)]; + const min = sorted[0]; + const max = sorted[len - 1]; + const avg = times.reduce((a, b) => a + b, 0) / len; + + return { median, p95, min, max, avg }; +} + +function loadCorpus(filename: string): string { + const filePath = path.resolve(__dirname, '..', 'benchmarkData', filename); + return fs.readFileSync(filePath, 'utf-8'); +} + +function getSystemInfo(): BenchmarkReport['system'] { + const cpus = os.cpus(); + return { + platform: os.platform(), + arch: os.arch(), + cpus: cpus[0]?.model ?? 'unknown', + cpuCount: cpus.length, + totalMemoryMB: Math.round(os.totalmem() / (1024 * 1024)), + nodeVersion: process.version, + }; +} + +function writeReport(report: BenchmarkReport): void { + fs.mkdirSync(BENCHMARK_OUTPUT_DIR, { recursive: true }); + const filename = `parser-benchmark-${new Date().toISOString().replace(/[:.]/g, '-')}.json`; + const outputPath = path.join(BENCHMARK_OUTPUT_DIR, filename); + fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8'); + console.log(`\nBenchmark results written to: ${outputPath}`); +} + function printResultTable(results: ReadonlyArray): void { console.log('\n=== Parser Benchmark Results ===\n'); console.log( @@ -66,9 +127,11 @@ function printResultTable(results: ReadonlyArray): void { r.statementCount ).padStart(7)} ${String(r.errorCount).padStart(7)} ${r.medianMs.toFixed(2).padStart(10)} ${r.minMs .toFixed(2) - .padStart(10)} ${r.maxMs.toFixed(2).padStart(10)} ${r.avgMs.toFixed(2).padStart(10)} ${formatCount( + .padStart(10)} ${r.maxMs.toFixed(2).padStart(10)} ${r.avgMs.toFixed(2).padStart(10)} ${Math.round( r.nodesPerSec - ).padStart(12)}` + ) + .toLocaleString() + .padStart(12)}` ); } console.log(''); @@ -178,14 +241,14 @@ describe('Parser Benchmark', () => { for (const { name, file } of corpora) { test(`parse ${name}`, () => { - const code = loadBenchmarkCorpus(file); + const code = loadCorpus(file); const result = benchmarkParse(name, code); allResults.push(result); console.log( ` ${name}: median=${result.medianMs.toFixed(2)}ms, nodes=${result.nodeCount}, stmts=${ result.statementCount - }, nodes/sec=${formatCount(result.nodesPerSec)}` + }, nodes/sec=${Math.round(result.nodesPerSec).toLocaleString()}` ); // Sanity: parser should produce statements @@ -196,7 +259,7 @@ describe('Parser Benchmark', () => { } test('scaled corpus (10x large_stdlib)', () => { - const base = loadBenchmarkCorpus('large_stdlib.py'); + const base = loadCorpus('large_stdlib.py'); const scaled = Array(10).fill(base).join('\n'); const result = benchmarkParse('large_stdlib_10x', scaled); @@ -205,7 +268,7 @@ describe('Parser Benchmark', () => { console.log( ` large_stdlib_10x: median=${result.medianMs.toFixed(2)}ms, nodes=${ result.nodeCount - }, nodes/sec=${formatCount(result.nodesPerSec)}` + }, nodes/sec=${Math.round(result.nodesPerSec).toLocaleString()}` ); expect(result.statementCount).toBeGreaterThan(0); @@ -218,10 +281,16 @@ describe('Parser Benchmark', () => { printResultTable(allResults); - writeBenchmarkReport( - 'parser', - 'parser-benchmark', - createBenchmarkReport('parser', WARMUP_ITERATIONS, BENCHMARK_ITERATIONS, allResults) - ); + const report: BenchmarkReport = { + timestamp: new Date().toISOString(), + system: getSystemInfo(), + config: { + warmupIterations: WARMUP_ITERATIONS, + benchmarkIterations: BENCHMARK_ITERATIONS, + }, + results: allResults, + }; + + writeReport(report); }); }); diff --git a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.test.ts deleted file mode 100644 index 8a19952b2b52..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.test.ts +++ /dev/null @@ -1,1024 +0,0 @@ -/* - * runEcosystemBenchmark.test.ts - * Copyright (c) Microsoft Corporation. - * - * Tests for the ecosystem benchmark runner entry point. - */ - -import { spawnSync } from 'child_process'; -import * as fs from 'fs'; -import * as os from 'os'; -import * as path from 'path'; - -import { BenchmarkReport, benchmarkReportSchemaVersion } from './benchmarkUtils'; -import { - buildEcosystemBenchmarkManifest, - buildPyrightInvocation, - compareEcosystemBenchmarkReportData, - compareEcosystemBenchmarkReports, - EcosystemBenchmarkResult, - executePyrightProjectCommand, - getDefaultMainBaselineReportPath, - parseEcosystemBenchmarkArgs, - prepareEcosystemProjectCheckout, - runEcosystemBenchmark, - writeEcosystemBenchmarkManifest, - writeMainBaselineReport, - writeProjectPyrightConfig, -} from './runEcosystemBenchmark'; -import { GeneratedEcosystemProject } from './syncMypyPrimerProjects'; - -const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS'; - -const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip; - -benchmarkSuite('Ecosystem Benchmark Runner', () => { - test('parses smoke runner arguments', () => { - const config = parseEcosystemBenchmarkArgs([ - '--suite', - 'smoke', - '--tag', - 'overloads', - '--project', - 'pandas', - '--num-shards', - '2', - '--shard-index', - '1', - '--project-date', - '2026-01-01', - '--output', - 'artifacts/ecosystem-smoke', - ]); - - expect(config.mode).toBe('select'); - if (config.mode !== 'select') { - throw new Error('Expected selection mode.'); - } - - expect(config.suiteName).toBe('smoke'); - expect(config.tag).toBe('overloads'); - expect(config.projectPattern?.source).toBe('pandas'); - expect(config.numShards).toBe(2); - expect(config.shardIndex).toBe(1); - expect(config.projectDate).toBe('2026-01-01'); - expect(config.outputDir).toBe('artifacts/ecosystem-smoke'); - }); - - test('builds a filtered smoke manifest', () => { - const config = parseEcosystemBenchmarkArgs(['--suite', 'smoke', '--tag', 'overloads', '--output', 'artifacts']); - - expect(config.mode).toBe('select'); - if (config.mode !== 'select') { - throw new Error('Expected selection mode.'); - } - - const manifest = buildEcosystemBenchmarkManifest(config); - - expect(manifest.executionMode).toBe('selection-only'); - expect(manifest.selectedProjectCount).toBe(1); - expect(manifest.selectedProjects.map((project) => project.name)).toEqual(['pandas']); - }); - - test('writes an ecosystem run manifest', () => { - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-runner-')); - - try { - const config = parseEcosystemBenchmarkArgs([ - '--suite', - 'smoke', - '--project', - 'django', - '--output', - outputDir, - ]); - - expect(config.mode).toBe('select'); - if (config.mode !== 'select') { - throw new Error('Expected selection mode.'); - } - - const manifest = buildEcosystemBenchmarkManifest(config); - const manifestPath = writeEcosystemBenchmarkManifest(outputDir, manifest); - - expect(manifestPath).toBe(path.join(outputDir, 'ecosystem-run-manifest.json')); - expect(JSON.parse(fs.readFileSync(manifestPath, 'utf-8'))).toEqual(manifest); - } finally { - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('runs end to end and writes a manifest artifact', () => { - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-runner-main-')); - - try { - const manifestPath = runEcosystemBenchmark([ - '--suite', - 'smoke', - '--tag', - 'parser-heavy', - '--output', - outputDir, - ]); - - expect(typeof manifestPath).toBe('string'); - - const manifest = JSON.parse(fs.readFileSync(manifestPath as string, 'utf-8')); - - expect(manifest.selectedProjects.map((project: { name: string }) => project.name)).toEqual(['black']); - } finally { - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('rejects unsupported suite names', () => { - expect(() => parseEcosystemBenchmarkArgs(['--suite', 'full', '--output', 'artifacts'])).toThrow( - 'Unsupported ecosystem benchmark suite' - ); - }); - - test('parses comparison mode arguments', () => { - const config = parseEcosystemBenchmarkArgs([ - '--baseline-report', - 'old.json', - '--candidate-report', - 'new.json', - '--output', - 'artifacts', - ]); - - expect(config).toEqual({ - mode: 'compare', - baselineReportPath: 'old.json', - candidateReportPath: 'new.json', - outputDir: 'artifacts', - }); - }); - - test('defaults comparison mode to the checked-in main baseline', () => { - const config = parseEcosystemBenchmarkArgs(['--candidate-report', 'new.json', '--output', 'artifacts']); - - expect(config).toEqual({ - mode: 'compare', - baselineReportPath: getDefaultMainBaselineReportPath(), - candidateReportPath: 'new.json', - outputDir: 'artifacts', - }); - }); - - test('parses execution mode arguments', () => { - const config = parseEcosystemBenchmarkArgs([ - '--suite', - 'smoke', - '--project-root', - 'q:/projects', - '--baseline-executable', - 'node ./out/packages/pyright-internal/src/pyright.js', - '--output', - 'artifacts', - ]); - - expect(config).toEqual({ - mode: 'execute', - suiteName: 'smoke', - outputDir: 'artifacts', - projectRoot: 'q:/projects', - projectDate: undefined, - tag: undefined, - projectPattern: undefined, - numShards: undefined, - shardIndex: undefined, - baselineExecutable: 'node ./out/packages/pyright-internal/src/pyright.js', - candidateExecutable: undefined, - mainBaselineReportPath: undefined, - baselineSourceCommit: undefined, - updateMainBaseline: undefined, - prepareProjects: undefined, - installDependencies: undefined, - }); - }); - - test('parses main baseline source commit', () => { - const config = parseEcosystemBenchmarkArgs([ - '--suite', - 'smoke', - '--project-root', - 'q:/projects', - '--baseline-executable', - 'node ../pyright/index.js', - '--baseline-source-commit', - 'abc123', - '--output', - 'artifacts', - ]); - - expect(config.mode).toBe('execute'); - if (config.mode !== 'execute') { - throw new Error('Expected execution mode.'); - } - - expect(config.baselineSourceCommit).toBe('abc123'); - }); - - test('parses project preparation flags', () => { - const config = parseEcosystemBenchmarkArgs([ - '--suite', - 'smoke', - '--project-root', - 'q:/projects', - '--baseline-executable', - 'node ../pyright/index.js', - '--prepare-projects', - '--install-dependencies', - '--output', - 'artifacts', - ]); - - expect(config.mode).toBe('execute'); - if (config.mode !== 'execute') { - throw new Error('Expected execution mode.'); - } - - expect(config.prepareProjects).toBe(true); - expect(config.installDependencies).toBe(true); - }); - - test('builds a pyright invocation from project metadata', () => { - const invocation = buildPyrightInvocation( - 'node ./dist/pyright.js', - { - name: 'black', - mypyPrimerProject: 'black', - source: { kind: 'mypy-primer' }, - pyrightCommand: '{pyright} --lib {paths}', - paths: ['src', 'tests'], - }, - 'c:/temp/pyrightconfig.json' - ); - - expect(invocation.command).toBe('node'); - expect(invocation.args).toEqual([ - './dist/pyright.js', - '--lib', - '--outputjson', - '-p', - 'c:/temp/pyrightconfig.json', - ]); - }); - - test('inserts a separator for node eval commands', () => { - const invocation = buildPyrightInvocation( - 'node -e "require(\'./out/pyright.js\').main()"', - { - name: 'black', - mypyPrimerProject: 'black', - source: { kind: 'mypy-primer' }, - pyrightCommand: '{pyright}', - }, - 'c:/temp/pyrightconfig.json' - ); - - expect(invocation.command).toBe('node'); - expect(invocation.args).toEqual([ - '-e', - "require('./out/pyright.js').main()", - '--', - '--outputjson', - '-p', - 'c:/temp/pyrightconfig.json', - ]); - }); - - test('writes a project pyrightconfig.json with source-only includes', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-')); - - try { - const configPath = writeProjectPyrightConfig(tempDir, { - name: 'pydantic', - mypyPrimerProject: 'pydantic', - source: { kind: 'mypy-primer' }, - paths: ['src', 'tests', 'testdata'], - }); - const config = JSON.parse(fs.readFileSync(configPath, 'utf-8')); - - expect(config.include).toEqual(['../src']); - expect(config.exclude).toContain('../**/tests'); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('falls back to configured paths when every path looks test-like', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-fallback-')); - - try { - const configPath = writeProjectPyrightConfig(tempDir, { - name: 'example', - mypyPrimerProject: 'example', - source: { kind: 'mypy-primer' }, - paths: ['tests'], - }); - const config = JSON.parse(fs.readFileSync(configPath, 'utf-8')); - - expect(config.include).toEqual(['../tests']); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('extends an existing project pyrightconfig when writing benchmark config', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-extends-')); - - try { - fs.writeFileSync(path.join(tempDir, 'pyrightconfig.json'), '{"typeCheckingMode":"strict"}', 'utf-8'); - - const configPath = writeProjectPyrightConfig(tempDir, { - name: 'example', - mypyPrimerProject: 'example', - source: { kind: 'mypy-primer' }, - paths: ['src'], - }); - const config = JSON.parse(fs.readFileSync(configPath, 'utf-8')); - - expect(config.extends).toBe('../pyrightconfig.json'); - expect(config.include).toEqual(['../src']); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('merges pyproject tool pyright settings into benchmark config', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-project-config-pyproject-')); - - try { - fs.writeFileSync( - path.join(tempDir, 'pyproject.toml'), - [ - '[tool.pyright]', - 'typeCheckingMode = "strict"', - 'include = ["tests"]', - 'extraPaths = ["typings"]', - 'stubPath = "stubs"', - ].join('\n'), - 'utf-8' - ); - - const configPath = writeProjectPyrightConfig(tempDir, { - name: 'example', - mypyPrimerProject: 'example', - source: { kind: 'mypy-primer' }, - paths: ['src'], - }); - const config = JSON.parse(fs.readFileSync(configPath, 'utf-8')); - - expect(config.typeCheckingMode).toBe('strict'); - expect(config.extraPaths).toEqual(['../typings']); - expect(config.stubPath).toBe('../stubs'); - expect(config.include).toEqual(['../src']); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('executes a project command and captures benchmark results', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-execute-')); - const workingDirectory = path.join(tempDir, 'black'); - const fakePyrightScriptPath = path.join(tempDir, 'fake-pyright.js'); - - try { - fs.mkdirSync(workingDirectory, { recursive: true }); - fs.writeFileSync( - fakePyrightScriptPath, - [ - 'const configArgIndex = process.argv.indexOf("-p");', - 'if (configArgIndex < 0) { throw new Error("missing -p"); }', - 'const fs = require("fs");', - 'const config = JSON.parse(fs.readFileSync(process.argv[configArgIndex + 1], "utf8"));', - 'if (JSON.stringify(config.include) !== JSON.stringify(["../src"])) {', - ' throw new Error(`unexpected include paths: ${JSON.stringify(config.include)}`);', - '}', - 'const result = {', - ' generalDiagnostics: [{ severity: "error" }, { severity: "warning" }],', - ' summary: {', - ' filesAnalyzed: 3,', - ' errorCount: 1,', - ' warningCount: 1,', - ' informationCount: 0,', - ' timeInSec: 0.25', - ' }', - '};', - 'console.log(JSON.stringify(result));', - ].join('\n'), - 'utf-8' - ); - - const result = executePyrightProjectCommand( - 'black', - createGeneratedProject({ - pyrightCommand: `{pyright} "${fakePyrightScriptPath}" {paths}`, - paths: ['src', 'tests'], - }), - workingDirectory, - process.execPath - ); - - expect(result.projectName).toBe('black'); - expect(result.filesAnalyzed).toBe(3); - expect(result.diagnosticCount).toBe(2); - expect(result.errorCount).toBe(1); - expect(result.warningCount).toBe(1); - expect(result.informationCount).toBe(0); - expect(result.diagnostics).toEqual([ - { file: undefined, severity: 'error', message: '' }, - { file: undefined, severity: 'warning', message: '' }, - ]); - expect(result.totalTimeMs).toBeGreaterThanOrEqual(0); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('prepares a project checkout from git metadata', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-prepare-')); - const sourceRepo = path.join(tempDir, 'source'); - const checkoutDir = path.join(tempDir, 'checkout'); - - try { - fs.mkdirSync(sourceRepo, { recursive: true }); - runGit(['init'], sourceRepo); - runGit(['config', 'core.autocrlf', 'false'], sourceRepo); - runGit(['config', 'user.email', 'pyright-benchmark@example.com'], sourceRepo); - runGit(['config', 'user.name', 'Pyright Benchmark'], sourceRepo); - fs.writeFileSync(path.join(sourceRepo, 'sample.py'), 'x = 1\n', 'utf-8'); - runGit(['add', 'sample.py'], sourceRepo); - runGit(['commit', '-m', 'initial'], sourceRepo, { - GIT_AUTHOR_DATE: '2025-01-01T00:00:00Z', - GIT_COMMITTER_DATE: '2025-01-01T00:00:00Z', - }); - - prepareEcosystemProjectCheckout( - createGeneratedProject({ location: sourceRepo }), - checkoutDir, - '2026-01-01' - ); - - expect(fs.existsSync(path.join(checkoutDir, 'sample.py'))).toBe(true); - expect(runGit(['status', '--short'], checkoutDir)).toBe(''); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('includes command details when pyright emits no JSON', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-execute-error-')); - const workingDirectory = path.join(tempDir, 'black'); - const fakePyrightScriptPath = path.join(tempDir, 'fake-pyright-error.js'); - - try { - fs.mkdirSync(workingDirectory, { recursive: true }); - fs.writeFileSync( - fakePyrightScriptPath, - ['console.log("not json");', 'console.error("synthetic stderr");', 'process.exit(2);'].join('\n'), - 'utf-8' - ); - - expect(() => - executePyrightProjectCommand( - 'black', - createGeneratedProject({ - pyrightCommand: `{pyright} "${fakePyrightScriptPath}" {paths}`, - paths: ['src'], - }), - workingDirectory, - process.execPath - ) - ).toThrow(/Command: .*fake-pyright-error\.js[\s\S]*Exit status: 2[\s\S]*synthetic stderr/); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('resolves relative node script paths against the runner cwd during execution', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-relative-exec-')); - const workingDirectory = path.join(tempDir, 'projects', 'black'); - const fakePyrightScriptPath = path.join(tempDir, 'fake-pyright-cli.js'); - const previousCwd = process.cwd(); - - try { - fs.mkdirSync(workingDirectory, { recursive: true }); - fs.writeFileSync( - fakePyrightScriptPath, - createFakePyrightScript({ errorCount: 0, warningCount: 0, informationCount: 0 }), - 'utf-8' - ); - - process.chdir(tempDir); - - const result = executePyrightProjectCommand( - 'black', - createGeneratedProject({ - paths: ['src'], - }), - workingDirectory, - `"${process.execPath}" ./fake-pyright-cli.js` - ); - - expect(result.projectName).toBe('black'); - expect(result.filesAnalyzed).toBe(3); - expect(result.diagnosticCount).toBe(0); - } finally { - process.chdir(previousCwd); - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('runs execution mode end to end and writes reports plus comparison artifacts', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-execution-main-')); - const projectRoot = tempDir; - const projectDir = path.join(projectRoot, 'black'); - const outputDir = path.join(tempDir, 'artifacts'); - const baselineScriptPath = path.join(tempDir, 'baseline-pyright.js'); - const candidateScriptPath = path.join(tempDir, 'candidate-pyright.js'); - - try { - fs.mkdirSync(path.join(projectDir, 'src'), { recursive: true }); - fs.writeFileSync(path.join(projectDir, 'src', 'sample.py'), 'x = 1\n', 'utf-8'); - - fs.writeFileSync( - baselineScriptPath, - createFakePyrightScript({ errorCount: 1, warningCount: 0, informationCount: 0 }), - 'utf-8' - ); - fs.writeFileSync( - candidateScriptPath, - createFakePyrightScript({ errorCount: 0, warningCount: 1, informationCount: 0 }), - 'utf-8' - ); - - const artifactPaths = runEcosystemBenchmark([ - '--suite', - 'smoke', - '--tag', - 'parser-heavy', - '--project-root', - projectRoot, - '--project-date', - '2026-01-01', - '--baseline-executable', - `"${process.execPath}" "${baselineScriptPath}"`, - '--candidate-executable', - `"${process.execPath}" "${candidateScriptPath}"`, - '--output', - outputDir, - ]); - - expect(typeof artifactPaths).not.toBe('string'); - expect(fs.existsSync((artifactPaths as { baselineReportPath: string }).baselineReportPath)).toBe(true); - expect(fs.existsSync((artifactPaths as { candidateReportPath: string }).candidateReportPath)).toBe(true); - expect( - fs.existsSync( - (artifactPaths as { comparisonArtifactPaths: { jsonPath: string } }).comparisonArtifactPaths - .jsonPath - ) - ).toBe(true); - - const baselineReport = JSON.parse( - fs.readFileSync((artifactPaths as { baselineReportPath: string }).baselineReportPath, 'utf-8') - ); - const candidateReport = JSON.parse( - fs.readFileSync((artifactPaths as { candidateReportPath: string }).candidateReportPath, 'utf-8') - ); - - expect(baselineReport.results[0].filesAnalyzed).toBe(3); - expect(candidateReport.results[0].filesAnalyzed).toBe(3); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('compares candidate-only execution against a main baseline report when present', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-candidate-main-')); - const projectRoot = tempDir; - const projectDir = path.join(projectRoot, 'black'); - const outputDir = path.join(tempDir, 'artifacts'); - const candidateScriptPath = path.join(tempDir, 'candidate-pyright.js'); - const mainBaselineReportPath = path.join(tempDir, 'baselines', 'ecosystem-smoke-main.json'); - - try { - fs.mkdirSync(path.join(projectDir, 'src'), { recursive: true }); - fs.mkdirSync(path.dirname(mainBaselineReportPath), { recursive: true }); - fs.writeFileSync(path.join(projectDir, 'src', 'sample.py'), 'x = 1\n', 'utf-8'); - fs.writeFileSync( - candidateScriptPath, - createFakePyrightScript({ errorCount: 0, warningCount: 1, informationCount: 0 }), - 'utf-8' - ); - fs.writeFileSync( - mainBaselineReportPath, - JSON.stringify( - createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [ - { projectName: 'black', diagnosticCount: 0, warningCount: 0 }, - ]), - undefined, - 2 - ), - 'utf-8' - ); - - const artifactPaths = runEcosystemBenchmark([ - '--suite', - 'smoke', - '--tag', - 'parser-heavy', - '--project-root', - projectRoot, - '--candidate-executable', - `"${process.execPath}" "${candidateScriptPath}"`, - '--main-baseline-report', - mainBaselineReportPath, - '--output', - outputDir, - ]); - - expect(typeof artifactPaths).not.toBe('string'); - expect((artifactPaths as { baselineReportPath?: string }).baselineReportPath).toBeUndefined(); - expect( - fs.existsSync( - (artifactPaths as { comparisonArtifactPaths: { jsonPath: string } }).comparisonArtifactPaths - .jsonPath - ) - ).toBe(true); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('copies execution baseline report into the main baseline path', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-main-baseline-')); - const projectRoot = tempDir; - const projectDir = path.join(projectRoot, 'black'); - const outputDir = path.join(tempDir, 'artifacts'); - const baselineScriptPath = path.join(tempDir, 'baseline-pyright.js'); - const mainBaselineReportPath = path.join(tempDir, 'baselines', 'ecosystem-smoke-main.json'); - - try { - fs.mkdirSync(path.join(projectDir, 'src'), { recursive: true }); - fs.writeFileSync(path.join(projectDir, 'src', 'sample.py'), 'x = 1\n', 'utf-8'); - fs.writeFileSync( - baselineScriptPath, - createFakePyrightScript({ errorCount: 0, warningCount: 0, informationCount: 0 }), - 'utf-8' - ); - - const artifactPaths = runEcosystemBenchmark([ - '--suite', - 'smoke', - '--tag', - 'parser-heavy', - '--project-root', - projectRoot, - '--project-date', - '2026-01-01', - '--baseline-executable', - `"${process.execPath}" "${baselineScriptPath}"`, - '--update-main-baseline', - '--main-baseline-report', - mainBaselineReportPath, - '--baseline-source-commit', - 'abc123', - '--output', - outputDir, - ]); - - expect(typeof artifactPaths).not.toBe('string'); - expect(fs.existsSync(mainBaselineReportPath)).toBe(true); - expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).results[0].projectName).toBe('black'); - expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).mainBaseline.sourceCommit).toBe( - 'abc123' - ); - expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).mainBaseline.projectDate).toBe( - '2026-01-01' - ); - expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8')).mainBaseline.configMode).toBe( - 'generated-benchmark-config' - ); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('copies a report to a main baseline path', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-copy-baseline-')); - const sourceReportPath = path.join(tempDir, 'baseline-report.json'); - const mainBaselineReportPath = path.join(tempDir, 'nested', 'ecosystem-smoke-main.json'); - - try { - fs.writeFileSync(sourceReportPath, '{"results":[]}', 'utf-8'); - - expect(writeMainBaselineReport(sourceReportPath, mainBaselineReportPath)).toBe(mainBaselineReportPath); - expect(fs.readFileSync(mainBaselineReportPath, 'utf-8')).toBe('{"results":[]}'); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('stamps copied main baseline metadata', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-stamp-baseline-')); - const sourceReportPath = path.join(tempDir, 'baseline-report.json'); - const mainBaselineReportPath = path.join(tempDir, 'nested', 'ecosystem-smoke-main.json'); - - try { - fs.writeFileSync(sourceReportPath, '{"results":[]}', 'utf-8'); - - writeMainBaselineReport(sourceReportPath, mainBaselineReportPath, { - sourceCommit: 'abc123', - projectDate: '2026-01-01', - configMode: 'generated-benchmark-config', - refreshedAt: '2026-05-08T00:00:00.000Z', - }); - - expect(JSON.parse(fs.readFileSync(mainBaselineReportPath, 'utf-8'))).toEqual({ - results: [], - mainBaseline: { - sourceCommit: 'abc123', - projectDate: '2026-01-01', - configMode: 'generated-benchmark-config', - refreshedAt: '2026-05-08T00:00:00.000Z', - }, - }); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('writes comparison artifacts from ecosystem benchmark reports', () => { - const reportsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-report-')); - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-compare-')); - - try { - const baselinePath = path.join(reportsDir, 'old.json'); - const candidatePath = path.join(reportsDir, 'new.json'); - - fs.writeFileSync( - baselinePath, - JSON.stringify( - createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [ - { projectName: 'black', totalTimeMs: 100, maxMemoryMB: 250 }, - ]), - undefined, - 2 - ), - 'utf-8' - ); - fs.writeFileSync( - candidatePath, - JSON.stringify( - createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [ - { projectName: 'black', totalTimeMs: 120, maxMemoryMB: 260 }, - ]), - undefined, - 2 - ), - 'utf-8' - ); - - const artifactPaths = compareEcosystemBenchmarkReports(baselinePath, candidatePath, outputDir); - - expect(JSON.parse(fs.readFileSync(artifactPaths.jsonPath, 'utf-8')).compared[0].key).toBe('black'); - expect(fs.readFileSync(artifactPaths.markdownPath, 'utf-8')).toContain('Largest Regressions'); - expect(JSON.parse(fs.readFileSync(artifactPaths.oldJsonPath, 'utf-8')).results[0].projectName).toBe( - 'black' - ); - } finally { - fs.rmSync(reportsDir, { force: true, recursive: true }); - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('compares ecosystem diagnostic metrics when reports include them', () => { - const reportsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-diagnostics-')); - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-diagnostics-compare-')); - - try { - const baselinePath = path.join(reportsDir, 'old.json'); - const candidatePath = path.join(reportsDir, 'new.json'); - - fs.writeFileSync( - baselinePath, - JSON.stringify( - createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [ - { - projectName: 'black', - diagnosticCount: 1, - errorCount: 1, - warningCount: 0, - diagnostics: [{ file: 'src/a.py', severity: 'error', message: 'old diagnostic' }], - }, - ]), - undefined, - 2 - ), - 'utf-8' - ); - fs.writeFileSync( - candidatePath, - JSON.stringify( - createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [ - { - projectName: 'black', - diagnosticCount: 2, - errorCount: 1, - warningCount: 1, - diagnostics: [ - { file: 'src/a.py', severity: 'error', message: 'old diagnostic' }, - { file: 'src/b.py', severity: 'warning', message: 'new diagnostic' }, - ], - }, - ]), - undefined, - 2 - ), - 'utf-8' - ); - - const artifactPaths = compareEcosystemBenchmarkReports(baselinePath, candidatePath, outputDir); - const comparison = JSON.parse(fs.readFileSync(artifactPaths.jsonPath, 'utf-8')); - - expect(comparison.compared[0].metrics.map((metric: { metric: string }) => metric.metric)).toEqual([ - 'diagnosticCount', - 'errorCount', - 'warningCount', - ]); - expect(comparison.diagnosticDiffs).toEqual([ - { - projectName: 'black', - added: ['warning | src/b.py | new diagnostic'], - removed: [], - }, - ]); - expect(fs.readFileSync(artifactPaths.markdownPath, 'utf-8')).toContain('diagnosticCount'); - expect(fs.readFileSync(artifactPaths.markdownPath, 'utf-8')).toContain('## Diagnostic Diffs'); - } finally { - fs.rmSync(reportsDir, { force: true, recursive: true }); - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('builds diagnostic diffs from report data', () => { - const comparison = compareEcosystemBenchmarkReportData( - createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [ - { - projectName: 'black', - diagnostics: [ - { file: 'src/a.py', severity: 'error', message: 'old diagnostic' }, - { file: 'src/stable.py', severity: 'warning', message: 'stable diagnostic' }, - ], - }, - ]), - createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [ - { - projectName: 'black', - diagnostics: [ - { file: 'src/b.py', severity: 'information', message: 'new diagnostic' }, - { file: 'src/stable.py', severity: 'warning', message: 'stable diagnostic' }, - ], - }, - ]) - ); - - expect(comparison.diagnosticDiffs).toEqual([ - { - projectName: 'black', - added: ['information | src/b.py | new diagnostic'], - removed: ['error | src/a.py | old diagnostic'], - }, - ]); - }); - - test('runs comparison mode end to end', () => { - const reportsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-report-main-')); - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-ecosystem-compare-main-')); - - try { - const baselinePath = path.join(reportsDir, 'old.json'); - const candidatePath = path.join(reportsDir, 'new.json'); - - fs.writeFileSync( - baselinePath, - JSON.stringify( - createEcosystemBenchmarkReport('2026-05-07T00:00:00.000Z', [ - { projectName: 'black', totalTimeMs: 100 }, - ]), - undefined, - 2 - ), - 'utf-8' - ); - fs.writeFileSync( - candidatePath, - JSON.stringify( - createEcosystemBenchmarkReport('2026-05-07T01:00:00.000Z', [ - { projectName: 'black', totalTimeMs: 95 }, - ]), - undefined, - 2 - ), - 'utf-8' - ); - - const artifactPaths = runEcosystemBenchmark([ - '--baseline-report', - baselinePath, - '--candidate-report', - candidatePath, - '--output', - outputDir, - ]); - - expect(typeof artifactPaths).not.toBe('string'); - expect(fs.existsSync((artifactPaths as { jsonPath: string }).jsonPath)).toBe(true); - } finally { - fs.rmSync(reportsDir, { force: true, recursive: true }); - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); -}); - -function createEcosystemBenchmarkReport( - timestamp: string, - results: EcosystemBenchmarkResult[] -): BenchmarkReport { - return { - schemaVersion: benchmarkReportSchemaVersion, - suiteName: 'ecosystem-smoke', - timestamp, - system: { - platform: 'win32', - arch: 'x64', - cpus: 'test-cpu', - cpuCount: 8, - totalMemoryMB: 16384, - nodeVersion: process.version, - }, - config: { - warmupIterations: 0, - benchmarkIterations: 1, - }, - results, - }; -} - -function createGeneratedProject(overrides: Partial = {}): GeneratedEcosystemProject { - return { - name: 'black', - mypyPrimerProject: 'black', - source: { kind: 'mypy-primer' }, - ...overrides, - }; -} - -function runGit(args: readonly string[], cwd: string, env: NodeJS.ProcessEnv = {}): string { - const result = spawnSync('git', args, { - cwd, - encoding: 'utf-8', - env: { ...process.env, ...env }, - }); - - if (result.error) { - throw result.error; - } - - if (result.status !== 0) { - throw new Error( - `git ${args.join(' ')} failed with ${result.status ?? 'unknown'}\n${result.stderr}\n${result.stdout}` - ); - } - - return result.stdout.trim(); -} - -function createFakePyrightScript(counts: { - errorCount: number; - warningCount: number; - informationCount: number; -}): string { - const diagnosticEntries = [ - ...Array.from({ length: counts.errorCount }, () => '{ severity: "error" }'), - ...Array.from({ length: counts.warningCount }, () => '{ severity: "warning" }'), - ...Array.from({ length: counts.informationCount }, () => '{ severity: "information" }'), - ].join(', '); - - return [ - 'const result = {', - ` generalDiagnostics: [${diagnosticEntries}],`, - ' summary: {', - ' filesAnalyzed: 3,', - ` errorCount: ${counts.errorCount},`, - ` warningCount: ${counts.warningCount},`, - ` informationCount: ${counts.informationCount},`, - ' timeInSec: 0.25', - ' }', - '};', - 'console.log(JSON.stringify(result));', - ].join('\n'); -} diff --git a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.ts b/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.ts deleted file mode 100644 index d22d0152b9f9..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/runEcosystemBenchmark.ts +++ /dev/null @@ -1,949 +0,0 @@ -import { spawnSync } from 'child_process'; -import commandLineArgs, { CommandLineOptions, OptionDefinition } from 'command-line-args'; -import * as fs from 'fs'; -import * as path from 'path'; - -import { parse } from '../../common/tomlUtils'; - -import { - BenchmarkMetricDefinition, - BenchmarkReportComparison, - BenchmarkReportComparisonArtifactPaths, - compareBenchmarkReports, - loadBenchmarkReport, - renderBenchmarkComparisonMarkdown, - writeBenchmarkReportComparisonArtifacts, -} from './benchmarkComparison'; -import { BenchmarkReport, createBenchmarkReport } from './benchmarkUtils'; -import { - EcosystemProjectTag, - EcosystemSmokeProject, - getEcosystemSmokeProjectTags, - getGeneratedEcosystemProject, - selectEcosystemSmokeProjects, -} from './ecosystemSmokeProjects'; -import { GeneratedEcosystemProject } from './syncMypyPrimerProjects'; - -export interface EcosystemBenchmarkRunConfig { - mode: 'select'; - suiteName: 'smoke'; - outputDir: string; - projectDate?: string; - tag?: EcosystemProjectTag; - projectPattern?: RegExp; - numShards?: number; - shardIndex?: number; -} - -export interface EcosystemBenchmarkComparisonConfig { - mode: 'compare'; - baselineReportPath: string; - candidateReportPath: string; - outputDir: string; -} - -export interface EcosystemBenchmarkExecutionConfig { - mode: 'execute'; - suiteName: 'smoke'; - outputDir: string; - projectRoot: string; - projectDate?: string; - tag?: EcosystemProjectTag; - projectPattern?: RegExp; - numShards?: number; - shardIndex?: number; - baselineExecutable?: string; - candidateExecutable?: string; - mainBaselineReportPath?: string; - baselineSourceCommit?: string; - updateMainBaseline?: boolean; - prepareProjects?: boolean; - installDependencies?: boolean; -} - -export interface EcosystemBenchmarkResult { - projectName: string; - totalTimeMs?: number; - maxMemoryMB?: number; - filesAnalyzed?: number; - diagnosticCount?: number; - errorCount?: number; - warningCount?: number; - informationCount?: number; - diagnostics?: EcosystemBenchmarkDiagnostic[]; -} - -export interface EcosystemBenchmarkDiagnostic { - file?: string; - severity: string; - message: string; -} - -export interface EcosystemBenchmarkDiagnosticDiff { - projectName: string; - added: string[]; - removed: string[]; -} - -export interface EcosystemBenchmarkReportComparison extends BenchmarkReportComparison { - diagnosticDiffs: EcosystemBenchmarkDiagnosticDiff[]; -} - -export interface EcosystemBenchmarkManifest { - suiteName: 'smoke'; - executionMode: 'selection-only' | 'command-execution'; - outputDir: string; - projectDate?: string; - filters: { - tag?: EcosystemProjectTag; - projectPattern?: string; - numShards?: number; - shardIndex?: number; - }; - selectedProjects: EcosystemSmokeProject[]; - selectedProjectCount: number; - notes: string[]; -} - -export interface EcosystemBenchmarkExecutionArtifactPaths { - baselineReportPath?: string; - candidateReportPath?: string; - comparisonArtifactPaths?: BenchmarkReportComparisonArtifactPaths; -} - -interface PyrightJsonResults { - generalDiagnostics: { file?: string; message?: string; severity: string }[]; - summary: { - errorCount: number; - warningCount: number; - informationCount: number; - filesAnalyzed: number; - timeInSec: number; - }; -} - -interface ProjectPyrightConfigFile { - [key: string]: unknown; - extends?: string; - include: string[]; - exclude: string[]; -} - -interface MainBaselineMetadata { - sourceCommit?: string; - projectDate?: string; - configMode: 'generated-benchmark-config'; - refreshedAt: string; -} - -export type EcosystemBenchmarkCommand = - | EcosystemBenchmarkRunConfig - | EcosystemBenchmarkComparisonConfig - | EcosystemBenchmarkExecutionConfig; - -const optionDefinitions: OptionDefinition[] = [ - { name: 'suite', type: String }, - { name: 'tag', type: String }, - { name: 'project', type: String }, - { name: 'num-shards', type: Number }, - { name: 'shard-index', type: Number }, - { name: 'project-date', type: String }, - { name: 'project-root', type: String }, - { name: 'baseline-executable', type: String }, - { name: 'candidate-executable', type: String }, - { name: 'baseline-report', type: String }, - { name: 'candidate-report', type: String }, - { name: 'main-baseline-report', type: String }, - { name: 'update-main-baseline', type: Boolean }, - { name: 'baseline-source-commit', type: String }, - { name: 'prepare-projects', type: Boolean }, - { name: 'install-dependencies', type: Boolean }, - { name: 'output', type: String }, -]; - -const benchmarkOwnedConfigKeys = new Set(['include', 'exclude', 'ignore', 'strict']); -const pyrightPathArrayConfigKeys = new Set(['extraPaths']); -const pyrightPathStringConfigKeys = new Set(['stubPath', 'typeshedPath', 'venvPath']); - -const ecosystemBenchmarkComparisonMetrics: readonly BenchmarkMetricDefinition[] = [ - { name: 'totalTimeMs', getValue: (result) => result.totalTimeMs }, - { name: 'maxMemoryMB', getValue: (result) => result.maxMemoryMB }, - { name: 'filesAnalyzed', lowerIsBetter: false, getValue: (result) => result.filesAnalyzed }, - { name: 'diagnosticCount', getValue: (result) => result.diagnosticCount }, - { name: 'errorCount', getValue: (result) => result.errorCount }, - { name: 'warningCount', getValue: (result) => result.warningCount }, - { name: 'informationCount', getValue: (result) => result.informationCount }, -]; - -export function parseEcosystemBenchmarkArgs(args: string[]): EcosystemBenchmarkCommand { - const parsedArgs = commandLineArgs(optionDefinitions, { argv: args }) as CommandLineOptions; - const outputDir = parsedArgs.output as string | undefined; - if (!outputDir) { - throw new Error('The --output option is required.'); - } - - const baselineReportPath = parsedArgs['baseline-report'] as string | undefined; - const candidateReportPath = parsedArgs['candidate-report'] as string | undefined; - const mainBaselineReportPath = parsedArgs['main-baseline-report'] as string | undefined; - const baselineSourceCommit = parsedArgs['baseline-source-commit'] as string | undefined; - const baselineExecutable = parsedArgs['baseline-executable'] as string | undefined; - const candidateExecutable = parsedArgs['candidate-executable'] as string | undefined; - - if (baselineReportPath || candidateReportPath) { - if (!candidateReportPath) { - throw new Error('The --candidate-report option is required when comparing ecosystem benchmark reports.'); - } - - return { - mode: 'compare', - baselineReportPath: baselineReportPath ?? mainBaselineReportPath ?? getDefaultMainBaselineReportPath(), - candidateReportPath, - outputDir, - }; - } - - const suiteName = (parsedArgs.suite as string | undefined) ?? 'smoke'; - - if (suiteName !== 'smoke') { - throw new Error(`Unsupported ecosystem benchmark suite "${suiteName}". Only "smoke" is implemented.`); - } - - const tag = parsedArgs.tag as string | undefined; - if (tag && !getEcosystemSmokeProjectTags().includes(tag as EcosystemProjectTag)) { - throw new Error(`Unsupported ecosystem smoke tag "${tag}".`); - } - - const projectPatternText = parsedArgs.project as string | undefined; - - if (baselineExecutable || candidateExecutable) { - const projectRoot = parsedArgs['project-root'] as string | undefined; - if (!projectRoot) { - throw new Error('The --project-root option is required when executing ecosystem benchmarks.'); - } - - return { - mode: 'execute', - suiteName, - outputDir, - projectRoot, - projectDate: parsedArgs['project-date'] as string | undefined, - tag: tag as EcosystemProjectTag | undefined, - projectPattern: projectPatternText ? new RegExp(projectPatternText, 'i') : undefined, - numShards: parsedArgs['num-shards'] as number | undefined, - shardIndex: parsedArgs['shard-index'] as number | undefined, - baselineExecutable, - candidateExecutable, - mainBaselineReportPath, - baselineSourceCommit, - updateMainBaseline: parsedArgs['update-main-baseline'] as boolean | undefined, - prepareProjects: parsedArgs['prepare-projects'] as boolean | undefined, - installDependencies: parsedArgs['install-dependencies'] as boolean | undefined, - }; - } - - return { - mode: 'select', - suiteName, - outputDir, - projectDate: parsedArgs['project-date'] as string | undefined, - tag: tag as EcosystemProjectTag | undefined, - projectPattern: projectPatternText ? new RegExp(projectPatternText, 'i') : undefined, - numShards: parsedArgs['num-shards'] as number | undefined, - shardIndex: parsedArgs['shard-index'] as number | undefined, - }; -} - -export function getDefaultMainBaselineReportPath(): string { - return getWritableBenchmarkFilePath('baselines', 'ecosystem-smoke-main.json'); -} - -export function buildEcosystemBenchmarkManifest(config: EcosystemBenchmarkRunConfig): EcosystemBenchmarkManifest { - const selectedProjects = selectEcosystemSmokeProjects({ - tag: config.tag, - projectPattern: config.projectPattern, - numShards: config.numShards, - shardIndex: config.shardIndex, - }); - - return { - suiteName: config.suiteName, - executionMode: 'selection-only', - outputDir: config.outputDir, - projectDate: config.projectDate, - filters: { - tag: config.tag, - projectPattern: config.projectPattern?.source, - numShards: config.numShards, - shardIndex: config.shardIndex, - }, - selectedProjects, - selectedProjectCount: selectedProjects.length, - notes: [ - 'This runner currently resolves the ecosystem smoke selection and writes a manifest artifact.', - 'Project execution against base/head Pyright is not implemented yet.', - ], - }; -} - -export function executeEcosystemBenchmark( - config: EcosystemBenchmarkExecutionConfig -): EcosystemBenchmarkExecutionArtifactPaths { - const selectedProjects = selectEcosystemSmokeProjects({ - tag: config.tag, - projectPattern: config.projectPattern, - numShards: config.numShards, - shardIndex: config.shardIndex, - }); - - if (config.prepareProjects) { - prepareEcosystemProjectCheckouts( - selectedProjects, - config.projectRoot, - config.projectDate, - config.installDependencies ?? false - ); - } - - const baselineResults = config.baselineExecutable - ? executeEcosystemBenchmarkSuite(selectedProjects, config.projectRoot, config.baselineExecutable) - : undefined; - const candidateResults = config.candidateExecutable - ? executeEcosystemBenchmarkSuite(selectedProjects, config.projectRoot, config.candidateExecutable) - : undefined; - - const artifactPaths: EcosystemBenchmarkExecutionArtifactPaths = {}; - fs.mkdirSync(config.outputDir, { recursive: true }); - - if (baselineResults) { - artifactPaths.baselineReportPath = writeNamedBenchmarkReport( - config.outputDir, - 'baseline-report.json', - createBenchmarkReport('ecosystem-smoke', 0, 1, baselineResults) - ); - - if (config.updateMainBaseline) { - writeMainBaselineReport( - artifactPaths.baselineReportPath, - config.mainBaselineReportPath ?? getDefaultMainBaselineReportPath(), - { - sourceCommit: config.baselineSourceCommit, - projectDate: config.projectDate, - configMode: 'generated-benchmark-config', - refreshedAt: new Date().toISOString(), - } - ); - } - } - - if (candidateResults) { - artifactPaths.candidateReportPath = writeNamedBenchmarkReport( - config.outputDir, - 'candidate-report.json', - createBenchmarkReport('ecosystem-smoke', 0, 1, candidateResults) - ); - } - - if (artifactPaths.baselineReportPath && artifactPaths.candidateReportPath) { - artifactPaths.comparisonArtifactPaths = compareAndWriteEcosystemBenchmarkReportFiles( - artifactPaths.baselineReportPath, - artifactPaths.candidateReportPath, - config.outputDir - ); - } else if (artifactPaths.candidateReportPath) { - const mainBaselineReportPath = config.mainBaselineReportPath ?? getDefaultMainBaselineReportPath(); - if (fs.existsSync(mainBaselineReportPath)) { - artifactPaths.comparisonArtifactPaths = compareAndWriteEcosystemBenchmarkReportFiles( - mainBaselineReportPath, - artifactPaths.candidateReportPath, - config.outputDir - ); - } - } - - return artifactPaths; -} - -export function compareEcosystemBenchmarkReports( - baselineReportPath: string, - candidateReportPath: string, - outputDir: string -): BenchmarkReportComparisonArtifactPaths { - return compareAndWriteEcosystemBenchmarkReportFiles(baselineReportPath, candidateReportPath, outputDir); -} - -export function writeEcosystemBenchmarkManifest(outputDir: string, manifest: EcosystemBenchmarkManifest): string { - fs.mkdirSync(outputDir, { recursive: true }); - - const manifestPath = path.join(outputDir, 'ecosystem-run-manifest.json'); - fs.writeFileSync(manifestPath, JSON.stringify(manifest, undefined, 2), 'utf-8'); - - return manifestPath; -} - -export function writeMainBaselineReport( - sourceReportPath: string, - baselineReportPath: string, - metadata?: MainBaselineMetadata -): string { - fs.mkdirSync(path.dirname(baselineReportPath), { recursive: true }); - - if (!metadata) { - fs.copyFileSync(sourceReportPath, baselineReportPath); - return baselineReportPath; - } - - const report = JSON.parse(fs.readFileSync(sourceReportPath, 'utf-8')) as Record; - report.mainBaseline = metadata; - fs.writeFileSync(baselineReportPath, JSON.stringify(report, undefined, 2), 'utf-8'); - return baselineReportPath; -} - -export function compareEcosystemBenchmarkReportData( - baselineReport: BenchmarkReport, - candidateReport: BenchmarkReport -): EcosystemBenchmarkReportComparison { - return { - ...compareBenchmarkReports( - baselineReport, - candidateReport, - (result) => result.projectName, - ecosystemBenchmarkComparisonMetrics - ), - diagnosticDiffs: compareEcosystemDiagnosticResults(baselineReport.results, candidateReport.results), - }; -} - -export function runEcosystemBenchmark( - args: string[] -): string | BenchmarkReportComparisonArtifactPaths | EcosystemBenchmarkExecutionArtifactPaths { - const command = parseEcosystemBenchmarkArgs(args); - - if (command.mode === 'compare') { - const artifactPaths = compareEcosystemBenchmarkReports( - command.baselineReportPath, - command.candidateReportPath, - command.outputDir - ); - - console.log(`Comparison artifacts written to: ${command.outputDir}`); - return artifactPaths; - } - - if (command.mode === 'execute') { - const artifactPaths = executeEcosystemBenchmark(command); - console.log(`Execution artifacts written to: ${command.outputDir}`); - return artifactPaths; - } - - const manifest = buildEcosystemBenchmarkManifest(command); - const manifestPath = writeEcosystemBenchmarkManifest(command.outputDir, manifest); - - console.log(`Selected ${manifest.selectedProjectCount} ecosystem project(s).`); - console.log(`Manifest written to: ${manifestPath}`); - - return manifestPath; -} - -function executeEcosystemBenchmarkSuite( - projects: readonly EcosystemSmokeProject[], - projectRoot: string, - executableCommand: string -): EcosystemBenchmarkResult[] { - return projects.map((project) => executeEcosystemProject(project, projectRoot, executableCommand)); -} - -function prepareEcosystemProjectCheckouts( - projects: readonly EcosystemSmokeProject[], - projectRoot: string, - projectDate: string | undefined, - installDependencies: boolean -): void { - fs.mkdirSync(projectRoot, { recursive: true }); - - for (const project of projects) { - const generatedProject = getGeneratedEcosystemProject(project.name); - if (!generatedProject) { - throw new Error(`No generated ecosystem metadata found for project ${project.name}.`); - } - - prepareEcosystemProjectCheckout(generatedProject, path.join(projectRoot, generatedProject.name), projectDate); - - if (installDependencies) { - installEcosystemProjectDependencies(generatedProject, path.join(projectRoot, generatedProject.name)); - } - } -} - -export function prepareEcosystemProjectCheckout( - project: GeneratedEcosystemProject, - workingDirectory: string, - projectDate: string | undefined -): void { - if (!project.location) { - throw new Error(`Cannot prepare ecosystem project ${project.name}; no repository location is configured.`); - } - - if (fs.existsSync(workingDirectory)) { - runRequiredProcess('git', ['fetch', '--all', '--tags'], workingDirectory, `update ${project.name}`); - } else { - runRequiredProcess('git', ['clone', project.location, workingDirectory], undefined, `clone ${project.name}`); - } - - if (projectDate) { - const commit = runRequiredProcess( - 'git', - ['rev-list', '-n', '1', `--before=${projectDate}`, 'HEAD'], - workingDirectory, - `resolve ${project.name} project-date commit` - ).trim(); - if (!commit) { - throw new Error(`Could not find a ${project.name} commit before ${projectDate}.`); - } - - runRequiredProcess('git', ['checkout', '--force', commit], workingDirectory, `checkout ${project.name}`); - } -} - -function installEcosystemProjectDependencies(project: GeneratedEcosystemProject, workingDirectory: string): void { - if (project.dependencies && project.dependencies.length > 0) { - runRequiredProcess( - 'python', - ['-m', 'pip', 'install', ...project.dependencies], - workingDirectory, - `install ${project.name} dependency metadata` - ); - } - - if (project.installCommand) { - runRequiredProcess(project.installCommand, [], workingDirectory, `run ${project.name} install command`, true); - } -} - -function runRequiredProcess( - command: string, - args: readonly string[], - cwd: string | undefined, - description: string, - shell = false -): string { - const result = spawnSync(command, args, { - cwd, - encoding: 'utf-8', - shell, - }); - - if (result.error) { - throw result.error; - } - - if (result.status !== 0) { - throw new Error( - `Failed to ${description}.\nCommand: ${[command, ...args].join(' ')}\nExit status: ${ - result.status ?? 'unknown' - }\nstderr:\n${(result.stderr ?? '').trim()}\nstdout:\n${(result.stdout ?? '').trim()}` - ); - } - - return result.stdout ?? ''; -} - -function executeEcosystemProject( - project: EcosystemSmokeProject, - projectRoot: string, - executableCommand: string -): EcosystemBenchmarkResult { - const generatedProject = getGeneratedEcosystemProject(project.name); - if (!generatedProject) { - throw new Error(`No generated ecosystem metadata found for project ${project.name}.`); - } - - const workingDirectory = path.join(projectRoot, generatedProject.name); - if (!fs.existsSync(workingDirectory)) { - throw new Error(`Expected ecosystem project checkout at ${workingDirectory}.`); - } - - return executePyrightProjectCommand(project.name, generatedProject, workingDirectory, executableCommand); -} - -export function executePyrightProjectCommand( - projectName: string, - project: GeneratedEcosystemProject, - workingDirectory: string, - executableCommand: string -): EcosystemBenchmarkResult { - const pyrightConfigPath = writeProjectPyrightConfig(workingDirectory, project); - const invocation = resolvePyrightInvocationPaths( - buildPyrightInvocation(executableCommand, project, pyrightConfigPath), - process.cwd() - ); - const startTime = process.hrtime.bigint(); - const result = spawnSync(invocation.command, invocation.args, { - cwd: workingDirectory, - encoding: 'utf-8', - }); - const elapsedMs = Number(process.hrtime.bigint() - startTime) / 1_000_000; - - if (result.error) { - throw result.error; - } - - const output = result.stdout?.trim(); - if (!output) { - throw createPyrightExecutionError(projectName, invocation, result.status, result.stdout, result.stderr); - } - - let jsonResults: PyrightJsonResults; - try { - jsonResults = JSON.parse(output) as PyrightJsonResults; - } catch (error) { - throw createPyrightExecutionError(projectName, invocation, result.status, result.stdout, result.stderr, error); - } - const diagnosticCount = - jsonResults.summary.errorCount + jsonResults.summary.warningCount + jsonResults.summary.informationCount; - - return { - projectName, - totalTimeMs: Math.round(elapsedMs * 100) / 100, - filesAnalyzed: jsonResults.summary.filesAnalyzed, - diagnosticCount, - errorCount: jsonResults.summary.errorCount, - warningCount: jsonResults.summary.warningCount, - informationCount: jsonResults.summary.informationCount, - diagnostics: jsonResults.generalDiagnostics.map(normalizePyrightDiagnostic), - }; -} - -function compareAndWriteEcosystemBenchmarkReportFiles( - baselineReportPath: string, - candidateReportPath: string, - outputDir: string -): BenchmarkReportComparisonArtifactPaths { - const baselineReport = loadBenchmarkReport(baselineReportPath); - const candidateReport = loadBenchmarkReport(candidateReportPath); - const comparison = compareEcosystemBenchmarkReportData(baselineReport, candidateReport); - const artifactPaths = writeBenchmarkReportComparisonArtifacts( - outputDir, - baselineReport, - candidateReport, - comparison - ); - - fs.writeFileSync(artifactPaths.markdownPath, renderEcosystemBenchmarkComparisonMarkdown(comparison), 'utf-8'); - return artifactPaths; -} - -function compareEcosystemDiagnosticResults( - baselineResults: readonly EcosystemBenchmarkResult[], - candidateResults: readonly EcosystemBenchmarkResult[] -): EcosystemBenchmarkDiagnosticDiff[] { - const candidateByProject = new Map(candidateResults.map((result) => [result.projectName, result])); - - return baselineResults.flatMap((baselineResult) => { - const candidateResult = candidateByProject.get(baselineResult.projectName); - if (!candidateResult) { - return []; - } - - const baselineDiagnostics = getDiagnosticSignatureSet(baselineResult); - const candidateDiagnostics = getDiagnosticSignatureSet(candidateResult); - const added = [...candidateDiagnostics].filter((entry) => !baselineDiagnostics.has(entry)).sort(); - const removed = [...baselineDiagnostics].filter((entry) => !candidateDiagnostics.has(entry)).sort(); - - return added.length > 0 || removed.length > 0 - ? [{ projectName: baselineResult.projectName, added, removed }] - : []; - }); -} - -function getDiagnosticSignatureSet(result: EcosystemBenchmarkResult): Set { - return new Set((result.diagnostics ?? []).map(formatDiagnosticSignature)); -} - -function normalizePyrightDiagnostic( - diagnostic: PyrightJsonResults['generalDiagnostics'][number] -): EcosystemBenchmarkDiagnostic { - return { - file: diagnostic.file, - severity: diagnostic.severity, - message: diagnostic.message ?? '', - }; -} - -function formatDiagnosticSignature(diagnostic: EcosystemBenchmarkDiagnostic): string { - return [diagnostic.severity, diagnostic.file ?? '', diagnostic.message].join(' | '); -} - -function renderEcosystemBenchmarkComparisonMarkdown(comparison: EcosystemBenchmarkReportComparison): string { - const lines = [renderBenchmarkComparisonMarkdown(comparison).trimEnd(), '', '## Diagnostic Diffs', '']; - - if (comparison.diagnosticDiffs.length === 0) { - lines.push('None.'); - return `${lines.join('\n')}\n`; - } - - for (const diff of comparison.diagnosticDiffs) { - lines.push(`### ${diff.projectName}`, ''); - appendDiagnosticDiffList(lines, 'Added diagnostics', diff.added); - appendDiagnosticDiffList(lines, 'Removed diagnostics', diff.removed); - } - - return `${lines.join('\n')}\n`; -} - -function appendDiagnosticDiffList(lines: string[], heading: string, diagnostics: readonly string[]): void { - lines.push(`#### ${heading}`, ''); - - if (diagnostics.length === 0) { - lines.push('None.', ''); - return; - } - - for (const diagnostic of diagnostics) { - lines.push(`- ${diagnostic}`); - } - - lines.push(''); -} - -function createPyrightExecutionError( - projectName: string, - invocation: { command: string; args: string[] }, - status: number | null, - stdout: string | undefined, - stderr: string | undefined, - cause?: unknown -): Error { - const stdoutPrefix = (stdout ?? '').trim().slice(0, 1000); - const stderrOutput = (stderr ?? '').trim(); - const details = [ - `Pyright execution for ${projectName} did not produce JSON output.`, - `Command: ${[invocation.command, ...invocation.args].join(' ')}`, - `Exit status: ${status ?? 'unknown'}`, - ]; - - if (cause instanceof Error) { - details.push(`JSON parse error: ${cause.message}`); - } - - if (stderrOutput.length > 0) { - details.push(`stderr:\n${stderrOutput}`); - } - - if (stdoutPrefix.length > 0) { - details.push(`stdout prefix:\n${stdoutPrefix}`); - } - - return new Error(details.join('\n')); -} - -export function buildPyrightInvocation( - executableCommand: string, - project: GeneratedEcosystemProject, - pyrightConfigPath?: string -): { command: string; args: string[] } { - const template = project.pyrightCommand ?? '{pyright} {paths}'; - const projectPaths = project.paths && project.paths.length > 0 ? project.paths : ['.']; - const tokens = tokenizeCommandTemplate(template); - const executableTokens = getExecutableCommandTokens(executableCommand); - if (executableTokens.length === 0) { - throw new Error('The Pyright executable command cannot be empty.'); - } - - const executableArgs = executableTokens.slice(1); - const pyrightArgs: string[] = []; - let command = executableTokens[0]; - let insertedExecutable = false; - - for (const token of tokens) { - if (token === '{pyright}') { - command = executableTokens[0]; - insertedExecutable = true; - continue; - } - - if (token === '{paths}') { - if (pyrightConfigPath) { - continue; - } - - pyrightArgs.push(...projectPaths); - continue; - } - - pyrightArgs.push(token); - } - - if (!pyrightArgs.includes('--outputjson')) { - pyrightArgs.push('--outputjson'); - } - - if (pyrightConfigPath && !pyrightArgs.includes('-p') && !pyrightArgs.includes('--project')) { - pyrightArgs.push('-p', pyrightConfigPath); - } - - const args = [...executableArgs]; - if (requiresNodeArgumentSeparator(command, executableArgs, pyrightArgs)) { - args.push('--'); - } - - args.push(...pyrightArgs); - - return { command, args }; -} - -export function writeProjectPyrightConfig(workingDirectory: string, project: GeneratedEcosystemProject): string { - const configDirectory = path.join(workingDirectory, '.pyright-benchmark'); - fs.mkdirSync(configDirectory, { recursive: true }); - - const configPath = path.join(configDirectory, 'pyrightconfig.json'); - const sourcePaths = selectProjectSourcePaths(project).map((entry) => - getConfigRelativePath(configDirectory, path.resolve(workingDirectory, entry)) - ); - const projectConfigPath = path.join(workingDirectory, 'pyrightconfig.json'); - const projectPyrightSettings = fs.existsSync(projectConfigPath) - ? {} - : readPyprojectPyrightSettings(workingDirectory, configDirectory); - const config: ProjectPyrightConfigFile = { - ...projectPyrightSettings, - extends: fs.existsSync(projectConfigPath) - ? getConfigRelativePath(configDirectory, projectConfigPath) - : undefined, - include: sourcePaths, - exclude: ['../**/test', '../**/tests', '../**/testing', '../**/test_*', '../**/*_test.py', '../**/*_tests.py'], - }; - - fs.writeFileSync(configPath, JSON.stringify(config, undefined, 2), 'utf-8'); - return configPath; -} - -function readPyprojectPyrightSettings(workingDirectory: string, configDirectory: string): Record { - const pyprojectPath = path.join(workingDirectory, 'pyproject.toml'); - if (!fs.existsSync(pyprojectPath)) { - return {}; - } - - const parsed = parse(fs.readFileSync(pyprojectPath, 'utf-8')) as { tool?: { pyright?: Record } }; - const pyrightSettings = parsed.tool?.pyright; - if (!pyrightSettings) { - return {}; - } - - const copiedSettings: Record = {}; - for (const [key, value] of Object.entries(pyrightSettings)) { - if (benchmarkOwnedConfigKeys.has(key)) { - continue; - } - - copiedSettings[key] = rebasePyprojectConfigValue(key, value, workingDirectory, configDirectory); - } - - return copiedSettings; -} - -function rebasePyprojectConfigValue( - key: string, - value: unknown, - workingDirectory: string, - configDirectory: string -): unknown { - if (pyrightPathArrayConfigKeys.has(key) && Array.isArray(value)) { - return value.map((entry) => - typeof entry === 'string' - ? getConfigRelativePath(configDirectory, path.resolve(workingDirectory, entry)) - : entry - ); - } - - if (pyrightPathStringConfigKeys.has(key) && typeof value === 'string') { - return getConfigRelativePath(configDirectory, path.resolve(workingDirectory, value)); - } - - return value; -} - -function tokenizeCommandTemplate(template: string): string[] { - return Array.from(template.matchAll(/"([^"]*)"|'([^']*)'|\S+/g)).map((match) => match[1] ?? match[2] ?? match[0]); -} - -function getExecutableCommandTokens(executableCommand: string): string[] { - return fs.existsSync(executableCommand) ? [executableCommand] : tokenizeCommandTemplate(executableCommand); -} - -function resolvePyrightInvocationPaths( - invocation: { command: string; args: string[] }, - baseDirectory: string -): { command: string; args: string[] } { - const command = resolveExistingPath(baseDirectory, invocation.command); - const args = [...invocation.args]; - const commandName = path.basename(command).toLowerCase(); - - if ((commandName === 'node' || commandName === 'node.exe') && args.length > 0) { - const firstArg = args[0]; - if (firstArg !== '-e' && firstArg !== '--eval' && firstArg !== '--') { - args[0] = resolveExistingPath(baseDirectory, firstArg); - } - } - - return { command, args }; -} - -function requiresNodeArgumentSeparator(command: string, executableArgs: string[], pyrightArgs: string[]): boolean { - if (pyrightArgs.length === 0) { - return false; - } - - const commandName = path.basename(command).toLowerCase(); - if (commandName !== 'node' && commandName !== 'node.exe') { - return false; - } - - return executableArgs.includes('-e') || executableArgs.includes('--eval'); -} - -function selectProjectSourcePaths(project: GeneratedEcosystemProject): string[] { - const configuredPaths = project.paths && project.paths.length > 0 ? project.paths : ['.']; - const sourcePaths = configuredPaths.filter((entry) => !isTestLikePath(entry)); - - return sourcePaths.length > 0 ? sourcePaths : configuredPaths; -} - -function isTestLikePath(entry: string): boolean { - return /(^|[\\/])(test|tests|testing|testdata)([\\/]|$)/i.test(entry); -} - -function getConfigRelativePath(fromDirectory: string, targetPath: string): string { - const relativePath = path.relative(fromDirectory, targetPath); - return relativePath.length > 0 ? relativePath.replace(/\\/g, '/') : '.'; -} - -function resolveExistingPath(baseDirectory: string, entry: string): string { - if (path.isAbsolute(entry)) { - return entry; - } - - const resolvedPath = path.resolve(baseDirectory, entry); - return fs.existsSync(resolvedPath) ? resolvedPath : entry; -} - -function getWritableBenchmarkFilePath(...pathParts: string[]): string { - const sourceFilePath = path.resolve(__dirname, ...pathParts); - if (!sourceFilePath.includes(`${path.sep}out${path.sep}`)) { - return sourceFilePath; - } - - return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', ...pathParts); -} - -function writeNamedBenchmarkReport( - outputDir: string, - fileName: string, - report: BenchmarkReport -): string { - const outputPath = path.join(outputDir, fileName); - fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8'); - return outputPath; -} - -if (require.main === module) { - runEcosystemBenchmark(process.argv.slice(2)); -} diff --git a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.test.ts b/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.test.ts deleted file mode 100644 index 92f26f6f8558..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.test.ts +++ /dev/null @@ -1,223 +0,0 @@ -/* - * syncMypyPrimerProjects.test.ts - * Copyright (c) Microsoft Corporation. - * - * Tests for the mypy_primer project sync scaffold. - */ - -import * as fs from 'fs'; -import * as os from 'os'; -import * as path from 'path'; - -import { - getBenchmarkSourceDirectory, - getDefaultMypyPrimerProjectSourcePath, - parseMypyPrimerProjectSource, - syncMypyPrimerProjects, - writeGeneratedEcosystemProjects, -} from './syncMypyPrimerProjects'; - -const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS'; - -const benchmarkSuite = process.env[RUN_BENCHMARKS_ENV] === '1' ? describe : describe.skip; - -benchmarkSuite('Sync Mypy Primer Projects', () => { - test('parses project blocks from mypy_primer source', () => { - const projects = parseMypyPrimerProjectSource( - [ - 'Project(', - ' location="https://github.com/psf/black",', - ' pyright_cmd="{pyright} {paths}",', - ' paths=["src"],', - ')', - '', - 'Project(', - ' location="https://github.com/pydantic/pydantic",', - ' pyright_cmd="{pyright} {paths}",', - ' paths=["pydantic", "tests"],', - ')', - ].join('\n'), - 'projects.py' - ); - - expect(projects).toEqual([ - { - name: 'black', - mypyPrimerProject: 'black', - source: { kind: 'mypy-primer', inputFile: 'projects.py' }, - location: 'https://github.com/psf/black', - pyrightCommand: '{pyright} {paths}', - paths: ['src'], - }, - { - name: 'pydantic', - mypyPrimerProject: 'pydantic', - source: { kind: 'mypy-primer', inputFile: 'projects.py' }, - location: 'https://github.com/pydantic/pydantic', - pyrightCommand: '{pyright} {paths}', - paths: ['pydantic', 'tests'], - }, - ]); - }); - - test('parses upstream metadata and filters pyright-disabled projects', () => { - const projects = parseMypyPrimerProjectSource( - [ - 'Project(', - ' location="https://github.com/example/project",', - ' name_override="example-project",', - ' pyright_cmd="{pyright} {paths}",', - ' paths=["src"],', - ' deps=["types-requests"],', - ' install_cmd="python -m pip install -e .",', - ' platforms=["linux", "darwin"],', - ' cost=2.5,', - ')', - '', - 'Project(', - ' location="https://github.com/example/project",', - ' pyright_cmd="{pyright} {paths}",', - ' paths=["lib"],', - ')', - '', - 'Project(', - ' location="https://github.com/example/skip-me",', - ' pyright_cmd=None,', - ')', - ].join('\n'), - 'projects.py' - ); - - expect(projects).toEqual([ - { - name: 'example-project', - mypyPrimerProject: 'project', - source: { kind: 'mypy-primer', inputFile: 'projects.py' }, - location: 'https://github.com/example/project', - pyrightCommand: '{pyright} {paths}', - paths: ['src'], - dependencies: ['types-requests'], - installCommand: 'python -m pip install -e .', - supportedPlatforms: ['linux', 'darwin'], - cost: 2.5, - }, - { - name: 'project', - mypyPrimerProject: 'project', - source: { kind: 'mypy-primer', inputFile: 'projects.py' }, - location: 'https://github.com/example/project', - pyrightCommand: '{pyright} {paths}', - paths: ['lib'], - }, - ]); - }); - - test('deduplicates project names derived from repeated locations', () => { - const projects = parseMypyPrimerProjectSource( - [ - 'Project(location="https://github.com/example/project", pyright_cmd="{pyright}")', - 'Project(location="https://github.com/example/project", pyright_cmd="{pyright}")', - ].join('\n'), - 'projects.py' - ); - - expect(projects.map((project) => project.name)).toEqual(['project', 'project-2']); - }); - - test('writes generated ecosystem projects', () => { - const outputDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-mypy-primer-sync-')); - const outputPath = path.join(outputDir, 'ecosystem-projects.generated.json'); - - try { - writeGeneratedEcosystemProjects(outputPath, [ - { - name: 'black', - mypyPrimerProject: 'black', - source: { kind: 'manual-snapshot' }, - }, - ]); - - expect(JSON.parse(fs.readFileSync(outputPath, 'utf-8'))).toEqual([ - { - name: 'black', - mypyPrimerProject: 'black', - source: { kind: 'manual-snapshot' }, - }, - ]); - } finally { - fs.rmSync(outputDir, { force: true, recursive: true }); - } - }); - - test('syncs project definitions from an input file', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-mypy-primer-cli-')); - const inputPath = path.join(tempDir, 'projects.py'); - const outputPath = path.join(tempDir, 'ecosystem-projects.generated.json'); - - try { - fs.writeFileSync( - inputPath, - [ - 'Project(', - ' location="https://github.com/psf/black",', - ' pyright_cmd="{pyright} {paths}",', - ' paths=["src"],', - ')', - ].join('\n'), - 'utf-8' - ); - - const writtenPath = syncMypyPrimerProjects(['--input', inputPath, '--output', outputPath]); - - expect(writtenPath).toBe(outputPath); - expect(JSON.parse(fs.readFileSync(outputPath, 'utf-8'))[0].name).toBe('black'); - expect(path.isAbsolute(JSON.parse(fs.readFileSync(outputPath, 'utf-8'))[0].source.inputFile)).toBe(false); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); - - test('stores checked-in snapshot input paths relative to the benchmark source directory', () => { - const projects = parseMypyPrimerProjectSource( - [ - 'Project(', - ' location="https://github.com/psf/black",', - ' pyright_cmd="{pyright} {paths}",', - ' paths=["src"],', - ')', - ].join('\n'), - path.join(getBenchmarkSourceDirectory(), 'mypy_primer.smoke_projects.snapshot.py') - ); - - expect(projects[0].source.inputFile).toBe('mypy_primer.smoke_projects.snapshot.py'); - }); - - test('defaults to the checked-in smoke snapshot and creates the output directory', () => { - const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pyright-mypy-primer-default-')); - const outputPath = path.join(tempDir, 'nested', 'ecosystem-projects.generated.json'); - - try { - const writtenPath = syncMypyPrimerProjects(['--output', outputPath]); - const projects = JSON.parse(fs.readFileSync(outputPath, 'utf-8')); - - expect(writtenPath).toBe(outputPath); - expect(fs.existsSync(getDefaultMypyPrimerProjectSourcePath())).toBe(true); - expect(projects).toHaveLength(10); - expect( - projects.every( - (project: { source: { inputFile?: string } }) => !path.isAbsolute(project.source.inputFile ?? '') - ) - ).toBe(true); - expect(projects.find((project: { name: string }) => project.name === 'black')).toMatchObject({ - pyrightCommand: '{pyright} {paths}', - paths: ['src'], - }); - expect(projects.find((project: { name: string }) => project.name === 'django-modern-rest')).toMatchObject({ - pyrightCommand: '{pyright}', - paths: ['dmr'], - }); - } finally { - fs.rmSync(tempDir, { force: true, recursive: true }); - } - }); -}); diff --git a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.ts b/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.ts deleted file mode 100644 index 25b9caf09f6b..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/syncMypyPrimerProjects.ts +++ /dev/null @@ -1,231 +0,0 @@ -import commandLineArgs, { CommandLineOptions, OptionDefinition } from 'command-line-args'; -import * as fs from 'fs'; -import * as path from 'path'; - -export interface GeneratedEcosystemProject { - name: string; - mypyPrimerProject: string; - source: { - kind: 'manual-snapshot' | 'mypy-primer'; - inputFile?: string; - }; - location?: string; - pyrightCommand?: string; - paths?: string[]; - dependencies?: string[]; - installCommand?: string; - supportedPlatforms?: string[]; - cost?: number; -} - -const optionDefinitions: OptionDefinition[] = [ - { name: 'input', type: String }, - { name: 'output', type: String }, -]; - -const defaultMypyPrimerProjectSourcePath = getBenchmarkFilePath('mypy_primer.smoke_projects.snapshot.py'); - -export function parseMypyPrimerProjectSource(sourceText: string, inputFile?: string): GeneratedEcosystemProject[] { - const blocks = extractProjectBlocks(sourceText); - - return ensureUniqueProjectNames( - blocks.flatMap((block) => { - const project = parseProjectBlock(block, inputFile); - return project ? [project] : []; - }) - ).sort((left, right) => left.name.localeCompare(right.name)); -} - -export function writeGeneratedEcosystemProjects( - outputPath: string, - projects: readonly GeneratedEcosystemProject[] -): void { - fs.mkdirSync(path.dirname(outputPath), { recursive: true }); - fs.writeFileSync(outputPath, `${JSON.stringify(projects, undefined, 2)}\n`, 'utf-8'); -} - -export function syncMypyPrimerProjects(args: string[]): string { - const parsedArgs = commandLineArgs(optionDefinitions, { argv: args }) as CommandLineOptions; - const inputPath = (parsedArgs.input as string | undefined) ?? defaultMypyPrimerProjectSourcePath; - const outputPath = - (parsedArgs.output as string | undefined) ?? getWritableBenchmarkFilePath('ecosystem-projects.generated.json'); - - const sourceText = fs.readFileSync(inputPath, 'utf-8'); - const projects = parseMypyPrimerProjectSource(sourceText, inputPath); - writeGeneratedEcosystemProjects(outputPath, projects); - console.log(`Wrote ${projects.length} ecosystem project definitions to ${outputPath}`); - - return outputPath; -} - -export function getBenchmarkSourceDirectory(): string { - return path.dirname(getWritableBenchmarkFilePath('ecosystem-projects.generated.json')); -} - -export function getDefaultMypyPrimerProjectSourcePath(): string { - return defaultMypyPrimerProjectSourcePath; -} - -function getBenchmarkFilePath(filename: string): string { - const sourceFilePath = path.resolve(__dirname, filename); - if (fs.existsSync(sourceFilePath)) { - return sourceFilePath; - } - - return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', filename); -} - -function getWritableBenchmarkFilePath(filename: string): string { - const sourceFilePath = path.resolve(__dirname, filename); - if (!sourceFilePath.includes(`${path.sep}out${path.sep}`)) { - return sourceFilePath; - } - - return path.resolve(__dirname, '..', '..', '..', '..', '..', '..', 'src', 'tests', 'benchmarks', filename); -} - -function extractProjectBlocks(sourceText: string): string[] { - const blocks: string[] = []; - let startIndex = sourceText.indexOf('Project('); - - while (startIndex >= 0) { - let depth = 0; - let inString = false; - let stringQuote = ''; - let previousChar = ''; - - for (let index = startIndex; index < sourceText.length; index++) { - const currentChar = sourceText[index]; - - if (inString) { - if (currentChar === stringQuote && previousChar !== '\\') { - inString = false; - stringQuote = ''; - } - } else if (currentChar === '"' || currentChar === "'") { - inString = true; - stringQuote = currentChar; - } else if (currentChar === '(') { - depth += 1; - } else if (currentChar === ')') { - depth -= 1; - if (depth === 0) { - blocks.push(sourceText.slice(startIndex, index + 1)); - startIndex = sourceText.indexOf('Project(', index + 1); - break; - } - } - - previousChar = currentChar; - } - - if (depth !== 0) { - throw new Error('Failed to parse mypy_primer project definitions.'); - } - } - - return blocks; -} - -function parseProjectBlock(block: string, inputFile?: string): GeneratedEcosystemProject | undefined { - const location = matchSingleQuotedOrDoubleQuotedValue(block, 'location'); - if (matchNoneValue(block, 'pyright_cmd')) { - return undefined; - } - - const pyrightCommand = matchSingleQuotedOrDoubleQuotedValue(block, 'pyright_cmd'); - const paths = matchStringArrayValue(block, 'paths'); - const dependencies = matchStringArrayValue(block, 'deps'); - const installCommand = matchSingleQuotedOrDoubleQuotedValue(block, 'install_cmd'); - const supportedPlatforms = matchStringArrayValue(block, 'platforms'); - const cost = matchNumberValue(block, 'cost'); - const nameOverride = matchSingleQuotedOrDoubleQuotedValue(block, 'name_override'); - const mypyPrimerProject = deriveProjectName(location); - const normalizedInputFile = inputFile ? normalizeInputFileReference(inputFile) : undefined; - - return { - name: nameOverride ?? mypyPrimerProject, - mypyPrimerProject, - source: { - kind: 'mypy-primer', - inputFile: normalizedInputFile, - }, - location, - pyrightCommand, - paths, - dependencies, - installCommand, - supportedPlatforms, - cost, - }; -} - -function ensureUniqueProjectNames(projects: readonly GeneratedEcosystemProject[]): GeneratedEcosystemProject[] { - const nameCounts = new Map(); - - return projects.map((project) => { - const count = (nameCounts.get(project.name) ?? 0) + 1; - nameCounts.set(project.name, count); - - if (count === 1) { - return project; - } - - return { ...project, name: `${project.name}-${count}` }; - }); -} - -function normalizeInputFileReference(inputFile: string): string { - if (!path.isAbsolute(inputFile)) { - return inputFile.replace(/\\/g, '/'); - } - - const benchmarkRelativePath = path.relative(getBenchmarkSourceDirectory(), inputFile); - if (!benchmarkRelativePath.startsWith('..') && !path.isAbsolute(benchmarkRelativePath)) { - return benchmarkRelativePath.replace(/\\/g, '/'); - } - - const cwdRelativePath = path.relative(process.cwd(), inputFile); - if (!cwdRelativePath.startsWith('..') && !path.isAbsolute(cwdRelativePath)) { - return cwdRelativePath.replace(/\\/g, '/'); - } - - return path.basename(inputFile); -} - -function deriveProjectName(location: string | undefined): string { - if (!location) { - throw new Error('Each mypy_primer project must define a location.'); - } - - const trimmedLocation = location.replace(/\/+$/, ''); - const slashIndex = trimmedLocation.lastIndexOf('/'); - return slashIndex >= 0 ? trimmedLocation.slice(slashIndex + 1) : trimmedLocation; -} - -function matchSingleQuotedOrDoubleQuotedValue(block: string, fieldName: string): string | undefined { - const match = block.match(new RegExp(`${fieldName}\\s*=\\s*(['\"])(.*?)\\1`, 's')); - return match?.[2]; -} - -function matchNoneValue(block: string, fieldName: string): boolean { - return new RegExp(`${fieldName}\\s*=\\s*None(,|\\s|\\))`, 's').test(block); -} - -function matchNumberValue(block: string, fieldName: string): number | undefined { - const match = block.match(new RegExp(`${fieldName}\\s*=\\s*(\\d+(?:\\.\\d+)?)`, 's')); - return match ? Number(match[1]) : undefined; -} - -function matchStringArrayValue(block: string, fieldName: string): string[] | undefined { - const match = block.match(new RegExp(`${fieldName}\\s*=\\s*\\[(.*?)\\]`, 's')); - if (!match) { - return undefined; - } - - return Array.from(match[1].matchAll(/(['\"])(.*?)\1/g)).map((entry) => entry[2]); -} - -if (require.main === module) { - syncMypyPrimerProjects(process.argv.slice(2)); -} diff --git a/packages/pyright-internal/src/tests/benchmarks/syntheticCases.ts b/packages/pyright-internal/src/tests/benchmarks/syntheticCases.ts deleted file mode 100644 index 2a581bd65de0..000000000000 --- a/packages/pyright-internal/src/tests/benchmarks/syntheticCases.ts +++ /dev/null @@ -1,169 +0,0 @@ -export function generateRecursiveAliasCase(depth: number): string { - const lines = ['from typing import TypeAlias', '', 'Alias0: TypeAlias = int', 'value0: Alias0 = 1']; - - for (let i = 1; i <= depth; i++) { - lines.push(`Alias${i}: TypeAlias = list[Alias${i - 1}]`); - lines.push(`value${i}: Alias${i} = [value${i - 1}]`); - } - - lines.push(''); - lines.push(`def use_alias(value: Alias${depth}) -> Alias${depth}:`); - lines.push(' return value'); - lines.push(''); - lines.push(`result = use_alias(value${depth})`); - - return `${lines.join('\n')}\n`; -} - -export function generateOverloadUnionCrossProductCase(width: number): string { - const lines = ['from typing import Literal, overload', '', '']; - - for (let left = 0; left < width; left++) { - for (let right = 0; right < width; right++) { - lines.push('@overload'); - lines.push( - `def combine(left: Literal[${left}], right: Literal[${right}]) -> Literal[${left + right}]: ...` - ); - } - } - - lines.push('def combine(left: int, right: int) -> int:'); - lines.push(' return left + right'); - lines.push(''); - - const union = Array.from({ length: width }, (_, index) => `Literal[${index}]`).join(' | '); - lines.push(`def use(left: ${union}, right: ${union}) -> int:`); - lines.push(' return combine(left, right)'); - lines.push(''); - lines.push('result = use(0, 1)'); - - return `${lines.join('\n')}\n`; -} - -export function generateProtocolMismatchCase(memberCount: number): string { - const lines = ['from typing import Protocol', '', 'class Expected(Protocol):']; - - for (let i = 0; i < memberCount; i++) { - lines.push(` def member_${i}(self) -> int: ...`); - } - - lines.push(''); - lines.push('class Candidate:'); - - for (let i = 0; i < memberCount - 1; i++) { - lines.push(` def member_${i}(self) -> int:`); - lines.push(` return ${i}`); - } - - lines.push(''); - lines.push('def consume(value: Expected) -> None:'); - lines.push(' pass'); - lines.push(''); - lines.push('consume(Candidate())'); - - return `${lines.join('\n')}\n`; -} - -export function generateGenericAliasChainCase(depth: number): string { - const lines = [ - 'from typing import Generic, TypeAlias, TypeVar', - '', - 'T = TypeVar("T")', - '', - 'class Box(Generic[T]):', - ' def __init__(self, value: T) -> None:', - ' self.value = value', - '', - 'Alias0: TypeAlias = int', - 'value0: Alias0 = 1', - ]; - - for (let i = 1; i <= depth; i++) { - lines.push(`Alias${i}: TypeAlias = Box[Alias${i - 1}]`); - lines.push(`value${i}: Alias${i} = Box(value${i - 1})`); - } - - lines.push(''); - lines.push(`def unwrap(value${depth}: Alias${depth}) -> Alias0:`); - - for (let i = depth; i > 0; i--) { - lines.push(` value${i - 1} = value${i}.value`); - } - - lines.push(' return value0'); - lines.push(''); - lines.push(`result = unwrap(value${depth})`); - - return `${lines.join('\n')}\n`; -} - -export function generateConstrainedTypeVarMatrixCase(width: number): string { - const classNames = Array.from({ length: width }, (_, index) => `Item${index}`); - const lines = ['from typing import TypeVar', '']; - - for (const className of classNames) { - lines.push(`class ${className}:`); - lines.push(' pass'); - lines.push(''); - } - - lines.push(`TItem = TypeVar("TItem", ${classNames.join(', ')})`); - lines.push(''); - lines.push('def choose(left: TItem, right: TItem) -> TItem:'); - lines.push(' return left'); - lines.push(''); - - for (let left = 0; left < width; left++) { - for (let right = 0; right < width; right++) { - lines.push(`value_${left}_${right} = choose(Item${left}(), Item${right}())`); - } - } - - return `${lines.join('\n')}\n`; -} - -export function generateLiteralUnionMathCase(width: number): string { - const literals = Array.from({ length: width }, (_, index) => `Literal[${index}]`); - const lines = ['from typing import Literal', '', `Value = ${literals.join(' | ')}`, '']; - - lines.push('def bump(value: Value) -> int:'); - - for (let i = 0; i < width - 1; i++) { - const prefix = i === 0 ? 'if' : 'elif'; - lines.push(` ${prefix} value == ${i}:`); - lines.push(` return value + ${i}`); - } - - lines.push(' return value'); - lines.push(''); - lines.push('def combine(left: Value, right: Value) -> int:'); - lines.push(' return bump(left) + bump(right)'); - lines.push(''); - lines.push('result = combine(0, 1)'); - - return `${lines.join('\n')}\n`; -} - -export function generateTypedDictCase(keyCount: number): string { - const lines = ['from typing import TypedDict', '', 'class Payload(TypedDict):']; - - for (let i = 0; i < keyCount; i++) { - lines.push(` key_${i}: int`); - } - - lines.push(''); - lines.push('payload: Payload = {'); - - for (let i = 0; i < keyCount; i++) { - lines.push(` "key_${i}": ${i},`); - } - - lines.push('}'); - lines.push(''); - lines.push('def total(value: Payload) -> int:'); - lines.push(' return ' + Array.from({ length: keyCount }, (_, index) => `value["key_${index}"]`).join(' + ')); - lines.push(''); - lines.push('result = total(payload)'); - - return `${lines.join('\n')}\n`; -} diff --git a/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts b/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts index d2ffb9e6a5e3..48c1521badfb 100644 --- a/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts +++ b/packages/pyright-internal/src/tests/benchmarks/tokenizerBenchmark.test.ts @@ -13,21 +13,20 @@ * src/tests/benchmarks/.generated/benchmark-results/tokenizer/ */ +import { execFileSync } from 'child_process'; +import * as fs from 'fs'; +import * as os from 'os'; +import * as path from 'path'; + import { Tokenizer } from '../../parser/tokenizer'; -import { - calculateStats, - createBenchmarkReport, - formatCount, - loadBenchmarkCorpus, - runJestBenchmarkInFreshProcess, - writeBenchmarkReport, -} from './benchmarkUtils'; // --- Configuration --- const WARMUP_ITERATIONS = 3; const BENCHMARK_ITERATIONS = 10; +const BENCHMARK_OUTPUT_DIR = path.join(__dirname, '.generated', 'benchmark-results', 'tokenizer'); +const JEST_BIN_PATH = path.resolve(__dirname, '..', '..', '..', 'node_modules', 'jest', 'bin', 'jest.js'); const CHILD_RESULT_PREFIX = '__TOKENIZER_BENCHMARK_RESULT__'; const CHILD_MODE_ENV = 'PYRIGHT_TOKENIZER_BENCH_CHILD'; const RUN_BENCHMARKS_ENV = 'PYRIGHT_RUN_BENCHMARKS'; @@ -48,8 +47,70 @@ interface BenchmarkResult { tokensPerSec: number; } +interface BenchmarkReport { + timestamp: string; + system: { + platform: string; + arch: string; + cpus: string; + cpuCount: number; + totalMemoryMB: number; + nodeVersion: string; + }; + config: { + warmupIterations: number; + benchmarkIterations: number; + }; + results: BenchmarkResult[]; +} + // --- Helpers --- +function calculateStats(times: ReadonlyArray): { + median: number; + p95: number; + min: number; + max: number; + avg: number; +} { + const sorted = [...times].sort((a, b) => a - b); + const len = sorted.length; + + const median = len % 2 === 0 ? (sorted[len / 2 - 1] + sorted[len / 2]) / 2 : sorted[Math.floor(len / 2)]; + const p95Index = Math.ceil(len * 0.95) - 1; + const p95 = sorted[Math.min(p95Index, len - 1)]; + const min = sorted[0]; + const max = sorted[len - 1]; + const avg = times.reduce((a, b) => a + b, 0) / len; + + return { median, p95, min, max, avg }; +} + +function loadCorpus(filename: string): string { + const filePath = path.resolve(__dirname, '..', 'benchmarkData', filename); + return fs.readFileSync(filePath, 'utf-8'); +} + +function getSystemInfo(): BenchmarkReport['system'] { + const cpus = os.cpus(); + return { + platform: os.platform(), + arch: os.arch(), + cpus: cpus[0]?.model ?? 'unknown', + cpuCount: cpus.length, + totalMemoryMB: Math.round(os.totalmem() / (1024 * 1024)), + nodeVersion: process.version, + }; +} + +function writeReport(report: BenchmarkReport): void { + fs.mkdirSync(BENCHMARK_OUTPUT_DIR, { recursive: true }); + const filename = `tokenizer-benchmark-${new Date().toISOString().replace(/[:.]/g, '-')}.json`; + const outputPath = path.join(BENCHMARK_OUTPUT_DIR, filename); + fs.writeFileSync(outputPath, JSON.stringify(report, undefined, 2), 'utf-8'); + console.log(`\nBenchmark results written to: ${outputPath}`); +} + function printResultTable(results: ReadonlyArray): void { console.log('\n=== Tokenizer Benchmark Results ===\n'); console.log( @@ -68,7 +129,7 @@ function printResultTable(results: ReadonlyArray): void { .toFixed(2) .padStart(10)} ${result.avgMs.toFixed(2).padStart(10)} ${result.p95Ms .toFixed(2) - .padStart(10)} ${formatCount(result.tokensPerSec).padStart(12)}` + .padStart(10)} ${Math.round(result.tokensPerSec).toLocaleString().padStart(12)}` ); } console.log(''); @@ -78,14 +139,55 @@ function emitChildResult(result: BenchmarkResult): void { process.stdout.write(`${CHILD_RESULT_PREFIX}${JSON.stringify(result)}\n`); } +function getChildOutput(error: unknown): string { + if (!(error instanceof Error)) { + return ''; + } + + const stdout = 'stdout' in error && typeof error.stdout === 'string' ? error.stdout : ''; + const stderr = 'stderr' in error && typeof error.stderr === 'string' ? error.stderr : ''; + return [stdout, stderr].filter((part) => part.length > 0).join('\n'); +} + +function escapeRegExp(text: string): string { + return text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); +} + function runBenchmarkInFreshProcess(testName: string): BenchmarkResult { - return runJestBenchmarkInFreshProcess( - __filename, - 'Tokenizer Benchmark', - testName, - CHILD_RESULT_PREFIX, - CHILD_MODE_ENV - ); + try { + const output = execFileSync( + process.execPath, + [ + JEST_BIN_PATH, + __filename, + '--runInBand', + '--forceExit', + '--testTimeout=300000', + '--testNamePattern', + `^Tokenizer Benchmark ${escapeRegExp(testName)}$`, + ], + { + cwd: path.resolve(__dirname, '..', '..', '..'), + encoding: 'utf-8', + env: { + ...process.env, + [CHILD_MODE_ENV]: '1', + }, + } + ); + + const resultLine = output.split(/\r?\n/).find((line) => line.startsWith(CHILD_RESULT_PREFIX)); + + if (!resultLine) { + throw new Error(`Child benchmark for "${testName}" did not emit a result.\n${output}`); + } + + return JSON.parse(resultLine.slice(CHILD_RESULT_PREFIX.length)) as BenchmarkResult; + } catch (error) { + const output = getChildOutput(error); + const message = error instanceof Error ? error.message : String(error); + throw new Error(`Child benchmark for "${testName}" failed.\n${message}${output ? `\n${output}` : ''}`); + } } function benchmarkTokenize(corpusName: string, code: string): BenchmarkResult { @@ -148,7 +250,7 @@ benchmarkSuite('Tokenizer Benchmark', () => { for (const { name, file } of corpora) { test(`tokenize ${name}`, () => { const result = isChildProcess - ? benchmarkTokenize(name, loadBenchmarkCorpus(file)) + ? benchmarkTokenize(name, loadCorpus(file)) : runBenchmarkInFreshProcess(`tokenize ${name}`); if (!isChildProcess) { @@ -156,9 +258,9 @@ benchmarkSuite('Tokenizer Benchmark', () => { } console.log( - ` ${name}: median=${result.medianMs.toFixed(2)}ms, tokens=${result.tokenCount}, tok/sec=${formatCount( + ` ${name}: median=${result.medianMs.toFixed(2)}ms, tokens=${result.tokenCount}, tok/sec=${Math.round( result.tokensPerSec - )}` + ).toLocaleString()}` ); if (isChildProcess) { @@ -172,7 +274,7 @@ benchmarkSuite('Tokenizer Benchmark', () => { test('scaled corpus (10x large_stdlib)', () => { const result = isChildProcess - ? benchmarkTokenize('large_stdlib_10x', Array(10).fill(loadBenchmarkCorpus('large_stdlib.py')).join('\n')) + ? benchmarkTokenize('large_stdlib_10x', Array(10).fill(loadCorpus('large_stdlib.py')).join('\n')) : runBenchmarkInFreshProcess('scaled corpus (10x large_stdlib)'); if (!isChildProcess) { @@ -182,7 +284,7 @@ benchmarkSuite('Tokenizer Benchmark', () => { console.log( ` large_stdlib_10x: median=${result.medianMs.toFixed(2)}ms, tokens=${ result.tokenCount - }, tok/sec=${formatCount(result.tokensPerSec)}` + }, tok/sec=${Math.round(result.tokensPerSec).toLocaleString()}` ); if (isChildProcess) { @@ -199,10 +301,16 @@ benchmarkSuite('Tokenizer Benchmark', () => { printResultTable(allResults); - writeBenchmarkReport( - 'tokenizer', - 'tokenizer-benchmark', - createBenchmarkReport('tokenizer', WARMUP_ITERATIONS, BENCHMARK_ITERATIONS, allResults) - ); + const report: BenchmarkReport = { + timestamp: new Date().toISOString(), + system: getSystemInfo(), + config: { + warmupIterations: WARMUP_ITERATIONS, + benchmarkIterations: BENCHMARK_ITERATIONS, + }, + results: allResults, + }; + + writeReport(report); }); });