Skip to content

Parallelize pytest --collect-only in split_tests.py#171772

Merged
balloob merged 2 commits into
devfrom
speed-up-split-tests
May 22, 2026
Merged

Parallelize pytest --collect-only in split_tests.py#171772
balloob merged 2 commits into
devfrom
speed-up-split-tests

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented May 21, 2026

Proposed change

cProfile of the Split tests for full run CI step shows 99.6% of wall time is in the single pytest --collect-only subprocess. Fan it out across os.cpu_count() workers using ProcessPoolExecutor; round-robin chunking keeps each batch roughly equal, and tests/components is expanded one level deeper so the ~1000 integration subdirectories distribute evenly. Bucket output is unchanged because we still parse the same pytest -qq output, just aggregated from multiple invocations.

CI step time vs dev (4-core runner):

                         dev      PR        Δ
Run split_tests.py      309s    147s    -162s

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to developer documentation pull request:
  • Link to frontend pull request:

Checklist

  • I understand the code I am submitting and can explain how it works.
  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.
  • Any generated code has been carefully reviewed for correctness and compliance with project standards.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies a diff between library versions and ideally a link to the changelog/release notes is added to the PR description.

To help with the load of incoming pull requests:

cProfile showed 99.6% of split_tests.py wall time was spent in the
single pytest --collect-only subprocess.  Fan out the collection across
``os.cpu_count()`` workers; round-robin chunking keeps each batch
roughly equal, and tests/components is expanded one level deeper so
the ~1000 integration subdirectories distribute evenly.  Local wall
time dropped from ~132s to ~11s on an 18-core box.  Bucket output is
unchanged because we still parse the same pytest -qq output, just
aggregated from multiple invocations.
Copilot AI review requested due to automatic review settings May 21, 2026 20:10
@home-assistant home-assistant Bot added cla-signed code-quality small-pr PRs with less than 30 lines. labels May 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up the CI “Split tests for full run” step by parallelizing pytest --collect-only in script/split_tests.py, fanning out large directories (notably tests/components) into smaller batches collected concurrently.

Changes:

  • Run pytest --collect-only across multiple batches in parallel via ProcessPoolExecutor.
  • Add path fan-out logic to expand large directories one level deeper (and tests/components to per-integration paths).
  • Aggregate multiple pytest collection outputs back into the existing TestFolder/bucket-splitting logic.

Comment thread script/split_tests.py Outdated
Comment thread script/split_tests.py
Only pass directories and test_*.py files to pytest --collect-only so
helpers like tests/components/conftest.py and tests/components/common.py
are not treated as explicit collection targets, and bail out with a
clear error if no eligible paths are found instead of running pytest
with no arguments.
@bdraco bdraco marked this pull request as ready for review May 21, 2026 20:38
@bdraco bdraco requested a review from a team as a code owner May 21, 2026 20:38
Copilot AI review requested due to automatic review settings May 21, 2026 20:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@balloob balloob merged commit deb8b5d into dev May 22, 2026
49 checks passed
@balloob balloob deleted the speed-up-split-tests branch May 22, 2026 02:58
@bdraco
Copy link
Copy Markdown
Member Author

bdraco commented May 22, 2026

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed code-quality small-pr PRs with less than 30 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants