project_info: opt-in batched reverse-dependency graph for first-party Python by jasonwbarnett · Pull Request #23434 · pantsbuild/pants

jasonwbarnett · 2026-06-17T21:51:56Z

Problem

dependents and --changed-dependents build a reverse-dependency graph via map_addresses_to_dependents, which resolves the dependencies of every target individually. Each resolve_dependencies call fans out to tens of engine nodes, so the work is O(total targets) regardless of how little actually changed. On large Python monorepos this dominates the run — e.g. ~2m40s for --changed-dependents=transitive on a ~50k-target repo, even when only one file changed (see #23236).

What this does

Adds an opt-in fast path, [dependents-inference].use_batched_python (off by default), that computes the same reverse graph in a handful of batched passes over first-party Python:

generator → generated-target edges, derived directly from the target set (no rule calls);
explicit dependencies=[...], reusing determine_explicitly_provided_dependencies (so parsing/ignores match exactly);
import inference, via a single native parse of all sources + one owner lookup per distinct imported module;
__init__.py inference, from the owned init files with the same empty-file filtering.

Backends opt in through a new ReverseDependencyGraphImpl union (in project_info.dependents). The Python implementation lives in the Python backend (dependency_inference/reverse_graph.py). The result is used only when it can reproduce the per-target result exactly; otherwise it returns None and the caller falls back to the always-correct per-target algorithm.

It declines (falls back) for anything it does not reproduce: custom/other-language inference backends that apply to a target, parametrization, special-cased dependency fields, multiple resolves, conftest.py/asset inference, non-root source roots, or [python-infer].unowned_dependency_behavior=error.

Correctness

A new equivalence test (reverse_graph_test.py) asserts the batched output is byte-identical to the per-target output across import / explicit / generator / __init__ edges (direct and transitive), and that the fallback path also matches. This was additionally validated end-to-end against the synthetic 50k-target repro and several fixtures.

Performance

On the synthetic 50k-target repro, --changed-dependents=transitive drops from ~2m40s → under 20s (the reverse-graph build itself goes from ~130s to ~6s; the batched parse replaces ~2.4M engine nodes with a handful).

Scope / follow-ups

This is intentionally conservative and opt-in while coverage grows. Natural follow-ups: full source-root support (currently declines when source roots strip paths), conftest.py/asset inference, and multi-resolve support. Each currently triggers the correct per-target fallback.

Testing

New reverse_graph_test.py (batched == per-target, incl. fallback) passes.
Existing dependents_test.py passes (no change to the default path).
Validated against 2.32.0.dev7 + the 50k-target repro and __init__/explicit/import/source-root fixtures.

Note: opened as a draft — the full repo lint/check/test :: suite was not run in my environment; relying on CI to confirm BUILD/lint.

Refs #23236.

🤖 Generated with Claude Code

…arty Python `dependents` and `--changed-dependents` build a reverse-dependency graph by resolving the dependencies of every target individually via `map_addresses_to_dependents`. Each `resolve_dependencies` call fans out to tens of engine nodes, so on large repositories this O(total targets) work dominates the run (e.g. ~2m40s for `--changed-dependents=transitive` on a ~50k-target Python repo) regardless of how little actually changed. This adds an opt-in fast path, `[dependents-inference].use_batched_python` (off by default), that computes the same reverse graph in a single batched pass over first-party Python sources: * generator -> generated-target edges, derived from the target set; * explicit `dependencies=[...]`, via `determine_explicitly_provided_dependencies`; * import inference, via one native parse of all sources + one owner lookup per imported module; * `__init__.py` inference, via the owned init files + the same empty-file filter. Backends opt in through a new `ReverseDependencyGraphImpl` union; the result is used only when it can reproduce the per-target result exactly, and otherwise returns `None` so the caller falls back to the always-correct per-target algorithm. Unsupported cases (custom inference backends, parametrization, special-cased deps, multiple resolves, conftest/asset inference, non-root source roots, `unowned_dependency_behavior=error`) fall back automatically. A new equivalence test asserts the batched output is identical to the per-target output across import/explicit/generator/`__init__` edges and the fallback path. On the 50k-target repro this takes `--changed-dependents` from ~2m40s to under 20s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…no-edge targets Expands the opt-in batched first-party-Python reverse-dependency graph (`[dependents-inference].use_batched_python`) so it engages on large, real-world repositories instead of falling back to per-target: - Parse sources grouped by source root. A single combined parse merges stripped paths that collide across source roots (e.g. a top-level `__init__.py` in two roots), which fails with a MergeDigests duplicate-entry error; grouping by source root keeps stripped paths unique. - Batch parametrized targets. Parametrization does not change a file's content, so the file is parsed once and its owners are resolved at each parametrization's own resolve. This is exact because inferred dependencies bypass `_fill_parameters` (only explicit deps are filled, and explicit-`deps` targets are routed per-target). - Honor `[python-infer].ambiguity_resolution = by_source_root` with a second owner-lookup phase keyed on the importer's source root for ambiguous modules. - Skip targets that contribute no edges (no explicit/special-cased deps, no applicable inference, not a generator) instead of resolving them per-target. The batched and per-target algorithms produce identical reverse graphs; on a large first-party-Python repository (~56k targets, heavy resolve parametrization) the goal runs ~1.9x faster cold and ~1.7x warm with byte-identical output. Adds a parametrized-resolves equivalence test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Jason Barnett and others added 2 commits June 17, 2026 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

project_info: opt-in batched reverse-dependency graph for first-party Python#23434

project_info: opt-in batched reverse-dependency graph for first-party Python#23434
jasonwbarnett wants to merge 2 commits into
pantsbuild:mainfrom
altana-ai:claude/batched-dependents-python

jasonwbarnett commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

jasonwbarnett commented Jun 17, 2026

Problem

What this does

Correctness

Performance

Scope / follow-ups

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant