Skip to content

project_info: opt-in batched reverse-dependency graph for first-party Python#23434

Draft
jasonwbarnett wants to merge 2 commits into
pantsbuild:mainfrom
altana-ai:claude/batched-dependents-python
Draft

project_info: opt-in batched reverse-dependency graph for first-party Python#23434
jasonwbarnett wants to merge 2 commits into
pantsbuild:mainfrom
altana-ai:claude/batched-dependents-python

Conversation

@jasonwbarnett

Copy link
Copy Markdown
Contributor

Problem

dependents and --changed-dependents build a reverse-dependency graph via map_addresses_to_dependents, which resolves the dependencies of every target individually. Each resolve_dependencies call fans out to tens of engine nodes, so the work is O(total targets) regardless of how little actually changed. On large Python monorepos this dominates the run — e.g. ~2m40s for --changed-dependents=transitive on a ~50k-target repo, even when only one file changed (see #23236).

What this does

Adds an opt-in fast path, [dependents-inference].use_batched_python (off by default), that computes the same reverse graph in a handful of batched passes over first-party Python:

  • generator → generated-target edges, derived directly from the target set (no rule calls);
  • explicit dependencies=[...], reusing determine_explicitly_provided_dependencies (so parsing/ignores match exactly);
  • import inference, via a single native parse of all sources + one owner lookup per distinct imported module;
  • __init__.py inference, from the owned init files with the same empty-file filtering.

Backends opt in through a new ReverseDependencyGraphImpl union (in project_info.dependents). The Python implementation lives in the Python backend (dependency_inference/reverse_graph.py). The result is used only when it can reproduce the per-target result exactly; otherwise it returns None and the caller falls back to the always-correct per-target algorithm.

It declines (falls back) for anything it does not reproduce: custom/other-language inference backends that apply to a target, parametrization, special-cased dependency fields, multiple resolves, conftest.py/asset inference, non-root source roots, or [python-infer].unowned_dependency_behavior=error.

Correctness

A new equivalence test (reverse_graph_test.py) asserts the batched output is byte-identical to the per-target output across import / explicit / generator / __init__ edges (direct and transitive), and that the fallback path also matches. This was additionally validated end-to-end against the synthetic 50k-target repro and several fixtures.

Performance

On the synthetic 50k-target repro, --changed-dependents=transitive drops from ~2m40s → under 20s (the reverse-graph build itself goes from ~130s to ~6s; the batched parse replaces ~2.4M engine nodes with a handful).

Scope / follow-ups

This is intentionally conservative and opt-in while coverage grows. Natural follow-ups: full source-root support (currently declines when source roots strip paths), conftest.py/asset inference, and multi-resolve support. Each currently triggers the correct per-target fallback.

Testing

  • New reverse_graph_test.py (batched == per-target, incl. fallback) passes.
  • Existing dependents_test.py passes (no change to the default path).
  • Validated against 2.32.0.dev7 + the 50k-target repro and __init__/explicit/import/source-root fixtures.

Note: opened as a draft — the full repo lint/check/test :: suite was not run in my environment; relying on CI to confirm BUILD/lint.

Refs #23236.

🤖 Generated with Claude Code

Jason Barnett and others added 2 commits June 17, 2026 19:39
…arty Python

`dependents` and `--changed-dependents` build a reverse-dependency graph by
resolving the dependencies of every target individually via
`map_addresses_to_dependents`. Each `resolve_dependencies` call fans out to
tens of engine nodes, so on large repositories this O(total targets) work
dominates the run (e.g. ~2m40s for `--changed-dependents=transitive` on a
~50k-target Python repo) regardless of how little actually changed.

This adds an opt-in fast path, `[dependents-inference].use_batched_python`
(off by default), that computes the same reverse graph in a single batched
pass over first-party Python sources:
  * generator -> generated-target edges, derived from the target set;
  * explicit `dependencies=[...]`, via `determine_explicitly_provided_dependencies`;
  * import inference, via one native parse of all sources + one owner lookup
    per imported module;
  * `__init__.py` inference, via the owned init files + the same empty-file filter.

Backends opt in through a new `ReverseDependencyGraphImpl` union; the result is
used only when it can reproduce the per-target result exactly, and otherwise
returns `None` so the caller falls back to the always-correct per-target
algorithm. Unsupported cases (custom inference backends, parametrization,
special-cased deps, multiple resolves, conftest/asset inference, non-root
source roots, `unowned_dependency_behavior=error`) fall back automatically.

A new equivalence test asserts the batched output is identical to the
per-target output across import/explicit/generator/`__init__` edges and the
fallback path. On the 50k-target repro this takes `--changed-dependents` from
~2m40s to under 20s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…no-edge targets

Expands the opt-in batched first-party-Python reverse-dependency graph
(`[dependents-inference].use_batched_python`) so it engages on large,
real-world repositories instead of falling back to per-target:

- Parse sources grouped by source root. A single combined parse merges stripped
  paths that collide across source roots (e.g. a top-level `__init__.py` in two
  roots), which fails with a MergeDigests duplicate-entry error; grouping by
  source root keeps stripped paths unique.
- Batch parametrized targets. Parametrization does not change a file's content,
  so the file is parsed once and its owners are resolved at each
  parametrization's own resolve. This is exact because inferred dependencies
  bypass `_fill_parameters` (only explicit deps are filled, and explicit-`deps`
  targets are routed per-target).
- Honor `[python-infer].ambiguity_resolution = by_source_root` with a second
  owner-lookup phase keyed on the importer's source root for ambiguous modules.
- Skip targets that contribute no edges (no explicit/special-cased deps, no
  applicable inference, not a generator) instead of resolving them per-target.

The batched and per-target algorithms produce identical reverse graphs; on a
large first-party-Python repository (~56k targets, heavy resolve parametrization)
the goal runs ~1.9x faster cold and ~1.7x warm with byte-identical output. Adds a
parametrized-resolves equivalence test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant