project_info: opt-in batched reverse-dependency graph for first-party Python#23434
Draft
jasonwbarnett wants to merge 2 commits into
Draft
project_info: opt-in batched reverse-dependency graph for first-party Python#23434jasonwbarnett wants to merge 2 commits into
jasonwbarnett wants to merge 2 commits into
Conversation
…arty Python
`dependents` and `--changed-dependents` build a reverse-dependency graph by
resolving the dependencies of every target individually via
`map_addresses_to_dependents`. Each `resolve_dependencies` call fans out to
tens of engine nodes, so on large repositories this O(total targets) work
dominates the run (e.g. ~2m40s for `--changed-dependents=transitive` on a
~50k-target Python repo) regardless of how little actually changed.
This adds an opt-in fast path, `[dependents-inference].use_batched_python`
(off by default), that computes the same reverse graph in a single batched
pass over first-party Python sources:
* generator -> generated-target edges, derived from the target set;
* explicit `dependencies=[...]`, via `determine_explicitly_provided_dependencies`;
* import inference, via one native parse of all sources + one owner lookup
per imported module;
* `__init__.py` inference, via the owned init files + the same empty-file filter.
Backends opt in through a new `ReverseDependencyGraphImpl` union; the result is
used only when it can reproduce the per-target result exactly, and otherwise
returns `None` so the caller falls back to the always-correct per-target
algorithm. Unsupported cases (custom inference backends, parametrization,
special-cased deps, multiple resolves, conftest/asset inference, non-root
source roots, `unowned_dependency_behavior=error`) fall back automatically.
A new equivalence test asserts the batched output is identical to the
per-target output across import/explicit/generator/`__init__` edges and the
fallback path. On the 50k-target repro this takes `--changed-dependents` from
~2m40s to under 20s.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…no-edge targets Expands the opt-in batched first-party-Python reverse-dependency graph (`[dependents-inference].use_batched_python`) so it engages on large, real-world repositories instead of falling back to per-target: - Parse sources grouped by source root. A single combined parse merges stripped paths that collide across source roots (e.g. a top-level `__init__.py` in two roots), which fails with a MergeDigests duplicate-entry error; grouping by source root keeps stripped paths unique. - Batch parametrized targets. Parametrization does not change a file's content, so the file is parsed once and its owners are resolved at each parametrization's own resolve. This is exact because inferred dependencies bypass `_fill_parameters` (only explicit deps are filled, and explicit-`deps` targets are routed per-target). - Honor `[python-infer].ambiguity_resolution = by_source_root` with a second owner-lookup phase keyed on the importer's source root for ambiguous modules. - Skip targets that contribute no edges (no explicit/special-cased deps, no applicable inference, not a generator) instead of resolving them per-target. The batched and per-target algorithms produce identical reverse graphs; on a large first-party-Python repository (~56k targets, heavy resolve parametrization) the goal runs ~1.9x faster cold and ~1.7x warm with byte-identical output. Adds a parametrized-resolves equivalence test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
dependentsand--changed-dependentsbuild a reverse-dependency graph viamap_addresses_to_dependents, which resolves the dependencies of every target individually. Eachresolve_dependenciescall fans out to tens of engine nodes, so the work isO(total targets)regardless of how little actually changed. On large Python monorepos this dominates the run — e.g. ~2m40s for--changed-dependents=transitiveon a ~50k-target repo, even when only one file changed (see #23236).What this does
Adds an opt-in fast path,
[dependents-inference].use_batched_python(off by default), that computes the same reverse graph in a handful of batched passes over first-party Python:dependencies=[...], reusingdetermine_explicitly_provided_dependencies(so parsing/ignores match exactly);__init__.pyinference, from the owned init files with the same empty-file filtering.Backends opt in through a new
ReverseDependencyGraphImplunion (inproject_info.dependents). The Python implementation lives in the Python backend (dependency_inference/reverse_graph.py). The result is used only when it can reproduce the per-target result exactly; otherwise it returnsNoneand the caller falls back to the always-correct per-target algorithm.It declines (falls back) for anything it does not reproduce: custom/other-language inference backends that apply to a target, parametrization, special-cased dependency fields, multiple resolves,
conftest.py/asset inference, non-root source roots, or[python-infer].unowned_dependency_behavior=error.Correctness
A new equivalence test (
reverse_graph_test.py) asserts the batched output is byte-identical to the per-target output across import / explicit / generator /__init__edges (direct and transitive), and that the fallback path also matches. This was additionally validated end-to-end against the synthetic 50k-target repro and several fixtures.Performance
On the synthetic 50k-target repro,
--changed-dependents=transitivedrops from ~2m40s → under 20s (the reverse-graph build itself goes from ~130s to ~6s; the batched parse replaces ~2.4M engine nodes with a handful).Scope / follow-ups
This is intentionally conservative and opt-in while coverage grows. Natural follow-ups: full source-root support (currently declines when source roots strip paths),
conftest.py/asset inference, and multi-resolve support. Each currently triggers the correct per-target fallback.Testing
reverse_graph_test.py(batched == per-target, incl. fallback) passes.dependents_test.pypasses (no change to the default path).__init__/explicit/import/source-root fixtures.Note: opened as a draft — the full repo
lint/check/test ::suite was not run in my environment; relying on CI to confirm BUILD/lint.Refs #23236.
🤖 Generated with Claude Code