Reuse rollout token counts across limit checks by xeophon · Pull Request #1799 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-21T08:59:44Z

Overview

Reduce synchronous token-limit overhead by avoiding repeated reconstruction of the same derived branch paths. The trace node graph remains the source of truth, limit precedence and soft-cap behavior are unchanged, and the commit only changes the production interception server.

What changed

Return immediately when no token cap is configured.
Count directly from Trace.nodes when canonical append order proves the graph is a single root-to-leaf path.
Materialize Trace.branches once for compacted, subagent, or otherwise non-linear graphs, then reuse that view across enabled input, output, and total token checks.
Preserve the existing max_turns → input → output → total precedence and >= boundaries.

Why

Trace.nodes stores each message once, while Trace.branches is an uncached derived view built by finding leaves and walking each parent chain. The previous input, output, and total properties each requested that view independently. When several token caps were enabled and still below their thresholds, the same graph paths were reconstructed up to three times.

The canonical linear case can safely use the existing node order without allocating a branch view. Other graph shapes continue through the established branch abstraction, but share one snapshot rather than rebuilding it for every count. This keeps arbitrary node ordering and branching semantics on the existing path.

Performance

Measurements use median time.perf_counter() wall time with GC before each repetition; peak Python allocation is measured separately with tracemalloc so tracing overhead does not affect timings.

Workload	Before	After	Time saved	Peak allocation
200k-node linear stress case, 1 token/node	88.784 ms	23.526 ms	65.258 ms (73.5%)	12.00 MiB → below 0.01 MiB display precision
2k nodes / 1k shared-trunk branches, 1 token/node	217.673 ms	125.654 ms	92.019 ms (42.3%)	8.23 MiB → 8.23 MiB
2k-node linear long-horizon case, 32 tokens/node	0.732 ms	0.315 ms	0.417 ms (57.0%)	0.16 MiB → below 0.01 MiB display precision
2k nodes / 10 branches, 32 tokens/node	5.341 ms	3.600 ms	1.741 ms (32.6%)	0.19 MiB → 0.19 MiB
10k nodes / 10 branches, 32 tokens/node	26.767 ms	18.172 ms	8.595 ms (32.1%)	0.93 MiB → 0.93 MiB

The branched peak remains unchanged because both paths hold at most one full branch snapshot at a time; the saving is reduced allocation churn and graph-walk CPU from eliminating additional snapshots. Since limit checks run synchronously from the interception session, the wall-time reduction also shortens the corresponding event-loop stall.

At higher token density, mask summation becomes a larger share of the work: the 2k-node / 10-branch case at 128 tokens per node measured 8.711 ms → 6.964 ms, saving 1.747 ms (20.1%). This is expected because the change targets graph reconstruction rather than token-mask arithmetic.

Scope

The commit contains only verifiers/v1/interception/server.py. Benchmark scripts, focused test scaffolding, project metadata, and lockfiles are intentionally excluded.

Note

Medium Risk
Limit enforcement semantics must stay aligned with existing branch-based totals on branched traces; mistakes would allow extra turns or stop rollouts early.

Overview
RolloutLimits.reached no longer relies on Trace’s prompt_len / completion_len / total_tokens helpers, which each walked Trace.branches independently when several token caps were enabled.

It returns immediately when no token caps are set. For traces in canonical linear append order (each node’s parent is the previous index), it counts from trace.nodes directly and skips building a branch view. For branched or non-linear graphs, it builds trace.branches once and sums per-branch lengths for input, output, and total checks.

max_turns precedence and >= stop boundaries are unchanged; only how counts are derived for the synchronous pre-turn check in the interception server.

^{Reviewed by Cursor Bugbot for commit af6db01. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Reuse rollout token counts across limit checks in `RolloutLimits.reached`

Adds an early return in RolloutLimits.reached when all token caps are None, avoiding unnecessary computation.
For traces forming a single linear chain, computes token counts directly from node-level data (node.token_ids lengths and masked token counts) rather than trace-level aggregates.
For non-linear (branched) traces, replaces single trace-level aggregates (trace.prompt_len, etc.) with sums across trace.branches.
Behavioral Change: token cap comparisons now use different aggregation paths depending on graph topology, which may produce different values than before for branched traces.

^{Macroscope summarized af6db01.}

macroscopeapp · 2026-06-21T09:07:28Z

Approvability

Verdict: Needs human review

This PR changes how token limits are calculated during rollouts - from using trace-level properties directly to summing across branches. Since this modifies limit-checking logic that controls when rollouts stop, the behavioral implications warrant human review.

^{You can customize Macroscope's approvability policy. Learn more.}

Reuse rollout token counts across limits

af6db01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse rollout token counts across limit checks#1799

Reuse rollout token counts across limit checks#1799
xeophon wants to merge 1 commit into
feat/nano-as-v1from
codex/reuse-rollout-token-counts

xeophon commented Jun 21, 2026 •

edited by cursor Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xeophon commented Jun 21, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What changed

Why

Performance

Scope

Reuse rollout token counts across limit checks in RolloutLimits.reached

Uh oh!

macroscopeapp Bot commented Jun 21, 2026

Approvability

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented Jun 21, 2026 •

edited by cursor Bot

Loading

Reuse rollout token counts across limit checks in `RolloutLimits.reached`