Reuse rollout token counts across limit checks#1799
Open
xeophon wants to merge 1 commit into
Open
Conversation
ApprovabilityVerdict: Needs human review This PR changes how token limits are calculated during rollouts - from using trace-level properties directly to summing across branches. Since this modifies limit-checking logic that controls when rollouts stop, the behavioral implications warrant human review. You can customize Macroscope's approvability policy. Learn more. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Reduce synchronous token-limit overhead by avoiding repeated reconstruction of the same derived branch paths. The trace node graph remains the source of truth, limit precedence and soft-cap behavior are unchanged, and the commit only changes the production interception server.
What changed
Trace.nodeswhen canonical append order proves the graph is a single root-to-leaf path.Trace.branchesonce for compacted, subagent, or otherwise non-linear graphs, then reuse that view across enabled input, output, and total token checks.max_turns→ input → output → total precedence and>=boundaries.Why
Trace.nodesstores each message once, whileTrace.branchesis an uncached derived view built by finding leaves and walking each parent chain. The previous input, output, and total properties each requested that view independently. When several token caps were enabled and still below their thresholds, the same graph paths were reconstructed up to three times.The canonical linear case can safely use the existing node order without allocating a branch view. Other graph shapes continue through the established branch abstraction, but share one snapshot rather than rebuilding it for every count. This keeps arbitrary node ordering and branching semantics on the existing path.
Performance
Measurements use median
time.perf_counter()wall time with GC before each repetition; peak Python allocation is measured separately withtracemallocso tracing overhead does not affect timings.The branched peak remains unchanged because both paths hold at most one full branch snapshot at a time; the saving is reduced allocation churn and graph-walk CPU from eliminating additional snapshots. Since limit checks run synchronously from the interception session, the wall-time reduction also shortens the corresponding event-loop stall.
At higher token density, mask summation becomes a larger share of the work: the 2k-node / 10-branch case at 128 tokens per node measured 8.711 ms → 6.964 ms, saving 1.747 ms (20.1%). This is expected because the change targets graph reconstruction rather than token-mask arithmetic.
Scope
The commit contains only
verifiers/v1/interception/server.py. Benchmark scripts, focused test scaffolding, project metadata, and lockfiles are intentionally excluded.Note
Medium Risk
Limit enforcement semantics must stay aligned with existing branch-based totals on branched traces; mistakes would allow extra turns or stop rollouts early.
Overview
RolloutLimits.reachedno longer relies onTrace’sprompt_len/completion_len/total_tokenshelpers, which each walkedTrace.branchesindependently when several token caps were enabled.It returns immediately when no token caps are set. For traces in canonical linear append order (each node’s parent is the previous index), it counts from
trace.nodesdirectly and skips building a branch view. For branched or non-linear graphs, it buildstrace.branchesonce and sums per-branch lengths for input, output, and total checks.max_turnsprecedence and>=stop boundaries are unchanged; only how counts are derived for the synchronous pre-turn check in the interception server.Reviewed by Cursor Bugbot for commit af6db01. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Reuse rollout token counts across limit checks in
RolloutLimits.reachedRolloutLimits.reachedwhen all token caps areNone, avoiding unnecessary computation.node.token_idslengths and masked token counts) rather than trace-level aggregates.trace.prompt_len, etc.) with sums acrosstrace.branches.Macroscope summarized af6db01.