Avoid full-context routed-expert padding by xeophon · Pull Request #1794 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-21T08:26:20Z

Overview

Keep the V1 routed-expert payload unchanged and construct the engine's one-row-short padding only for the final node slice. Node alignment and owned-array behavior stay the same, while the transient work scales with the retained suffix instead of the full prompt context.

Design

The engine may omit routing for the final token because no forward pass follows it. Attribution still repeats the last available routing row, but now does so when the final token-bearing node ends exactly one row beyond the payload. Complete slices continue to receive their existing owned copies, and empty or out-of-range slices remain unset.

This localizes the padding to arr[off:] plus the final row. It avoids materializing a padded full-context array that is immediately discarded after new-node attribution.

Performance

A PEP 723 benchmark used a uint8 routed-expert tensor shaped (1,000,000, 8, 4), with 900,000 reused rows and 100,001 retained new rows:

Measurement	Before	After	Saved
Median synchronous copy time	0.495 ms	0.046 ms	0.449 ms (90.7%)
Peak traced allocation	33.570 MiB	3.052 MiB	30.518 MiB (90.9%)
Fresh-process maximum RSS	98.141 MiB	67.391 MiB	30.750 MiB (31.3%)
Large-array bytes copied	35,200,064 B	3,200,032 B	32,000,032 B (90.9%)

The retained result remains 3,200,032 bytes in both cases; the savings come entirely from removing the context-sized temporary copy.

Note

^{Cursor Bugbot is generating a summary for commit fe33a92. Configure here.}

Note

Fix full-context padding in `_attribute_routed_experts` to pad only the final node

Previously, when the routing array was one row short, the entire array was padded globally by appending a copy of the last row. Now, padding is applied locally and only to the final node that consumes the missing row, keeping all other nodes' routed_experts slices aligned to their actual token counts.

Changes are isolated to graph.py.

^{Macroscope summarized fe33a92.}

macroscopeapp · 2026-06-21T08:28:55Z

Approvability

Verdict: Needs human review

This change modifies how expert routing arrays are padded during model inference - shifting from upfront full-context padding to lazy final-node padding. While an optimization, it changes processing logic in ML infrastructure that warrants verification.

^{You can customize Macroscope's approvability policy. Learn more.}

Avoid full routed-expert padding copy

fe33a92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid full-context routed-expert padding#1794

Avoid full-context routed-expert padding#1794
xeophon wants to merge 1 commit into
feat/nano-as-v1from
codex/avoid-routed-expert-padding

xeophon commented Jun 21, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xeophon commented Jun 21, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Design

Performance

Fix full-context padding in _attribute_routed_experts to pad only the final node

Uh oh!

macroscopeapp Bot commented Jun 21, 2026

Approvability

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented Jun 21, 2026 •

edited by macroscopeapp Bot

Loading

Fix full-context padding in `_attribute_routed_experts` to pad only the final node