Skip to content

feat: persistent runtime pool + warm in-runtime scoring#1766

Closed
mikasenghaas wants to merge 2 commits into
feat/nano-as-v1from
feat/v1-persistent-runtime
Closed

feat: persistent runtime pool + warm in-runtime scoring#1766
mikasenghaas wants to merge 2 commits into
feat/nano-as-v1from
feat/v1-persistent-runtime

Conversation

@mikasenghaas

@mikasenghaas mikasenghaas commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

Adds a persistent runtime mode + warm in-runtime scoring so the heavy per-rollout costs (sandbox/container provisioning, and re-importing a verifier's deps every rollout) are paid once per run instead of once per rollout.

  • persistent on the base runtime config (BaseRuntimeConfig.persistent, inherited by subprocess/docker/prime/modal; CLI --harness.runtime.persistent true). A persistent runtime is taken from an eval/train-level RuntimePool, reused across rollouts, and torn down only at the end of the run.
  • Wired into Environment.serving() alongside the existing shared-tools / interception pools, and injected into every Rollout via episode() — so it covers both the eval CLI and the env server. The env server's serving() spans the whole run, so on training the pool lives for the entire run.
  • Rollout acquires/releases a pooled runtime instead of make_runtime/stop (ephemeral path unchanged). Acquire happens inside the rollout's try, so a provisioning failure is captured on the trace like a normal start() failure.
  • Runtime.reset() clears the per-rollout workspace between reuses (subprocess recreates /tmp/<name>; docker/prime/modal empty the workdir) so reuse stays isolated; the provisioned resource and warm workers survive. Persistent mode therefore suits tasksets whose per-rollout state is workspace-local (e.g. gsm8k / math).
  • run_uv_script(..., warm=True) — the subprocess runtime routes to a long-lived worker that loads the script as a module once (its heavy top-level imports paid once) and answers many args → stdout calls. Gated on persistent (a worker would otherwise die with its rollout). Scripts opt in by exposing main(argv) -> str while staying uv run-able cold via a if __name__ == "__main__": print(main(sys.argv[1:])) footer.
  • gsm8k-v1 / aime24-v1 / math-env-v1 opt in: each verify.py converted to the dual-mode main(argv) shape (the math ones return instead of sys.exit, so a reused worker survives the early-out paths) and the correct reward passes warm=True.
  • Split the runtime factory (RuntimeConfig / make_runtime / runtime_is_local) into runtimes/factory.py so the pool can build runtimes without importing the package (avoids a cycle); runtimes/__init__ re-exports.

Why two layers

The provisioning win (skip re-creating a container/sandbox per rollout) needs only the pool, and helps remote runtimes most (the tb2 bench measured 2.5–7.4s sandbox+tunnel provisioning per rollout). The import-once win needs a warm worker, which only survives if the runtime persists — so the pool is the foundation and warm scoring rides on it. For subprocess, provisioning is ~free (start/stop is 0.2 ms); the entire per-rollout cost is the fresh-process import math_verify (~0.22 s), which the warm worker removes.

Verification

uv run python bench/persistent_runtime.py <N> <concurrency> — runs gsm8k's real verify.py through the framework API (no model generation, to isolate runtime/scoring overhead). ephemeral = make_runtime + start + run_uv_script(cold) + stop per call (today's per-rollout scoring path); persistent = pool.acquire + run_uv_script(warm=True) + release.

N concurrency ephemeral persistent speedup
256 128 3.95s (15.5 ms/call) 1.21s (4.7 ms/call) 3.3x
1000 128 15.31s (15.3 ms/call) 1.27s (1.27 ms/call) 12.0x
4000 128 63.28s (15.8 ms/call) 2.16s (0.54 ms/call) 29.4x

The win grows with N: ephemeral stays pinned at ~15.3 ms/call (the per-call import), while persistent's one-time worker import (one per pooled runtime) amortizes away — so it peaks when N ≫ concurrency (a long training run).

Correctness: a small gsm8k-v1 eval with --harness.runtime.persistent true vs default produces identical rewards (reward=1.000, 0 errors), logs rollouts as (pooled), and tears the pool down at the end (runtime pool: tearing down N persistent runtime(s)). Warm worker output verified correct across edge cases (e.g. 1,000, malformed → 0.0). ruff clean; tests/v1/test_configs.py green; full test suite collects.

Scope / follow-ups

  • Warm workers are currently implemented on the subprocess runtime. Persistence works on every runtime (and on remote runtimes also skips the per-rollout sandbox+tunnel provisioning). Warm workers on docker/prime/modal (a small in-runtime HTTP service reached via the existing expose/host_endpoint plumbing — structurally like a tool server) are a natural follow-up.
  • Persistent mode reuses a runtime across rollouts/tasks; the per-rollout workspace is reset, but anything a taskset installs outside the workspace persists — so it's opt-in and best for workspace-local state. Tasks with per-task images get a separate pool per image (pool keyed by resolved config).

Breaking

None. persistent defaults to False (ephemeral, today's behavior); warm defaults to False. run_uv_script gains an optional trailing warm kwarg.

mikasenghaas and others added 2 commits June 19, 2026 05:41
Add `persistent` to the base runtime config: a persistent runtime is taken from an
eval/train-level pool, reused across rollouts (per-rollout workspace reset between
uses), and torn down only at the end of the run — so expensive provisioning and warm
in-runtime workers are paid once, not per rollout.

- BaseRuntimeConfig.persistent (all four runtime configs); RuntimePool wired into
  Environment.serving() (eval + env-server/train) and injected into rollouts; Rollout
  acquires/releases instead of make_runtime/stop. The env server's serving() spans the
  whole run, so on train the pool lives for the entire run.
- Runtime.reset() clears the per-rollout workspace (subprocess/docker/prime/modal) so
  reuse stays isolated; warm workers + the provisioned resource survive.
- run_uv_script(warm=True): the subprocess runtime routes to a long-lived worker that
  imports the script's deps once (gated on persistent — the worker dies with an
  ephemeral runtime). Scripts opt in via main(argv)->str, staying uv-run-able cold.
  gsm8k verify.py converted + its reward passes warm=True.
- Split the runtime factory (RuntimeConfig/make_runtime) into runtimes/factory.py so
  the pool can build runtimes without importing the package (a cycle).
- bench/persistent_runtime.py: gsm8k scoring ~12x faster at 1000 rollouts / 128
  concurrency (3.3x@256 → 29x@4000), import paid once per pooled runtime not per call.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same math-verify pattern as gsm8k — convert verify.py to main(argv)->str (early
returns instead of sys.exit, so a reused warm worker survives them) + __main__ footer,
and pass warm=True. So aime/math scoring pays import math_verify once per pooled runtime
on a persistent runtime, like gsm8k.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
warm worker would die with the rollout that spawned it, so it only pays off when reused)."""
data = script.encode() if isinstance(script, str) else script
interpreter, path = await self._resolve_interpreter(data)
if warm and self.config.persistent:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium runtimes/subprocess.py:186

When warm=True and self.config.persistent is true, the env parameter is silently dropped — _run_warm has no env parameter and the warm worker protocol has no mechanism to set per-call environment variables. A caller passing env={"FOO": "bar"} with warm=True will get execution without those variables applied, with no error or warning. Consider falling back to the non-warm path when env is provided.

Suggested change
if warm and self.config.persistent:
if warm and self.config.persistent and not env:
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/runtimes/subprocess.py around line 186:

When `warm=True` and `self.config.persistent` is true, the `env` parameter is silently dropped — `_run_warm` has no `env` parameter and the warm worker protocol has no mechanism to set per-call environment variables. A caller passing `env={"FOO": "bar"}` with `warm=True` will get execution without those variables applied, with no error or warning. Consider falling back to the non-warm path when `env` is provided.

Evidence trail:
verifiers/v1/runtimes/subprocess.py lines 170-188 (REVIEWED_COMMIT): `run_uv_script` signature accepts `env`, line 186-187 warm path drops it. Lines 190-204: `_run_warm` has no `env` parameter. Lines 206-229: `_spawn_warm` and the warm worker protocol have no mechanism for per-call env vars. Line 188: non-warm path uses `env` via `self.run(...)`.

@mikasenghaas

Copy link
Copy Markdown
Member Author

issue was nfs, not code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant