Fix: Episode freeze with staggered agent death in multi-agent environments by yoosunghong · Pull Request #2 · GPUOpen-LibrariesAndSDKs/Schola

yoosunghong · 2026-03-14T16:04:15Z

Hello, I would like to express my sincere gratitude for your dedication to this project. Schola has been an invaluable tool for my research.

While using it, I identified a minor issue in a multi-agent setup and prepared a potential fix. I'm not sure if this is the best approach, but I wanted to share it in hopes that it might be useful. I would greatly appreciate any feedback you may have.

Problem

In multi-agent environments using Schola + RLlib with NEXT_STEP autoreset, agents dying at different timesteps cause two compounding failures:

Episode Freeze (Python → C++): When an agent dies, RLlib stops sending its actions. Unreal's Step() receives an incomplete action map, causing the environment to stall.
Stale Data Leak (C++ → Python): Unreal re-emitted observations for dead agents because Step() could overwrite bTerminated flags back to false, preventing AllAgentsCompleted() from ever returning true.

Result: Any multi-agent episode with staggered deaths hangs permanently after the first agent dies.

Root Cause

Python (RayEnv.step): Raw RLlib actions (live agents only) were forwarded directly without padding for dead agents.
C++ (AbstractGymConnector): Step() was called with all received actions, allowing environment implementations to accidentally clear terminal flags on dead agents.

Fix Details

Python Side (`schola/rllib/env.py`)

_make_noop_action(): Generates zero-valued actions matching any action space structure.
Action Padding: RayEnv and RayVecEnv now pad previously-dead agents with no-op actions.
Data Filtering: Observations/rewards for already-dead agents are filtered out from the response.

C++ Side (`AbstractGymConnector.cpp`)

Snapshot & Filter: In the NextStep branch, it now snapshots terminal agents before Step() and builds a LiveActions map.
State Restoration: Manually restores terminal flags for previously-dead agents post-step to prevent accidental "revival."

Testing & Verification

Unit Tests (No Unreal Required)

python -m pytest Test/rllib/test_staggered_death.py -v

Covers: No-op generation, staggered death flow, padding logic, and __all__ computation.

Integration Test (UE5.6 Environment)

Verified against a real UE5.6 environment with 3 agents dying at steps 5, 10, and 15.

Mean ep_len_mean: 14.0 (Expected 15, tolerance ±2)
Episodes completed: 274 (30 PPO iterations)
Hang detected: No
Reproduction Project: [https://github.com/yoosunghong/ScholaStaggeredTest]

📝 Compliance Checklist

Python code is formatted using Black.
C++ code follows the Unreal Style Guide.
All new tests pass locally.

amd-alexcann · 2026-04-16T15:24:24Z

Hi @yoosunghong thanks for reaching out, we are in the process of moving to open development so you should see more work from us on here in the near future, and we would love to see further suggestions and contributions from you!

I have a concern about padding the actions with zero-based no-ops given that in applications like hierarchical RL with ray distinguishing between a no-op and a valid action will be relevant. Can this be handled by checking if an action exists in the map before applying it to the agent?

For the filtering of the Actions/terminateds and truncateds, can we implement the handling in TScholaEnvironment or a helper function? That way we won't need to redo the handling for the other reset protocols as well (e.g. NextStep).

Snapshot previously-dead agents before delegating to the Blueprint Step(), forward only live-agent actions, and restore the full pre-step snapshot afterwards. Replaces the per-field flag patch and the Python-side no-op padding with a single wrapper-level guard that covers every reset protocol (Disabled, SameStep, NextStep) without duplication in AbstractGymConnector. Restoring the full FAgentState (rather than cherry-picking bTerminated/bTruncated/Reward) prevents stale observations or future FAgentState members from leaking out of the Blueprint boundary for agents that are already terminal.

centralise dead-agent filter in BaseRayEnv Add BaseRayEnv._filter_dead_agents() static helper that strips already-dead agents from all five gRPC return dicts (obs, rewards, terminateds, truncateds, infos) before they reach RLlib. Previously only RayEnv had inline protection; RayVecEnv was unprotected and would crash identically under staggered death. RayVecEnv.step() skips the filter for any env slot whose _reset_on_next_step flag is True, preventing the prior episode's dead-agent set from stripping the fresh observations Unreal returns on the NEXT_STEP autoreset transition.

yoosunghong · 2026-04-18T08:30:51Z

Hi @amd-alexcann,

Thank you so much for the warm welcome and the insightful feedback! I am glad to hear about the move to open development and would love to continue contributing to Schola's growth.

I have pushed an update that addresses your concerns:

No-op padding → Action map check
I have entirely removed _make_noop_action() and the zero-padding logic. As you suggested, the C++ side now filters dead agents by checking for their existence in the action map before passing them to Execute_Step(). This ensures that no-ops and valid actions remain distinguishable, maintaining compatibility with hierarchical RL.
Centralized filtering helper
I moved the dead-agent filtering logic into a shared static helper, BaseRayEnv._filter_dead_agents(). Both RayEnv.step() and RayVecEnv.step() now utilize this helper. This prevents duplication and ensures that all protocols (including NEXT_STEP) are covered. Interestingly, this refactoring revealed that RayVecEnv previously lacked dead-agent filtering, which is now resolved.

Note on RayVecEnv: I added a guard to skip filtering when _reset_on_next_step is True. This ensures that when Unreal returns fresh observations during a NEXT_STEP autoreset transition, they aren't accidentally stripped by the previous episode's termination state.

Testing & Verification

All 10 staggered-death unit tests passed.

Regression suites are clean.

Re-verified the Unreal integration test end-to-end (both RayEnv and RayVecEnv) to confirm the hang is resolved without side effects.

I look forward to your further thoughts!

amd-alexcann self-requested a review April 16, 2026 14:27

yoosunghong force-pushed the fix/staggered-death-freeze branch from 1b91e29 to 572375a Compare April 17, 2026 02:38

yoosunghong force-pushed the fix/staggered-death-freeze branch from 572375a to 5f844e1 Compare April 18, 2026 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Episode freeze with staggered agent death in multi-agent environments#2

Fix: Episode freeze with staggered agent death in multi-agent environments#2
yoosunghong wants to merge 2 commits intoGPUOpen-LibrariesAndSDKs:mainfrom
yoosunghong:fix/staggered-death-freeze

yoosunghong commented Mar 14, 2026

Uh oh!

amd-alexcann commented Apr 16, 2026 •

edited

Loading

Uh oh!

yoosunghong commented Apr 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yoosunghong commented Mar 14, 2026

Problem

Root Cause

Fix Details

Python Side (schola/rllib/env.py)

C++ Side (AbstractGymConnector.cpp)

Testing & Verification

Unit Tests (No Unreal Required)

Integration Test (UE5.6 Environment)

📝 Compliance Checklist

Uh oh!

amd-alexcann commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoosunghong commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Python Side (`schola/rllib/env.py`)

C++ Side (`AbstractGymConnector.cpp`)

amd-alexcann commented Apr 16, 2026 •

edited

Loading

yoosunghong commented Apr 18, 2026 •

edited

Loading