Skip to content

Fix: Episode freeze with staggered agent death in multi-agent environments#2

Open
yoosunghong wants to merge 2 commits intoGPUOpen-LibrariesAndSDKs:mainfrom
yoosunghong:fix/staggered-death-freeze
Open

Fix: Episode freeze with staggered agent death in multi-agent environments#2
yoosunghong wants to merge 2 commits intoGPUOpen-LibrariesAndSDKs:mainfrom
yoosunghong:fix/staggered-death-freeze

Conversation

@yoosunghong
Copy link
Copy Markdown

Hello, I would like to express my sincere gratitude for your dedication to this project. Schola has been an invaluable tool for my research.

While using it, I identified a minor issue in a multi-agent setup and prepared a potential fix. I'm not sure if this is the best approach, but I wanted to share it in hopes that it might be useful. I would greatly appreciate any feedback you may have.


Problem

In multi-agent environments using Schola + RLlib with NEXT_STEP autoreset, agents dying at different timesteps cause two compounding failures:

  1. Episode Freeze (Python → C++): When an agent dies, RLlib stops sending its actions. Unreal's Step() receives an incomplete action map, causing the environment to stall.
  2. Stale Data Leak (C++ → Python): Unreal re-emitted observations for dead agents because Step() could overwrite bTerminated flags back to false, preventing AllAgentsCompleted() from ever returning true.

Result: Any multi-agent episode with staggered deaths hangs permanently after the first agent dies.


Root Cause

  • Python (RayEnv.step): Raw RLlib actions (live agents only) were forwarded directly without padding for dead agents.
  • C++ (AbstractGymConnector): Step() was called with all received actions, allowing environment implementations to accidentally clear terminal flags on dead agents.

Fix Details

Python Side (schola/rllib/env.py)

  • _make_noop_action(): Generates zero-valued actions matching any action space structure.
  • Action Padding: RayEnv and RayVecEnv now pad previously-dead agents with no-op actions.
  • Data Filtering: Observations/rewards for already-dead agents are filtered out from the response.

C++ Side (AbstractGymConnector.cpp)

  • Snapshot & Filter: In the NextStep branch, it now snapshots terminal agents before Step() and builds a LiveActions map.
  • State Restoration: Manually restores terminal flags for previously-dead agents post-step to prevent accidental "revival."

Testing & Verification

Unit Tests (No Unreal Required)

python -m pytest Test/rllib/test_staggered_death.py -v
  • Covers: No-op generation, staggered death flow, padding logic, and __all__ computation.

Integration Test (UE5.6 Environment)

Verified against a real UE5.6 environment with 3 agents dying at steps 5, 10, and 15.

  • Mean ep_len_mean: 14.0 (Expected 15, tolerance ±2)
  • Episodes completed: 274 (30 PPO iterations)
  • Hang detected: No
  • Reproduction Project: [https://github.com/yoosunghong/ScholaStaggeredTest]

📝 Compliance Checklist

  • Python code is formatted using Black.
  • C++ code follows the Unreal Style Guide.
  • All new tests pass locally.

@amd-alexcann amd-alexcann self-requested a review April 16, 2026 14:27
@amd-alexcann
Copy link
Copy Markdown
Collaborator

amd-alexcann commented Apr 16, 2026

Hi @yoosunghong thanks for reaching out, we are in the process of moving to open development so you should see more work from us on here in the near future, and we would love to see further suggestions and contributions from you!

I have a concern about padding the actions with zero-based no-ops given that in applications like hierarchical RL with ray distinguishing between a no-op and a valid action will be relevant. Can this be handled by checking if an action exists in the map before applying it to the agent?

For the filtering of the Actions/terminateds and truncateds, can we implement the handling in TScholaEnvironment or a helper function? That way we won't need to redo the handling for the other reset protocols as well (e.g. NextStep).

@yoosunghong yoosunghong force-pushed the fix/staggered-death-freeze branch from 1b91e29 to 572375a Compare April 17, 2026 02:38
Snapshot previously-dead agents before delegating to the Blueprint
Step(), forward only live-agent actions, and restore the full
pre-step snapshot afterwards. Replaces the per-field flag patch
and the Python-side no-op padding with a single wrapper-level
guard that covers every reset protocol (Disabled, SameStep,
NextStep) without duplication in AbstractGymConnector.
Restoring the full FAgentState (rather than cherry-picking
bTerminated/bTruncated/Reward) prevents stale observations or
future FAgentState members from leaking out of the Blueprint
boundary for agents that are already terminal.
@yoosunghong yoosunghong force-pushed the fix/staggered-death-freeze branch from 572375a to 5f844e1 Compare April 18, 2026 03:16
centralise dead-agent filter in BaseRayEnv
Add BaseRayEnv._filter_dead_agents() static helper that strips already-dead
agents from all five gRPC return dicts (obs, rewards, terminateds, truncateds,
infos) before they reach RLlib.  Previously only RayEnv had inline protection;
RayVecEnv was unprotected and would crash identically under staggered death.
RayVecEnv.step() skips the filter for any env slot whose _reset_on_next_step
flag is True, preventing the prior episode's dead-agent set from stripping the
fresh observations Unreal returns on the NEXT_STEP autoreset transition.
@yoosunghong
Copy link
Copy Markdown
Author

yoosunghong commented Apr 18, 2026

Hi @amd-alexcann,

Thank you so much for the warm welcome and the insightful feedback! I am glad to hear about the move to open development and would love to continue contributing to Schola's growth.

I have pushed an update that addresses your concerns:

  1. No-op padding → Action map check
    I have entirely removed _make_noop_action() and the zero-padding logic. As you suggested, the C++ side now filters dead agents by checking for their existence in the action map before passing them to Execute_Step(). This ensures that no-ops and valid actions remain distinguishable, maintaining compatibility with hierarchical RL.

  2. Centralized filtering helper
    I moved the dead-agent filtering logic into a shared static helper, BaseRayEnv._filter_dead_agents(). Both RayEnv.step() and RayVecEnv.step() now utilize this helper. This prevents duplication and ensures that all protocols (including NEXT_STEP) are covered. Interestingly, this refactoring revealed that RayVecEnv previously lacked dead-agent filtering, which is now resolved.

Note on RayVecEnv: I added a guard to skip filtering when _reset_on_next_step is True. This ensures that when Unreal returns fresh observations during a NEXT_STEP autoreset transition, they aren't accidentally stripped by the previous episode's termination state.

Testing & Verification

All 10 staggered-death unit tests passed.

Regression suites are clean.

Re-verified the Unreal integration test end-to-end (both RayEnv and RayVecEnv) to confirm the hang is resolved without side effects.

I look forward to your further thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants