Fix: Episode freeze with staggered agent death in multi-agent environments#2
Conversation
|
Hi @yoosunghong thanks for reaching out, we are in the process of moving to open development so you should see more work from us on here in the near future, and we would love to see further suggestions and contributions from you! I have a concern about padding the actions with zero-based no-ops given that in applications like hierarchical RL with ray distinguishing between a no-op and a valid action will be relevant. Can this be handled by checking if an action exists in the map before applying it to the agent? For the filtering of the Actions/terminateds and truncateds, can we implement the handling in TScholaEnvironment or a helper function? That way we won't need to redo the handling for the other reset protocols as well (e.g. NextStep). |
1b91e29 to
572375a
Compare
Snapshot previously-dead agents before delegating to the Blueprint Step(), forward only live-agent actions, and restore the full pre-step snapshot afterwards. Replaces the per-field flag patch and the Python-side no-op padding with a single wrapper-level guard that covers every reset protocol (Disabled, SameStep, NextStep) without duplication in AbstractGymConnector. Restoring the full FAgentState (rather than cherry-picking bTerminated/bTruncated/Reward) prevents stale observations or future FAgentState members from leaking out of the Blueprint boundary for agents that are already terminal.
572375a to
5f844e1
Compare
centralise dead-agent filter in BaseRayEnv Add BaseRayEnv._filter_dead_agents() static helper that strips already-dead agents from all five gRPC return dicts (obs, rewards, terminateds, truncateds, infos) before they reach RLlib. Previously only RayEnv had inline protection; RayVecEnv was unprotected and would crash identically under staggered death. RayVecEnv.step() skips the filter for any env slot whose _reset_on_next_step flag is True, preventing the prior episode's dead-agent set from stripping the fresh observations Unreal returns on the NEXT_STEP autoreset transition.
|
Hi @amd-alexcann, Thank you so much for the warm welcome and the insightful feedback! I am glad to hear about the move to open development and would love to continue contributing to Schola's growth. I have pushed an update that addresses your concerns:
Note on RayVecEnv: I added a guard to skip filtering when _reset_on_next_step is True. This ensures that when Unreal returns fresh observations during a NEXT_STEP autoreset transition, they aren't accidentally stripped by the previous episode's termination state. Testing & Verification All 10 staggered-death unit tests passed. Regression suites are clean. Re-verified the Unreal integration test end-to-end (both RayEnv and RayVecEnv) to confirm the hang is resolved without side effects. I look forward to your further thoughts! |
Hello, I would like to express my sincere gratitude for your dedication to this project. Schola has been an invaluable tool for my research.
While using it, I identified a minor issue in a multi-agent setup and prepared a potential fix. I'm not sure if this is the best approach, but I wanted to share it in hopes that it might be useful. I would greatly appreciate any feedback you may have.
Problem
In multi-agent environments using Schola + RLlib with
NEXT_STEPautoreset, agents dying at different timesteps cause two compounding failures:Step()receives an incomplete action map, causing the environment to stall.Step()could overwritebTerminatedflags back to false, preventingAllAgentsCompleted()from ever returning true.Root Cause
RayEnv.step): Raw RLlib actions (live agents only) were forwarded directly without padding for dead agents.AbstractGymConnector):Step()was called with all received actions, allowing environment implementations to accidentally clear terminal flags on dead agents.Fix Details
Python Side (
schola/rllib/env.py)_make_noop_action(): Generates zero-valued actions matching any action space structure.RayEnvandRayVecEnvnow pad previously-dead agents with no-op actions.C++ Side (
AbstractGymConnector.cpp)NextStepbranch, it now snapshots terminal agents beforeStep()and builds aLiveActionsmap.Testing & Verification
Unit Tests (No Unreal Required)
__all__computation.Integration Test (UE5.6 Environment)
Verified against a real UE5.6 environment with 3 agents dying at steps 5, 10, and 15.
📝 Compliance Checklist