Skip to content

Commit 572375a

Browse files
committed
Reviewer Feedback Incorporated #1
Centralise staggered-death handling in TScholaEnvironment::Step Snapshot previously-dead agents before delegating to the Blueprint Step(), forward only live-agent actions, and restore the full pre-step snapshot afterwards. Replaces the per-field flag patch and the Python-side no-op padding with a single wrapper-level guard that covers every reset protocol (Disabled, SameStep, NextStep) without duplication in AbstractGymConnector. Restoring the full FAgentState (rather than cherry-picking bTerminated/bTruncated/Reward) prevents stale observations or future FAgentState members from leaking out of the Blueprint boundary for agents that are already terminal.
1 parent f7e9e57 commit 572375a

File tree

2 files changed

+403
-2
lines changed

2 files changed

+403
-2
lines changed

Source/ScholaTraining/Public/Environment/EnvironmentInterface.h

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,52 @@ class SCHOLATRAINING_API TScholaEnvironment : public TScriptInterface<T>, public
119119

120120
/**
121121
* @brief Execute a step through the Blueprint interface.
122-
* @param[in] InActions Map of agent names to their actions.
122+
*
123+
* Handles staggered agent death: agents that were already terminated or truncated
124+
* before this step have their terminal state preserved. Only actions for live agents
125+
* are forwarded to the Blueprint (dead agents have no entry in InActions since the
126+
* Python side only sends actions for agents RLlib is actively managing).
127+
*
128+
* This logic is centralised here so it automatically covers every reset protocol
129+
* (Disabled, SameStep, NextStep) without duplication in AbstractGymConnector.
130+
*
131+
* @param[in] InActions Map of agent names to their actions (live agents only).
123132
* @param[out] OutAgentStates Map of agent names to their resulting states.
124133
*/
125134
void Step(const TMap<FString, TInstancedStruct<FPoint>>& InActions, TMap<FString, FAgentState>& OutAgentStates) override
126135
{
127-
T::Execute_Step(this->GetObject(), InActions, OutAgentStates);
136+
// Snapshot previously-dead agents before stepping.
137+
TMap<FString, FAgentState> DeadAgentStates;
138+
for (const auto& Pair : OutAgentStates)
139+
{
140+
if (Pair.Value.bTerminated || Pair.Value.bTruncated)
141+
{
142+
DeadAgentStates.Add(Pair.Key, Pair.Value);
143+
}
144+
}
145+
146+
// Build a filtered action map that excludes dead agents. Python only sends
147+
// actions for live agents, but this guard also prevents any accidental
148+
// dead-agent entry from reaching Execute_Step.
149+
TMap<FString, TInstancedStruct<FPoint>> LiveActions;
150+
for (const auto& ActionPair : InActions)
151+
{
152+
if (!DeadAgentStates.Contains(ActionPair.Key))
153+
{
154+
LiveActions.Add(ActionPair.Key, ActionPair.Value);
155+
}
156+
}
157+
158+
T::Execute_Step(this->GetObject(), LiveActions, OutAgentStates);
159+
160+
// Restore the full pre-step snapshot for previously-dead agents. A per-field
161+
// patch would leave observations, info, and any future FAgentState members
162+
// leaking stale Blueprint output; the snapshot is the source of truth for
163+
// dead agents, so overwrite the entry verbatim.
164+
for (const auto& DeadPair : DeadAgentStates)
165+
{
166+
OutAgentStates.Add(DeadPair.Key, DeadPair.Value);
167+
}
128168
};
129169

130170
};

0 commit comments

Comments
 (0)