rl-hack

Here are 2 public repositories matching this topic...

RewardGuard / Reward-Guard

Plug-and-play reward monitoring for RL training loops. Catch reward hacking, component imbalance, and starvation before they tank your run. Drop in one .step() call — get balance reports, auto weight correction, alignment scores, and WandB/TensorBoard/SB3 integrations out of the box. → rewardguard.dev

python machine-learning reinforcement-learning openai-gym alignment rl ai-safety rl-environment rl-hack reward-hacking