.. currentmodule:: torchtune.rlhf
Components and losses for RLHF algorithms like PPO and DPO.
.. autosummary::
:toctree: generated/
:nosignatures:
estimate_advantages
get_rewards_ppo
truncate_sequence_at_first_stop_token
loss.PPOLoss
loss.DPOLoss
loss.RSOLoss