GRPO

technique

Group Relative Policy Optimization - a reinforcement learning algorithm used in training language models.

Topics

ai reinforcement learning optimization training

Casual references without a clear endorsement

Nathan Lambert mentioned "If we look at the GRPO equation, this one is famous for this because essentially the reward given..." ▶ 1:48:49