← Back
GRPO
technique
Group Relative Policy Optimization - a reinforcement learning algorithm used in training language models.
Also mentioned
(1)
Casual references without a clear endorsement
Nathan Lambert
mentioned
"If we look at the GRPO equation, this one is famous for this because essentially the reward given..."
▶ 1:48:49