← Back

GRPO

technique

Group Relative Policy Optimization - a reinforcement learning algorithm used in training language models.

Also mentioned (1)

Casual references without a clear endorsement

Nathan Lambert mentioned "If we look at the GRPO equation, this one is famous for this because essentially the reward given..." ▶ 1:48:49