← Back

GRPO

technique 1 mention from 1 sources

Group Relative Policy Optimization - a reinforcement learning algorithm used in training language models.

1

sources

Mentioned by

All mentions

Nathan Lambert mentioned ✓ High confidence
"If we look at the GRPO equation, this one is famous for this because essentially the reward given to the agent is based on how good a given action—an action is a completion—is relative to the other answers to that same problem."

Attribution: Nathan explains the GRPO algorithm in the context of reward calculation for RL training