← Back
GRPO
technique
1 mention from 1 sources
Group Relative Policy Optimization - a reinforcement learning algorithm used in training language models.
1
sources
Mentioned by
All mentions
"If we look at the GRPO equation, this one is famous for this because essentially the reward given to the agent is based on how good a given action—an action is a completion—is relative to the other answers to that same problem."
From:
State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
•
▶ 1:48:49
•
Jan 2026
Attribution: Nathan explains the GRPO algorithm in the context of reward calculation for RL training