← Back
Direct Preference Optimization
technique
A simpler alternative to reinforcement learning for training language models using human preferences without explicit reward models.
Topics
Also mentioned
(1)
Casual references without a clear endorsement
Nathan Lambert
mentioned
"the famous paper, Direct Preference Optimization, which is a much simpler way of solving the prob..."
▶ 2:10:12