Direct Preference Optimization

technique

A simpler alternative to reinforcement learning for training language models using human preferences without explicit reward models.

Topics

ai training preferences optimization

Casual references without a clear endorsement

Nathan Lambert mentioned "the famous paper, Direct Preference Optimization, which is a much simpler way of solving the prob..." ▶ 2:10:12