← Back

MMLU

other

Massive Multitask Language Understanding - a benchmark for evaluating AI models across diverse academic subjects and knowledge areas.

Also mentioned (1)

Casual references without a clear endorsement

Sebastian Raschka mentioned "even something simpler like MMLU, which is a multiple-choice benchmark. If you just change the fo..." ▶ 1:46:50