← Back

MMLU

other 1 mention from 1 sources

Massive Multitask Language Understanding - a benchmark for evaluating AI models across diverse academic subjects and knowledge areas.

1

sources

Mentioned by

All mentions

Sebastian Raschka mentioned ✓ High confidence
"even something simpler like MMLU, which is a multiple-choice benchmark. If you just change the format slightly, like, I don't know, if you use a dot instead of a parenthesis or something like that, the model accuracy will vastly differ."

Attribution: Sebastian uses MMLU as an example of how format sensitivity affects model evaluation