I used to be obsessed with what's the smartest LLM, until I tried actually using them for some tasks and realized that the smaller models did the same task way faster.
So I switched my focus from "what's the smartest model" to "what's the smallest one that can do my task?"
With that lens, "scores high on general intelligence benchmarks" actually becomes a measure of how overqualified the model is, and how much time, money and energy you are wasting.
So I switched my focus from "what's the smartest model" to "what's the smallest one that can do my task?"
With that lens, "scores high on general intelligence benchmarks" actually becomes a measure of how overqualified the model is, and how much time, money and energy you are wasting.