A couple of years ago, the dominant question in every engineering meeting, every Slack thread, every developer blog was: which model is the best? People ran benchmarks. They argued about MMLU scores.
They debated GPT-4 vs Claude vs Gemini like it was a sports rivalry. The energy made sense. These were genuinely new capabilities, and figuring out who was leading felt important. But that question is