If you've tried building anything serious on top of large language models (LLMs) recently, you've probably run into this: "Thinking" is supposed to make models better. In practice, it makes your infrastructure worse. This isn't a model problem—it's an infrastructure and abstraction problem.
And it's getting worse as teams scale across multiple AI providers. Let's break down exactly where things go