If you’ve tried to build anything serious on top of LLMs recently, you’ve probably run into this: “Thinking” is supposed to make models better. In practice, it makes your infrastructure worse. Let’s break down where it actually hurts. The illusion of “just turn on reasoning” At a high level, you’d expect something simple: Turn reasoning on → better answers Turn reasoning off → cheaper, faster