How neural networks learn — and why the obvious approach breaks immediately Every optimizer you'll ever use — Adam, AdamW, Lion, LAMB — is an answer to a problem that gradient descent creates. To understand why those answers exist, you need to feel the problem first. The loss surface is a landscape you can't see Imagine you're blindfolded, standing somewhere on a hilly terrain. Your only too