I deployed Ollama on Kubernetes, and the GPU worker node locked up mid-rollout. No logs, no error, just a dead pod that wouldn’t terminate and a new one that wouldn’t schedule. It wasn’t a crash.
It wasn’t a timeout. It was a deadlock I’d never seen before. I expected a smooth rollout. Ollama is a single-container, single-GPU workload. I set up a Deployment with a single replica, used a Persistent