Spot instances are 80% cheaper, but AWS kills them with a 2-minute warning. If you are running stateless web requests, that’s fine. But if you are running modern LLM reasoning workloads—where a single request can take minutes to process—a 2-minute warning is a death sentence.
Losing a node means losing gigabytes of computed KV Cache and instantly snapping the client's connection. User experience g
