There is a class of AWS incident I have started calling the "everything looked fine in testing" failure. The pattern is consistent. You design a serverless API.

Lambda function with sensible defaults, wired through API Gateway, pointing at DynamoDB. You test it in dev throughout the week. Latency is acceptable.

Costs track to plan. Then a campaign drops, or a new enterprise customer brings their t