My team runs an e-commerce operation that pushes around 80,000 product descriptions through LLMs every month. We were spending $800+ on GPT-4o API calls. Last month we moved the bulk generation pipeline to Llama 4 Maverick running locally via Ollama.

Monthly cost dropped to about $40 in electricity. Here's the full setup, what worked, what didn't, and where we still use cloud APIs. Why bother run