I've been running DeepSeek behind LangChain for a few months for a side project. Worked fine, except one day I noticed something weird: DeepSeek's pricing page advertises cached input tokens at ~10% of the miss rate, but my bills didn't reflect that at all. I dug in.
The cache is byte-prefix based. The moment your request's prefix differs from the previous one by even a single character, you pay f