Google’s new TPU 8t and TPU 8i explained: What the 8th-gen chips mean for AI agents

Google has revealed its eighth generation of custom TPUs at Cloud Next 2026, and unlike previous generations, this release is not just one but two different chips. The new TPU 8t and TPU 8i that have been designed specifically for training and inference, respectively. Survey Thank you for completing the survey!

The reason for this is simple – As AI models move from answering questions to autonomously executing multi-step tasks, the requirements for building those models and running them in production have branched quite a bit. Specialisation was simply the best route for tackling both issues. Also read: Google unveils new AI chips to rival Nvidia: Here is what they offer TPU 8t: Built to train faster The TPU 8t chip is Google’s training beast.

One superpod scales to 9,600 chips in total and delivers 121 Exaflops of compute capacity, with twice the inter-chip bandwidth compared to the last generation of TPUs. The objective here is to reduce the time it takes to build a model on the frontier from months to weeks. In addition, the chip offers 10x faster access to storage via TPUDirect and achieves over 97% productive compute time thanks to automated fault detection and routing, while being paired with Google’s new Virgo Network to scale up to one million chips within a logical cluster.

TPU 8i: Built to think faster The TPU 8i is designed for inference, specifically for the low-latency, high-throughput demands of AI agents working in swarms. Its standout feature is memory: 288 GB of high-bandwidth memory paired with 384 MB of on-chip SRAM, three times more than the previous generation. Keeping a model’s active working set on-chip eliminates the processor idle time that compounds at agent scale.

Also read: ChatGPT Images 2.0 vs 1.5: Is this update really an upgrade? Google also doubled interconnect bandwidth to 19.2 Tb/s and introduced a new Boardfly topology that cuts maximum network diameter by over 50%. The result is 80% better performance-per-dollar compared to the last generation – meaning businesses can serve roughly twice the user volume at the same cost.

Why it matters Both chips run on Google’s own Axion ARM-based CPU host and support JAX, PyTorch, SGLang and vLLM out of the box, with bare metal access for customers who need it. They will be generally available later in 2026 as part of Google’s AI Hypercomputer stack. The TPU has always been Google’s answer to the question of what happens when you build silicon around the workload rather than the other way around.

With agents now doing the work, the workload just got a lot more complex. Also read: Google has a Sergey Brin problem, and it’s called Claude Code