April 22, 2026InfrastructureAgents

Google's 8th-Gen TPUs: One Chip for Training, One for Agent Loops

Google split its TPU line in two. At Cloud Next 2026, the 8th generation arrives as TPU 8t for training and TPU 8i for inference. The 8t pod scales to 9,600 chips, 2 petabytes of shared memory, 121 ExaFlops, roughly 3x the per-pod compute of the previous Ironwood. The 8i ships 288GB of HBM with 384MB on-chip SRAM and doubles interconnect bandwidth to 19.2 Tb/s, claiming 80% better perf-per-dollar.

The split is the actual point. Google's pitch is that agentic workloads do not look like one-shot training or one-shot inference. They are continuous loops where a model reasons, calls a tool, plans the next step, executes, then learns from the outcome. That changes what the chip needs to be good at. Latency-sensitive serving with massive KV cache reuse looks nothing like dense backprop on trillion-parameter clusters. So Google built two chips instead of one.

Both are co-designed with DeepMind, both promise up to 2x performance-per-watt over Ironwood, and both are GA later this year through AI Hypercomputer. This is also Google's loudest move to push customers off Nvidia for inference economics, where Google believes its TPU advantage is largest. Source: https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/
← Previous
GitHub Stars Daily Spotlight β€” April 22, 2026
Next β†’
Google Folds Vertex AI into the Gemini Enterprise Agent Platform
← Back to all articles

Comments

Loading...
>_