Our eighth generation TPUs: two chips for the agentic era

https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/

94•xnx•1h ago

Comments

TheMrZZ•1h ago

> A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory.

This seems impressive. I don't know much about the space, so maybe it's not actually that great, but from my POV it looks like a competitive advantage for Google.

cyanydeez•12m ago

it is. itll still not create AGI without some breakthrough in instruction vs data separation of concerns

NoiseBert69•1h ago

That cooling system looks crazy. What an unbelievable density.

Keyframe•1h ago

As others have been capturing news cycle eyes, seems to me Google has been going from strength to strength quietly in the background capturing consumer market share and without much (any?) infrastructure problems considering they're so vertically integrated in AI since day one? At one point they even seemed like a lost cause, but they're like a tide.. just growing all around.

baq•1h ago

you've never tried to use gemini 3 I guess - that thing was so unreliable it might as well not be offered; there's also a reason why everybody here is excited for claude and codex, but not really for antigravity.

that said, I actually agree: google IMHO silently dominates the 'normie business' chatbot area. gemini is low key great for day to day stuff.

youniverse•1h ago

Yeah I think there will be a time in a few years (1-2?) when both Google and Apple will get to eat their cake. They aren't playing the same game of speed running unpolished product releases every month to double their valuation. They have time to think and observe and put out something really polished. At least that's the hope! :)

echelon•45m ago

That's because these mega monopolies have diverse income streams and have grown like cancers to tax every system and economy that touches the internet.

Anthropic and OpenAI are having to fight like hell to secure market share. Google just gets to sit back and relax with its browser and android monopolies.

Why did our regulators fall asleep at the wheel? Google owns 92% of "URL bar" surface area and turned it into a Google search trademark dragnet. Now Anthropic has to bid for its own products against its competitors and inject a 15+% CAC which is just a Google tax.

Now consider all the bullshit Google gets to do with android and owning that with an iron fist. Every piece of software has a 30% tax, has to jump through hoops, and even finding it is subject to the same bidding process.

These companies need to be broken up.

Google would be healthier for the economy and its own investors as six different companies. And they shouldn't be allowed to set the rules for mobile apps or tax other people's IP and trademarks.

harrall•16m ago

Google invented the AI architecture that Anthropic and OpenAI based their entire companies on? Based off years of research at Google.

Of course they should have to fight with the inventors of the technology they’re using.

someguyiguess•13m ago

> Google invented the AI architecture that Anthropic and OpenAI based their entire companies on

Source?

ckcheng•7m ago

Unless you don’t think Attention Is All You Need?

https://en.wikipedia.org/wiki/Attention_Is_All_You_Need

IncreasePosts•7m ago

"Attention Is All You Need" was a paper by a bunch of Google researchers

vibe42•58m ago

Their latest open models are pretty competitive with other open models, and some innovation around the smaller sizes (2-4 GB).

They're helping close to the distance to realistic quality inference on phones and other smaller devices.

WarmWash•35m ago

AI adoption isn't existential to Google like it is to OAI and Anthropic. They also can't produce hype like the other two, because anything they say is just going to come off as corporate drivel.

amazingamazing•1h ago

If ai ends up having a winner I struggle to see how it doesn’t end with Google winning because they own the entire stack, or Apple because they will have deployed the most potentially AI capable edge sites.

aliljet•1h ago

The real problem is that scientists doing this sort of early work more often than not want to burn hardware under their desks. Renting infrastructure in Google cloud isn't the only way...

nickandbro•1h ago

I am curious what workloads Citadel Securities is running on these TPUs? Are you telling me they need the latest TPUs for market insights?

vibe42•1h ago

Training their own, closed, internal models on their own data sets? Probably a good way to squeeze out some market trading signals.

nickandbro•54m ago

Reminds me of when hedge funds started laying increasingly shorter fiber-optic cable lines to achieve the lowest possible latency for high-frequency trading.

written-beyond•42m ago

I thought these TPUs were primarily used for inference?

vlovich123•14m ago

TPU8t is for training. But even still, once you’ve trained, you need to run the model too. And these kinds of models already have a huge latency hit so there’s not much hurting running it away from the trading switches.

pmb•1h ago

At this point, when you are doing big AI you basically have to buy it from NVidia or rent it from Google. And Google can design their chips and engine and systems in a whole-datacenter context, centralizing some aspects that are impossible for chip vendors to centralize, so I suspect that when things get really big, Google's systems will always be more cost-efficient.

(disclosure: I am long GOOG, for this and a few other reasons)

sigmoid10•1h ago

I'd bet that too if their management wasn't so incredibly uninspiring. Like, Apple under Cook was also pretty mild and a huge step down from Jobs, but Google feels like it fell off a cliff. If it wasn't for OpenAI releasing ChatGPT, they might still be sitting on that tech while only testing it internally. Now it drives their entire chip R&D.

WarmWash•37m ago

To be fair, I don't think any of the AI players wanted what OAI did. Sam grabbed first mover at the cost of this insane race everyone else got forced into.

hkpack•29m ago

I am not fan of the era when CEO is expected to be a cult leader type person.

Cook did very well in all areas as well as in not trying to create a cult.

whattheheckheck•23m ago

What would an inspiring leader do differently for you?

someguyiguess•14m ago

Inspire

akersten•20m ago

I'd go long Google too if using Gemini CLI felt anything close to the experience I get with Codex or Claude. They might have great hardware but it's worthless if their flagship coding agent gets stuck in loops trying to find the end of turn token.

fourside•17m ago

Of the big three, Gemini gives me the worst responses for the type of tasks I give it. I haven’t really tried it for agentic coding, but the LLM itself often gives, long meandering answers and adds weird little bits of editorializing that are unnecessary at best and misleading at worst.

surajrmal•10m ago

Gemini CLI isn't a great product unfortunately. While it's unfortunately tied to a GUI, antigravity is a far superior agent harness. I suggest comparing that to Claude code instead.

paulmist•1h ago

At $15/GB of HBM4 the 331.8TB of HBM4 per pod is 5 million...

nsteel•58m ago

It's HBM3e

zozbot234•30m ago

$15/GB is retail price for DIMM sticks. Is HBM4 really that cheap?

selectodude•17m ago

HBM is just DRAM stacked directly next to the die. The expensive part is gluing it on there. The chips themselves are pretty much the same.

vibe42•1h ago

The pics of the cooling system is pretty good sci-fi / cyberpunk / steampunk inspo.

If the whole AI bubble spectularly collapes, at least we got a lot of cool pics of custom hardware!

NitpickLawyer•55m ago

> If the whole AI bubble spectularly collapes

Every other news for the past month has been about lacking capacity. Everyone is having scaling issues with more demand than they can cover. Anthropic has been struggling for a few months, especially visible when EU tz is still up and US east coast comes online. Everything grinds to a halt. MS has been pausing new subscriptions for gh Copilot, also because a lack of capacity. And yet people are still on bubble this, collapse that? I don't get it. Is it becoming a meme? Are people seriously seeing something I don't? For the past 3 years models have kept on improving, capabilities have gone from toy to actually working, and there's no sign of stopping. It's weird.

vibe42•36m ago

Both are possible; increasing demand and bubble collapse.

The way this could happen is if model commoditization increases - e.g. some AI labs keep publishing large open models that increasingly close the gap to the closed frontier models.

Also, if consumer hardware keep getting better and models get so good that most people can get most of their usage satisfied by smaller models running on their laptop, they won't pay a ton for large frontier models.

hgoel•27m ago

There's a massive amount of demand at the current price point, this does not exclude a bubble considering that the current cost to consumers is lower than what capacity expansion costs.

Though nowadays it feels like the bubble is going to end up being mainly an OpenAI issue. The others are at least vaguely trying to balance expansion with revenue, without counting on inventing a computer god.

nsteel•57m ago

This link has more on the architecture: https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...

fulafel•54m ago

"TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation" sounds impressive especially as the previous generation is so recent (2025).

Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

dataking•48m ago

Vera Rubin will have Groq chips focused on fast inference so it points toward a trend. Also, with energy needs so high, why not reach for every feasible optimization?

xnx•46m ago

Nvidia said in March that they're working on specialized inference hardware, but they don't have any right now. You can do inference from Nvidia's current hardware offerings, but it's not as efficient.

FuriouslyAdrift•35m ago

AMD has been doing inference chips for many years and are the leader for HPC.

https://www.amd.com/en/products/accelerators/instinct.html

zozbot234•32m ago

The "training" chips will probably be quite usable for slower, higher-throughput inference at scale. I expect that to be quite popular eventually for non-time-sensitive uses.

cmptrnerd6•38m ago

Which company is building the silicon for Google? Is it tsmc? What node size? I didn't see it with a quick search, sorry if it was in the post.

wina•30m ago

tsmc through broadcom

varispeed•33m ago

I can't help but think we will be "laughing" at this in 10 years time like we laugh at steam engines or abacus.

iandanforth•30m ago

Anyone know if these are already powering all of Gemini services, some of them, or none yet? It's hard to tell if this will result in improvements in speed, lower costs, etc, or if those will be invisible, or have already happened.

kamranjon•28m ago

It's interesting that, of the large inference providers, Google has one of the most inconvenient policies around model deprecation. They deprecate models exactly 1 year after releasing them and force you to move onto their next generation of models. I had assumed, because they are using their own silicon, that they would actually be able to offer better stability, but the opposite seems to be true. Their rate limiting is also much stricter than OpenAI for example. I wonder how much of this is related to these TPU's, vs just strange policy decisions.

gordonhart•24m ago

It's frustrating how cavalier they are about killing old Gemini releases. My read is that once a new model is serving >90% of volume, which happens pretty quickly as most tools will just run the latest+greatest model, the standard Google cost/benefit analysis is applied and the old thing is unceremoniously switched off. It's actually surprising that they recently extended the EOL date for Gemini 2.5. Google has never been a particularly customer-obsessed company...

surajrmal•4m ago

What benefit is there to sticking on older models? If the API is the same, what are the switching costs?

jbellis•7m ago

Flash 2 isn't even at EOL until June but we started seeing ~90% error rates getting 429s over the weekend. (So we switched to GPT 5.4 nano.)

WarmWash•20m ago

Whats interesting to note, as someone who uses Gemini, ChatGPT, and Claude, is that Gemini consistently uses drastically fewer tokens than the other two. It seems like gemini is where it is because it has a much smaller thinking budget.

It's hard to reconcile this because Google likely has the most compute and at the lowest cost, so why aren't they gassing the hell out of inference compute like the other two? Maybe all the other services they provide are too heavy? Maybe they are trying to be more training heavy? I don't know, but it's interesting to see.

someguyiguess•16m ago

They have to have SOME competitive advantage. What reason is there to use Gemini over Claude or ChatGPT? It's not producing nearly the quality of output.

magicalhippo•9m ago

Well comparing Gemini 3.1 Pro vs ChatGPT 5.4 Pro, it's much faster at replying. Of course, if it actually thinks less then that helps a lot towards that. For most of my personal and work use-cases, I prefer waiting a bit longer for a better answer.

RationPhantoms•15m ago

They just released their enterprise agentic platform today so my expectation is that might be the gravity well for the Fortune 500's to park their inference on.

magicalhippo•10m ago

I've been trying Gemini Pro using their $20-ish Goole One subscription for a couple of months, and I also find it consistently does fewer web searches to verify information than say ChatGPT 5.4 Pro which I have through work.

I was planning on comparing them on coding but I didn't get the Gemini VSCode add-in to work so yeah, no dice.

The Android and web app is also riddled with bugs, including ones that makes you lose your chat history from the threads if you switch between them, not cool.

I'll be cancelling my Google One subscription this month.

zshn25•13m ago

It would be interesting to benchmark a short training / inference run on the latest of TPU vs. NVIDIA GPU per cost basis

jmyeet•12m ago

In recent discussions about Tim Apple [sic] moving on there was a discussion about whether Apple flopped on AI, which is my opinion. Of course you had the false dichotomy of doing nothing or burning money faster than the US military like OpenAI does.

IMHO that happy medium is Google. Not having to pay the NVidia tax will likely be a huge competitive advantage. And nobody builds data centers as cost-effectively as Google. It's kind of crazy to be talking ExaFLOPS and Tb/s here. From some quick Googling:

- The first MegaFLOPS CPU was in 1964

- A Cray supercomputer hit GigaFLOPS in 1988 with workstations hitting it in the 1990s. Consumer CPUs I think hit this around 1999 with the Pentium 3 at 1GHz+;

- It was the 2010s before we saw off-the-shelf TFLOPS;

- It was only last year where a single chip hit PetaFLOPS. I see the IBM Roadrunner hit this in 2008 but that was ~13,000 CPUs so...

Obviously this is near 10,000 TPUs to get to ~121 EFLOPS (FP4 admittedly) but that's still an astounding number. IT means each one is doing ~12 PFLOPS (FP4).

I saw a claim that Claude Mythos cost ~$10B to train. I personally believe Google can (or soon will be able to) do this for an order of magnitude less at least.

I would love to know the true cost/token of Claude, ChatGPT and Gemini. I think you'll find Google has a massive cost advantage here.

someguyiguess•8m ago

Apple has not flopped on AI as you say. They are just focused on privacy and are likely waiting for the time when local models become efficient enough to run on iPhones (which is quickly becoming a reality).

Google could probably train models for orders of magnitude less money as you say, but they aren't. They are not capable of creating high quality models like OpenAI and Anthropic are. Their company is just too disorganized and chaotic.

Anecdotally, I don't know a single person who uses Gemini on purpose.

himata4113•7m ago

I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.

I feel like google will surprise everyone with a model that will be an entire generation beyond SOTA at some point in time once they go from prototyping to making a model that's not a preview model anymore. All models up till now feel like they're just prototypes that were pushed to GA just so they have something to show to investors and to integrate into their suite as a proof of concept.

SecretDreams•6m ago

They are missing a header to show the transition in discussion from TPU8t to 8i!

Thanks for posting otherwise.

Edit: actually, looks like the header got captured as a figure caption on accident.

How Skopx Learns Your Business While You Work

Open Benchmark: Text Normalization in Commercial Streaming TTS Models

Push Notifications Can Betray Your Privacy (and What to Do About It)

Don't read the PDF, write the parser

Context Bloat in AI Agents

Linus Torvalds on AI code review: Anybody who thinks all AI is slop is in denial

A record-setting 31.4 Tbps attack caps a year of DDoS assaults

Tim Cook to Be Replaced by Near-Identical,More Expensive CEO with a Nicer Camera

Show HN: CatchAll – slowest web search API that outperforms everything on recall

TurboOCR: CUDA and TensorRT OCR Server at 270 img/s

Show HN: Ohita – a tool to simplify API key management for AI agents

Statutory Copyleft

Google puts AI agents at heart of its enterprise money-making push

Show HN: Sift – a minimal news app (looking for UI/UX feedback)

DOJ charges SPLC with fraud for paying white supremacist groups $3M

Show HN: Stonks-CLI – track your investment portfolio from your terminal

I spent 20 years building an AI agent engine, and what v6 got right

UK lawmakers approve lifetime smoking ban for today's under-18s

Show HN: API Ingest – Agentic Search in API Docs

Show HN: An MCP server that fact-checks AI bug diagnoses against AST evidence

Prinesh Where R U?

Inko 0.20.0: reducing heap allocations by 50%

Probing the Planck scale with quantum computation

Australian social media ban marred by weak platform checks, tech providers say

AudioRoute – Capture system audio into any DAW on macOS

YouTube complies with Indonesia's social media curbs, minister says

Critical RCE Vulnerability in LiteLLM Proxy

If a bird flu pandemic starts, we may have an mRNA vaccine ready

Millions of renters hit by unlawful data collection

Building the Google Photos Web UI (2018)