TPUs vs. GPUs and why Google is positioned to win AI race in the long term

https://www.uncoveralpha.com/p/the-chip-made-for-the-ai-inference

62•vegasbrianc•3h ago

Comments

sbarre•1h ago

A question I don't see addressed in all these articles: what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.

blibble•1h ago

the entire organisation has been built over the last 25 years to produce GPUs

turning a giant lumbering ship around is not easy

sbarre•1h ago

For sure, I did not mean to imply they could do it quickly or easily, but I have to assume that internally at Nvidia there's already work happening to figure out "can we make chips that are better for AI and cheaper/easier to make than GPUs?"

sofixa•1h ago

> what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.

Nothing prevents them per se, but it would risk cannibalising their highly profitable (IIRC 50% margin) higher end cards.

numbers_guy•1h ago

Nothing in principle. But Huang probably doesn't believe in hyper specializing their chips at this stage because it's unlikely that the compute demands of 2035 are something we can predict today. For a counterpoint, Jim Keller took Tenstorrent in the opposite direction. Their chips are also very efficient, but even more general purpose than NVIDIA chips.

fooker•1h ago

That's exactly what Nvidia is doing with tensor cores.

bjourne•56m ago

Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.

llm_nerd•1h ago

For users buying H200s for AI workloads, the "ASIC" tensor cores deliver the overwhelming bulk of performance. So they already do this, and have been since Volta in 2017.

To put it into perspective, the tensor cores deliver about 2,000 TFLOPs of FP8, and half that for FP16, and this is all tensor FMA/MAC (comprising the bulk of compute for AI workloads). The CUDA cores -- the rest of the GPU -- deliver more in the 70 TFLOP range.

So if data centres are buying nvidia hardware for AI, they already are buying focused TPU chips that almost incidentally have some other hardware that can do some other stuff.

I mean, GPUs still have a lot of non-tensor general uses in the sciences, finance, etc, and TPUs don't touch that, but yes a lot of nvidia GPUs are being sold as a focused TPU-like chip.

sorenjan•1h ago

Is it the Cuda cores that run the vertex/fragment/etc shaders in normal GPUs? Where does the ray tracing units fit in? How much of a modern Nvidia GPU is general purpose vs specialized to graphics pipelines?

LogicFailsMe•1h ago

That's pretty much what they've been doing incrementally with the data center line of GPUs versus GeForce since 2017. Currently, the data center GPUs now have up to 6 times the performance at matrix math of the GeForce chips and much more memory. Nvidia has managed to stay one tape out away from addressing any competitors so far.

The real challenge is getting the TPU to do more general purpose computation. But that doesn't make for as good a story. And the point about Google arbitrarily raising the prices as soon as they think they have the upper hand is good old fashioned capitalism in action.

timmg•1h ago

They will, I'm sure.

The big difference is that Google is both the chip designer *and* the AI company. So they get both sets of profits.

Both Google and Nvidia contract TSMC for chips. Then Nvidia sells them at a huge profit. Then OpenAI (for example) buys them at that inflated rate and them puts them into production.

So while Nvidia is "selling shovels", Google is making their own shovels and has their own mines.

1980phipsi•1h ago

Aka vertical integration.

sojuz151•1h ago

They lose the competitive advantage. They have nothing more to offer than what Google has in-house.

HarHarVeryFunny•56m ago

It's not that the TPU is better than an NVidia GPU, it's just that it's cheaper since it doesn't have a fat NVidia markup applied, and is also better vertically integrated since it was designed/specified by Google for Google.

Workaccount2•44m ago

Deepmind gets to work directly with the TPU team to make custom modifications and designs specifically for deepmind projects. They get to make pickaxes that are made exactly for the mine they are working.

Everyone using Nvidia hardware has a lot of overlap in requirements, but they also all have enough architectural differences that they won't be able to match Google.

OpenAI announced they will be designing their own chips, exactly for this reason, but that also becomes another extremely capital intensive investment for them.

This also doesn't get into that Google also already has S-tier dataceters and datacenter construction/management capabilities.

clickety_clack•1h ago

Any chance of a bit of support for jax-metal, or incorporating apple silicon support into Jax?

lvl155•1h ago

Right because people would love to get locked into another even more expensive platform.

svantana•1h ago

That's mentioned in the article, but is the lock-in really that big? In some cases, it's as easy as changing the backend of your high-level ML library.

LogicFailsMe•1h ago

That's what it is on paper. But in practice you trade one set of hardware idiosyncrasies for another and unless you have the right people to deal with that, it's a hassle.

lvl155•50m ago

On top, when you get locked into Google Cloud, you’re effectively at the mercy of their engineers to optimize and troubleshoot. Do you think Google will help their potential competitors before they help themselves? Highly unlikely considering their actions in the past decade plus.

Irishsteve•1h ago

I thin k you can only run on google cloud not aws bare metal azure etc

dana321•1h ago

That and the fact they can self-fund the whole AI venture and don't require outside investment.

jsheard•1h ago

That and they were harvesting data way before it was cool, and now that it is cool, they're in a privileged position since almost no-one can afford to block GoogleBot.

They do voluntarily offer a way to signal that the data GoogleBot sees is not to be used for training, for now, and assuming you take them at their word, but AFAIK there is no way to stop them doing RAG on your content without destroying your SEO in the process.

lazyfanatic42•29m ago

Wow, they really got folks by the short hairs if that is true...

mrbungie•59m ago

The most fun fact about all the developments post-ChatGPT is that people apparently forgot that Google was doing actual AI before AI meant (only) ML and GenAI/LLMs, and they were top players at it.

Arguably main OpenAI raison d'être was to be a counterweight to that pre-2023 Google AI dominance. But I'd also argue that OpenAI lost its way.

lvl155•58m ago

And they forgot to pay those people so most of them left.

OccamsMirror•56m ago

To be fair, they weren't increasing Ads revenue.

lvl155•54m ago

They literally gave away their secret sauce to OpenAI and pretended like it wasn’t a big opportunity.

mrbungie•39m ago

Just as expected from a big firm with slower organizational speed. They can afford to make those mistakes.

bhouston•1h ago

In my 20+ years of following NVIDIA, I have learned to never bet against them long-term. I actually do not know exactly why they continually win, but they do. The main issue they have a 3-4 year gap between wanting a new design pivot and realizing it (silicon has a long "pipeline"), it can seem that they may be missing a new trend or swerve in the demands of the market, it is often simply because there is this delay.

bryanlarsen•1h ago

You could have said the same thing about Intel for ~50 years.

newyankee•1h ago

Fair, but the 75% margins can be reduced to 25% with healthy competition. The lack of competition in the frontier chips space was always the bottleneck to commoditization of computation, if such a thing is even possible

siliconc0w•1h ago

Google has always had great tech - their problem is the product or the perseverance, conviction, and taste needed to make things people want.

thomascgalvin•58m ago

Their incentive structure doesn't lead to longevity. Nobody gets promoted for keeping a product alive, they get promoted for shipping something new. That's why we're on version 37 of whatever their chat client is called now.

I think we can be reasonably sure that search, Gmail, and some flavor of AI will live on, but other than that, Google apps are basically end-of-life at launch.

nostrademons•31m ago

It's telling that basically all of Google's successful projects were either acquisitions or were sponsored directly by the founders (or sometimes, were acquisitions that were directly sponsored by the founders). Those are the only situations where you are immune from the performance review & promotion process.

villgax•46m ago

Fuschia or me?

villgax•1h ago

https://killedbygoogle.com

riku_iki•1h ago

It's all small products which didn't receive traction.

davidmurdoch•1h ago

It's not though. Chromecast, g suite legacy, podcast, music, url shortener,... These weren't small products.

riku_iki•55m ago

chromecast is alive, podcast, music were migrated to youtube app, url shortener is not core business and just side hustle for google. Not familiar with g suite legacy.

bgwalter•58m ago

Google Hangouts wasn't small. Google+ was big and supposedly "the future" and is the canonical example of a huge misallocation of resources.

Google will have no problem discontinuing Google "AI" if they finally notice that people want a computer to shut up rather than talk at them.

riku_iki•54m ago

> Google+ was big

how you define big? My understanding they failed to compete with facebook, and decided to redirect resources somewhere else.

Workaccount2•39m ago

Google completely fumbled Google+ by doing a slow invite only launch.

The hype when it was first coming to market was intense. But then nobody could get access because they heavily restricted sign ups.

By the time it was in "open beta" (IIRC like 6-7 mos later), the hype had long died and nobody cared about it anymore.

villgax•53m ago

Wait until Apple's ChromeBook competitor shows up to eat their lunch just like switching to another proprietary stack with no dev ecosystem will die out. Sure they'll go after big ticket accounts, also take a guess at what else gets sanctioned next.

mupuff1234•41m ago

That's actually one of the reasons why Google might win.

Nvidia is tied down to support previous and existing customers while Google can still easily shift things around without needing to worry too much about external dependencies.

qwertox•1h ago

How high are the chances that as soon as China produces their own competitive TPU/GPU, they'll invade Taiwan in order to starve the West in regards to processing power, while at the same time getting an exclusive grip on the Taiwanese Fabs?

gostsamo•1h ago

Not very. Those fabs are vulnerable things, shame if something happens to them. If China attacks, it would be for various other reasons and processors are only one of many considerations, no matter how improbable it might sound to an HN-er.

qwertox•53m ago

What if China becomes self-sufficient enough to no longer rely on Taiwanese Fabs, and hence having no issues with those Fabs getting destroyed. That would put China as the leader once and for all.

gostsamo•45m ago

First, the US has advanced fab capabilities and in case of a need can develop them further. On the other side, China will suffer a Russia style blockback while caught up in a nasty war with Taiwan.

Totally possible, but the second order effects are much more complex than "leader once for all". The path for victory for China is not war despite the west, but a war when the west would not care.

the_af•37m ago

The best path for victory for China is probably no war at all. War is wasteful and risky.

A4ET8a8uTh0_v2•1h ago

Seems low at the moment with the concept of G2 being floated as generic understanding of China's ascension to where Russia used to be effectively recreating bipolar semi cold war world order. Mind, I am not saying impossible, but there are reasons China would want to avoid this scenario ( probably one of the few things US would not tolerate and would likely retaliate ).

hjouneau•1h ago

If they have the fabs but ASML doesn't send them their new machines, they will just end up in the same situation as now, just one generation later. If China wants to compete, they need to learn how to make the EUV light and mirrors.

GordonS•1h ago

Highly unlikely. Despite the rampant anti-Chinese FUD that's so prevalent in the media (and, sadly, here on HN), China isn't really in the habit of invading other lands.

CuriouslyC•55m ago

The plot twist here is that China doesn't view Taiwan as foreign.

the_af•33m ago

But China also doesn't see war as the best path forward in Taiwan (they want to return it to the mainland, not lay waste to it). The grandparent comment is unfairly downvoted in my opinion, the fact remains modern China is far less likely to be involved in military campaigns than, say, the US.

CuriouslyC•57m ago

The US would destroy TSMC before letting China have it. China also views military conquest of Taiwan as less than ideal for a number of reasons, so I think right now it's seen as a potential defensive move in the face of American aggression.

bryanlarsen•26m ago

China will invade Taiwan when they start losing, not when they're increasingly winning.

As long as "tomorrow" is a better day to invade Taiwan than today is, China will wait for tomorrow.

ricardo81•1h ago

It's a cool subject and article and things I only have a general understanding of (considering the place of posting).

What I'm sure about is having a programming unit more purposed to a task is more optimal than a general programming unit designed to accommodate all programming tasks.

More and more of the economics of programming boils down to energy usage and invariably towards physical rules, the efficiency of the process has the benefit of less energy consumed.

As a Layman is makes general sense. Maybe a future where productivity is based closer on energy efficiency rather than monetary gain pushes the economy in better directions.

Cryptocurrency and LLMs seem like they'll play out that story over the next 10 years.

zenoprax•59m ago

I have read in the past that ASICs for LLMs are not as simple a solution compared to cryptocurrency. In order to design and build the ASIC you need to commit to a specific architecture: a hashing algorithm for a cryptocurrency is fixed but the LLMs are always changing.

Am I misunderstanding "TPU" in the context of the article?

p-e-w•53m ago

It’s true that architectures change, but they are built from common components. The most important of those is matrix multiplication, using a relatively small set of floating point data types. A device that accelerates those operations is, effectively, an ASIC for LLMs.

bfrog•51m ago

We used to call these things DSPs

tuhgdetzhh•20m ago

What is the difference between a DSP and Asic? Is a GPU a DSP?

HarHarVeryFunny•42m ago

Regardless of architecture (which is anyways basically the same for all LLMs), the computational needs of modern neural networks are pretty generic, centered around things like matrix multiply, which is what the TPU provides. There is even TPU support for some operations built into PyTorch - it is not just a proprietary interface that Google use themselves.

paulmist•58m ago

> The GPUs were designed for graphics [...] However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” [...] A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping.

With simulations becoming key to training models doesn't this seem like a huge problem for Google?

m4r1k•58m ago

Google's real moat isn't the TPU silicon itself—it's not about cooling, individual performance, or hyper-specialization—but rather the massive parallel scale enabled by their OCS interconnects.

To quote The Next Platform: "An Ironwood cluster linked with Google’s absolutely unique optical circuit switch interconnect can bring to bear 9,216 Ironwood TPUs with a combined 1.77 PB of HBM memory... This makes a rackscale Nvidia system based on 144 “Blackwell” GPU chiplets with an aggregate of 20.7 TB of HBM memory look like a joke."

Nvidia may have the superior architecture at the single-chip level, but for large-scale distributed training (and inference) they currently have nothing that rivals Google's optical switching scalability.

villgax•47m ago

100 times more chips for equivalent memory, sure.

NaomiLehman•44m ago

I think it's not about the cost but the limits of quickly accessible RAM

croon•31m ago

Ironwood is 192GB, Blackwell is 96GB, right? Or am i missing something?

m4r1k•23m ago

Check the specs again. Per chip, TPU 7x has 192GB of HBM3e, whereas the NVIDIA B200 has 186GB.

While the B200 wins on raw FP8 throughput (~9000 vs 4614 TFLOPs), that makes sense given NVIDIA has optimized for the single-chip game for over 20 years. But the bottleneck here isn't the chip—it's the domain size.

NVIDIA's top-tier NVL72 tops out at an NVLink domain of 72 Blackwell GPUs. Meanwhile, Google is connecting 9216 chips at 9.6Tbps to deliver nearly 43 ExaFlops. NVIDIA has the ecosystem (CUDA, community, etc.), but until they can match that interconnect scale, they simply don't compete in this weight class.

thelastgallon•46m ago

Also, Google owns the entire vertical stack, which is what most people need. It can provide an entire spectrum of AI services far cheaper, at scale (and still profitable) via its cloud. Not every company needs to buy the hardware and build models, etc., etc.; what most companies need is an app store of AI offerings they can leverage. Google can offer this with a healthy profit margin, while others will eventually run out of money.

mrbungie•23m ago

It's fun when then you read last Nvidia tweet [1] suggesting that still their tech is better, based on pure vibes as anything in the (Gen)AI-era.

[1] https://x.com/nvidianewsroom/status/1993364210948936055

jimbohn•58m ago

Given the importance of scale for this particular product, any company placing itself on "just" one layer of the whole story is at a heavy disadvantage, I guess. I'd rather have a winning google than openai or meta anyway.

subroutine•48m ago

> I'd rather have a winning google than openai or meta anyway.

Why? To me, it seems better for the market, if the best models and the best hardware were not controlled by the same company.

jimbohn•34m ago

I agree, it would be the best of bad cases, in a sense. I have low trust in OpenAI due to its leadership, and in Meta, because, well, Meta has history, let's say.

mosura•55m ago

This is the “Microsoft will dominate the Internet” stage.

The truth is the LLM boom has opened the first major crack in Google as the front page of the web (the biggest since Facebook), in the same way the web in the long run made Windows so irrelevant Microsoft seemingly don’t care about it at all.

villgax•44m ago

Exactly, ChatGPT pretty much ate away ad volume & retention if th already garbage search results weren't enough. Don't even get me started on Android & Android TV as an ecosystem.

thesz•50m ago

5 days ago: https://news.ycombinator.com/item?id=45926371

Sparse models have same quality of results but have less coefficients to process, in case described in the link above sixteen (16) times as less.

This means that these models need 8 times less data to store, can be 16 and more times faster and use 16+ times less energy.

TPUs are not all that good in the case of sparse matrices. They can be used to train dense versions, but inference efficiency with sparse matrices may be not all that great.

HarHarVeryFunny•39m ago

TPUs do include dedicated hardware, SparseCores, for sparse operations.

https://docs.cloud.google.com/tpu/docs/system-architecture-t...

https://openxla.org/xla/sparsecore

1980phipsi•47m ago

> It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).

Does anyone have a sense of why CUDA is more important for training than inference?

johnebgd•43m ago

I think it’s the same reason windows is inportant to desktop computers. Software was written to depend on it. Same with most of the software out there today to train being built around CUDA. Even a version difference of CUDA can break things.

NaomiLehman•41m ago

inference is often a static, bounded problem solvable by generic compilers. training requires the mature ecosystem and numerical stability of cuda to handle mixed-precision operations. unless you rewrite the software from the ground up like Google but for most companies it's cheaper and faster to buy NVIDIA hardware

never_inline•31m ago

> static, bounded problem

What does it even mean in neural net context?

> numerical stability

also nice to expand a bit.

baby_souffle•41m ago

That quote left me with the same question. Something about decent amount of ram on one board perhaps? That’s advantageous for training but less so for inference?

llm_nerd•38m ago

It's just more common as a legacy artifact from when nvidia was basically the only option available. Many shops are designing models and functions, and then training and iterating on nvidia hardware, but once you have a trained model it's largely fungible. See how Anthropic moved their models from nvidia hardware to Inferentia to XLA on Google TPUs.

Further it's worth noting that the Ironwood, Google's v7 TPU, supports only up to BF16 (a 16-bit floating point that has the range of FP32 minus the precision. Many training processes rely upon larger types, quantizing later, so this breaks a lot of assumptions. Yet Google surprised and actually training Gemini 3 with just that type, so I think a lot of people are reconsidering assumptions.

imtringued•12m ago

When training a neural network, you usually play around with the architecture and need as much flexibility as possible. You need to support a large set of operations.

Another factor is that training is always done with batches. Inference batching depends on the number of concurrent users. This means training tends to be compute bound where supporting the latest data types is critical, whereas inference speeds are often bottlenecked by memory which does not lend itself to product differentiation. If you put the same memory into your chip as your competitor, the difference is going to be way smaller.

jmward01•44m ago

How much of current GPU and TPU design is based around attn's bandwith hungry design? The article makes it seem like TPUs aren't very flexible so big model architecture changes, like new architectures that don't use attn, may lead to useless chips. That being said, I think it is great that we have some major competing architectures out there. GPUs, TPUs and UMA CPUs are all attacking the ecosystem in different ways which is what we need right now. Diversity in all things is always the right answer.

thelastgallon•43m ago

With its AI offerings, can Google suck the oxygen out of AWS? AWS grew big because of compute. The AI spend will be far larger than compute. Can Google launch AI/Cloud offerings with free compute bundled? Use our AI, and we'll throw in compute for free.

loph•32m ago

This is highly relevant:

"Meta in talks to spend billions on Google's chips, The Information reports"

https://www.reuters.com/business/meta-talks-spend-billions-g...

We're Losing Our Voice to LLMs

Arthur Conan Doyle explored men’s mental health through Sherlock Holmes

Show HN: Runprompt – run .prompt files from the command line

Linux Kernel Explorer

Penpot: The Open-Source Figma

Show HN: MkSlides – Markdown to slides with a similar workflow to MkDocs

Ray Marching Soft Shadows in 2D (2020)

Interactive λ-Reduction

Mixpanel Security Breach

DIY NAS: 2026 Edition

Technical Deflation

Music eases surgery and speeds recovery, study finds

G0-G3 corners, visualised: learn what "Apple corners" are

Willis Whitfield: Creator of clean room technology still in use today (2024)

'Turncoat' by Dennis Sewell Review

The Concrete Pontoons of Bristol

Gemini CLI Tips and Tricks for Agentic Coding

The State of GPL Propagation to AI Models

S&box is now an open source game engine

Show HN: SyncKit – Offline-first sync engine (Rust/WASM and TypeScript)

Running Unsupported iOS on Deprecated Devices

Last Issue of "ECMAScript News"

Coq: The World's Best Macro Assembler? (2013) [pdf]

Closest Harmonic Number to an Integer

Functional Data Structures and Algorithms: a Proof Assistant Approach

Voyager 1 is about to reach one light-day from Earth

A Fast 64-Bit Date Algorithm (30–40% faster by counting dates backwards)

Migrating the main Zig repository from GitHub to Codeberg

Fara-7B: An efficient agentic model for computer use

Show HN: Era – Open-source local sandbox for AI agents