Pretty much every hardware vendor has an NPU
Even the latest NVIDIA Blackwell GPUs are general purpose, albeit with negligible "graphics" capabilites. They can run fairly arbitrary C/C++ code with only some limitations, and the area of the chip dedicated to matrix products (the "tensor units") is relatively small: less than 20% of the area!
Conversely, the Google TPUs dedicate a large area of each chip to pure tensor ops, hence the name.
This is partly why Google's Gemini is 4x cheaper than OpenAI's GPT5 models to serve.
Jensen Huang has said in recent interviews that he stands by the decision to keep the NVIDIA GPUs more general purpose, because this makes them flexible and able to be adapted to future AI designs, not just the current architectures.
That may or may not pan out.
I strongly suspect that the winning chip architecture will have about 80% of its area dedicated to tensor units, very little onboard cache, and model weights streamed in from High Bandwidth Flash (HBF). This would be dramatically lower power and cost compared to the current hardware that's typically used.
Something to consider is that as the size of matrices scales up in a model, the compute needed to perform matrix multiplications goes up as the cube of their size, but the other miscellaneous operations such as softmax, relu, etc.. scale up linearly with the size of the vectors being multiplied.
Hence, as models scale into the trillions of parameters, the matrix multiplications ("tensor" ops) dominate everything else.
>First wave of Ryzen AI desktop CPUs targets business PCs rather than DIYers.
I wanted a better strix halo (which has 128GB unified RAM and 40cu on the 8080s (or something) iGPU).
This looks like normal Ryzen mobile chips + but with fewer cus.
Microsoft: "Friendship ended with Intel, now AMD is my best friend"
cebert•2d ago
himata4113•1h ago
aljgz•1h ago
wood_spirit•1h ago
But does this extrapolate to the current way of doing AI being in normal life in a good way that ends up being popular? The way Microsoft etc is trying to put AI in everything is kinda saying no it isn’t actually what users want.
I’d like voice control in my PC or phone. That’s a use for these NPUs. But I imagine it is like AR- what we all want until it arrives and it’s meh.
snovv_crash•1h ago
vbezhenar•1h ago
It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
orbital-decay•1h ago
wtallis•43m ago
I think the overall trend is now moving somewhat away from having the CPU and GPU on one die. Intel's been splitting things up into several chiplets for most of their recent generations of processors, AMD's desktop processors have been putting the iGPU on a different die than the CPU cores for both of the generations that have an iGPU, their high-end mobile part does the same, even NVIDIA has done it that way.
Where we still see monolithic SoCs as a single die is mostly smaller, low-power parts used in devices that wouldn't have the power budget for a discrete GPU. But as this article shows, sometimes those mobile parts get packaged for a desktop socket to fill a hole in the product line without designing an entirely new piece of silicon.
fodkodrasz•1h ago
That is fine. Most ordinary users can benefit from these very basic use cases which can be accelerated.
Guess people also said this for video encoding acceleration, and now they use it on a daily basis for video conferencing, for example.
wtallis•52m ago
You don't do anything involving realtime image, video, or sound processing? You don't want ML-powered denoising and other enhancements for your webcam, live captions/transcription for video, OCR allowing you to select and copy text out of any image, object and face recognition for your photo library enabling semantic search? I can agree that local LLMs aren't for everybody—especially the kind of models you can fit on a consumer machine that isn't very high-end—but NPUs aren't really meant for LLMs, anyways, and there are still other kinds of ML tasks.
> It's similar to performance/effiency cores. I don't need power efficiency and I'd actually buy CPU that doesn't make that distinction.
Do you insist that your CPU cores must be completely homogeneous? AMD, Intel, Qualcomm and Apple are all making at least some processors where the smaller CPU cores aren't optimized for power efficiency so much as maximizing total multi-core throughput with the available die area. It's a pretty straightforward consequence of Amdahl's Law that only a few of your CPU cores need the absolute highest single-thread performance, and if you have the option of replacing the rest with a significantly larger number of smaller cores that individually have most of the performance of the larger cores, you'll come out ahead.
throwa356262•43m ago
Besides, most of what you mentioned doesn't run on NPU anyway. They are usually standard GPU workload.
wtallis•39m ago
And on the platforms that have a NPU with a usable programming model and good vendor support, the NPU absolutely does get used for those tasks. More fragmented platforms like Windows PCs are least likely to make good use of their NPUs, but it's still common to see laptop OEMs shipping the right software components to get some of those tasks running on the NPU. (And Microsoft does still seem to want to promote that; their AI PC branding efforts aren't pure marketing BS.)
anematode•24m ago
skirmish•1h ago
kijin•1h ago