I was ranting about this to my friends; Wallstreet is now banking on Tech firms to produce the illusion of growth and returns, rather than repackaging and selling subprime mortgages.
The tech sector seems to have a never ending supply of things to spur investment and growth: cloud computing, saas, mobile, social media, IoT, crypto, Metaverse, and now AI.
Some useful, some not so much.
Tech firms have a lot of pressure to produce growth, it's filled with very smart people, and wields influence on public policy. The flip side is the mortage crisis, at least before it collapsed, got more Americans into home ownership (even if they weren't ready for it). I'm not sure the tech sectors meteoric rise has been as helpful (sentiment of locals in US tech hubs suggests a overall feeling of dissatisfaction with tech)
Oh right, for their data centers. I could see this being useful there too, brings costs down lower.
Yes, in the sense that this is at least partially inspired by Apple's vertical integration playbook, which has now been extended to their own data centers based on custom Apple Silicon¹ and a built-for-purpose, hardened edition of Darwin².
¹ https://security.apple.com/blog/private-cloud-compute/ ² https://en.wikipedia.org/wiki/Darwin_(operating_system)
Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.
Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...
"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."
In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.
So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.
https://www.cdotrends.com/story/3823/groq-ai-chip-delivers-b...
Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.
Long term, I wonder if we're exiting the "platform compute" era, for want of a better term. By that I mean compute which can run more or less any operating system, software, etc. If everyone is siloed into their own vertically integrated hardware+operating system stack, the results will be awful for free software.
I think they do all deep learning for Gemini on ther own silicon.
But they also invented AI as we know it when they introduced transformer architecture and they’ve been more invested in machine learning than most companies for a very long time.
Maybe if you restrict it similarly to the Deepseek paper to "Gemini uses TPU for the final successful training run and for scaled inference" you might be correct, but there's no way that GPUs aren't involved for at minimum comparability and more rapid iteration reasons during the extremely buggy and error prone point of getting to the final training run. Certainly the theoretical and algorithmic innovations that are often being done at Google and do make their way into Gemini also sometimes using Nvidia GPUs.
GCP has a lot of, likely on the order of at least 1 million GPUs in their fleet today (I'm likely underestimating). Some of that is used internally and is made available to their engineering staff. What constitutes "deep learning for gemini" is very up to interpretation.
> The software titan is rather late to the custom silicon party. While Amazon and Google have been building custom CPUs and AI accelerators for years, Microsoft only revealed its Maia AI accelerators in late 2023.
They are too late for now, they realistically hardware takes a couple generations to become a serious contender and by the time Microsoft has a chance to learn from their hardware mistakes the “AI” bubble will have popped.
But, there will probably be some little LLM tools that do end up having practical value; maybe there will be a happy line-crossing point for MS and they’ll have cheap in-house compute when the models actually need to be able to turn a profit.
Not really and for the same reason Chinese players like Biren are leapfrogging - much of the work profile in AI/ML is "embarrassingly parallel".
If you are able to negotiate competitive fabrication and energy supply deals, you can mass produce your way into providing "good enough" performance.
The persona who cares about hardware performance in training isn't in the market for cloud offered services.
There was a split at MS where the ‘Next Gen’ bayesian was being done in the US and the frequentist work was being shipped off to China. Chris Bishop was promoted to head of MSR Cambridge which didn’t help.
Microsoft really is an institutionally stupid organization so I have no idea on which direction they actually go. My best guess is that it’s all talk.
So, unless they also solve that issue with their own hardware, then it will be like the TPU, which is limited to usage primarily at Google, or within very specific use cases.
There are only so many super talented software engineers to go around. If you're going to become an expert in something, you're going to pick what everyone else is using first.
I don't know. The transformer architecture uses only a limited number of primitives. Once you have ported those to your new architecture, you're good to go.
Also, Google has been using TPUs for a long time now, and __they__ never hit a brick wall for a lack of CUDA.
The name of the game has been custom SoCs and ASICs for a couple years now, because inference and model training is an "embarrassingly parallel" problem, and models that are optimized for older hardware can provide similar gains to models that are run on unoptimized but more performant hardware.
Same reason H100s remain a mainstay in the industry today, as their performance profile is well understood now.
Is anyone else getting crypto flashbacks?
wish with one hand and shit in the other, see which fills first
https://www.electronicdesign.com/technologies/analog/article...
https://www.analog.com/en/resources/analog-dialogue/articles...
okokwhatever•1h ago
synergy20•1h ago
ambicapter•1h ago
fidotron•1h ago
Just look at the implosion of the XBox business.
fennecbutt•14m ago
And I'm guessing that the decline is due to executive meddling.
What is it that executives do again? Beyond collecting many millions of dollars a year, that is.
fidotron•10m ago