720x RDNA5 AT0 XL 25,920 GB VRAM 23,040 GB System RAM
~ $10 Million
Who is the target market here?
I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.
Can it run Crysis?
-- Jensen Huang
It's funny though... we're using deepseek now for features in our service and based on our customer-type we thought that they would be completely against sending their data to a third-party. We thought we'd have to do everything locally. But they seem ok with deepseek which is practically free. And the few customers that still worry about privacy may not justify such a high price point.
If private inference is actually non-negotiable, then sure, put GPUs in your colo and enjoy the infra pain, vendor weirdness, and the meeting where finance learns what those power numbers meant.
Theres a lot there that makes sense & I think needs to be considered. But a lot just seems to be out of the blue, included without connection, in my view. Feels like maybe are in-grouo messages, that I don't understand. How this is headered as against democracy is unclear to me, and revolting. I both think we must grapple with the world as it is, and this post is in that area, strongly, but to let fear be the dominant ruling emotion is one of the main definitions of conservativism, and it's use here to scare us sounds bad.
And his politics are a derivative of Great Man Theory, and his positions on things like democracy follow from that. This idea, and those espoused by some of the VC/tech elite like Peter Theil are that singular hardworking genius individuals can change the world on their own, and everyone who not in this top 0.1% are borderline NPCs.
They do this both because of their genius/hardwork, and also because they are willing to break the rules that are set forth by this bottom 99.9%.
I'm starting to call this ideology Authoritarian techno-Libertarianism. Its a delibriately oxymoronic name that I use, because these "Great Men" are definitely trying to change the world. IE, they are trying to impose their goals and values on the world without getting the buyin of other people.
Thats the "authoritarian" part. And then the "libertarian" part is that they are going about this imposition of their will on the world by doing it all themselves, through their own hard work.
Think "Person invents a world changing technology, that some people thing is bad, and just releases it open source for anyone to use". AI models are a great example, in fact. Once that technology is out there the genie cannot be put back into the bottle and a ton of people are going to lose their jobs, ect.
A distain for democracy follows directly from things like this. You dont wait for people to vote to allow you to change the world by inventing something new. You just do and watch the results.
Still, this is a great idea, and one I hope takes off. I think there's a good argument that the future of AI is in locally-trained models for everyone, rather than relying on a big company's own model.
One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.
If it shipped with like 4090+ (for a higher price) it’d be more tempting.
https://x.com/__tinygrad__/status/1983917797781426511
Stopped due to raising GPU prices:
Can confirm.
Maybe the volume for them is ok that well-intentioned but poor quality PRs can be politely(or otherwise, culture depending) disregarded and the method of generation is not important.
Then you could focus fire, like the script kiddies did with DDoS in the old days on fixing whatever preferred issues you have.
But let’s be real, 12k is kinda pushing it - what kind of people are gonna spend $65k or even $10M (lmao WTAF) on a boutique thing like this. I dont think these kinds of things go in datacenters (happy to be corrected) and they are way too expensive (and probably way too HOT) to just go in a home or even an office “closet”.
I had the same feeling as throwadem when reading this. Your comment clarify what they meant by "everyone"
Nowadays I find most things work fine on Arm. Sometimes something needs to be built from source which is genuinely annoying. But moving from CUDA to ROCm is often more like a rewrite than a recompile.
For 5K one can get a desktop PC with RTX 5090, that has 3x more compute, but 4x less VRAM - so depending on the workload may be a better option.
I’m pretty curious to see any benchmarks on inference on VRAM vs UM.
So for an LLM inference is relatively slow because of that bandwidth, but you can load much bigger smarter models than you could on any consumer GPU.
Machines with the 4xx chips are coming next month so maybe wait a week or two.
It's soldered LPDDR5X with amd strix halo ... sglang and llama.cpp can do that pretty well these days. And it's, you know, half the price and you're not locked into the Nvidia ecosystem
You can check what each model does on AMD Strix halo here:
Mac Studio or Mac Mini, depending on which gives you the highest amount of unified memory for ~$5k.
$12,000, $65,000, $10,000,000.
the town near my hometown has 650 – 800 houses (according to chatgpt).
crazy.
I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.
I'm also confused why this is 12U. My whole rig is 4u.
The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.
I'm glad this exists but it's... honestly pretty perplexing
The thing that’s less useful is the 64G VRAM/128G System RAM config, even the large MoE models only need 20B for the router, the rest of the VRAM is essentially wasted (Mixing experts between VRAM and/System RAM has basically no performance benefit).
Can't you offload KV to system RAM, or even storage? It would make it possible to run with longer contexts, even with some overhead. AIUI, local AI frameworks include support for caching some of the KV in VRAM, using a LRU policy, so the overhead would be tolerable.
With that said, people are trying to extend VRAM into system RAM or even NVMe storage, but as soon as you hit the PCI bus with the high bandwidth layers like KV cache, you eliminate a lot of the performance benefit that you get from having fast memory near the GPU die.
I almost sure it’s possible to custom build a machine as powerful as their red v2 within 9k budget. And have a lot of fun along the way.
I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.
Sorry, what? Is this just a scam?
I could swear I filed a GitHub issue asking about the plans for that but I don't see it. Anyway I think he mentioned it when explaining tinygrad at one point and I have wondered why that hasn't got more attention.
As far as boxes, I wish that there were more MI355X available for normal hourly rental. Or any.
Since when did our perception of tiny blow out of size in tech? Is it the influence of "hello world" eletron apps consuming 100mb of mem while idle setting the new standard? Anyway being an AI bro seems like an expensive hobby...
wongarsu•2h ago
Not revolutionary in any way, but nice. Unless I'm missing something here?
eurekin•1h ago
speedgoose•1h ago
nextlevelwizard•1h ago