I understand the current hardware limitations and that you can't just put a frontier LLM in a black box and hook it up to your existing MBP via USB-C. In my estimation, something like a Apple Mac Studio M3 (256gb or more of unified memory) is maybe one possible option ($7,500 - $10,000) for running a 405b open weights model... but it wouldn't be very fast. And it wouldn't come close to the level of quality or workflow of Claude Code.
To really run a current frontier LLM locally with something like >30 tokens per second would probably require four A100s.. add in NVLink bridges, expensive cooling, 256GB RAM, a cool case with LED lights (optional) and we're talking about ~$60,000? $80,000?
So my question is: How many generations—or what specific architectural shifts (specialized ASICs, better quantization, etc.)—do we need before we can buy a dedicated co-processor box that sits on a desk and runs a Sonnet-level agent at viable speeds... at a price point where it makes sense vs. spending $500-$2,000 per month per developer on API fees? In my opinion that "makes sense to me, here's the credit card" price point might be $10,000 right now, but I could be wrong.
And related question: Who will do this? Anthropic could probably make a killing right now IF they had could sell "Claude Code in a box for $10,000" but would they ever want to? It would be cannibalizing the majority of their business. But Apple might do this. And it might only be one or two generations of hardware upgrades away. They just need the "frontier LLM" to stick into the box.
lihaciudaniel2•1h ago
bigyabai•1h ago