Ask HN: How long before we get "coding agent in a box"?

2•heavyarms•1mo ago

I've been using Claude Code since the early beta days and, since the 4.5 Sonnet release, it's changed my workflow a lot. At least in my view, the current iteration of frontier coding agents are good enough to automate a lot of rote software development tasks and are worth the money... if put in the hands of capable developers who know how to use them. But giving unrestricted access to all of your developers to something like Claude Code is also signing yourself up for huge variability in OpEx budgets.

I understand the current hardware limitations and that you can't just put a frontier LLM in a black box and hook it up to your existing MBP via USB-C. In my estimation, something like a Apple Mac Studio M3 (256gb or more of unified memory) is maybe one possible option ($7,500 - $10,000) for running a 405b open weights model... but it wouldn't be very fast. And it wouldn't come close to the level of quality or workflow of Claude Code.

To really run a current frontier LLM locally with something like >30 tokens per second would probably require four A100s.. add in NVLink bridges, expensive cooling, 256GB RAM, a cool case with LED lights (optional) and we're talking about ~$60,000? $80,000?

So my question is: How many generations—or what specific architectural shifts (specialized ASICs, better quantization, etc.)—do we need before we can buy a dedicated co-processor box that sits on a desk and runs a Sonnet-level agent at viable speeds... at a price point where it makes sense vs. spending $500-$2,000 per month per developer on API fees? In my opinion that "makes sense to me, here's the credit card" price point might be $10,000 right now, but I could be wrong.

And related question: Who will do this? Anthropic could probably make a killing right now IF they had could sell "Claude Code in a box for $10,000" but would they ever want to? It would be cannibalizing the majority of their business. But Apple might do this. And it might only be one or two generations of hardware upgrades away. They just need the "frontier LLM" to stick into the box.

Comments

PaulHoule•1mo ago

I think it's the memory which is the bottleneck more so than processing speed and the obvious levers to push on are:

- more memory efficient models

- a whole system approach to getting better performance out of a less capable model

- more memory

the memory crisis, Micron shutting down a beloved brand that built trust over almost 30 years, and all of that is an economic externalization of memory as the bottleneck.

wmf•1mo ago

Barring some breakthrough... never. The cloud will always be cheaper.

NASA now allowing astronauts to bring their smartphones on space missions

Claude Code Is the Inflection Point

MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

AI Agent Automates Google Stock Analysis from Financial Reports

Voxtral Realtime 4B Pure C Implementation

I Was Trapped in Chinese Mafia Crypto Slavery [video]

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

Study of 150 developers shows AI generated code no harder to maintain long term

Spotify now requires premium accounts for developer mode API access

When Albert Einstein Moved to Princeton

Agents.md as a Dark Signal

System time, clocks, and their syncing in macOS

McCLIM and 7GUIs – Part 1: The Counter

So whats the next word, then? Almost-no-math intro to transformer models

Ed Zitron: The Hater's Guide to Microsoft

UK infants ill after drinking contaminated baby formula of Nestle and Danone

Show HN: Android-based audio player for seniors – Homer Audio Player

Starter Template for Ory Kratos

LLMs are powerful, but enterprises are deterministic by nature

Make your iPad 3 a touchscreen for your computer

Internationalization and Localization in the Age of Agents

Building a Custom Clawdbot Workflow to Automate Website Creation

Why the "Taiwan Dome" won't survive a Chinese attack

Xkcd: Game AIs

Windows 11 is finally killing off legacy printer drivers in 2026

From Offloading to Engagement (Study on Generative AI)