Ask HN: Is the next big thing locally running coding agents?

1•baigy•40m ago

There's extreme price escalation on part of Anthropic, with token spend now approaching levels that have made many-an-enterprise scratch their heads.

At the same time, judging by opensource advances (E.g. Qwen 3.6 27B), hosting a smart enough local LLM on 16GB VRAM (or equivalent) is increasingly becoming a reality. Lastly, I see most coding to be of intermediate difficulty, not beyond.

Seems to me it's a matter of time that people shift to free Claude Code type experiences, powered by local LLMs.

What do you think?

Comments

damnitbuilds•31m ago

I got Qwen 3.6 running locally on 12GB VRAM.

It went:

  AI: "I see you are building a Django project. How can I help?"

  Me: "When I click on the Reload button, it does not set the reload option correctly. Fix this"

     <10 minutes>

  AI: "I see you are building a Django project. How can I help?"

Needs more tweaking of the context window, I think.

Seriously, I agree that this is the future, when OpenAI et al have gone bust.

baigy•26m ago

I think it's a huge bubble about to pop. I get that enterprises are like elephants, slow to move, locked into agreements.

But I think free is going to be infinitely better than paying Anthropic more money than you used to spend on your human payroll. The big pop is coming.

giwook•22m ago

I think this is the key issue with running locally hosted models.

Yes, technically you can run them on 12gb vram.

But should you?

Realistically 64gb seems to be the current threshold for getting meaningful work done while also maintaining a large enough context window.

baigy•18m ago

This will drop further with increase in intelligence density.

jonahbenton•28m ago

There are many markets. Qwen 3.6 27b at a high enough quant is good enough for many use cases. But enterprise-consumed tokens come with legal/data protection agreements. They have just gotten comfortable with BYOD- there is no BYOD equivalent set of practices and protections for local LLMs (BYOLLM). So some enterprises are getting back into prem GPU capacity.

baigy•24m ago

On prem GPU capacity - or decent enough devices for core engineering team - lends itself pretty nicely to local LLMs too. And you own the whole stack this way. Why pay premiums to Anthropic and fuel its trillion dollar valuation?

giwook•23m ago

This seems like an obvious progression imo though I think very much subject to change. Open weight models will become better, and memory prices will return to normal prices in a couple years (hopefully).

That being said I think an unpredictable variable here is how the companies building frontier models respond to what should be a noticeable inflection point in consumers turning towards locally hosted open weight models.

There is also a significant amount of compute that is being built out as we speak that should in theory reduce costs for providers of frontier models but that's a whole other can of worms.

Despite all of the very impressive open weight models that are available to us today, Anthropic and OpenAI continue to remain steps ahead of the competition. Most of the biggest and brightest minds in AI are working at frontier labs. It's not hard to foresee that these labs continue to maintain their edge given the amount of expertise and brainpower they've assembled.

Assuming frontier models continue to maintain their edge, even if it's on a subset of tasks (e.g. reasoning, judgment, planning), I see a convergence towards a hybrid workflow where both frontier and local models are used for specific tasks. e.g. Claude for reasoning, planning, judgment, with intelligent routing to cheap/free models tuned for certain tasks.

baigy•19m ago

Good points.

I feel where it all loses its legs is the fact that most coding work is intermediate complexity. You won't need super intelligence to code/maintain your CRM or what have you. Specialized firms may pay the premiums Anthropic/OpenAI expect, the vast majority of enterprises won't need to, for the vast majority of their use-cases.

Technical Interviews Reject the Wrong Engineers

Show HN: Let agents run any analysis with Mixpanel data, no UI required

The Unbearable Blandness Of The 2020's [video]

NATO commander: Europe has no alternative to Palantir's warfare tech

Leroy's elusive little people: A review on lilliputian hallucinations (2021)

What 1,281 agent runs reveal about coding agent failure in large codebases

Active beam headlights are finally coming to America

How OLTs may have exposed ISP networks

Show HN: A demo video of Effected Keyboard 2

Navox Network – Browser-only CRM built on weak-ties research

Build your own green threads library in C

Show HN: I made the first free ad blocker for podcasts

PULSELoCo: 17x less trainer-to-trainer bandwidth in distributed RL post-training

Collabora and Flipper: Opening Up the RK3576

AI Gateway Production Index

TSA Gold+ program for privatizing airport security screening

I spent 50 hours drawing a line graph

Microsoft warns of new Defender zero-days exploited in attacks

Show HN: opub, donated compute for open-source

A Booming Shadow Market of Sketchy A.I. Investments

Deepfakes Tore a High School Apart

Apparently former Facebook staffers are in high-ranking positions at Mozilla now

MCP-safeguard: first automated security scanner for MCP servers

I built a tool to stop AI coding agents from leaking my secrets

Realtime pixels-in-actions-out neural agent for Flappy Anna 3D

I built a small tool to reduce input token costs by 20-30% for agentic tasks

Morphogenic Systems Lead

Show HN: Six legendary marketers walk into a workflow

Agents will make your telemetry explode. You are not ready

We Reverse-Engineered Docker Sandbox's Undocumented MicroVM API