Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

48•cloudking•2h ago

Has anyone here fully swapped Claude/GPT for a local model as their main coding tool, not just for side experiments? If so, please share your setup and performance (e.g tok/s)

Comments

tumetab1•59m ago

Not yet, tried Gemma 4 on an Apple M4 but the tok/s is significant lower than the cloud offering.

Also,the lack of enterprise tooling to help selected an appropriate model and tooling to run a local LLM does not help.

arjie•59m ago

Not “local” and not interactive coding but sharing since it might be helpful. I have 2x RTX Pro 6000 Blackwell running DeepSeek V4 Flash. I get 160 tok/s raw but it’s a reasoning model. For my use case, I have it auto-write code and another system auto-review the code.

I occasionally use it with pi to write some code and it’s blazing fast but it’s mostly habit that keeps me with CC and Codex.

leptons•5m ago

Have you measured your electricity consumption for this rig? I have to wonder how much it would cost you per month.

HappySweeney•48m ago

I have an optane and lots of ram, so I tried full-fat models for writing some function overnight, as I get about 0.7 t/s. My current go-to test is to update a scalar function to transpose a bit-matrix to one using avx512. the cloud models all play with that like its nothing. Kimi 2.6 and GLM 5.1 both failed miserably.

kertoip_1•37m ago

Just attach OpenRouter to your coding agent tool and try yourself. All relevant open weight models are there. Every person have different needs and expectations

christkv•31m ago

Waiting for this https://github.com/antirez/ds4 to stabilize for strix halo.

acc_297•26m ago

I've been wondering lately if it would help to take a medium sized model and either in cloud or some local setup actually do Reinforcement Learning from Human Feedback (RLHF) on every prompt as a chore - I don't know if trying to manually finetune a model to your use habits would ruin it or help - ideally if you were diligent you could get rid of some of the ticks that make models for the general public difficult to work with e.g. overly sycophantic, overly verbose, annoying tendency to explain via analogies

but perhaps one individuals prompt feedback just isn't going to ever be enough I'm not sure how much you need (I know people working at big companies that have purchased in-house agents fine-tuned on internal documents etc.. and apparently these end up with bizarre behaviours not necessarily more helpful than the standard models)

I'd like to be able to essentially edit every response given by an agent and then finetune on the difference between what it produced and how I edited the text. Personally I would just remove a lot of the adjectives and try to distill the responses to core responses but I worry based on some of the work done by Owain Evans and other alignment researchers that this can sometimes push agents into tricky-to-predict tendancies.

rolisz•14m ago

I'm interested in trying something similar. I was thinking to do this for my OpenClaw agent.

About Owain Evans work: I think he did SFT. On Twitter someone was saying that RL is not as susceptible to what he showed. I'd like to try that

dude250711•25m ago

Yes, running a local model on a natural wetware substrate here.

Recommended setup: plenty of nutrients, some caffeine and a quiet environment.

Performance - not currently measured in tokens: roughly average.

HPsquared•20m ago

I personally get about 50 tokens per hour.

K0balt•23m ago

Pretty good results with qwen 3.6 27b dense. I’d say it’s about equal to (Claude) haiku 4.5 maybe sonnet depending on the task.

kandros•21m ago

I’d rather ask my butcher than Haiku for coding tasks

Razengan•21m ago

Related: Are there any viable distributed AI models?

Like how we've had SETI at Home, Folding at Home, BitTorrent etc. People are clearly willing to donate their computer resources to distributed projects.

Maybe in a dAI network anyone could submit content for training on, and each user running a "node" could have their own custom private conditions on which type of content to accept for training or inference.

Like someone who dislikes anime could say "never accept anime related content or queries" so their node would basically opt-out from any data or questions about anime.

joshuamoyers•14m ago

I think it'd be very hard to achieve viable tokens/s or get arithmetic intensity to be high enough in general, since many cases in existing training and inference are memory bandwidth limited. Definitely seems possible to conceptually have a slow pipeline that is distributed though.

_davide_•18m ago

i used to mix remote and local minimax 2.7(q3) on my strix halo, it run at 30 tg and 220 tokens pp... it was a bit painful slow, but it was a good feeling i could stay offline. unfortunately m3 which is at opus .8 levels is 460b parameters and doesn't even fit in 128gb of memory, let alone a big context. strix halo feels like a toy for ai purposes. https://kyuz0.github.io/amd-strix-halo-toolboxes/

ryandrake•16m ago

Always a bit disappointed in the details in these kinds of threads. When you do get answers, they're never specific enough to try out on your own. It'll be something like "I use Qwen 3.5 and get great results!" OK but what quantization are you using? What llama parameters? What context size? What GPU are you running it on, and how much VRAM does it have? Are you hosting it on a separate box, or running it locally on your dev machine? What coding agent tool are you using, and how is it configured / hooked up to the model?

anonymousiam•15m ago

This was posted shortly after your Ask HN post:

My Homelab AI Dev Platform

https://news.ycombinator.com/item?id=48542433

ecshafer•13m ago

I work with a few models on servers, so not local, but self hosted with ollama. gemma-4, glm 4.7 flash, and qwen 3.6. glm is the best at coding agentically. But I still don't think any of them reach the levels of gpt 5.5 or opus 4.8.

system2•12m ago

Until I can buy an 80GB VRAM GPU, I won't attempt to do it. A local LLM is always missing something that needs a bigger model.

mitchell_h•9m ago

Tried. The context windows just weren't big enough.

deadbabe•7m ago

Prompt more directly instead of open ended.

nfrankel•6m ago

I tried. It works in theory: https://blog.frankel.ch/tokensparsamkeit-coding-assistants/#...

Results depend on the model, of course, and your computer is the limit. Mine wasn't up to the task, unfortunately.

codinhood•5m ago

I don't think you're going to get many "true" answers to this. The opportunity cost of not using the latest and best models is just too much right now.

Every month I research this and come to the same conclusion: the time, effort, and cost required to get local models (and the coding tools around them) to perform even close to Claude Code with sonnet/opus just not worth it right now. If it was, it would be distributive enough to be in the news.

Not that I'm discounting someone hasn't already solved this, just trying to Occam razor my way out of diving too deep down rabbit holes.

SkitterKherpi•4m ago

It has so far been the kind of thing that always feels like the next version of the local models would be the one that is just good enough.

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Ask HN: What are you working on? (June 2026)

I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

Ask HN: What agentic directory structure do you use?

Ask HN: How are thinking efforts implemented?

Tell HN: Claude is completely unusable for biology

I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMA

What I have done with Claude Code in the last 60 days being a non tech person

Ask HN: Favorite text heavy blogs that are a joy to read?

Ask HN: Why hasn't there been a real competitor to Ticketmaster yet?

Ask HN: How do you handle release notes for non-technical users?

AWS Bedrock to require sharing data with Anthropic for Mythos and future models

Ask HN: Want to build something open source on nights and weekends together?

Ask HN: What does your local LLM setup looks like?

Notes on DeepSeek

Ask HN: Would it be useful to have a slop button in addition to flag?

Ask HN: Are most corporate SWE jobs performative?

Is there a name for the type of comments agents add where they leak the prompt?

Ask HN: What are tools you have made for yourself since the advent of AI?

Ask HN: Year of Linux Desktop is fun with LLMs

Ask HN: How do you get into a flow state when using AI to code?

I procrastinate by building tools to stop me from procrastinating: A sad story

Ask HN: Are you still using a Vision Pro?

Ask HN: Is anyone shorting the overspend in AI yet?

Ask HN: Do you have AI psychosis?

FTX's former Anthropic stake would be worth about $75B at today's valuation

Story of How Im Running an Unlimited $6/Month AI Provider on 4x RTX 3090s

Ask HN: Is there a metric for AI code quality?

Ask HN: What internal tool did you build that became a product?

What if we legally required politicians to work regular jobs 2 months a year?

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Comments

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Ask HN: What are you working on? (June 2026)

I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

Ask HN: What agentic directory structure do you use?

Ask HN: How are thinking efforts implemented?

Tell HN: Claude is completely unusable for biology

I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMA

What I have done with Claude Code in the last 60 days being a non tech person

Ask HN: Favorite text heavy blogs that are a joy to read?

Ask HN: Why hasn't there been a real competitor to Ticketmaster yet?

Ask HN: How do you handle release notes for non-technical users?

AWS Bedrock to require sharing data with Anthropic for Mythos and future models

Ask HN: Want to build something open source on nights and weekends together?

Ask HN: What does your local LLM setup looks like?

Notes on DeepSeek

Ask HN: Would it be useful to have a slop button in addition to flag?

Ask HN: Are most corporate SWE jobs performative?

Is there a name for the type of comments agents add where they leak the prompt?

Ask HN: What are tools you have made for yourself since the advent of AI?

Ask HN: Year of Linux Desktop is fun with LLMs

Ask HN: How do you get into a flow state when using AI to code?

I procrastinate by building tools to stop me from procrastinating: A sad story

Ask HN: Are you still using a Vision Pro?

Ask HN: Is anyone shorting the overspend in AI yet?

Ask HN: Do you have AI psychosis?

FTX's former Anthropic stake would be worth about $75B at today's valuation

Story of How Im Running an Unlimited $6/Month AI Provider on 4x RTX 3090s

Ask HN: Is there a metric for AI code quality?

Ask HN: What internal tool did you build that became a product?

What if we legally required politicians to work regular jobs 2 months a year?