frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A guide to local coding models

https://www.aiforswes.com/p/you-dont-need-to-spend-100mo-on-claude
120•mpweiher•2h ago

Comments

nzeid•2h ago
I appreciate the author's modesty but the flip-flopping was a little confusing. If I'm not mistaken, the conclusion is that by "self-hosting" you save money in all cases, but you cripple performance in scenarios where you need to squeeze out the kind of quality that requires hardware that's impractical to cobble together at home or within a laptop.

I am still toying with the notion of assembling an LLM tower with a few old GPUs but I don't use LLMs enough at the moment to justify it.

a_victorp•1h ago
If you ever do it, please make a guide! I've been toying with the same notion myself
suprjami•1h ago
If you want to do it cheap, get a desktop motherboard with two PCIe slots and two GPUs.

Cheap tier is dual 3060 12G. Runs 24B Q6 and 32B Q4 at 16 tok/sec. The limitation is VRAM for large context. 1000 lines of code is ~20k tokens. 32k tokens is is ~10G VRAM.

Expensive tier is dual 3090 or 4090 or 5090. You'd be able to run 32B Q8 with large context, or a 70B Q6.

For software, llama.cpp and llama-swap. GGUF models from HuggingFace. It just works.

If you need more than that, you're into enterprise hardware with 4+ PCIe slots which costs as much as a car and the power consumption of a small country. You're better to just pay for Claude Code.

satvikpendem•52m ago
Jeff Geerling has (not quite but sort of) guides: https://news.ycombinator.com/item?id=46338016
cloudhead•2h ago
In my experience the latest models (Opus 4.5, GPT 5.2) Are _just_ starting to keep up with the problems I'm throwing at them, and I really wish they did a better job, so I think we're still 1-2 years away from local models not wasting developer time outside of CRUD web apps.
OptionOfT•1h ago
Eh, these things are trained on existing data. The further you are from that the worse the models get.

I've noticed that I need to be a lot more specific in those cases, up to the point where being more specific is slowing me down, partially because I don't always know what the right thing is.

cloudhead•33m ago
For sure, and I guess that's kind of my point -- if the OP says local coding models are now good enough, then it's probably because he's using things that are towards the middle of the distribution.
simonw•2h ago
> I realized I looked at this more from the angle of a hobbiest paying for these coding tools. Someone doing little side projects—not someone in a production setting. I did this because I see a lot of people signing up for $100/mo or $200/mo coding subscriptions for personal projects when they likely don’t need to.

Are people really doing that?

If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic. The OpenAI one in particular is a great deal, because Codex is charged a whole lot lower than Claude.

The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

hamdingers•1h ago
And as a hobbyist the time to sign up for the $20/month plan is after you've spent $20 on tokens at least a couple times.

YMMV based on the kinds of side projects you do, but it's definitely been cheaper for me in the long run to pay by token, and the flexibility it offers is great.

iOSThrowAway•1h ago
I spent $240 in one week through the API and realized the $20/month was a no-brainer.
__mharrison__•1h ago
I'm convinced the $20 gpt plus plan is the best plan right now. You can use Codex with gpt5.2. I've been very impressed with this.

(I also have the same MBP the author has and have used Aider with Qwen locally.)

baq•1h ago
bit the bullet this week and paid for a month of claude and a month of chatgpt plus. claude seems to have much lower token limits, both aggregate and rate-limited and GPT-5.2 isn't a bad model at all. $20 for claude is not enough even for a hobby project (after one day!), openai looks like it might be.
InsideOutSanta•48m ago
I feel like a lot of the criticism the GPT-5.x models receive only applies to specific use cases. I prefer these models over Anthropic's because they are less creative and less likely to take freedoms interpreting my prompts.

Sonnet 4.5 is great for vibe coding. You can give it a relatively vague prompt and it will take the initiative to interpret it in a reasonable way. This is good for non-programmers who just want to give the model a vague idea and end up with a working, sensible product.

But I usually do not want that, I do not want the model to take liberties and be creative. I want the model to do precisely what I tell it and nothing more. In my experience, te GPT-5.x models are a better fit for that way of working.

andix•36m ago
From my personal experience it's around 50:50 between Claude and Codex. Some people strongly prefer one over the other. I couldn't figure out yet why.

I just can't accept how slow codex is, and that you can't really use it interactively because of that. I prefer to just watch Claude code work and stop it once I don't like the direction it's taking.

asabla•10m ago
From my point of view, you're either choosing between instruction following or more creative solutions.

Codex models tend to be extremely good at following instructions, to the point that it won't do any additional work unless you ask it to. GPT-5.1 and GPT-5.2 on the other hand is a little bit more creative.

Models from Anthropics on the other hand is a lot more loosy goosy on the instructions, and you need to keep an eye on it much more often.

I'm using models interchangeably from both providers all the time depending on the task at hand. No real preference if one is better then the other, they're just specialized on different things

wyre•1h ago
Me. Currently using Claude Max for personal coding projects. I've been on Claude's $20 plan and would run out of tokens. I don't want to give my money to OpenAI. So far these projects have not returned their value back to me, but I am viewing it as an investment in learning best pratices with these coding tools.
satvikpendem•55m ago
> If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic.

> The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

These are the same people, by and large. What I have seen is users who purely vibe code everything and run into the limits of the $20/m models and pay up for the more expensive ones. Essentially they're trading learning coding (and time, in some cases, it's not always faster to vibe code than do it yourself) for money.

maddmann•47m ago
If this is the new way code is written then they are arguably learning how to code. Jury is still out though, but I think you are being a bit dismissive.
smcleod•53m ago
On a $20/mo plan doing any sort of agentic coding you'll hit the 5hr window limits in less than 20 minutes.
andix•44m ago
It really depends. When building a lot of new features it happens quite fast. With some attention to context length I was often able to go for over an hour on the 20$ claude plan.

If you're doing mostly smaller changes, you can go all day with the 20$ Claude plan without hitting the limits. Especially if you need to thoroughly review the AI changes for correctness, instead of relying on automated tests.

allenu•8m ago
I find that I use it on isolated changes where Claude doesn’t really need to access a ton of files to figure out what to do and I can easily use it without hitting limits. The only time I hit the 4-5 hour limit is when I’m going nuts on a prototype idea and vibe coding absolutely everything, and usually when I hit the limit, I’m pretty mentally spent anyway so I use it as a sign to go do something else. I suppose everyone has different styles and different codebases, but for me I can pretty easily stay under the limit without that it’s hard to justify $100 or $200 a month.
simonw•31m ago
With Codex it only happened to me once in my 4.5hr session here: https://simonwillison.net/2025/Dec/15/porting-justhtml/

Claude Code is a whole lot less generous though.

jwpapi•50m ago
Not everybody is broke.
haritha-j•42m ago
I’ve been using vs code copilot pro for a few months and never really had any issue, once you hit the limit for one model, you generally still have a bunch more models to choose from. Unless I was vibe coding massive amounts of code without looking to testing, it’s hard to imagine I will run out of all the available pro models.
minimaxir•21m ago
Claude 4.5 Opus on Claude Code's $20 plan is funny because you get about 2-3 prompts on any nontrivial task before you hit the session limit.

If I wasn't only using it for side projects I'd have to cough up the $200 out of necessity.

simonw•1h ago
This story talks about MLX and Ollama but doesn't mention LM Studio - https://lmstudio.ai/

LM Studio can run both MLX and GGUF models but does so from an Ollama style (but more full-featured) macOS GUI. They also have a very actively maintained model catalog at https://lmstudio.ai/models

ZeroCool2u•1h ago
LMStudio is so much better than Ollama it's silly it's not more popular.
thehamkercat•1h ago
LMStudio is not open source though, ollama is

but people should use llama.cpp instead

behnamoh•52m ago
> LMStudio is not open source though, ollama is

and why should that affect usage? it's not like ollama users fork the repo before installing it.

thehamkercat•50m ago
It was worth mentioning.
smcleod•49m ago
I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.
midius•1h ago
Makes me think it's a sponsored post.
Cadwhisker•1h ago
LMStudio? No, it's the easiest way to run am LLM locally that I've seen to the point where I've stopped looking at other alternatives.

It's cross-platform (Win/Mac/Linux), detects the most appropriate GPU in your system and tells you whether the model you want to download will run within it's RAM footprint.

It lets you set up a local server that you can access through API calls as if you were remotely connected to an online service.

vunderba•1h ago
FWIW, Ollama already does most of this:

- Cross-platform

- Sets up a local API server

The tradeoff is a somewhat higher learning curve, since you need to manually browse the model library and choose the model/quantization that best fit your workflow and hardware. OTOH, it's also open-source unlike LMStudio which is proprietary.

randallsquared•49m ago
I assumed from the name that it only ran llama-derived models, rather than whatever is available at huggingface. Is that not the case?
fenykep•35m ago
No, they have quite a broad list of models: https://ollama.com/search

[edit] Oh and apparently you can also directly run some models directly from HuggingFace: https://huggingface.co/docs/hub/ollama

thehamkercat•59m ago
I think you should mention that LM Studio isn't open source.

I mean, what's the point of using local models if you can't trust the app itself?

satvikpendem•57m ago
Depends what people use them for, not every user of local models is doing so for privacy, some just don't like paying for online models.
thehamkercat•51m ago
Most LLM sites are now offering free plans, and they are usually better than what you can run locally, So I think people are running local models for privacy 99% of the time
behnamoh•51m ago
> I mean, what's the point of using local models if you can't trust the app itself?

and you think ollama doesn't do telemetry/etc. just because it's open source?

thehamkercat•50m ago
That's why i suggested using llama.cpp in my other comment.
evacchi•12m ago
ramalama.ai is worth mentioning too
maranas•1h ago
Cline + RooCode and VSCode already works really well with local models like qwen3-coder or even the latest gpt-oss. It is not as plug-and-play as Claude but it gets you to a point where you only have to do the last 5% of the work
NelsonMinar•1h ago
"This particular [80B] model is what I’m using with 128GB of RAM". The author then goes on to breezily suggest you try the 4B model instead of you only have 8GB of RAM. With no discussion of exactly what a hit in quality you'll be taking doing that.
Workaccount2•1h ago
I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Somewhat comically, the author seems to have made it about 2 days. Out of 1,825. I think the real story is the folly of fixating your eyes on shiny new hardware and searching for justifications. I'm too ashamed to admit how many times I've done that dance...

Local models are purely for fun, hobby, and extreme privacy paranoia. If you really want privacy beyond a ToS guarantee, just lease a server (I know they can still be spying on that, but it's a threshold.)

ekjhgkejhgk•1h ago
I agree with everything you said, and yet I cannot help but respect a person who wants to do it himself. It reminds me of the hacker culture of the 80s and 90s.
slicktux•22m ago
Agreed, Everyone seems to shun the DIY hacker now a days; saying things like “I’ll just pay for it”. It’s not about just NOT paying for it but doing it yourself and learning how to do it so that you can pass the knowledge on and someone else can do it.
davidw•20m ago
I loathe the idea of being beholden to large corporations for what may be a key part of this job in the future.
satvikpendem•53m ago
> I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Well, the hardware remains the same but local models get better and more efficient, so I don't think there is much difference between paying 5k for online models over 5 years vs getting a laptop (and well, you'll need a laptop anyway, so why not just get a good enough one to run local models in the first place?).

smcleod•47m ago
My 2023 Macbook Pro (M2 Max) is coming up to 3 years old and I can run models locally that are arguably "better" than what was considered SOTA about 1.5 years ago. This is of course not an exact comparison but it's close enough to give some perspective.
holyknight•1h ago
your premise would've been right, if memory wouldn't skyrocketed like 400% in like 2 weeks.
freeone3000•59m ago
What are you doing with these models that you’re going above free tier on copilot?
satvikpendem•51m ago
Some just like privacy and working without internet, I for example travel regularly on the train and like to have my laptop when there's not always good WiFi.
ardme•55m ago
Isnt the math of buying Nvidia stock with what you pay for all the hardware and then just paying $20 a month for codex with the annual returns better?
andix•50m ago
I wouldn't run local models on the development PC. Instead run them on a box in another room or another location. Less fan noise and it won't influence the performance of the pc you're working on.

Latency is not an issue at all for LLMs, even a few hundred ms won't matter.

It doesn't make a lot of sense to me, except when working offline while traveling.

snoman•16m ago
Less of a concern these days with hardware like a Mac Studio or Nvidia dgx which are accessible and aren’t noisy at all.
raw_anon_1111•21m ago
I don’t think I’ve ever read an article where I knew the author was completely wrong about all of their assumptions was that they admitted it themselves and left the bad assumptions in the article.

The above paragraph is meant to be a compliment.

But justifying it based on keeping his Mac for five years is crazy. At the rate things are moving, coding models are going to get so much better in a year, the gap is going to widen.

Also in the case of his father where he is working for a company that must use a self hosted model or any other company that needed it, would a $10K Mac Studio with 512GB RAM be worth it? What about two Mac Studios connected over Thunderbolt using the newly released support in macOS 26?

https://news.ycombinator.com/item?id=46248644

chrisischris•6m ago
The resource contention point is real, running local models alongside Docker and your actual dev environment is where this can fall apart. One comment here mentions running models on a separate box to avoid impacting your dev machine. That's the right idea but idle GPUs are everywhere the infrastructure to actually tap into them is what's missing. Currently building something along these lines. https://sporeintel.com/
SamDc73•2m ago
If privacy is your top priority, then sure spend a few grand on hardware and run everything locally.

Personally, I run a few local models (around 30B params is the ceiling on my hardware at 8k context), and I still keep a $200 ChatGPT subscription cause I'm not spending $5-6k just to run models like K2 or GLM-4.6 (they’re usable, but clearly behind OpenAI, Claude, or Gemini for my workflow)

I was got excited about aescoder-4b (model that specialize in web design only) after its DesignArena benchmarks, but it falls apart on large codebases and is mediocre at Tailwind

That said, I think there’s real potential in small, highly specialized models like 4B model trained only for FastAPI, Tailwind or a single framework. Until that actually exists and works well, I’m sticking with remote services.

A guide to local coding models

https://www.aiforswes.com/p/you-dont-need-to-spend-100mo-on-claude
120•mpweiher•2h ago•59 comments

Logging Sucks

https://loggingsucks.com/
485•FlorinSays•5h ago•169 comments

I'm just having fun

https://jyn.dev/i-m-just-having-fun/
63•lemper•5d ago•24 comments

Show HN: Books mentioned on Hacker News in 2025

https://hackernews-readings-613604506318.us-west1.run.app
295•seinvak•7h ago•114 comments

More on whether useful quantum computing is "imminent"

https://scottaaronson.blog/?p=9425
46•A_D_E_P_T•2h ago•30 comments

Rue: Higher level than Rust, lower level than Go

https://rue-lang.dev/
44•ingve•2h ago•31 comments

The gift card accountability sink

https://www.bitsaboutmoney.com/archive/gift-card-accountability-sink/
61•walterbell•2h ago•36 comments

Evaluating Chain-of-Thought Monitorability

https://openai.com/index/evaluating-chain-of-thought-monitorability/
10•mfiguiere•2d ago•0 comments

Disney Imagineering Debuts Next-Generation Robotic Character, Olaf

https://disneyparksblog.com/disney-experiences/robotic-olaf-marks-new-era-of-disney-innovation/
18•ChrisArchitect•1h ago•9 comments

Show HN: WalletWallet – create Apple passes from anything

https://walletwallet.alen.ro/
251•alentodorov•7h ago•76 comments

I Program on the Subway

https://www.scd31.com/posts/programming-on-the-subway
142•evankhoury•5d ago•91 comments

Show HN: Autograd.c – a tiny ML framework built from scratch

https://github.com/sueszli/autograd.c
32•sueszli•5d ago•1 comments

Mullvad VPN: "This is a Chat Control 3.0 attempt."

https://mastodon.online/@mullvadnet/115742530333573065
393•janandonly•5h ago•119 comments

I can't upgrade to Windows 11, now leave me alone

https://idiallo.com/byte-size/cant-update-to-windows-11-leave-me-alone
278•firefoxd•4h ago•258 comments

CO2 batteries that store grid energy take off globally

https://spectrum.ieee.org/co2-battery-energy-storage
113•rbanffy•8h ago•88 comments

E.W.Dijkstra Archive

https://www.cs.utexas.edu/~EWD/welcome.html
99•surprisetalk•8h ago•8 comments

Structured outputs create false confidence

https://boundaryml.com/blog/structured-outputs-create-false-confidence
105•gmays•8h ago•53 comments

ARIN Public Incident Report – 4.10 Misissuance Error

https://www.arin.net/announcements/20251212/
128•immibis•8h ago•33 comments

Get an AI code review in 10 seconds

https://oldmanrahul.com/2025/12/19/ai-code-review-trick/
81•oldmanrahul•6h ago•42 comments

Autoland Saves King Air, Everyone Reported Safe

https://avbrief.com/autoland-saves-king-air-everyone-reported-safe/
60•bradleybuda•6h ago•27 comments

Coarse Is Better

https://borretti.me/article/coarse-is-better
173•_dain_•10h ago•94 comments

Ruby website redesigned

https://www.ruby-lang.org/en/
348•psxuaw•16h ago•134 comments

Indoor tanning makes youthful skin much older on a genetic level

https://www.ucsf.edu/news/2025/12/431206/indoor-tanning-makes-youthful-skin-much-older-genetic-level
212•SanjayMehta•18h ago•153 comments

You’re not burnt out, you’re existentially starving

https://neilthanedar.com/youre-not-burnt-out-youre-existentially-starving/
169•thanedar•5h ago•181 comments

Engineering dogmas it's time to retire

https://newsletter.manager.dev/p/5-engineering-dogmas-its-time-to
11•kiyanwang•2h ago•3 comments

Three ways to solve problems

https://andreasfragner.com/writing/three-ways-to-solve-problems
102•42point2•9h ago•20 comments

Show HN: Shittp – Volatile Dotfiles over SSH

https://github.com/FOBshippingpoint/shittp
113•sdovan1•11h ago•61 comments

Waymo halts service during S.F. blackout after causing traffic jams

https://missionlocal.org/2025/12/sf-waymo-halts-service-blackout/
161•rwoll•18h ago•247 comments

FWS – pip-installable embedded process supervisor with PTY/pipe/dtach back ends

14•mrsurge•3d ago•3 comments

Show HN: Jmail – Google Suite for Epstein files

https://www.jmail.world
1363•lukeigel•1d ago•320 comments