frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A Guide to Local Coding Models

https://www.aiforswes.com/p/you-dont-need-to-spend-100mo-on-claude
81•mpweiher•2h ago

Comments

nzeid•1h ago
I appreciate the author's modesty but the flip-flopping was a little confusing. If I'm not mistaken, the conclusion is that by "self-hosting" you save money in all cases, but you cripple performance in scenarios where you need to squeeze out the kind of quality that requires hardware that's impractical to cobble together at home or within a laptop.

I am still toying with the notion of assembling an LLM tower with a few old GPUs but I don't use LLMs enough at the moment to justify it.

a_victorp•1h ago
If you ever do it, please make a guide! I've been toying with the same notion myself
suprjami•36m ago
If you want to do it cheap, get a desktop motherboard with two PCIe slots and two GPUs.

Cheap tier is dual 3060 12G. Runs 24B Q6 and 32B Q4 at 16 tok/sec. The limitation is VRAM for large context. 1000 lines of code is ~20k tokens. 32k tokens is is ~10G VRAM.

Expensive tier is dual 3090 or 4090 or 5090. You'd be able to run 32B Q8 with large context, or a 70B Q6.

For software, llama.cpp and llama-swap. GGUF models from HuggingFace. It just works.

If you need more than that, you're into enterprise hardware with 4+ PCIe slots which costs as much as a car and the power consumption of a small country. You're better to just pay for Claude Code.

satvikpendem•8m ago
Jeff Geerling has (not quite but sort of) guides: https://news.ycombinator.com/item?id=46338016
cloudhead•1h ago
In my experience the latest models (Opus 4.5, GPT 5.2) Are _just_ starting to keep up with the problems I'm throwing at them, and I really wish they did a better job, so I think we're still 1-2 years away from local models not wasting developer time outside of CRUD web apps.
OptionOfT•1h ago
Eh, these things are trained on existing data. The further you are from that the worse the models get.

I've noticed that I need to be a lot more specific in those cases, up to the point where being more specific is slowing me down, partially because I don't always know what the right thing is.

simonw•1h ago
> I realized I looked at this more from the angle of a hobbiest paying for these coding tools. Someone doing little side projects—not someone in a production setting. I did this because I see a lot of people signing up for $100/mo or $200/mo coding subscriptions for personal projects when they likely don’t need to.

Are people really doing that?

If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic. The OpenAI one in particular is a great deal, because Codex is charged a whole lot lower than Claude.

The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

hamdingers•55m ago
And as a hobbyist the time to sign up for the $20/month plan is after you've spent $20 on tokens at least a couple times.

YMMV based on the kinds of side projects you do, but it's definitely been cheaper for me in the long run to pay by token, and the flexibility it offers is great.

iOSThrowAway•44m ago
I spent $240 in one week through the API and realized the $20/month was a no-brainer.
__mharrison__•44m ago
I'm convinced the $20 gpt plus plan is the best plan right now. You can use Codex with gpt5.2. I've been very impressed with this.

(I also have the same MBP the author has and have used Aider with Qwen locally.)

baq•28m ago
bit the bullet this week and paid for a month of claude and a month of chatgpt plus. claude seems to have much lower token limits, both aggregate and rate-limited and GPT-5.2 isn't a bad model at all. $20 for claude is not enough even for a hobby project (after one day!), openai looks like it might be.
wyre•21m ago
Me. Currently using Claude Max for personal coding projects. I've been on Claude's $20 plan and would run out of tokens. I don't want to give my money to OpenAI. So far these projects have not returned their value back to me, but I am viewing it as an investment in learning best pratices with these coding tools.
satvikpendem•12m ago
> If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic.

> The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

These are the same people, by and large. What I have seen is users who purely vibe code everything and run into the limits of the $20/m models and pay up for the more expensive ones. Essentially they're trading learning coding (and time, in some cases, it's not always faster to vibe code than do it yourself) for money.

smcleod•10m ago
On a $20/mo plan doing any sort of agentic coding you'll hit the 5hr window limits in less than 20 minutes.
jwpapi•7m ago
Not everybody is broke.
simonw•1h ago
This story talks about MLX and Ollama but doesn't mention LM Studio - https://lmstudio.ai/

LM Studio can run both MLX and GGUF models but does so from an Ollama style (but more full-featured) macOS GUI. They also have a very actively maintained model catalog at https://lmstudio.ai/models

ZeroCool2u•51m ago
LMStudio is so much better than Ollama it's silly it's not more popular.
thehamkercat•39m ago
LMStudio is not open source though, ollama is

but people should use llama.cpp instead

behnamoh•8m ago
> LMStudio is not open source though, ollama is

and why should that affect usage? it's not like ollama users fork the repo before installing it.

thehamkercat•7m ago
It was worth mentioning.
smcleod•6m ago
I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.
midius•46m ago
Makes me think it's a sponsored post.
Cadwhisker•38m ago
LMStudio? No, it's the easiest way to run am LLM locally that I've seen to the point where I've stopped looking at other alternatives.

It's cross-platform (Win/Mac/Linux), detects the most appropriate GPU in your system and tells you whether the model you want to download will run within it's RAM footprint.

It lets you set up a local server that you can access through API calls as if you were remotely connected to an online service.

vunderba•34m ago
FWIW, Ollama already does most of this:

- Cross-platform

- Sets up a local API server

The tradeoff is a somewhat higher learning curve, since you need to manually browse the model library and choose the model/quantization that best fit your workflow and hardware. OTOH, it's also open-source unlike LMStudio which is proprietary.

randallsquared•5m ago
I assumed from the name that it only ran llama-derived models, rather than whatever is available at huggingface. Is that not the case?
thehamkercat•15m ago
I think you should mention that LM Studio isn't open source.

I mean, what's the point of using local models if you can't trust the app itself?

satvikpendem•14m ago
Depends what people use them for, not every user of local models is doing so for privacy, some just don't like paying for online models.
thehamkercat•8m ago
Most LLM sites are now offering free plans, and they are usually better than what you can run locally, So I think people are running local models for privacy 99% of the time
behnamoh•8m ago
> I mean, what's the point of using local models if you can't trust the app itself?

and you think ollama doesn't do telemetry/etc. just because it's open source?

thehamkercat•7m ago
That's why i suggested using llama.cpp in my other comment.
maranas•54m ago
Cline + RooCode and VSCode already works really well with local models like qwen3-coder or even the latest gpt-oss. It is not as plug-and-play as Claude but it gets you to a point where you only have to do the last 5% of the work
NelsonMinar•51m ago
"This particular [80B] model is what I’m using with 128GB of RAM". The author then goes on to breezily suggest you try the 4B model instead of you only have 8GB of RAM. With no discussion of exactly what a hit in quality you'll be taking doing that.
Workaccount2•47m ago
I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Somewhat comically, the author seems to have made it about 2 days. Out of 1,825. I think the real story is the folly of fixating your eyes on shiny new hardware and searching for justifications. I'm too ashamed to admit how many times I've done that dance...

Local models are purely for fun, hobby, and extreme privacy paranoia. If you really want privacy beyond a ToS guarantee, just lease a server (I know they can still be spying on that, but it's a threshold.)

ekjhgkejhgk•39m ago
I agree with everything you said, and yet I cannot help but respect a person who wants to do it himself. It reminds me of the hacker culture of the 80s and 90s.
satvikpendem•10m ago
> I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Well, the hardware remains the same but local models get better and more efficient, so I don't think there is much difference between paying 5k for online models over 5 years vs getting a laptop (and well, you'll need a laptop anyway, so why not just get a good enough one to run local models in the first place?).

holyknight•22m ago
your premise would've been right, if memory wouldn't skyrocketed like 400% in like 2 weeks.
freeone3000•15m ago
What are you doing with these models that you’re going above free tier on copilot?
satvikpendem•8m ago
Some just like privacy and working without internet, I for example travel regularly on the train and like to have my laptop when there's not always good WiFi.
ardme•12m ago
Isnt the math of buying Nvidia stock with what you pay for all the hardware and then just paying $20 a month for codex with the annual returns better?
andix•7m ago
I wouldn't run local models on the development PC. Instead run them on a box in another room or another location. Less fan noise and it won't influence the performance of the pc you're working on.

Latency is not an issue at all for LLMs, even a few hundred ms won't matter.

It doesn't make a lot of sense to me, except when working offline while traveling.

Show HN: Generate Quiz from YouTube Videos

https://minform.io/tools/youtube-to-quiz-maker
1•eashish93•3m ago•0 comments

Show HN: Geomapping – One tag turns your articles into interactive experiences

https://geomapping.qcw.ai/
1•scaelere•6m ago•0 comments

Qubes OS 4.3.0 has been released

https://www.qubes-os.org/news/2025/12/21/qubes-os-4-3-0-has-been-released/
1•andrewdavidwong•10m ago•0 comments

Scientists achieve 3D chip breakthrough to accelerate AI

https://engineering.stanford.edu/news/scientists-and-us-foundry-achieve-3d-chip-breakthrough-acce...
1•geox•10m ago•0 comments

Does the market return 10% on average?

https://nbelakovski.substack.com/p/does-the-market-really-return-10
2•actinium226•12m ago•1 comments

Solar storms could hit Earth and produce auroras

https://www.pbs.org/newshour/science/severe-solar-storms-could-hit-earth-space-forecasters-say-he...
2•QueensGambit•17m ago•0 comments

Ranked: The Best Countries at Math

https://www.visualcapitalist.com/ranked-the-best-countries-at-math/
2•amichail•19m ago•0 comments

Show HN: Don't trust your docs, codument instead

https://benquemax.com/essay/codumentation
2•markkuhaukka•20m ago•0 comments

Palantir and Nvidia Team Up to Operationalize AI

https://nvidianews.nvidia.com/news/nvidia-palantir-ai-enterprise-data-intelligence
4•int_19h•22m ago•1 comments

The Quest to Replace My MagSafe Cable

https://nickoates.com/blog/quest-magsafe-replacement
4•nickoates•26m ago•3 comments

AI Leaderboard Overview

https://lmarena.ai/de/leaderboard
1•doener•28m ago•0 comments

Apple announces sweeping App Store and iPhone changes in Japan

https://9to5mac.com/2025/12/17/apple-announces-sweeping-app-store-and-iphone-changes-in-japan/
4•CharlesW•29m ago•2 comments

Oracle became a 'poster child' for AI bubble fears

https://finance.yahoo.com/news/how-oracle-became-a-poster-child-for-ai-bubble-fears-150039511.html
2•doener•31m ago•0 comments

Food becoming more calorific but less nutritious due to rising carbon dioxide

https://www.theguardian.com/environment/2025/dec/19/higher-carbon-dioxide-food-more-calorific-les...
2•mikhael•35m ago•0 comments

LVM Thin Provisioning (2016)

https://storageapis.wordpress.com/2016/06/24/lvm-thin-provisioning/
2•indigodaddy•36m ago•0 comments

CPU Autoscaling with a Kernel of Truth

https://dl.acm.org/doi/10.1145/3725783.3764407
1•matt_d•39m ago•0 comments

The Cognitive Burden of Garbage Collection vs. Move Semantics (2023)

https://insanitybit.github.io/2023/06/09/Java-GC-Rust
1•todsacerdoti•42m ago•0 comments

The existential balm of seeing yourself as a verb, not a noun

https://psyche.co/ideas/the-existential-balm-of-seeing-yourself-as-a-verb-not-a-noun
1•NaOH•42m ago•0 comments

Show HN: A faster way to generate consistent UI avatars without full 3D workflow

https://characterforge.app/
1•xtrivity•43m ago•0 comments

Processing Fees and Settlement Delays Solution

1•dixonlin•44m ago•0 comments

The Future of Film Is Behind Us: Whatever Happened to 3-D?

https://www.theatlantic.com/technology/2025/12/avatar-fire-ash-3-d/685370/
3•fortran77•47m ago•1 comments

FOSDEM'26 (31 Jan & 1 Feb)

https://fosdem.org/2026/
4•reconnecting•49m ago•1 comments

Future Materials Bank

https://www.futurematerialsbank.com/
3•andsoitis•50m ago•0 comments

As U.S. Guns Pour into Canada, the Bodies Pile Up

https://www.nytimes.com/2025/12/21/world/americas/canada-gun-violence-us.html
8•bookofjoe•50m ago•7 comments

John Schulman on dead ends, scaling RL, and building research institutions [video]

https://www.youtube.com/watch?v=29BYxvvF1iM
1•gmays•51m ago•0 comments

Fabrication Techniques Using Myco-Materials

https://encyclopedia.pub/entry/27602
1•andsoitis•52m ago•0 comments

Show HN: Nrawg – Not Really A Word Game

https://nrawg.com/
1•toastar•52m ago•0 comments

<ForeignObject> – SVG

https://developer.mozilla.org/en-US/docs/Web/SVG/Reference/Element/foreignObject
1•amadeuspagel•53m ago•0 comments

Handheld PC Community Forums

https://www.hpcfactor.com/forums/category-view.asp
1•walterbell•57m ago•0 comments

Building Wine from Source on an M2 Mac

https://idrewsomeshapes.ca/posts/2025/12/building-wine-from-source-on-an-m2-mac/
1•lemonsalsa•57m ago•1 comments