frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Usage-based pricing killing your vibe, here's how to roll your own local AI

https://www.theregister.com/2026/05/02/local_ai_coding_agents/
21•Bender•2h ago

Comments

_345•1h ago
It's a seriously degraded experience from a developer's perspective. Okay you've got one local LLM installed finally after configuring everything perfectly, what happens when you want to run a second instance? Now you've blown past your vram and system ram limits, and you're stuck to just one.

Furthermore, the model they recommend doesn't quite reach ~gpt-5.4-mini level performance- that quality dip means you may as well just pay for something like Kimi K2.6 via openrouter if you want a something ~>= sonnet 4.6 in performance as a backup for when you run out of anthropic/openai usage.

2ndorderthought•1h ago
Why are you running 2 instances anyways? If you want that workflow just rent a few ec2 gpu instances and fire away?
vidarh•1h ago
If you're going to rent a few ec2 gpu instances you might as well funnel things through openrouter. Not that many of us have workflows where trusting an LLM provider is a problem but sending the data to EC2 is not.

As for why, why would you not? Sitting around waiting for a single assistant is inefficient use of time; I tend to have more like 4-10 instances running in parallel.

jen20•1h ago
> Not that many of us have workflows where trusting an LLM provider is a problem but sending the data to EC2 is not.

I'd imagine plenty of people have a problem with trusting fly-by-night inference providers or model owners with opt-out policies [1] [2] about training on your data, who would be more than happy to send data to EC2, or even the same models in Amazon Bedrock.

[1]: https://github.blog/news-insights/company-news/updates-to-gi...

[2]: https://help.openai.com/en/articles/5722486-how-your-data-is...

2ndorderthought•33m ago
I absolutely see no reason to send company IP, future plans, and current code base to any other company.

I also do not run 10 agents at the same time. There's no way I could keep up with the volume of work from doing that in any meaningful way

killingtime74•18m ago
Does your company self host everything though. Many are already in the cloud, why single out llms to not use cloud for
0xbadcafebee•1h ago
Not sure why you got downvoted. 95% of people should be paying for a subscription. It's far cheaper, far more scalable, and far less hassle.

Local AI only makes sense for a couple of use cases:

  - Privacy
  - Constant churning on tokens
  - Latency
  - Availability
Local AI is "cheaper" when you already have the hardware sitting around, like an old MacBook or gaming GPU, or the API cost (subscriptions will all run out if you churn 24/7) is too high to bare. I'm surprised companies are still selling their old MacBooks to employees, when they could be turning them into Beowulf clusters for cheap AI compute on long-running jobs (the cost is just electricity)

If usage-based pricing is killing your vibe, find a cheaper subscription with higher limits. Here's a list of them compared on price-per-request-limit: https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/...

xscott•1h ago
I think you're right about the cost/benefit trade-off in general, but I do wonder how much "compaction" Codex and Claude do is to keep context fresh and how much is to save (them) runtime costs.

If you've got a 1M token context, but they constantly summarize it down to something much smaller, is it really 1M tokens of benefit? With a local model, you can use all 256k tokens on your own terms. However, I don't have any benchmarks to know.

xscott•1h ago
Your point about caliber/quality is fair, but I have been pretty astonished by some of the newer/better models (Gemma 4 variants, GPT-OSS before that).

However, there's not a lot of memory increase to have multiple sessions in parallel with one model. It's an HTTP server, and other than some caching, basically stateless.

iib•7m ago
Doesn't llama.cpp (or similar) have to evict the kv cache for this, so that performance is degraded when running multiple sessions? Or how do you load a model in memory and then use it in multiple sessions? I am still learning this stuff
janice1999•1h ago
A 24GB Nvidia RTX 3090 TI is ~2000 euro.
2ndorderthought•1h ago
Which is how many months of Claude or Claude + chatgpt when Claude is down? And do you own anything after using those subscriptions? Can you pick and choose from dozens of models and whatever comes next? Can you play video games with your Claude subscription?
beej71•1h ago
Believe me when I say that I want to run local models, and I do. But in my testing, 24 GB doesn't get you much brainpower.
2ndorderthought•29m ago
Have you tried the latest qwen3.6 models?

For most of my questions and 8-9b model works great. Upshot is not having chatgpt/meta sell my data or target me with random thoughts later.

ekjhgkejhgk•9m ago
We're in the same boat. I would rather have NO llm, than an llm that collects my data (which you should assume is all of them, unless you've been asleep for the last 20 years).

Fortunately, I don't have to pick one or the other - instead I run Qwen 3.6 35B A3B. It's a bit slow with my 8gb GPU (I'm in the process of getting a bigger one) but again, to me the choice isn't "what's the best I can get", it's "what's the best local I can get".

efficax•52m ago
qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.
2ndorderthought•32m ago
Yea you probably do want to use a GPU for models of that size.

I also wonder what quantization you are using? If you haven't tried other quants I really would

efficax•28m ago
This is qwen3.6:27b-coding-nvfp4. It's only an M1. If they ever ship an M5 studio with 96GB of ram, that's my next upgrade path for the local llm experiments.

You can get work done with them if you have a harness that can drive outcomes without needing feedback (I've been building a tdd red to green agent harness lately that is very effective if given a good plan upfront). So if you can stand waiting a few days to see results that would only take hours with a model deployed to frontier nvidia hardware, you can get results this way.

datadrivenangel•10m ago
The time delay is the real issue. Much much slower wall clock time.

Securing a DoD Contractor: Finding a Multi-Tenant Authorization Vulnerability

https://www.strix.ai/blog/how-strix-found-zero-auth-vulnerability-dod-backed-startup
117•bearsyankees•2h ago•48 comments

I am worried about Bun

https://wwj.dev/posts/i-am-worried-about-bun/
236•remote-dev•3h ago•142 comments

Talking to strangers at the gym

https://thienantran.com/talking-to-35-strangers-at-the-gym/
873•thitran•8h ago•431 comments

How OpenAI delivers low-latency voice AI at scale

https://openai.com/index/delivering-low-latency-voice-ai-at-scale/
31•Sean-Der•46m ago•12 comments

GameStop makes $55.5B takeover offer for eBay

https://www.bbc.co.uk/news/articles/cn0p8yled1do
555•n1b0m•10h ago•492 comments

Microsoft Edge stores all passwords in memory in clear text, even when unused

https://twitter.com/L1v1ng0ffTh3L4N/status/2051308329880719730
175•cft•2h ago•75 comments

Does Employment Slow Cognitive Decline? Evidence from Labor Market Shocks

https://www.nber.org/papers/w35117
130•littlexsparkee•4h ago•108 comments

Redis array: short story of a long development process

https://antirez.com/news/164
173•antirez•6h ago•66 comments

US healthcare marketplaces shared citizenship and race data with ad tech giants

https://techcrunch.com/2026/05/04/us-healthcare-marketplaces-shared-citizenship-and-race-data-wit...
309•ZeidJ•3h ago•101 comments

Let's Talk about LLMs

https://www.b-list.org/weblog/2026/apr/09/llms/
44•cdrnsf•2h ago•20 comments

UK Fuel Price Intelligence

https://www.fuelinsight.co.uk
125•theazureguy•5h ago•55 comments

Pomiferous: The most extensive apples (pommes) database

https://pomiferous.com/
73•Ariarule•5h ago•23 comments

Stop big tech from making users behave in ways they don't want to

https://economist.com/by-invitation/2026/04/29/stop-big-tech-from-making-users-behave-in-ways-the...
154•andsoitis•3h ago•96 comments

How Monero's proof of work works

https://blog.alcazarsec.com/tech/posts/how-moneros-proof-of-work-works
181•alcazar•6h ago•146 comments

Heat pump sales rise across Europe

https://www.pv-magazine.com/2026/05/04/heat-pump-sales-rise-17-across-europe-in-q1-as-energy-pric...
120•doener•2h ago•38 comments

Formatting a 25M-line codebase overnight

https://stripe.dev/blog/formatting-an-entire-25-million-line-codebase-overnight-the-rubyfmt-story
3•r00k•17m ago•0 comments

1966 Ford Mustang Converted into a Tesla with Working 'Full Self-Driving'

https://electrek.co/2026/05/02/tesla-1966-mustang-ev-conversion-full-self-driving/
68•Brajeshwar•5h ago•53 comments

Sierra Raises $950M at $15B Valuation

https://sierra.ai/blog/better-customer-experiences-built-on-sierra
50•doppp•4h ago•70 comments

White House Considers Vetting A.I. Models Before They Are Released

https://www.nytimes.com/2026/05/04/technology/trump-ai-models.html
56•jbegley•1h ago•31 comments

Show HN: nfsdiag - a NFS diagnostic application

https://github.com/lsferreira42/nfsdiag
13•lsferreira42•2d ago•0 comments

A little comparison between R and Kap

https://blog.dhsdevelopments.com/a-little-comparison-between-r-and-kap
4•tosh•2d ago•0 comments

Offenders sentenced up to 10 years for spying on TSMC

https://www.taipeitimes.com/News/front/archives/2026/04/28/2003856358
69•ironyman•2h ago•0 comments

Newton's law of gravity passes its biggest test

https://www.science.org/content/article/newton-s-law-gravity-passes-its-biggest-test-ever
104•pseudolus•7h ago•87 comments

'Kitten Space Agency', the Spiritual Successor to 'Kerbal Space Program' (2025)

https://www.space.com/entertainment/space-games/kitten-space-agency-is-the-spiritual-successor-to...
78•Tomte•2h ago•28 comments

Trillions in Retirement Dollars Flow into Opaque Trusts

https://www.bloomberg.com/news/features/2026-05-03/trillions-in-us-retirement-dollars-flow-into-o...
75•koolhead17•3h ago•12 comments

The Visible Zorker: Zork 3

https://eblong.com/infocom/visi/zork3/
11•zarlez•3h ago•0 comments

Using “underdrawings” for accurate text and numbers

https://samcollins.blog/underdrawings/
346•samcollins•3d ago•126 comments

OpenAI, Google, and Microsoft Back Bill to Fund 'AI Literacy' in Schools

https://www.404media.co/literacy-in-future-technologies-artificial-intelligence-act-adam-schiff-m...
87•cdrnsf•4h ago•79 comments

BYOMesh – New LoRa mesh radio offers 100x the bandwidth

https://partyon.xyz/@nullagent/116499715071759135
460•nullagent•1d ago•149 comments

Why are neural networks and cryptographic ciphers so similar? (2025)

https://reiner.org/neural-net-ciphers
102•jxmorris12•2d ago•32 comments