frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Claude Code: connect to a local model when your quota runs out

https://boxc.net/blog/2026/claude-code-connecting-to-local-models-when-your-quota-runs-out/
68•fugu2•3d ago

Comments

baalimago•1h ago
Or better yet: Connect to some trendy AI (or web3) company's chatbot. It almost always outputs good coding tips
alexhans•1h ago
Useful tip.

From a strategic standpoint of privacy, cost and control, I immediately went for local models, because that allowed to baseline tradeoffs and it also made it easier to understand where vendor lock-in could happen, or not get too narrow in perspective (e.g. llama.cpp/open router depending on local/cloud [1] ).

With the explosion of popularity of CLI tools (claude/continue/codex/kiro/etc) it still makes sense to be able to do the same, even if you can use several strategies to subsidize your cloud costs (being aware of the lack of privacy tradeoffs).

I would absolutely pitch that and evals as one small practice that will have compounding value for any "automation" you want to design in the future, because at some point you'll care about cost, risks, accuracy and regressions.

[1] - https://alexhans.github.io/posts/aider-with-open-router.html

[2] - https://www.reddit.com/r/LocalLLaMA

mogoman•1h ago
can you recommend a setup with ollama and a cli tool? Do you know if I need a licence for Claude if I only use my own local LLM?
alexhans•53m ago
What are your needs/constraints (hardware constraints definitely a big one)?

The one I mentioned called continue.dev [1] is easy to try out and see if it meets your needs.

Hitting local models with it should be very easy (it calls APIs at a specific port)

[1] - https://github.com/continuedev/continue

drifkin•8m ago
we recently added a `launch` command to Ollama, so you can set up tools like Claude Code easily: https://ollama.com/blog/launch

tldr; `ollama launch claude`

glm-4.7-flash is a nice local model for this sort of thing if you have a machine that can run it

cyanydeez•25m ago
I think control should be top of the list here. You're talking about building work flows, products and long term practices around something that's inherently non-deterministic.

And the probability that any given model you use today is the same as what you use tomorrow is doubly doubtful:

1. The model itself will change as they try to improve the cost-per-test improves. This will necessarily make your expectations non-deterministic.

2. The "harness" around that model will change as business-cost is tightened and the amount of context around the model is changed to improve the business case which generates the most money.

Then there's the "cataclysmic" lockout cost where you accidently use the wrong tool that gets you locked out of the entire ecosystem and you are black listed, like a gambler in vegas who figures out how to count cards and it works until the house's accountant identifies you as a non-negligible customer cost.

It's akin to anti-union arguments where everyone "buying" into the cloud AI circus thinks they're going to strike gold and completely ignores the fact that very few will and if they really wanted a better world and more control, they'd unionize and limit their illusions of grandeur. It should be an easy argument to make, but we're seeing about 1/3 of the population are extremely susceptible to greed based illusions.,

swyx•1h ago
i mean the other obvious answer is to plug in to the other claude code proxies that other model companies have made for you:

https://docs.z.ai/devpack/tool/claude

https://www.cerebras.ai/blog/introducing-cerebras-code

or i guess one of the hosted gpu providers

if you're basically a homelabber and wanted an excuse to run quantized models on your own device go for it but dont lie and mutter under your own tin foil hat that its a realistic replacement

zingar•1h ago
I guess I should be able to use this config to point Claude at the GitHub copilot licensed models (including anthropic models). That’s pretty great. About 2/3 of the way through every day I’m forced to switch from Claude (pro license) to amp free and the different ergonomics are quite jarring. Open source folks get copilot tokens for free so that’s another pro license I don’t have to worry about.
hkpatel3•50m ago
Openrouter can also be used with claude code. https://openrouter.ai/docs/guides/claude-code-integration
raw_anon_1111•28m ago
Or just don’t use Claude Code and use Codex CLI. I have yet to hit a quota with Codex working all day. I hit the Claude limits within an hour or less.

This is with my regular $20/month ChatGpT subscription and my $200 a year (company reimbursed) Claude subscription.

esafak•27m ago
Or they could just let people use their own harnesses again...
usef-•21m ago
That wouldn't solve this problem.

And they do? That's what the API is.

The subscription always seemed clearly advertised for client usage, not general API usage, to me. I don't know why people are surprised after hacking the auth out of the client. (note in clients they can control prompting patterns for caching etc, it can be cheaper)

esafak•15m ago
End users -- people who use harnesses -- have subscriptions so that makes no sense. General API usage is for production.
usef-•2m ago
"Production" what?

The API is for using the model. It can be in dev, or experiments, or anything...

wkirby•20m ago
My experience thus far is that the local models are a) pretty slow and b) prone to making broken tool calls. Because of (a) the iteration loop slows down enough to where I wander off to do other tasks, meaning that (b) is way more problematic because I don't see it for who knows how long.

This is, however, a major improvement from ~6 months ago when even a single token `hi` from an agentic CLI could take >3 minutes to generate a response. I suspect the parallel processing of LMStudio 0.4.x and some better tuning of the initial context payload is responsible.

6 months from now, who knows?

btbuildem•13m ago
I'm confused, wasn't this already available via env vars? ANTHROPIC_BASE_URL and so on, and yes you may have to write a thin proxy to wrap the calls to fit whatever backend you're using.

I've been running CC with Qwen3-Coder-30B (FP8) and I find it just as fast, but not nearly as clever.

eek2121•11m ago
I gotta say, the local models are catching up quick. Claude is definitely still ahead, but things are moving right along.
mcbuilder•8m ago
Opencode has been a thing for a while now

Voxtral Transcribe 2

https://mistral.ai/news/voxtral-transcribe-2
532•meetpateltech•6h ago•133 comments

Yawning has an unexpected influence on the fluid inside your brain

https://www.newscientist.com/article/2513692-yawning-has-an-unexpected-influence-on-the-fluid-ins...
111•MDWolinski•5d ago•55 comments

Claude Code: connect to a local model when your quota runs out

https://boxc.net/blog/2026/claude-code-connecting-to-local-models-when-your-quota-runs-out/
68•fugu2•3d ago•19 comments

We built a real-world benchmark for AI code review

https://www.qodo.ai/blog/how-we-built-a-real-world-benchmark-for-ai-code-review/
9•benocodes•37m ago•1 comments

Building a 24-bit arcade CRT display adapter from scratch

https://www.scd31.com/posts/building-an-arcade-display-adapter
82•evakhoury•4h ago•19 comments

The Codex app is cool, and it illustrates the shift left of IDEs and coding GUIs

https://www.benshoemaker.us/writing/codex-app-launch/
14•straydusk•1h ago•1 comments

The Singularity Is Always Near (2006)

https://kk.org/thetechnium/the-singularity/
16•rmason•1d ago•3 comments

AI is killing B2B SaaS

https://nmn.gl/blog/ai-killing-b2b-saas
103•namanyayg•4h ago•156 comments

Tractor

https://incoherency.co.uk/blog/stories/tractor.html
104•surprisetalk•1d ago•35 comments

Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

https://arxiv.org/abs/2602.00294
134•fheinsen•7h ago•66 comments

A sane but bull case on Clawdbot / OpenClaw

https://brandon.wang/2026/clawdbot
212•brdd•1d ago•348 comments

Arcan-A12: Weaving a Different Web

https://www.divergent-desktop.org/blog/2026/01/26/a12web/
35•ingenieroariel•5h ago•12 comments

RS-SDK: Drive RuneScape with Claude Code

https://github.com/MaxBittker/rs-sdk
68•evakhoury•5h ago•27 comments

Litestream Writable VFS

https://fly.io/blog/litestream-writable-vfs/
40•emschwartz•6d ago•3 comments

Turn any website into a live, structured data feed

https://www.meter.sh/
11•chadwebscraper•2h ago•8 comments

Converge (YC S23) Is Hiring Product Engineers (NYC, In-Person)

https://www.runconverge.com/careers/product-engineer
1•thomashlvt•4h ago

No More Hidden Changes: How MySQL 9.6 Transforms Foreign Key Management

https://blogs.oracle.com/mysql/no-more-hidden-changes-how-mysql-9-6-transforms-foreign-key-manage...
11•ksec•4d ago•4 comments

Claude Code for Infrastructure

https://www.fluid.sh/
74•aspectrr•3h ago•62 comments

Technocracy 2.0

https://brooklynrail.org/2026/02/field-notes/technocracy-2-0/
29•antonomon•1h ago•14 comments

Study: emotional support from social media found to reduce anxiety

https://news.uark.edu/articles/80669/emotional-support-from-social-media-found-to-reduce-anxiety
59•giuliomagnifico•4h ago•62 comments

Coding Agent VMs on NixOS with Microvm.nix

https://michael.stapelberg.ch/posts/2026-02-01-coding-agent-microvm-nix/
70•secure•3d ago•35 comments

Claude Is a Space to Think

https://www.anthropic.com/news/claude-is-a-space-to-think
282•meetpateltech•9h ago•141 comments

A case study in PDF forensics: The Epstein PDFs

https://pdfa.org/a-case-study-in-pdf-forensics-the-epstein-pdfs/
210•DuffJohnson•7h ago•109 comments

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

https://github.com/bethington/ghidra-mcp
252•xerzes•14h ago•63 comments

Old Insurance Maps – Georeferencing Sanborn Fire Insurance Maps on Modern Maps

https://oldinsurancemaps.net/
73•lapetitejort•1w ago•22 comments

Guinea worm on track to be 2nd eradicated human disease; only 10 cases in 2025

https://arstechnica.com/health/2026/02/guinea-worm-on-track-to-be-2nd-eradicated-human-disease-on...
207•bookofjoe•7h ago•88 comments

Show HN: Interactive California Budget (By Claude Code)

https://california-budget.com
12•sberens•1h ago•6 comments

FBI couldn't get into WaPo reporter's iPhone because Lockdown Mode enabled

https://www.404media.co/fbi-couldnt-get-into-wapo-reporters-iphone-because-it-had-lockdown-mode-e...
516•robin_reala•7h ago•429 comments

Show HN: SymDerive – A functional, stateless symbolic math library

19•dinunnob•3d ago•5 comments

Show HN: EpsteIn – Search the Epstein files for your LinkedIn connections

https://github.com/cfinke/EpsteIn
42•cfinke•2h ago•8 comments