AI Coding at Home Without Going Broke

https://stephen.bochinski.dev/blog/2026/06/13/ai-coding-at-home-without-going-broke/

95•sbochins•2h ago

Comments

atreids•1h ago

I find just going via Deepseek's platform API directly, using their V4 flash model, and hooking into a harness like Opencode more than acceptable. Think I've spent maybe $10 over a couple of weeks.

I did explore self-hosting models but hardware right now is just too expensive.

Yoric•1h ago

Directly at DeepSeek? It was my understanding (but I didn't check) that some other AI operators were providing (some of?) DeepSeek's model for cheaper prices.

Still, that's interesting. What do you get for that price? Only coding, or also e.g. image generation?

isatty•1h ago

> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.

Power is not free.

What I’ve found is that you’re basically paying a premium for privacy, and that’s worth it for me.

enraged_camel•1h ago

>> Power is not free.

There's actually an interesting thought experiment here: if it takes you a full day to build something that AI would otherwise build in a day, do you end up using more power, or less? What is the break-even point, purely from a power consumption perspective?

axus•1h ago

What would you do for the rest of the day, power off your devices and go for a long bike ride?

enraged_camel•1h ago

Speaking personally: yes. That's literally what I'm planning to do this afternoon because it's noon and I'm already done with the coding tasks I had on my plate today.

dofm•1h ago

Luckily the future is absolutely going to be that star trek one where technological abundance means we are all wealthy and have free time to develop personally, and not the future where all the money bubbles up into the hands of a thin-skinned malignant narcissist who wants to play with launching rockets and provoking racial violence /s

dofm•1h ago

If an identical task takes a day on both sides, then the human route uses less energy, surely.

Brains are thousands or maybe even millions of times more fuel-efficient than computers and you are alive for the whole day either way, right? You probably eat about the same even.

The reason executives think AI is more efficient is that it more space efficient than a human and doesn't demand to be paid or work only a set number of hours. Everything with computing is more efficient if you resent having to give money to other humans. If they could just not have you be alive when they don't need you, it'd possibly be different.

Even though I think at a typical British freelance rate and a truly unsubsidised token price, the AI is possibly more expensive than me. And as a freelancer, from their perspective I really am not alive until they need me. (This is what it often feels like)

The reality is the human and the AI aren't used to build the same things anyway so it's a comparison you can't really make.

warumdarum•1h ago

Actually if you have solar, it kind of is.. so prIvAt AI compute gets defacto cheaper during the day?

reactordev•1h ago

If you have solar, it is not, because you have battery and equipment degradation from cycle charging, c’mon man…

I would agree with you if you said it was vastly cheaper overall (with the initial equipment investment amortized over time) compared to The Power Company.

In many states, even if you are generating electricity and selling it back to the power company, they still gonna charge you normal rates of usage because greed.

If you go off grid, you have bigger things to worry about than how to power your AI cluster. It’s manageable enough if you have land but that’s in scarce supply.

dnautics•1h ago

> if you have solar, it is not, because you have battery and equipment degradation from cycle charging, c’mon man…

no, the rate of that is pretty independent of use. unless you live in a place where selling energy back rules are designed to screw the solar owner (California)

reactordev•1h ago

California, Arizona, Texas, most of the southern states…

dofm•1h ago

Luckily I needed a new laptop and I bought an M1 Max secondhand from a friend quite cheaply because it was fast enough to recompile something else I am interested in.

So for me, there is no additional hardware cost; it was acquired in replacement.

I run the AI models at home on this kit because I want to; I'll use openrouter if I need to.

I accept the economics of this article are right. But I feel so incredibly sad about this outcome that we're now just to be people caretaking machines that do the job we loved that actually I am not sure that exercising this nuance is going to matter in the long term.

It turns out it is a mistake I have made in my life — now really unfixable because I am a bit too old — to believe that I will always find enough fulfilment in my work to offset the absence of personal fulfilment elsewhere; I have always enjoyed being able to help people directly by doing a thing I love and I am good at, and that has kept away the sadness of finding it difficult to build a conventional family life to enjoy.

I assumed I would always find some new way to find that enjoyment, but even the slim enjoyment from being able to explore this stuff on my own kit in my own terms will not be enough if the pendulum does not swing back towards human effort.

It is a dismal world we have made for ourselves. Lately I have found myself dreading growing too much older in it.

rambojohnson•1h ago

work at a cafe.

dnautics•1h ago

> Power is not free.

its ~free if you have home solar.

jrm4•1h ago

I'm in Florida and am already using AC, so if not "free", definitely "negligible."

throwaway219450•1h ago

Also, I would anticipate at least a 5 year lifespan for a current generation card. The 3090 is still respectable simply because it has 24GB of RAM which, for years, has been the limiting factor for ML at home. If you got a 6000, sure it’s going to cost 7-8k, but the resale value is likely to be very good. Even the 3090 is 50%+ of RRP still. And if you’re not doing LLMs, it’s an interesting value proposition for “classic” CNN vision model training. You can fit enormous batch sizes on 96 GB. The biggest reason to upgrade is perf/watt has about doubled (eg 4000 pro Blackwell is half the 3090 for similar).

People tend to assume the capex is thrown away but as we’ve seen with RAM, don’t be so sure you won’t be able flip it if you need to.

OutOfHere•1h ago

Fixed-price monthly plans ought to be sufficient for most people who actually review their spec and code, for building production-grade software that stand the test of time. A careful spec+review+iteration takes time, resetting the usage quota. Granted, security audits uses tokens too.

If you still need more tokens, odds that you're vibecoding unmaintainable throwaway trash.

manfre•1h ago

With access to view usage for my org and conversations with developers, I think much of the high token usage is a result of people not knowing how to right size the model for the given task. The trend seems to be to pick the most powerful model and use it for everything. Based upon git metrics, I'm one of the top performing engineers at my org and I've yet to run into any overage or throttling on the $200/mo anthropic sub.

justinhj•14m ago

I had no idea git metrics could show your best performers

quickthoughts•1h ago

Ha just wrote a post[1] about a sort of 4th option - max out cheap compute to create more tangible things that can be used/run locally.

1: https://news.ycombinator.com/item?id=48519181

gaigalas•1h ago

> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.

In the good ol' days, we bought machines not only to run stuff, but to experiment.

I understand today experiments are limited. Inference is reasonable, fine-tuning is either niche or a stretch, and base training is impossible.

*That is bound to change*, and when it does, there will be an avalanche of hobbysts and amateurs poking at base training. They'll find optimizations no one found before, synthetize data no one ever imagined to synthetize, and when that happens we'll start getting libre models.

So, yeah. Right now, buying the machine doesn't pay off that well, unless you want to pioneer this stuff in severe adverse conditions (hardware prices inflated, etc). Eventually, it will.

esalman•1h ago

For me, investing in hardware seems to be the way to go.

I learned coding nearly 24 years ago and still learning new stuff all the time. At no point in time I had to rely on a subscription model to learn and do new stuff.

If LLM and agents are the default tools for coding and building software, at least for next few years, it seems like a no-brainer to invest $2000-3000 on hardware, like a Halo Strix PC.

iugtmkbdfil834•1h ago

Yes and no. Hardware does lock you in. Granted, I am happy with my 128gb of shared memory, but I am mildly concerned that it actually is more expensive now than when I bought mine. It does not bode well for the future; not when combined with recent WH admin moves on Anthropic and the reality that next batch of good models may require more than 128gb to run well.

edit: I am not dismissing local. I am one such user ( though I have subs too ), but one has to be clear eyed about the trade-offs.

CraigJPerry•1h ago

I wondered if there might be a no brainer "free" option on discarded hardware.

I have a GTX1080ti which i think is circa 2018, it's unused, more than paid for itself over the years, owes me nothing at this point so the hardware is free.

It runs Gemma e4b multimodal, qwen 3.5 8b or the qwen 4b embeddings models well enough (40+ t/s for the LLMs).

The machine consumes 350 watts at the wall when under load (3 watts when sleeping, 80w at idle). Electricity costs me £0.035GBP/kwh which is cheap for the UK (load shifting via house battery).

144k output tokens for around 1pence (and takes an hour to do that in theory).

It's only JUST cheaper to use than the far more capable deepseek v4 flash model despite the free hardware and ~10x cheaper than normal electricity.

throwatdem12311•1h ago

3k? Try 10

vadansky•1h ago

Can I run something comparable to Opus 4.6 locally yet? I keep hearing conflicting things. If I can spend 10k to do that I would cancel my subscription. The problem is I don’t wanna spend the money to find out myself.

grim_io•1h ago

10k will not get you anywhere near opus or sonnet. It's simply not possible for mere mortals currently.

als0•1h ago

> Can I run something comparable to Opus 4.6 locally yet?

Sadly, no. The best comparable thing you can get is about Sonnet 3.7

Catloafdev•1h ago

If you want frontier-level, the economically reasonable option is OpenRouter or a direct sub to frontier-of-your-choice.

The reality is that they do not offer configurations that would allow a consumer to run that much VRAM on a single setup to protect datacenter margins. Apple used to, and they stopped, those devices are going for ~$20k+ each on ebay now.

You can get very, very capable models on a 3090/4090/5090/6000 series card. But if you want 'frontier level' you are investing ~22k at a bare minimum if you go new. Used you can probably build your own server for much cheaper up-front cost but it's likely going to be 4-6x+ electricity usage.

daemonologist•1h ago

There are also significant economies of scale (namely: utilization and batching), which tend to make inference on a shared server more economical even after the operator takes a cut.

pianopatrick•1h ago

I think someone could find some way to use the smaller local models to write code. Some kind of framework or harness or language or something. But not too many people are working on that because the big models are pretty cheap and a lot better.

petra•1h ago

Maybe one possible path(to make weaker models highly capable) is making the job of the llm as easy as possible.

I wonder if part of the solution is building/finding the right libraries, with the right documentation/language/API(one that plays well with LLM's) and maybe creating some synthetic data around them - to make it very easy for the llm.

And maybe there could be a business model around creating those libraries.

pianopatrick•1h ago

I think as well there might be "algorithms" that can work with local LLMs. With local LLMs there is a small context window, but not that much cost per token. So perhaps there is a way to do lots of small prompts that work in a sequence to produce a result.

Like perhaps you could produce 5 versions of a piece of code, and then compare them to choose the best.

Also if the local LLMs can call tools, maybe you can use static analysis tools to catch errors and try again in a loop or process of some sort.

There also might be certain languages that work better because those languages have better static checks.

jrm4•1h ago

Yes. LITERALLY THIS. I do this! Not hypothetical.

I'll write a detailed prompt for a function, hand it off to 5 or so models (all of which are on my local machine), wait about 5 min and then compare.

jrm4

dempedempe•1h ago

Did you just copy-and-paste an AI response an post it on your blog?

impure•1h ago

I recently made an AI Agent and surprisingly coding with DeepSeek V4 Flash is quite cheap. It probably has to do with the aggressive prompt caching. I'm using OpenRouter with Novita AI as the preferred provider.

kagamino•1h ago

Same here, deepseek v4 flash on opencode go. It's cheap, fats and good enough to follow my instructions

2muchtime•1h ago

I’m using zen because I have a Claude subscription and just like dabbling with the other models and I was shocked at how little flash cost but it was noticeably not at the level I’d like my model to be.

For me MiniMax 3 has really hit the sweet spot of being very cheap, though more than flash, but I’d also very capable.

throwa356262•1h ago

Deepseek v4 via deepseek themselves is significantly cheaper.

Because (1) Huawei collab and (2) vLLM etc dont implement half of the inference optimisations deepseek proposed in their paper.

RomanPushkin•1h ago

AI coding at home literally costs $100/month. I'm wondering where $400 is coming from? $100 is more than enough for "coding at home", IMO. I rarely face the limits, and when I do it's just a time for a quick walk anyway.

chasd00•2m ago

Man I’m using the $20/month sub and it works just fine for me. Granted, I have a family and house and lots of obligations so by the time I hit the limits some other task is due before I can return to coding. If I hit the limits before I have something else to do then I just code by hand or review what has been generated until I can use the agent again. Reviewing agent code is a good way to learn too, agents have shown me different approaches than what I would have done and they’re definitely worth thinking about.

zuzululu•1h ago

Another update for codex users they let you accumulate resets which greatly adds to the mileage

I don't think its feasible to have something comparable to these frontier models when they are increasing usage and lowering token costs

mwcampbell•1h ago

I invested about $4,000 in an NVIDIA DGX Spark several months ago. 128 GB of unified RAM, and the NVIDIA GB10 chip. With the RAM, the several CPU cores, and the 4 TB NVMe SSD, it's a very capable ARM64 Linux computer even without the GPU, and so far I've mostly been using it as such. But I wonder, what's the most capable model, specifically for coding, that can run well on that hardware?

Yoric•1h ago

https://www.canirun.ai/?status=tight might answer that question

morganastra•55m ago

Deepseek v4 flash is shockingly strong for its size and reportedly runs well on that hardware.

jacobgold•1h ago

"Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling."

I realize this text is just slop but it never stops being a "real bargain" at any point.

And it's more like $200/mo for $4000+/mo in tokens. You can also buy additional subscriptions.

There's no sense in running local models or doing anything else as long as VCs (and soon the public markets) are willing to pay your bill.

abc42•1h ago

Even if they were making a profit, their scale and expertise will obviously give you a cheaper product than what you can build.

jacobgold•1h ago

Maybe today but it's not a law of nature. It seems inevitable that AI models and coding agents will be fully commoditized eventually, just like computers, game engines, compilers, web servers, and so many other technologies have been.

At the end of the day, AI models are relatively small files that we run little CUDA programs on.

simonw•1h ago

SemiAnalysis pushed this to the limit and managed to get $8,000 of tokens from a $200/month Anthropic plan and $14,000 of tokens from a $200/month OpenAI plan: https://twitter.com/SemiAnalysis_/status/2064815044085318040

jacobgold•1h ago

Yeah, although that is pushing every rate limit and no one knows what happens if you do that consistently? I think $4,000/mo is probably a good estimate for an individual dev doing synchronous coding agent work.

abc42•1h ago

What kind of usage chews through Claude Max x20? I use several agents with max effort in parallel and usually end up with something like 50% weekly usage. Fable almost allowed me to get to 70% but then they started resetting the limits mid-week and of course now ended the whole thing.

tamimio•1h ago

You can have opencode and switch between multiple providers based on the tasks you are doing on the fly, normal tasks use deepseek for example, hard one use gpt5 or opus4, and track the usage with something like codexbar or similar. Openrouter seems to charge extra on top of the api costs, same with zen ide, so keep that in mind.

MemoryHoleHQ•1h ago

I've been thinking a lot about this and my personal take right now is that at some near-medium future the models abvailable to run at home and the hardware needed to use them will be enough.

My baseline is sonnet 4.6. I think it's good enough for most tasks sincerly. So, from what I see, we are already at a point where we don't need frontier models for serious coding and debuging. Give it a couple of years and that level will fit 120B models.

At the same time, we saw the rise of direct acess memory systems like DGX or Stryx Halo that will allow to run models of this size for "cheap" in the medium term.

That's what I'm betting in. That in 2 years I can buy a system for about $2500 that will run a model that's similar to Sonnet 4.6 locally.

I might be spectacularly wrong though. But I'm willing to wait and use subscriptions/API calls for now.

13415•1h ago

I use copy & paste with a pro subscription. I guess I'm a bit behind in terms of tool use but it works great for me.

TheSkyHasEyes•1h ago

Similar story. I did have a pro subscription as a trial. I'm finding the free tier is as good(for my purposes) as the paid model.

sesm•1h ago

> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.

As usual, an extraordinary claim without an extraordinary evidence: https://stephen.bochinski.dev/apps/

tunesmith•1h ago

I feel like I must have plateued and don't know what to do next to level up. I'm currently on the $100/month codex plan and it seems fine using 5.5-xhigh all the time. I think of what to do next, have a chat session to determine exactly what to ask for up to the point of being ready to implement, and then codex churns on a commit-sized task whereupon I briefly check it on my local dev server. If necessary I ask for a change. Then I ask it to commit and recommend the next step based off the spec. Oftentimes I have to "approve" an out-of-sandbox request anyway.

I haven't found anything that requires running all night. I could tell it to one-shot a big plan but given how often I realize I want an intermediary thing to be slightly different it seems like a waste of effort.

I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).

I don't understand what people are doing with their side projects that is leading them to churn through tokens so quickly, to the point of requiring two $200/month subscriptions and a bunch of token charges besides.

dnautics•1h ago

I have been on $100/mo claude and it has been churning out quite good software for months now. like i estimate what would have taken me three ish years, assuming i didn't burn out from failure (i would have). i only hit limits when i double fisted claude with my main project and my side project. just the other day i noticed i had been stuck on 4.5 because i failed to update the npm package.

sheremetyev•1h ago

> I don't want to give it "dangerous" access to my entire mac

I'm running Claude/Codex inside native macOS sandbox, configured with a simple script - https://github.com/sheremetyev/sandfence

always in "bypass permissions" mode - it works until task is solved, sometime 1 hour or more (which includes running tests etc)

jrm4•1h ago

Is spending (metered money) even worth it? Perhaps for most I mean "beyond like a 30 bucks a month," but for me I'm literally not spending more money beyond my very cheapo 16gb video card.

No clue what y'all are doing, perhaps because I'm hobbying, and also I'm old and can perhaps do more of this by hand.

But I'm basically just doing what I did before, plus ollama self hosted and sometimes gemini and I feel like I'm going lightspeed beyond what I've ever done.

And I suppose this is still very fine-grained. I have it make a draft, then just have them fix/change it step by step?

I tried one of the bigger boys that can one-shot apps, which I guess is cool, but I'm finding it's just as hard to modify as if I just grabbed someone elses repo on github.

WhiteOwlLion•1h ago

There’s a lot of Xeon chips for $10 on eBay. Too bad there’s no drive for cpu based inference. The data center will need to swap out the older gpu clusters so what does that do for hardware pricing on data center gpus? H100 are cheap enough but the power requirements make it a long term net negative for how much pay for power in California.

spgorbatiuk•1h ago

Hardware and provider juggling is a way to go, although I think it is also worth mentioning that the cost is not only the price-per-token, but first of all, the amount of tokens used.

Depending on what one builds, comprehensive documentation and applicable skills and memory tools often allow for a substantial reduction of tokens previously used by the agent to comprehend and remember what is being built

geophph•49m ago

> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.

What does this look like after 6-12 months? Like, how much code are you trying to write total?

Maybe it just doesn’t click in my mind, but sometimes I wonder about how much work people are trying to do and how they actually have enough to get done so quickly in such a short amount of time.

sublinear•25m ago

They prefer to work harder and not smarter. Forever hill climbing to nowhere.

I've never worked on a complicated codebase that started out that way until the rest of the business concerns and office politics came into effect. People may not like it, but the bureaucracy is far and away more valuable than the core functionality.

Mature codebases are years of people thinking of all the possible gotchas while solving their acute pain points. This is not fluff, but the living and breathing part of it. Without that code, it's just a machine barely doing stuff in the most obtuse ways possible that nobody wants to pay for.

I would argue that they're putting LLMs to work on that finer detail stuff, but AI is still far too dumb. No, what they're doing is playing with their skinner box.

hillj23•34m ago

I think this is only going to become more relevant. I'm personally a $200/mo Claude Maxer and I know that the usage I'm getting on Opus 4.8 Max and (until they yoked it out from under me) Fable 5 is way, way more than what I'm paying them. At some point, this will turn usage-based and I will be hammered on it and probably forced to look at self-hosting. I think while the caps are there, even at $200, it's honestly not too bad if you're coding value into the market, but as soon as those caps come off for retail AI users, we're all going to have some tough choices to make.

bachmeier•33m ago

> The upfront cost is steep and the models you can actually run at home are weaker than what the frontier labs ship, so this only pays off if you can keep the rig busy with long running tasks where a slower, cheaper model grinds away overnight. Most people can’t keep a home machine that loaded, and the hardware you buy today may look like a bad bet in a year.

Oh, so this is not a post about AI coding at home. It's about vibe coding at home.

There's a lot I disagree with in this post, but I'm posting this from a home computer with 64 GB of RAM and no GPU. I do lots of AI coding while spending very little money. I run Gemma 4 26b (mixture of experts) and Qwen 3 coder with Ollama. I use Github Copilot code completions. I use the Gemini and Mistral API free tiers. I have a Gemini paid API account. It's now prepaid, so you don't have to worry about an accidental $1000 bill. You can do a lot of things with Gemini Flash Lite 3.1.

None of this is burning through tokens to create an expensive blob of spaghetti code, but it does qualify as AI coding.

pshirshov•16m ago

> and the hardware you buy today may look like a bad bet in a year.

3090s and 7900s are going well so far.

Next year an Arc Pro B70 won't produce you less tokens than today.

They aren't fast but if you have flows where you can make money with them - they are a bargain in terms of price per Gb.

Kuyawa•8m ago

This month I've spent only 15 cents using DeepSeek API and my own coding agent. Three apps delivered to clients and currently working on a tournament management app for pickleball, padel and beach tennis. I love DeepSeek.

mikgp•5m ago

What are people doing at home? I have like 5 different apps I code on the $20/month Claude plan and like sure I can hit rate limits but - What are people doing to burn through $3k in tokens?

US bans differential privacy in Census data

Treating pancreatic tumours may have revealed cancer's master switch

GameBoy Workboy

Every Frame Perfect

Appreciating Exif

The experience of rendering Arabic typography and its technical debt

The adder at the heart of Intel's 8087 floating-point chip

A low-carbon computing platform from your retired phones

GLM 5.2 Is Out

AI Coding at Home Without Going Broke

Electric motors with no rare earths

Show HN: I am building a map of people who lived in the Roman Empire

AI OSS tool repo goes archived over night after raising $7.3M Seed

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

Show HN: Paca – Lightweight Jira alternative for human-AI collaboration

Amazon CEO's Talks with U.S. Officials Triggered Crackdown on Anthropic Models

Statement on US government directive to suspend access to Fable 5 and Mythos 5

An Interview with Intel's Kira Boyko: Xeon 6's Product Director

The state of building user interfaces in Rust

Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages

PwC Report: AI Making Medical Bills Higher

Tessera – a consent-gated tunnel that's blind to your traffic

Trophic memory, deer, and a unique scientific object

How to setup a local coding agent on macOS

Open source AI must win

The computer science degree isn’t dead

Shepherd's Dog: A Game by Fable

Show HN: Putt.day a daily mini golf game

Malware developers added nuclear and biological weapons text to to their spyware

Show HN: 2 Weeks of Hallucinate – The Photo Gallery

AI Coding at Home Without Going Broke

Comments

US bans differential privacy in Census data

Treating pancreatic tumours may have revealed cancer's master switch

GameBoy Workboy

Every Frame Perfect

Appreciating Exif

The experience of rendering Arabic typography and its technical debt

The adder at the heart of Intel's 8087 floating-point chip

A low-carbon computing platform from your retired phones

GLM 5.2 Is Out

AI Coding at Home Without Going Broke

Electric motors with no rare earths

Show HN: I am building a map of people who lived in the Roman Empire

AI OSS tool repo goes archived over night after raising $7.3M Seed

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

Show HN: Paca – Lightweight Jira alternative for human-AI collaboration

Amazon CEO's Talks with U.S. Officials Triggered Crackdown on Anthropic Models

Statement on US government directive to suspend access to Fable 5 and Mythos 5

An Interview with Intel's Kira Boyko: Xeon 6's Product Director

The state of building user interfaces in Rust

Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages

PwC Report: AI Making Medical Bills Higher

Tessera – a consent-gated tunnel that's blind to your traffic

Trophic memory, deer, and a unique scientific object

How to setup a local coding agent on macOS

Open source AI must win

The computer science degree isn’t dead

Shepherd's Dog: A Game by Fable

Show HN: Putt.day a daily mini golf game

Malware developers added nuclear and biological weapons text to to their spyware

Show HN: 2 Weeks of Hallucinate – The Photo Gallery