Power is not free.
What I’ve found is that you’re basically paying a premium for privacy, and that’s worth it for me.
There's actually an interesting thought experiment here: if it takes you a full day to build something that AI would otherwise build in a day, do you end up using more power, or less? What is the break-even point, purely from a power consumption perspective?
Brains are thousands or maybe even millions of times more fuel-efficient than computers and you are alive for the whole day either way, right? You probably eat about the same even.
The reason executives think AI is more efficient is that it more space efficient than a human and doesn't demand to be paid or work only a set number of hours. Everything with computing is more efficient if you resent having to give money to other humans. If they could just not have you be alive when they don't need you, it'd possibly be different.
Even though I think at a typical British freelance rate and a truly unsubsidised token price, the AI is possibly more expensive than me. And as a freelancer, from their perspective I really am not alive until they need me. (This is what it often feels like)
The reality is the human and the AI aren't used to build the same things anyway so it's a comparison you can't really make.
I would agree with you if you said it was vastly cheaper overall (with the initial equipment investment amortized over time) compared to The Power Company.
In many states, even if you are generating electricity and selling it back to the power company, they still gonna charge you normal rates of usage because greed.
If you go off grid, you have bigger things to worry about than how to power your AI cluster. It’s manageable enough if you have land but that’s in scarce supply.
no, the rate of that is pretty independent of use. unless you live in a place where selling energy back rules are designed to screw the solar owner (California)
So for me, there is no additional hardware cost; it was acquired in replacement.
I run the AI models at home on this kit because I want to; I'll use openrouter if I need to.
I accept the economics of this article are right. But I feel so incredibly sad about this outcome that we're now just to be people caretaking machines that do the job we loved that actually I am not sure that exercising this nuance is going to matter in the long term.
It turns out it is a mistake I have made in my life — now really unfixable because I am a bit too old — to believe that I will always find enough fulfilment in my work to offset the absence of personal fulfilment elsewhere; I have always enjoyed being able to help people directly by doing a thing I love and I am good at, and that has kept away the sadness of finding it difficult to build a conventional family life to enjoy.
I assumed I would always find some new way to find that enjoyment, but even the slim enjoyment from being able to explore this stuff on my own kit in my own terms will not be enough if the pendulum does not swing back towards human effort.
It is a dismal world we have made for ourselves. Lately I have found myself dreading growing too much older in it.
its ~free if you have home solar.
People tend to assume the capex is thrown away but as we’ve seen with RAM, don’t be so sure you won’t be able flip it if you need to.
If you still need more tokens, odds that you're vibecoding unmaintainable throwaway trash.
In the good ol' days, we bought machines not only to run stuff, but to experiment.
I understand today experiments are limited. Inference is reasonable, fine-tuning is either niche or a stretch, and base training is impossible.
*That is bound to change*, and when it does, there will be an avalanche of hobbysts and amateurs poking at base training. They'll find optimizations no one found before, synthetize data no one ever imagined to synthetize, and when that happens we'll start getting libre models.
So, yeah. Right now, buying the machine doesn't pay off that well, unless you want to pioneer this stuff in severe adverse conditions (hardware prices inflated, etc). Eventually, it will.
I learned coding nearly 24 years ago and still learning new stuff all the time. At no point in time I had to rely on a subscription model to learn and do new stuff.
If LLM and agents are the default tools for coding and building software, at least for next few years, it seems like a no-brainer to invest $2000-3000 on hardware, like a Halo Strix PC.
edit: I am not dismissing local. I am one such user ( though I have subs too ), but one has to be clear eyed about the trade-offs.
I have a GTX1080ti which i think is circa 2018, it's unused, more than paid for itself over the years, owes me nothing at this point so the hardware is free.
It runs Gemma e4b multimodal, qwen 3.5 8b or the qwen 4b embeddings models well enough (40+ t/s for the LLMs).
The machine consumes 350 watts at the wall when under load (3 watts when sleeping, 80w at idle). Electricity costs me £0.035GBP/kwh which is cheap for the UK (load shifting via house battery).
144k output tokens for around 1pence (and takes an hour to do that in theory).
It's only JUST cheaper to use than the far more capable deepseek v4 flash model despite the free hardware and ~10x cheaper than normal electricity.
Sadly, no. The best comparable thing you can get is about Sonnet 3.7
The reality is that they do not offer configurations that would allow a consumer to run that much VRAM on a single setup to protect datacenter margins. Apple used to, and they stopped, those devices are going for ~$20k+ each on ebay now.
You can get very, very capable models on a 3090/4090/5090/6000 series card. But if you want 'frontier level' you are investing ~22k at a bare minimum if you go new. Used you can probably build your own server for much cheaper up-front cost but it's likely going to be 4-6x+ electricity usage.
I wonder if part of the solution is building/finding the right libraries, with the right documentation/language/API(one that plays well with LLM's) and maybe creating some synthetic data around them - to make it very easy for the llm.
And maybe there could be a business model around creating those libraries.
Like perhaps you could produce 5 versions of a piece of code, and then compare them to choose the best.
Also if the local LLMs can call tools, maybe you can use static analysis tools to catch errors and try again in a loop or process of some sort.
There also might be certain languages that work better because those languages have better static checks.
I'll write a detailed prompt for a function, hand it off to 5 or so models (all of which are on my local machine), wait about 5 min and then compare.
For me MiniMax 3 has really hit the sweet spot of being very cheap, though more than flash, but I’d also very capable.
Because (1) Huawei collab and (2) vLLM etc dont implement half of the inference optimisations deepseek proposed in their paper.
I don't think its feasible to have something comparable to these frontier models when they are increasing usage and lowering token costs
I realize this text is just slop but it never stops being a "real bargain" at any point.
And it's more like $200/mo for $4000+/mo in tokens. You can also buy additional subscriptions.
There's no sense in running local models or doing anything else as long as VCs (and soon the public markets) are willing to pay your bill.
At the end of the day, AI models are relatively small files that we run little CUDA programs on.
My baseline is sonnet 4.6. I think it's good enough for most tasks sincerly. So, from what I see, we are already at a point where we don't need frontier models for serious coding and debuging. Give it a couple of years and that level will fit 120B models.
At the same time, we saw the rise of direct acess memory systems like DGX or Stryx Halo that will allow to run models of this size for "cheap" in the medium term.
That's what I'm betting in. That in 2 years I can buy a system for about $2500 that will run a model that's similar to Sonnet 4.6 locally.
I might be spectacularly wrong though. But I'm willing to wait and use subscriptions/API calls for now.
As usual, an extraordinary claim without an extraordinary evidence: https://stephen.bochinski.dev/apps/
I haven't found anything that requires running all night. I could tell it to one-shot a big plan but given how often I realize I want an intermediary thing to be slightly different it seems like a waste of effort.
I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).
I don't understand what people are doing with their side projects that is leading them to churn through tokens so quickly, to the point of requiring two $200/month subscriptions and a bunch of token charges besides.
I'm running Claude/Codex inside native macOS sandbox, configured with a simple script - https://github.com/sheremetyev/sandfence
always in "bypass permissions" mode - it works until task is solved, sometime 1 hour or more (which includes running tests etc)
No clue what y'all are doing, perhaps because I'm hobbying, and also I'm old and can perhaps do more of this by hand.
But I'm basically just doing what I did before, plus ollama self hosted and sometimes gemini and I feel like I'm going lightspeed beyond what I've ever done.
And I suppose this is still very fine-grained. I have it make a draft, then just have them fix/change it step by step?
I tried one of the bigger boys that can one-shot apps, which I guess is cool, but I'm finding it's just as hard to modify as if I just grabbed someone elses repo on github.
Depending on what one builds, comprehensive documentation and applicable skills and memory tools often allow for a substantial reduction of tokens previously used by the agent to comprehend and remember what is being built
What does this look like after 6-12 months? Like, how much code are you trying to write total?
Maybe it just doesn’t click in my mind, but sometimes I wonder about how much work people are trying to do and how they actually have enough to get done so quickly in such a short amount of time.
I've never worked on a complicated codebase that started out that way until the rest of the business concerns and office politics came into effect. People may not like it, but the bureaucracy is far and away more valuable than the core functionality.
Mature codebases are years of people thinking of all the possible gotchas while solving their acute pain points. This is not fluff, but the living and breathing part of it. Without that code, it's just a machine barely doing stuff in the most obtuse ways possible that nobody wants to pay for.
I would argue that they're putting LLMs to work on that finer detail stuff, but AI is still far too dumb. No, what they're doing is playing with their skinner box.
Oh, so this is not a post about AI coding at home. It's about vibe coding at home.
There's a lot I disagree with in this post, but I'm posting this from a home computer with 64 GB of RAM and no GPU. I do lots of AI coding while spending very little money. I run Gemma 4 26b (mixture of experts) and Qwen 3 coder with Ollama. I use Github Copilot code completions. I use the Gemini and Mistral API free tiers. I have a Gemini paid API account. It's now prepaid, so you don't have to worry about an accidental $1000 bill. You can do a lot of things with Gemini Flash Lite 3.1.
None of this is burning through tokens to create an expensive blob of spaghetti code, but it does qualify as AI coding.
3090s and 7900s are going well so far.
Next year an Arc Pro B70 won't produce you less tokens than today.
They aren't fast but if you have flows where you can make money with them - they are a bargain in terms of price per Gb.
For comparison, a modern frontier model like Gemini 3.5 Pro consumes about 15kW -- so only about 1.5x the fully loaded human. In an 8h workday, that model would crank through ~80M tokens (~$5k at API prices). That's ~4 major refactors of a 10k LOC codebase, so probably not a very realistic comparison to a single human dev.
I think a more useful comparison, based on my experience, is that an engineer with AI support can get one 8h day's worth of unassisted work done in 1h. So, the 25 kWh consumed during collaboration (conservatively assuming I keep the GPU hot for the whole hour) frees up the remaining 70 kWh I'll draw down for the day to be spent in some other way.
Then, assume power costs 20 cents per kilowatt hour (US avwrage) To match the human 3 cents per hour, you need an average of 150 watts of power drawn per hour. That's in the range of a budget graphics card, but not much past there.
However, if you sleep instead of sitting around, you can probably make AI cost competitive. Sleeping drops your metabolic rate by more, and lying down in bed (as opposed to sitting) also reduces calorie burn. Combined, you can reduce your burn by like 30 calories an hour. At the new 9 cents per hour human cost, you can afford to run a higher end graphics card at ~450 watts per hour. That puts you in RTX 3090 range.
But that feels like measuring productivity in lines of code. For what I'm doing, I'm not seeing the benefit in any subscription.
Sure, I can't one-prompt a whole new boring CRUD app, but oh well.
That position is not without its own risks, though. Maybe Opus 4.8 will run on a single chip by 2028... and maybe you won't be allowed to touch it.
And what if Xi makes a play for Taiwan? That would be stupid, but so was invading Ukraine with tanks from Temu, and it still happened.
But - good luck finding them. Apple discontinued the model a few months ago. And more recently, even 256G model was discontinued. Big AI really really does not want people to get off their needle.
Which is to say, I might use AI to do an outline/organizational , but I'm prompting every chunk of code "one-by-one," (e.g. at about the "function" level) which still feels lightyears ahead of what I used to do.
I see people just completely wasting tokens with ridiculous setups, 100% hitting cache misses as well as dumping huge files into context all the time.
Just learn how these things work, or pay the price I guess.
Here’s an example https://m.youtube.com/watch?v=xc1296HY8Fw&ra=m
It’s completely different to a professional workflow (what you described). It’s a toy for consumers
atreids•1h ago
I did explore self-hosting models but hardware right now is just too expensive.
Yoric•1h ago
Still, that's interesting. What do you get for that price? Only coding, or also e.g. image generation?