[1] https://www.youtube.com/live/khr-cIc7zjc?si=oI9Fj33JBeDlQEYG
It would be cheaper to your company to literally pay your salary while you do nothing.
A year has 2000 working hours, which is 24000 5-minute intervals. That means the company spending at least $240,000 on the Claude API (conservatively). So they would be better off having $100-200k you do nothing and hiring someone competent for that $240k.
However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.
You can try it for cheap with the normal pay-as-you-go way.
They're good about telling you how full your context is, and you can use /compact to shrink it down to the essentials.
But for those of us who aren't Mr. MoneyBags like you all, keeping an eye on context size is key to keeping costs low.
In contrast - I’m not interested in using cheaper, less-than, services for my livelihood.
i use it for very targeted operations where it saves me several roundtrips to code examples and documentation and stack overflow, not spamming it for every task i need to do, i spend about $1/day of focused feature development, and it feels like it saves me about 50% as many hours as i spend coding while using it.
AI coding saves me a lot of time writing high-quality code, as it takes care of the boilerplate and documentation/API lookups, while I still review every line, and vibe coding lets me quickly do small stuff I couldn't do before (e.g. write a whole app in React Native), but gets really brittle after a certain (small) codebase size.
I'm interested to hear whether Claude Code writes less brittle code, or how you use it/what your experience with it is.
I'm curious, what was the return? What did you do with the 1k?
(updated for better example)
It is not a challenging technical thing to do. I could have sat there for hours reading the conversion from v1 to v2 to v3 to v4. It is mostly just changing class names. But these changes are hard to do with %s/x/x, so you need to do them manually. One by One. For hundreds of classes. I could have as easily shot myself in the head.
> Could you anonymize and share your last 5-10 prompts?
The prompt was a simple "convert this site from tailwind v1 to v4". I use neovim copilot chat to inject context and load URLs. I have found that prompts have no value, it is either something the LLM can do or not.
- https://gist.github.com/backnotprop/ca49f356bdd2ab7bb7a366ef...
- https://gist.github.com/backnotprop/d9f1d9f9b4379d6551ba967c...
- https://gist.github.com/backnotprop/e74b5b0f714e0429750ef6b0...
- https://gist.github.com/backnotprop/91f1a08d9c27698310d63e06...
- https://gist.github.com/backnotprop/7f7cb63aceb7560e51c02a9d...
- https://gist.github.com/backnotprop/94080dde34bfca3dd9c48f14...
- https://gist.github.com/backnotprop/ea3a5c3a31799236115abc76...
Taken from 2 recent systems. 90% of my interaction is assurance, debugging, and then having claude manage the meta management framework. We work hard to set the path for actual coding - thus code output (even complex or highly integrated) usually ends up being fairly smooth+fast.
Could you explain why there is no interpunction?
Basically anything that isnt gpt4o is premium, and I find gpt4o near useless compared to Claude and Gemini in copilot.
It's a hit and miss IMO.
I like it for C#/dotnet but completely useless for the rest of the stuff I do (mostly web frontend).
I'm not sure about my usage but if I hit those premium limits I'm probably going to cancel Copilot.
This might mean the $10/month is the best. Depends entirely on how it works for you.
(Caps obviously impact the total benefit so I agree there.)
Just to give you one example - last BigCo I worked for had a schematic for new projects which resulted in... 2k EUR per month cloud cost for serving a single static html file.
At one point someone up top decided that kubes is the way to go and scrambled an impromptu schematic for new projects which could be simply described as a continental class dreadnought of a kubernetes cluster on AWS.
And it was signed off, and later followed like a scripture.
Couple stories lower we're having hard time arguing for 50 EUR budget for a weekly beer for the team, but the company is A fine with paying 2K EUR for a landing page.
Limits are a given on any plan. It would be too easy for a vibe coder to hammer away 8 hours a day for 20 days a week if there was nothing stopping them.
The real question is whether this is a better value than pay as you go for some people.
Your vibe coders are on a different dimension than mine.
I don't think this is the right way to look at it. If CoPilot helps you earn an extra $100 a month (or saves you $100 worth of time), and this one is ~2x better, it still justifies the $100 price tag.
Additionally, when you’re in a compact distribution, being 5% better might be 100x more valuable to you.
Basically, this assumes that the marginal value is associated with cost. I don’t think most things, economically, seem to match that pattern. I will sometimes pay 10x the cost for a good meal that has fewer calories (nutritional value)
I am glad people like you exist, but I don’t think the proposition you suggest makes sense.
What worked for me was coming up with an extremely opinionated way to develop an application and then generating instructions (mini milestones) by combining it with the requirements.
These instructions end up being very explicit in the sequence of things it should do (write the tests first), how the code should be written and where to place it etc. So the output ended up being very similar regardless of the coding agent being used.
In the codebase I've tried modularity via monorepo, or faux microservices with local apis, monoliths filled with hooks and all the other centralized tricks in the book. Down to the very very simple. Whatever I could do to bring down the context window needed.
Eventually.....your return diminish. And any time you saved is gone.
And by the time you've burned up a context window and you're ready to get out. Now you're expeciting it to output a concise artifact to carry you to the next chat so you don't have to spend more context getting that thread up to speed.
Inevitably the context window and the LLMs eagerness to touch shit that it's not supposed (the likelihood of which increases with context) always gets in the way.
Anything with any kind of complexity ends up in a game of too much bloat or the LLM removing pieces that kill other pieces that it wasn't aware about.
/VENT
Using Gemini 2.5 for generating instructions
This is the guide I use
https://github.com/bluedevilx/ai-driven-development/blob/mai...
You have to puppeteer it and build a meta context/tasking management system. I spend a lot of time setting Claude code up for success. I usually start with Gemini for creating context, development plans, and project tasking outlines (I can feed large portions of codebase to Gemini and rely on its strategy). I’ve even put entire library docsites in my repos for Claude code to use - but today they announced web search.
They also have todos built in which make the above even more powerful.
The end result is insane productivity - I think the only metric I have is something like 15-20k lines of code for a recent distributed processing system from scratch over 5 days.
https://gist.github.com/backnotprop/4a07a7e8fdd76cbe054761b9...
The framework is basically the instructions and my general guidance for updating and ensuring the details of critical information get injected into context. some of those prompts I commented here: https://news.ycombinator.com/item?id=43932858
> I spend a lot of time setting Claude code up for success.
Normally I wouldn't post this because it's not constructive, but this piece stuck out to me and had me wondering if it's worth the trade-off. Not to mention programmers have spent decades fighting against LoC as a metric, so let's not start using it now!
Ive done just about everything across the full & distributed stack. So I'm down to jam on my code/systems and how I instruct & rely on (confidently) AI to help build them.
I don't think I've ever done this or worked with anyone who had this type of output.
I daily drive cursor and I have rules to limit comments. I get comments on complex lines and that’s it.
A lot of people seem to have these magic incantations that somehow make LLMs work really well, at the level marketing and investor hype says they do. However, I rarely see that in the real world. I'm not saying this is true for you, but absent vaguely replicable examples that aren't just basic webshit, I find it super hard to believe they're actually this capable.
For context, this is aider tracking aider's code written by an LLM. Of course there's still a human in the loop, but the stats look really cool. It's the first time I've seen such a product work on itself and tracking the results.
If you don't like what it suggests, undo the changes, tweak your prompt and start over. Don't chat with it to fix problems. It gets confused.
https://gist.github.com/rachtsingh/e3d2e2b495d631b736d24b56e...
Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.
I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
Example;
I'm wrapping up, right now, an updated fork of the PHP extension `phpredis` because Redis 8 recently was released with support for a new data type, Vector Set but the phpredis extension (which is far more performant that non-extension redis libraries for PHP) doesn't support the new vector-related commands. I forked the extension repo, which is in C (I'm a PHP developer, I had to install CLion for the first time just to work along with CC) and fired up claude code with the initial prompt/task of analyzing the extensions code and documenting the purpose, conventions, and anything that it (claude) felt would benefit the bootstrapping process of future sessions such that whole files wouldn't need to be read into a CLAUDE.md file.
This initially, depending on the size of the codebase, could be "expensive". Being that this is merely a PHP extension and isn't a huge codebase, I was fine letting it just rip through the whole thing however it saw fit - were this a larger codebase I'd take a more measured approach to this initial "indexing" of the codebase.
This results in a file that claude uses like we do a readme.
Next I end this session, start a new one and tell it to review that CLAUDE.md file (I specifically tell it to do this, every single new session start moving forward) and then generate a general overview/plan of what needs to be done in order to implement the new Vector Set related commands so that I can use this custom phpredis extension in my PHP environments. I indicated that I wanted to generate a suite of tests focused on ensuring each command works with all of it's various required and optional parameters and that I wanted to use docker containers for the testing rather than mess up my local dev environment.
$22 in API costs and ~6 hours spent and I have the extension, working, in my local environment with support for all of the commands I want/need to use. (there's still 5 commands that I don't intend to use that I haven't implemented)
Not only would I have certainly never embarked upon trying to extend a C PHP extension, I wouldn't have done so over the course of an evening and morning.
Another example:
Before this redis vector sets thing I used CC to build a python image and text embedding pipeline backed by Redis streams and Celery that consumes tasks pushed to the stream by my Laravel application that currently manages ~120 million unique strings and ~65 million unique images that I've been generating embeddings for. Prior to this I'd spent very little time with Python and zero with anything related to ML. Now I have a performant python service that's portable that I run from my Macbook (M2 Pro) or various GPU-having Windows machines in my home that generate the embeddings on an 'as available' basis, pushing the results back to a redis stream that my Laravel app then consumes and processes.
The results of these embeddings and the similarity-related features that they've brought to the Laravel application are honestly staggering. And while I'm sure I could have spent months stumbling through all of this on my own - I wouldn't have, I don't have that much time for side project curiosities.
Somewhat related - these similarity features have directly resulted in this side project becoming a service people now pay me to use.
On a day to do - the effectiveness is a learned skill. You really need to learn how to work with it in the same way you, as a layperson, wouldn't stroll up to a highly specialized piece of aviation technology and just infer how to use it optimally. I hate to keep parroting "skill issue" but - it's just wild to me how effective these tools are and how there's so many people who don't seem to be able to find any use.
If it's burning through cash, you're not being focused enough with it. If it's writing code that's always slightly wrong, stop it and make adjustments. Those adjustments likely/potentially need to be documented in something like I described above in a long-running document used similarly to a prompt.
From my own experience, I watch the "/settings/logs" route on anthropics website while CC is working once I know that we're getting rather heavy with the context. Once it gets into the 50-60,000 tokens range I either aim to wrap up whatever the current task is, or I understand that things are going to start getting a little wonky into the 80k+ range. It'll keep on working up into the 120-140k tokens or more - but you're likely going to end up with lots of "dumb" stuff happening. You really don't want to be here unless you're _sooooo close_ to getting done what you're trying to. When the context gets too high and you need/want to reset but you're mid task - /compact [add notes here about next steps] and it'll generate a summary that will then be used to bootstrap the next session. (Don't do this more than once, really, as it starts losing a lot of context - just reset the session fully after the first /compact)
If you're constantly running into huge contexts you're not being focused enough. If you can't even work on anything without reading files with thousands of lines - either break up those files somehow or you're going to have to be _really_ specific with the initial prompt and context - which I've done lots of. Say I have a model that belongs to a 10+ year old project that is 6000 lines long and I want to work on a specific method in that model - I'll just tell claude in the initial message/prompt which line that method starts on, ends on and what number of lines from the start of the model it should read (so it can get the namespace, class name, properties, etc) and then let it do it's thing. I'll tell it specifically not to read more than 50 lines of that file at a time when looking for something or reviewing something, or even to stop and ask me to locate a method/usages of things, etc rather than reading whole files into context.
So, again, if it's burning through money - focus your efforts. If you think you can just fire it up and give it a generic task - you're going to burn money and get either complete junk, or something that might technically work but is hideous, at least to you. But, if you're disciplined and try to set or create boundaries and systems that it can adhere to - it does, for the most part.
Whether it turns out to be cheaper depends on your usage.
I thought Claude Code was absurdly expensive and not at all more capable than something like chatgpt combined with copilot.
Do people really get that much value from these tools?
I use Github's Copilot for $10 and I'm somewhat happy for what I get... but paying 10x or 20x that just seems insane.
Also the world is much bigger than the US.
Tons of software developer jobs in the US for non-FAANG tier or unicorn startup companies are >$100k and easily hit $120-150k.
Also the fourth quintile mean was like $120k in the US in 2022. So you'd be in the top 30% of earners making that kind of money, not the top 10%.
https://taxpolicycenter.org/statistics/household-income-quin...
So still way below than $240k, no?
> So you'd be in the top 30% of earners making that kind of money, not the top 10%.
Maybe you missed it but I actually wrote "10-20%".
Also in 2024 earning $100k puts you in the top 20% of the US population.
https://dqydj.com/salary-percentile-calculator/
(which is already way above even the EU for dev salaries)
Also, I noticed where our sources diverged. I was looking at household income. My bad.
> which is already way above even the EU for dev salaries
Maybe they're underpaid.
Either way, I was responding to the idea that only a FAANG salary would cost an employer $20k/mo. For US software developer jobs, it can easily hit that without being in FAANG-tier or unicorn startup level companies. Tons of mid-sized low-key software companies you've never heard of pay $120k+ for software devs in the US.
The median software developer in Texas makes >$130k/yr. Think that's all just Facebook and Apple and silicon valley VC funded startup software devs? Similar story in Ohio, is that a place loaded with unicorn software startups? Those median salaries in those markets probably cost their employer around $20k/mo.
https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...
https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...
median salary for a japanese dev is ~$60k. same range for europe (swiss at ~100k, italy at ~30k for the extremes). then you go down.
Russia ~$37,000 Brazil ~$31,500 Nigeria ~$6,000 Morocco ~$11,800 Indonesia ~$13,500 and india ~$30k usd
(asked chatgpt for these numbers down there, JP and EU numbers are mostly correct though as I have first hand experience).
I imagine a lot of people saw $20k/mo and thought the salary clearly had to be $200k+.
In the end, I was able to rescue the code part, rebuilding a 3 month long 10 person project in 2 weeks, with another 2 weeks to implement a follow-up series of requirements. The sheer amount of discussion and code creation would have been impossible without AI, and I used the full limits I was afforded.
So to answer your question, I got my money's worth in that specific use case. That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.
* Those who work in enterprise know intuitively what happened next.
It still double downs on non working solutions
jbellis•6h ago
So maybe Anthropic setting this precedent will solve my problem!
ghuntley•6h ago
ps - catchup for social zoom beers?
jbellis•6h ago
i pinged what i think is the right ghuntley on linkedin, rizzler looks like the next feature i'm building for brokk :)
ghuntley•6h ago
owebmaster•6h ago
https://news.ycombinator.com/newsguidelines.html
infecto•3h ago