frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

A Formal Analysis of Apple's iMessage PQ3 Protocol [pdf]

https://www.usenix.org/system/files/conference/usenixsecurity25/sec25cycle1-prepub-595-linker.pdf
69•luu•2h ago•32 comments

Void: Open-source Cursor alternative

https://github.com/voideditor/void
639•sharjeelsayed•12h ago•269 comments

eBPF Mystery: When is IPv4 not IPv4? When it's pretending to be IPv6

https://blog.gripdev.xyz/2025/05/06/ebpf-mystery-when-is-ipv4-not-ipv4-when-its-ipv6/
16•tanelpoder•1h ago•0 comments

Starlink User Terminal Teardown

https://www.darknavy.org/blog/a_first_glimpse_of_the_starlink_user_ternimal/
47•walterbell•1h ago•3 comments

Hill or High Water

https://royalsociety.org/blog/2025/05/hill-or-high-water/
12•benbreen•1h ago•0 comments

Reservoir Sampling

https://samwho.dev/reservoir-sampling/
330•chrisdemarco•11h ago•71 comments

Fui: C library for interacting with the framebuffer in a TTY context

https://github.com/martinfama/fui
84•Bhulapi•6h ago•27 comments

A flat pricing subscription for Claude Code

https://support.anthropic.com/en/articles/11145838-using-claude-code-with-your-max-plan
113•namukang•7h ago•102 comments

Dead Reckoning

https://www.damninteresting.com/dead-reckoning/
5•repost_bot•1h ago•1 comments

USENIX to sunset USENIX ATC

https://www.usenix.org/blog/usenix-atc-announcement
14•eatbitseveryday•1h ago•1 comments

Finding a Bug in Chromium

https://bou.ke/blog/chromium-bug/
27•bouk•3d ago•1 comments

Progress toward fusion energy gain as measured against the Lawson criteria

https://www.fusionenergybase.com/articles/continuing-progress-toward-fusion-energy-breakeven-and-gain-as-measured-against-the-lawson-criteria
178•sam•13h ago•81 comments

Podfox: First Container-Aware Browser

https://val.packett.cool/blog/podfox/
50•pierremenard•6h ago•4 comments

From: Steve Jobs. "Great idea, thank you."

https://blog.hayman.net/2025/05/06/from-steve-jobs-great-idea.html
803•mattl•10h ago•220 comments

When Abandoned Mines Collapse

https://practical.engineering/blog/2025/5/6/when-abandoned-mines-collapse
158•impish9208•2d ago•44 comments

Phoenician culture spread mainly through cultural exchange

https://www.mpg.de/24574685/0422-evan-phoenician-culture-spread-mainly-through-cultural-exchange-150495-x
57•gmays•3d ago•18 comments

For better or for worse, the overload (2024)

https://consteval.ca/2024/07/25/overload/
4•HeliumHydride•1h ago•0 comments

Gorilla study reveals complex pros and cons of friendship

https://www.sciencedaily.com/releases/2025/05/250505170816.htm
40•lentoutcry•2d ago•28 comments

Cogentcore: Open-source framework for building multi-platform apps with Go

https://github.com/cogentcore/core
15•kristianp•4h ago•4 comments

Show HN: Using eBPF to see through encryption without a proxy

https://github.com/qpoint-io/qtap
225•tylerflint•12h ago•71 comments

Stability by Design

https://potetm.com/devtalk/stability-by-design.html
83•potetm•9h ago•22 comments

First American pope elected and will be known as Pope Leo XIV

https://www.cnn.com/world/live-news/new-pope-conclave-day-two-05-08-25
498•saikatsg•12h ago•771 comments

How to start a school with your friends

https://prigoose.substack.com/p/how-to-start-a-university
88•geverett•9h ago•39 comments

Prepare your apps for Google Play's 16 KB page size compatibility requirement

https://android-developers.googleblog.com/2025/05/prepare-play-apps-for-devices-with-16kb-page-size.html
40•ingve•7h ago•23 comments

Block Diffusion: Interpolating Autoregressive and Diffusion Language Models

https://m-arriola.com/bd3lms/
49•t55•10h ago•11 comments

Static as a Server

https://overreacted.io/static-as-a-server/
86•danabramov•11h ago•61 comments

Ciro (YC S22) is hiring a software engineer to build AI agents for sales

https://www.ycombinator.com/companies/ciro/jobs
1•dwiner•11h ago

Product Purgatory: When they love it but still don't buy

https://longform.asmartbear.com/purgatory/
26•doppp•3d ago•3 comments

How the US built 5k ships in WWII

https://www.construction-physics.com/p/how-the-us-built-5000-ships-in-wwii
81•rbanffy•8h ago•57 comments

How Obama’s BlackBerry got secured (2013)

https://www.electrospaces.net/2013/04/how-obamas-blackberry-got-secured.html
208•lastdong•3d ago•77 comments
Open in hackernews

A flat pricing subscription for Claude Code

https://support.anthropic.com/en/articles/11145838-using-claude-code-with-your-max-plan
112•namukang•7h ago

Comments

jbellis•6h ago
One of my problems developing Brokk (AI coding for large codebases, https://brokk.ai) is that everyone is used to Cursor-style pricing of $20ish a month, but Brokk is designed around long-form prompts, it's a lot closer to a leash for Claude Code than it is to a Cursor or a Copilot, and the bill is a lot closer to CC too. (But of course Brokk is vendor neutral; you can absolutely mix o3 with GP2.5 with S3.7.)

So maybe Anthropic setting this precedent will solve my problem!

ghuntley•6h ago
nah, there are low powered tools and there’s high powered tools. If people want $20/month happy meal toys in business that business will get left behind. Ignore the consumer market, make Bugatti’s instead - https://ghuntley.com/redlining

ps - catchup for social zoom beers?

jbellis•6h ago
you're right about the context limits

i pinged what i think is the right ghuntley on linkedin, rizzler looks like the next feature i'm building for brokk :)

ghuntley•6h ago
This is me - speak soon. https://www.linkedin.com/in/geoffreyhuntley
owebmaster•6h ago
> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.

https://news.ycombinator.com/newsguidelines.html

infecto•3h ago
Agree with the other commenter. I have seen at least 2 other posts today where you plug your project.
ghuntley•6h ago
the new Claude code “max plan” would last me all of [1] 5mins… I don’t get why people are excited about this. High powered tools aren’t cheap and aren’t for the consumer…

[1] https://www.youtube.com/live/khr-cIc7zjc?si=oI9Fj33JBeDlQEYG

iLoveOncall•6h ago
If that's the case you should stop using it, because there's no way you see any ROI when you spend that much to just do some coding stuff.

It would be cheaper to your company to literally pay your salary while you do nothing.

postalrat•6h ago
I'd love to see your math.
pclmulqdq•5h ago
It's pretty simple: that usage in 5 min is probably at least $10 worth of API credits in that time (maybe $100).

A year has 2000 working hours, which is 24000 5-minute intervals. That means the company spending at least $240,000 on the Claude API (conservatively). So they would be better off having $100-200k you do nothing and hiring someone competent for that $240k.

F7F7F7•5h ago
Claude Max is less than a 1/2 percentage point of a Jr. Devs average salary. If you can't make that work then....
justanotheratom•6h ago
I am sure this is worth every dime, but my workflow is so used to Cursor now (cursor rules, model choice, tab complete, to be specific), that I can't be bothered to try this out.
s17n•6h ago
If you're using Cursor with Claude it's gonna be pretty much the same thing. Personally I use Claude Code because I hate the Cursor interface but if you like it I don't think you're missing much.
justanotheratom•6h ago
I don't enjoy the interface as such, rather the workflows that it enables.
tkzed49•6h ago
The problem is that this is $100/mo with limits. At work I use Cursor, which is pretty good (especially tab completion), and at home I use Copilot in vscode insiders build, which is catching up to Cursor IMO.

However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.

tkzed49•6h ago
I'll add on to this: I don't really use agent modes a lot. In an existing codebase, they waste a lot of my time for mixed results. Maybe Claude Code is so much better at this that it enables a different paradigm of AI editing—but I'd need easy, cheap access to try it.
koakuma-chan•6h ago
> but I'd need easy, cheap access to try it.

You can try it for cheap with the normal pay-as-you-go way.

warp•6h ago
You don't need a max subscription to use Claude Code. By default it uses your API credits, and I guess I'm not a heavy AI user yet (for my hobby projects), but I haven't spent more than $5/month on Claude Code the past few months.
EnPissant•6h ago
I spent $5 in 10 minutes when I tried it.
christophilus•5h ago
For me, it was $10 in 2 hours. That’s super cheap if it saves me significant time. Jury’s out on that, though.
koakuma-chan•6h ago
The problem with it is that it uses a 30k~ token system prompt (albeit "cached"), and very quickly the usage goes up to a few million. I can easily spend over $10 a day.
F7F7F7•5h ago
I burned $30 in Claude Code in just under an hour. I was equally frustrated and impressed. So much so I ended up a $200 MAX subscriber.
hombre_fatal•5h ago
The money starts adding up fast as your context fills up since it's resending the whole accumulated context back through the api every time.

They're good about telling you how full your context is, and you can use /compact to shrink it down to the essentials.

But for those of us who aren't Mr. MoneyBags like you all, keeping an eye on context size is key to keeping costs low.

logankeenan•4h ago
I’ve been wanting to try Claude Code. What makes it such a difference maker compared to existing AI tools?
ramoz•5h ago
Doesn’t resonate with me because I’ve spent over $1,000 on Claude Code at this point and the return is worth it. The spend feels cheap compared to output.

In contrast - I’m not interested in using cheaper, less-than, services for my livelihood.

tkzed49•5h ago
hey, I'm open to that possibility. Maybe I'll grab $5 in API credit and give it a shot (for 5 minutes or a week depending on who you ask)
keerthiko•5h ago
i got $100 of credit at the start of the year, and have been using +1$ each month, starting at $2 in january using aider at the time. just switched to claude code this week, since it follows a similar UX. agentic CLI code assist really has been growing in usefulness for me as i get faster at reviewing its output.

i use it for very targeted operations where it saves me several roundtrips to code examples and documentation and stack overflow, not spamming it for every task i need to do, i spend about $1/day of focused feature development, and it feels like it saves me about 50% as many hours as i spend coding while using it.

stavros•4h ago
What do you prefer, between Aider and CC? I use Aider for when I want to vibe code (I just give the LLM a high-level description and then don't check the output, because it's so long), and Cursor when I want to AI code (I tell the AI to do low-level stuff and check every one of the five lines it gives me).

AI coding saves me a lot of time writing high-quality code, as it takes care of the boilerplate and documentation/API lookups, while I still review every line, and vibe coding lets me quickly do small stuff I couldn't do before (e.g. write a whole app in React Native), but gets really brittle after a certain (small) codebase size.

I'm interested to hear whether Claude Code writes less brittle code, or how you use it/what your experience with it is.

satvikpendem•4h ago
> the return is worth it

I'm curious, what was the return? What did you do with the 1k?

edoceo•3h ago
Produce working code faster => ship faster => paid faster? That's the valu-prop right? So, naturally the $JOB will cover the bill.
bdangubic•3h ago
so you didn’t spend a penny? :)
ramoz•20m ago
something like that. Think "paid more" as well
hcnews•4h ago
Could you anonymize and share your last 5-10 prompts? Just wanna understand how people are using Claude Code.
winrid•3h ago
"Ensure all our crons publish to telegraf when they start and finish. Include the cron name and tenant id when applicable. For crons that query batch jobs, only publish and take a lock when there is work to do. look at <snip> as an example. Here is the complete list to migrate. Create a todo list and continue until done. <insert list of 40 file paths>"

(updated for better example)

winrid•1h ago
The thing I forgot is the command for it to get the next set of files to process. Otherwise it will migrate 30% of them and say "look dad, I'm done!"
csomar•53m ago
I used it yesterday to convert a website from tailwind v1 to v4. Gave it the files (html/scss/js), links to tailwind and it did the job. Needed some back and forth and some manual stuff but overall it was painless.

It is not a challenging technical thing to do. I could have sat there for hours reading the conversion from v1 to v2 to v3 to v4. It is mostly just changing class names. But these changes are hard to do with %s/x/x, so you need to do them manually. One by One. For hundreds of classes. I could have as easily shot myself in the head.

> Could you anonymize and share your last 5-10 prompts?

The prompt was a simple "convert this site from tailwind v1 to v4". I use neovim copilot chat to inject context and load URLs. I have found that prompts have no value, it is either something the LLM can do or not.

ramoz•51m ago
These aren't that fun but sure.

- https://gist.github.com/backnotprop/ca49f356bdd2ab7bb7a366ef...

- https://gist.github.com/backnotprop/d9f1d9f9b4379d6551ba967c...

- https://gist.github.com/backnotprop/e74b5b0f714e0429750ef6b0...

- https://gist.github.com/backnotprop/91f1a08d9c27698310d63e06...

- https://gist.github.com/backnotprop/7f7cb63aceb7560e51c02a9d...

- https://gist.github.com/backnotprop/94080dde34bfca3dd9c48f14...

- https://gist.github.com/backnotprop/ea3a5c3a31799236115abc76...

Taken from 2 recent systems. 90% of my interaction is assurance, debugging, and then having claude manage the meta management framework. We work hard to set the path for actual coding - thus code output (even complex or highly integrated) usually ends up being fairly smooth+fast.

smartbit•37m ago
Interesting. Thanks.

Could you explain why there is no interpunction?

ramoz•28m ago
Ah yea sorry that is an export error... I copied prompts directly out of Claude Code and when I do that it copies all of the ascii/tui parts that wrap the message... I used some random "strip special chars" site to remove those and was lazy about adding actual punctuation back in.
ramoz•26m ago
worth noting that some of the prompts are related to the project context management system i use: (obfuscated business details) https://gist.github.com/backnotprop/4a07a7e8fdd76cbe054761b9...
dahcryn•5h ago
Have I got bad news for you.... Microsoft announced imposing limits on "premium" models from next week. You get 300 "free" requests a month. If you use agent, you consume about 3-4 requests per action easily, I estimate to burn through 300 in about 3-5 working days.

Basically anything that isnt gpt4o is premium, and I find gpt4o near useless compared to Claude and Gemini in copilot.

maven29•5h ago
Enforcement of Copilot premium request limits moved to June 4, 2025 https://github.blog/changelog/2025-05-07-enforcement-of-copi...
pier25•5h ago
> I find gpt4o near useless compared to Claude and Gemini in copilot.

It's a hit and miss IMO.

I like it for C#/dotnet but completely useless for the rest of the stuff I do (mostly web frontend).

I'm not sure about my usage but if I hit those premium limits I'm probably going to cancel Copilot.

debian3•4h ago
The default unlimited model is now gpt 4.1 https://github.blog/changelog/2025-05-08-openai-gpt-4-1-is-n...
richardw•5h ago
Whoever is paying for your time should calculate how much time you’d save between the different products. The actual product price comparison isn’t as important as the impact on output quality and time taken. Could be $1000 a month and still pay for itself in a day, if it generated >$1000 extra value.

This might mean the $10/month is the best. Depends entirely on how it works for you.

(Caps obviously impact the total benefit so I agree there.)

zeroq•4h ago
Just today I had yet another conversation about how BigCo doesn't give a damn about cost.

Just to give you one example - last BigCo I worked for had a schematic for new projects which resulted in... 2k EUR per month cloud cost for serving a single static html file.

At one point someone up top decided that kubes is the way to go and scrambled an impromptu schematic for new projects which could be simply described as a continental class dreadnought of a kubernetes cluster on AWS.

And it was signed off, and later followed like a scripture.

Couple stories lower we're having hard time arguing for 50 EUR budget for a weekly beer for the team, but the company is A fine with paying 2K EUR for a landing page.

Aurornis•4h ago
> The problem is that this is $100/mo with limits

Limits are a given on any plan. It would be too easy for a vibe coder to hammer away 8 hours a day for 20 days a week if there was nothing stopping them.

The real question is whether this is a better value than pay as you go for some people.

htrp•4h ago
> 8 hours a day for 20 days a week

Your vibe coders are on a different dimension than mine.

selcuka•2h ago
> I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful

I don't think this is the right way to look at it. If CoPilot helps you earn an extra $100 a month (or saves you $100 worth of time), and this one is ~2x better, it still justifies the $100 price tag.

rafaelmn•2h ago
At 10-20$ a month that calculation is trivial to make. At a 100$ I'm honestly not getting that much value out of AI, especially not every month, and especially not compared to cheaper versions.
edmundsauto•1h ago
I think this thinking is flawed. First, it presupposes a linear value/cost relationship. That is not always true - a bag that costs 100x as much is not 100x more useful.

Additionally, when you’re in a compact distribution, being 5% better might be 100x more valuable to you.

Basically, this assumes that the marginal value is associated with cost. I don’t think most things, economically, seem to match that pattern. I will sometimes pay 10x the cost for a good meal that has fewer calories (nutritional value)

I am glad people like you exist, but I don’t think the proposition you suggest makes sense.

I_am_tiberius•6h ago
Do you still need a phone number to register with claude?
abetaha•6h ago
I wonder how successful this pricing model ($100-$200 a month with limits) is going to be. It is very hard to justify, when other tooling in the ~$20/month range offers unlimited usage, and comparable quality.
jsheard•5h ago
Is any of the ~$20/month with unlimited usage tooling actually profitable though? It goes without saying that if all else is equal then the product sold at a greater loss will be more popular, but that only works until the vendor runs out of money to light on fire.
turnsout•4h ago
Cursor keeps raising money… I for one personally enjoy burning all those VC dollars. Consider it a very tiny version of wealth redistribution.
slrainka•6h ago
Agent mode without rails is like a boat without a rudder.

What worked for me was coming up with an extremely opinionated way to develop an application and then generating instructions (mini milestones) by combining it with the requirements.

These instructions end up being very explicit in the sequence of things it should do (write the tests first), how the code should be written and where to place it etc. So the output ended up being very similar regardless of the coding agent being used.

F7F7F7•5h ago
I've tried every variation of this very thing. Even managed to build a quick and dirty ticketing system that I could assign to the LLM of my choosing. WITH context. Talking Graph Codebase's diagrams, mappings, tree structure of every possibility, simple documentation, complex documentation, a bunch of OSS to do this very thing automatically etcetcetc.

In the codebase I've tried modularity via monorepo, or faux microservices with local apis, monoliths filled with hooks and all the other centralized tricks in the book. Down to the very very simple. Whatever I could do to bring down the context window needed.

Eventually.....your return diminish. And any time you saved is gone.

And by the time you've burned up a context window and you're ready to get out. Now you're expeciting it to output a concise artifact to carry you to the next chat so you don't have to spend more context getting that thread up to speed.

Inevitably the context window and the LLMs eagerness to touch shit that it's not supposed (the likelihood of which increases with context) always gets in the way.

Anything with any kind of complexity ends up in a game of too much bloat or the LLM removing pieces that kill other pieces that it wasn't aware about.

/VENT

slrainka•5h ago
So, relying on a large context can be tricky. Instead I’ve tried to get to a ER model quickly. And from there build modules that don’t have tight dependencies.

Using Gemini 2.5 for generating instructions

This is the guide I use

https://github.com/bluedevilx/ai-driven-development/blob/mai...

energy123•16m ago
How many tokens (across whole codebase) did it take for diminishing returns to kick in? What does the productivity vs token plot look like?
cye131•6h ago
I'm curious whether anyone's actually using Claude code successfully. I tried it on release and found it negative value for tasks other than spinning up generic web projects. For existing codebases of even a moderate size, it burns through cash to write code that is always slightly wrong and requires more tuning than writing it myself.
ramoz•6h ago
Yes. For small apps, as well distributed systems.

You have to puppeteer it and build a meta context/tasking management system. I spend a lot of time setting Claude code up for success. I usually start with Gemini for creating context, development plans, and project tasking outlines (I can feed large portions of codebase to Gemini and rely on its strategy). I’ve even put entire library docsites in my repos for Claude code to use - but today they announced web search.

They also have todos built in which make the above even more powerful.

The end result is insane productivity - I think the only metric I have is something like 15-20k lines of code for a recent distributed processing system from scratch over 5 days.

broof•5h ago
Can you share more about what you mean by a meta context/tasking management system? I’m always curious when I see people who have happily spent large amounts on api tokens.
conception•39m ago
So i use roo and you have the architecture mode draft out in as much detail as you want, plans, tech stack choices, todos, etc. Switch to orchestration mode to execute the plan, including verifying things are done correctly. It sub tasks out the todos. Tell it to not bother you unless it has a question. Cone back in thirty and see how it’s doing. You can have it commit to a branch per task if you want. Etc etc.
ramoz•38m ago
Here is some insight... I had gemini obfuscate my business context so if something sounds weird it is probably because of that.

https://gist.github.com/backnotprop/4a07a7e8fdd76cbe054761b9...

The framework is basically the instructions and my general guidance for updating and ensuring the details of critical information get injected into context. some of those prompts I commented here: https://news.ycombinator.com/item?id=43932858

meesles•5h ago
Is that final number really that crazy? With a well defined goal, you can put out 5-8K per day by writing code the old fashioned way. Also would love to see the code, since in my experience (I use Cursor as a daily driver), AI bloats code by 50% or more with unnecessary comments and whitespace especially when making full classes/files.

> I spend a lot of time setting Claude code up for success.

Normally I wouldn't post this because it's not constructive, but this piece stuck out to me and had me wondering if it's worth the trade-off. Not to mention programmers have spent decades fighting against LoC as a metric, so let's not start using it now!

dttze•4h ago
You'll never see the code. They will just say how amazingly awesome it is, how it will fundamentally alter how coding is done, etc... and then nothing. Then if you look into who posts it, they work in some AI related startup and aren't even a coder.
ramoz•8m ago
Not open source but depending on certain context i can show whoever. im not hard to find.

Ive done just about everything across the full & distributed stack. So I'm down to jam on my code/systems and how I instruct & rely on (confidently) AI to help build them.

maccard•4h ago
5k likes of code a day is 10 lines of code a minute solidly for 8 hours straight. Whatever way you cut that with white space, bracket alignment, that’s a pretty serious amount of code to chunk out.
coverj•4h ago
are people really committing 5k lines a day without AI assistance even once a month?

I don't think I've ever done this or worked with anyone who had this type of output.

bcrosby95•3h ago
It depends upon how well mapped out the problem is in your head. If it's an unfamiliar domain, no way.
infecto•3h ago
Nobody is writing 5k consistently on a daily basis. Sure if it’s a bunch of boiler scaffolding maybe.

I daily drive cursor and I have rules to limit comments. I get comments on complex lines and that’s it.

_bin_•4h ago
I'd be really interested in seeing the source for this, if it's an open-source project, along with the prompts and some examples. Or other source/prompt examples you know of.

A lot of people seem to have these magic incantations that somehow make LLMs work really well, at the level marketing and investor hype says they do. However, I rarely see that in the real world. I'm not saying this is true for you, but absent vaguely replicable examples that aren't just basic webshit, I find it super hard to believe they're actually this capable.

ramoz•46m ago
Not open source but depending on certain context i can show you. im not hard to find.
senordevnyc•14m ago
Aider writes 70-80% of its own code: https://aider.chat/HISTORY.html
NitpickLawyer•8m ago
While not directly what you're asking for, I find this link extremely fascinating - https://aider.chat/HISTORY.html

For context, this is aider tracking aider's code written by an LLM. Of course there's still a human in the loop, but the stats look really cool. It's the first time I've seen such a product work on itself and tracking the results.

ed•5h ago
Yes. It costs me a few bucks per feature, which is an absolute no-brainer.

If you don't like what it suggests, undo the changes, tweak your prompt and start over. Don't chat with it to fix problems. It gets confused.

thegeomaster•5h ago
Absolutely stellar for 0-to-1-oriented frontend-related tasks, less so but still quite useful for isolated features in backends. For larger changes or smaller changes in large/more interconnected codebases, refactors, test-run-fix-loops, and similar, it has mostly provided negative value for me unfortunately. I keep wondering if it's a me problem. It would probably do much better if I wrote very lengthy prompts to micromanage little details, but I've found that to be a surprisingly draining activity, so I prefer to give it a shot with a more generic prompt and either let it run or give up, depending on which direction it takes.
singhrac•4h ago
Here's a very small piece of I code I generated quickly (i.e. <5 min) for a small task (I generated some data and wanted to check the best way to compress it):

https://gist.github.com/rachtsingh/e3d2e2b495d631b736d24b56e...

Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.

I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).

Implicated•2h ago
Just to throw my experience in, it's been _wildly_ effective.

Example;

I'm wrapping up, right now, an updated fork of the PHP extension `phpredis` because Redis 8 recently was released with support for a new data type, Vector Set but the phpredis extension (which is far more performant that non-extension redis libraries for PHP) doesn't support the new vector-related commands. I forked the extension repo, which is in C (I'm a PHP developer, I had to install CLion for the first time just to work along with CC) and fired up claude code with the initial prompt/task of analyzing the extensions code and documenting the purpose, conventions, and anything that it (claude) felt would benefit the bootstrapping process of future sessions such that whole files wouldn't need to be read into a CLAUDE.md file.

This initially, depending on the size of the codebase, could be "expensive". Being that this is merely a PHP extension and isn't a huge codebase, I was fine letting it just rip through the whole thing however it saw fit - were this a larger codebase I'd take a more measured approach to this initial "indexing" of the codebase.

This results in a file that claude uses like we do a readme.

Next I end this session, start a new one and tell it to review that CLAUDE.md file (I specifically tell it to do this, every single new session start moving forward) and then generate a general overview/plan of what needs to be done in order to implement the new Vector Set related commands so that I can use this custom phpredis extension in my PHP environments. I indicated that I wanted to generate a suite of tests focused on ensuring each command works with all of it's various required and optional parameters and that I wanted to use docker containers for the testing rather than mess up my local dev environment.

$22 in API costs and ~6 hours spent and I have the extension, working, in my local environment with support for all of the commands I want/need to use. (there's still 5 commands that I don't intend to use that I haven't implemented)

Not only would I have certainly never embarked upon trying to extend a C PHP extension, I wouldn't have done so over the course of an evening and morning.

Another example:

Before this redis vector sets thing I used CC to build a python image and text embedding pipeline backed by Redis streams and Celery that consumes tasks pushed to the stream by my Laravel application that currently manages ~120 million unique strings and ~65 million unique images that I've been generating embeddings for. Prior to this I'd spent very little time with Python and zero with anything related to ML. Now I have a performant python service that's portable that I run from my Macbook (M2 Pro) or various GPU-having Windows machines in my home that generate the embeddings on an 'as available' basis, pushing the results back to a redis stream that my Laravel app then consumes and processes.

The results of these embeddings and the similarity-related features that they've brought to the Laravel application are honestly staggering. And while I'm sure I could have spent months stumbling through all of this on my own - I wouldn't have, I don't have that much time for side project curiosities.

Somewhat related - these similarity features have directly resulted in this side project becoming a service people now pay me to use.

On a day to do - the effectiveness is a learned skill. You really need to learn how to work with it in the same way you, as a layperson, wouldn't stroll up to a highly specialized piece of aviation technology and just infer how to use it optimally. I hate to keep parroting "skill issue" but - it's just wild to me how effective these tools are and how there's so many people who don't seem to be able to find any use.

If it's burning through cash, you're not being focused enough with it. If it's writing code that's always slightly wrong, stop it and make adjustments. Those adjustments likely/potentially need to be documented in something like I described above in a long-running document used similarly to a prompt.

From my own experience, I watch the "/settings/logs" route on anthropics website while CC is working once I know that we're getting rather heavy with the context. Once it gets into the 50-60,000 tokens range I either aim to wrap up whatever the current task is, or I understand that things are going to start getting a little wonky into the 80k+ range. It'll keep on working up into the 120-140k tokens or more - but you're likely going to end up with lots of "dumb" stuff happening. You really don't want to be here unless you're _sooooo close_ to getting done what you're trying to. When the context gets too high and you need/want to reset but you're mid task - /compact [add notes here about next steps] and it'll generate a summary that will then be used to bootstrap the next session. (Don't do this more than once, really, as it starts losing a lot of context - just reset the session fully after the first /compact)

If you're constantly running into huge contexts you're not being focused enough. If you can't even work on anything without reading files with thousands of lines - either break up those files somehow or you're going to have to be _really_ specific with the initial prompt and context - which I've done lots of. Say I have a model that belongs to a 10+ year old project that is 6000 lines long and I want to work on a specific method in that model - I'll just tell claude in the initial message/prompt which line that method starts on, ends on and what number of lines from the start of the model it should read (so it can get the namespace, class name, properties, etc) and then let it do it's thing. I'll tell it specifically not to read more than 50 lines of that file at a time when looking for something or reviewing something, or even to stop and ask me to locate a method/usages of things, etc rather than reading whole files into context.

So, again, if it's burning through money - focus your efforts. If you think you can just fire it up and give it a generic task - you're going to burn money and get either complete junk, or something that might technically work but is hideous, at least to you. But, if you're disciplined and try to set or create boundaries and systems that it can adhere to - it does, for the most part.

light_hue_1•5h ago
This isn't flat pricing. It's exactly the same API credits but you prepay for the month and lose anything you don't use.

Whether it turns out to be cheaper depends on your usage.

I thought Claude Code was absurdly expensive and not at all more capable than something like chatgpt combined with copilot.

esha_manideep•5h ago
Claude's limits are so vague - its not clear if buying Claude Max is cheaper than just using the API. Has anyone benchmarked this?
varispeed•4h ago
Sounds like they are great fans of Numberwang.
dham•5h ago
Both Anthropic and OpenAI don't have Linux desktop clients (to use MCP), so yea I'll skip.
turnsout•4h ago
Claude Code runs in the terminal
varispeed•4h ago
They could make $1000 a month version that runs on tape.
pier25•5h ago
$200/month?

Do people really get that much value from these tools?

I use Github's Copilot for $10 and I'm somewhat happy for what I get... but paying 10x or 20x that just seems insane.

lkbm•4h ago
If your employer spends $20k a month on you (salary + everything else), $200 a month breaks even at around a 1% boost in productivity.
pier25•4h ago
Maybe if you're working in FAANG...
0x6c6f6c•4h ago
Lots of jobs where employers pay that much per head, not just FAANG. Honestly FAANG is probably spending double that for senior+ level engineers.
conradkay•4h ago
Average US salary for SWE is 10-12k/month. Fully-loaded cost (what they spend) is 1.5-2x salary so that's not an unrealistic number
pier25•3h ago
So you're arguing about the top 10-20% earners in the US?

Also the world is much bigger than the US.

vel0city•3h ago
The point is you don't have to have FAANG salaries to hit $20k/mo in cost to your employer.

Tons of software developer jobs in the US for non-FAANG tier or unicorn startup companies are >$100k and easily hit $120-150k.

Also the fourth quintile mean was like $120k in the US in 2022. So you'd be in the top 30% of earners making that kind of money, not the top 10%.

https://taxpolicycenter.org/statistics/household-income-quin...

pier25•1h ago
> unicorn startup companies are >$100k and easily hit $120-150k.

So still way below than $240k, no?

> So you'd be in the top 30% of earners making that kind of money, not the top 10%.

Maybe you missed it but I actually wrote "10-20%".

Also in 2024 earning $100k puts you in the top 20% of the US population.

https://dqydj.com/salary-percentile-calculator/

(which is already way above even the EU for dev salaries)

vel0city•1h ago
You dropped off the "non" part of that. It's the non-Unicorn software companies easily paying $120k for a seasoned software developer in the US.

Also, I noticed where our sources diverged. I was looking at household income. My bad.

> which is already way above even the EU for dev salaries

Maybe they're underpaid.

Either way, I was responding to the idea that only a FAANG salary would cost an employer $20k/mo. For US software developer jobs, it can easily hit that without being in FAANG-tier or unicorn startup level companies. Tons of mid-sized low-key software companies you've never heard of pay $120k+ for software devs in the US.

The median software developer in Texas makes >$130k/yr. Think that's all just Facebook and Apple and silicon valley VC funded startup software devs? Similar story in Ohio, is that a place loaded with unicorn software startups? Those median salaries in those markets probably cost their employer around $20k/mo.

https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...

https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...

drodgers•1h ago
Yes, this product mostly only targets the top 20% of US earners. That's a lot of people, and a lot of HN readers especially.
vel0city•3h ago
If you get an employer match on 401k/HSA, the employer pays full healthcare premium, employer sponsored life insurance benefits, unemployment insurance, employer covered disability, payroll taxes, and all the other software costs, it wouldn't even take $200k in salary to cost $20k/mo. Someone could be making like $150k and still cost the company that much.
lucyjojo•3h ago
gentle reminder that the majority of developers do not live in the united states.

median salary for a japanese dev is ~$60k. same range for europe (swiss at ~100k, italy at ~30k for the extremes). then you go down.

Russia ~$37,000 Brazil ~$31,500 Nigeria ~$6,000 Morocco ~$11,800 Indonesia ~$13,500 and india ~$30k usd

(asked chatgpt for these numbers down there, JP and EU numbers are mostly correct though as I have first hand experience).

vel0city•3h ago
Sure, but ~$150k isn't exactly FAANG US salaries for an experienced software dev. That's my point. Lots of people forget how much extra many employers pay for a salaried employee on top of just the take home salary. Labor is expensive in the US.

I imagine a lot of people saw $20k/mo and thought the salary clearly had to be $200k+.

jbm•1h ago
To rescue a flailing project that I took over when a senior hire ghosted a customer in the middle of a project, I got the 200$ Pro package from OpenAI (which is much less usable than Claude for our purposes; there were other benefits related to my client's relationship w/ OpenAI)

In the end, I was able to rescue the code part, rebuilding a 3 month long 10 person project in 2 weeks, with another 2 weeks to implement a follow-up series of requirements. The sheer amount of discussion and code creation would have been impossible without AI, and I used the full limits I was afforded.

So to answer your question, I got my money's worth in that specific use case. That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.

* Those who work in enterprise know intuitively what happened next.

999900000999•4h ago
Worth it, but I’m chilling until the next major model release.

It still double downs on non working solutions

oidar•2h ago
It would really be helpful if Anthropic let user know the useage limits and what has been used and what is left instead of these vague X5 X20 vs Pro.
bionhoward•2h ago
Yes, let’s all agree not to compete with something that competes with us. Galaxy brain