How Anthropic teams use Claude Code

https://www.anthropic.com/news/how-anthropic-teams-use-claude-code

259•yurivish•23h ago

Comments

buggy6257•23h ago

@dang looks like HN ate the leading “How” in the title.

kevmo314•23h ago

Yeah, I was about to say, I sure hope they do.

laborcontract•22h ago

fyi @dang doesn't actually flag dang. They've always advised emailing hn@ycombinator.com.

OJFord•22h ago

It's intentional. Imagine a blog post 'How I whizzed foobars to optimise my Home Assistant garbage collection' - it's unnecessary filler.

I agree it has times like this it doesn't work and makes the title worse though. The submitter can edit it, and this reformatting (and others) are not applied again.

NotMichaelBay•11h ago

why use many word, etc

ants_everywhere•23h ago

I've been trying Claude Code for a few weeks after using Gemini Cli.

There's something a little better the tool use loop, which is nice.

But Claude seems a little dumber and is aggressive about "getting things done", often ignoring common sense or explicit instructions or design information.

If I tell it to make a test pass, it will sometimes change my database structure to avoid having to debug the test. At least twice it deleted protobufs from my project and replaced it with JSON because it struggled to immediately debug a proto issue.

jonstewart•22h ago

The hilarious part I’ve found is that when it runs into the least bit of trouble with a step on one of its plans, it will say it has been “Deferred” and then make up an excuse for why that’s acceptable.

It is sometimes acceptable for humans to use judgment and defer work; the machine doesn’t have judgment so it is not acceptable for it to do so.

ants_everywhere•22h ago

Oh yeah totally. It feels a bit deceptive sometimes.

Like just now it says "great the tests are consistently passing!" So I ran the same test command and 4 of the 7 tests are so broken they don't even build.

enobrev•9h ago

I've noticed in the "task complete" summaries, I'll see something like "250/285 tests passing, but the broken tests are out of scope for this change".

My immediate and obvious response is "you broke them!" (at least to myself), but I do appreciate that it's trying to keep focused in some strange way. A simple "commit, fix failing tests" prompt will generally take care of it.

I've been working on my "/implement" command to do a better job of checking that the full test suite is all green before asking if I want to clear the task and merge the feature branch

quintu5•22h ago

My favorite is when you ask Claude to implement two requirements and it implements the first, gets confused by the the second, removes the implementation for the first to “focus” on the second, and then finishes by having implemented nothing.

fragmede•20h ago

After the first time that happened, why would you continue to ask it to do two things at once?

aaronbrethorst•19h ago

The implementation is now enterprise grade with robust security, :rocketship_emoji:

theshrike79•17h ago

This is why you ask it to do one thing at a time.

Then clear the context and move on to the next task. Context pollution is real and can hurt you.

physix•21h ago

Talking about hilarious, we had a Close Encounter of the Hallucinating Kind today. We were having mysterious simultaneous gRPC socket-closed exceptions on the client and server side running in Kubernetes talking to each other through an nginx ingress.

We captured debug logs, described the detailed issue to Gemini 2.5 Flash giving it the nginx logs for the one second before and after an example incident, about 10k log entries.

It came back with a clear verdict, saying

"The smoking gun is here: 2025/07/24 21:39:51 [debug] 32#32: *5902095 rport:443 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.233.100.128, server: grpc-ai-test.not-relevant.org, request: POST /org.not-relevant.cloud.api.grpc.CloudEventsService/startStreaming HTTP/2.0, upstream: grpc://10.233.75.54:50051, host: grpc-ai-test.not-relevant.org"

and gave me a detailed action plan.

I was thinking this is cool, don't need to use my head on this, until I realized that the log entry simply did not exist. It was entirely made up.

(And yes I admit, I should know better than to do lousy prompting on a cheap foundation model)

mattigames•21h ago

"This task seems more appropriate for lesser beings e.g. humans"

stkdump•19h ago

Well I would say that the machine should not override the human input. But if the machine makes up the plans in the first place, then why should it not be allowed to change the plans? I think that the hilarious part in modifying tests to make them work without understanding why they fail is that it probably happens due to training from humans.

animex•22h ago

I just had the same thing happen. Some comprehensive tests were failing, and it decide to write a simple test instead rather than investigate why these more complicated tests were failing. I wonder if the team is trying to save compute by urging it to complete tasks more quickly! Claude seems to be under a compute crunch as often I get API timeouts/errors.

Fade_Dance•22h ago

I even heard that it will aggressively delete your codebase and then lie about it. To your face.

victorbjorklund•15h ago

You are using version control so what is the issue?

Fade_Dance•12h ago

(exactly. It's a sarcastic reference to this story that was going around: https://news.ycombinator.com/item?id=44632575)

adregan•10h ago

I’ve seen Claude code get halfway through a small sized refactor (function parameters changed shape or something like that), say something that looks like frustration at the amount of time it’s taking, revert all of the good changes, and start writing a bash script to automate the whole process.

In that case, you have put a stop to it and point out that it would already be done if it hadn’t decided to blow it all up in an effort to write a one time use codemod. Of course it agrees with that point as it agrees with everything. It’s the epitome of strong opinions loosely held.

maronato•3h ago

Claude trying to cheat its way through tests has been my experience as well. Often it’ll delete or skip them and proudly claim all issues have been fixed. This behavior seems to be intrinsic to it since it happens with both Claude Code and Cursor.

Interestingly, it’s the only LLM I’ve seen behave that way. Others simply acknowledge the failure and, after a few hints, eventually get everything working.

Claude just hopes I won’t notice its tricks. It makes me wonder what else it might try to hide when misalignment has more serious consequences.

chubot•23h ago

I use Claude and like it, but this post has kind of a clunky and stilted style

So I guess the blog team also uses Claude

mepiethree•22h ago

Yeah this is kind of a stunning amount of information to provide but also basically like polished bullet points

vlovich123•21h ago

Must feel that way because that’s probably exactly what it is

jonstewart•22h ago

You’re absolutely right!

AIPedant•21h ago

I don't think the problem is using Claude - in fact some of the writing is quite clumsy and amateurish, suggesting an actual human wrote it. The overall post reads like a collection of survey responses, with no overarching organization, and no filtering of repetitive or empty responses. Nobody was in charge.

kylestanfield•21h ago

The MCP documentation site has the same problem. It’s basically just a list of bullet points without any details

maxnevermind•20h ago

Also started to suspect that, but I have a bigger problem with the content than styling:

> "Instead of remembering complex Kubernetes commands, they ask Claude for the correct syntax, like "how to get all pods or deployment status," and receive the exact commands needed for their infrastructure work."

Duh, you can ask LLM tech questions and stuff. What is the point of putting something like that on the tech blog of the company which supposed to be working on beading edge tech.

politelemon•17h ago

I think this is meant to serve as a bit of an advert/marketing and bit of a signal to investors that look, we're doing things.

LeafItAlone•10h ago

To get more people using it, and more. I’ve encountered people who don’t use it because they think that it isn’t something that will help them, even in tech. Showing how different groups find value in it might get people in those same positions using it.

Even with people who do use it, they might thinking about it narrowly. They use it for code generation, but might not think to use it for simplified man pages.

Of course there are people who are the exact opposite and use it for every last thing they do. And maybe from this they learn how to better approach their prompts.

minimaxir•22h ago

A repeated trend is that Claude Code only gets 70-80% of the way, which is fine and something I wish was emphasized more by people pushing agents.

This bullet point is funny:

> Treat it like a slot machine

> Save your state before letting Claude work, let it run for 30 minutes, then either accept the result or start fresh rather than trying to wrestle with corrections. Starting over often has a higher success rate than trying to fix Claude's mistakes.

That's easy to say when the employee is not personally paying the massive amount of compute running Claude Code for a half-hour.

throwmeaway222•22h ago

Thanks for the tip - we employees should run and re-run the code generation hundreds of times even if the changes are pretty good. That way, the brass will see a huge bill without many actual commits.

Sorry boss, it looks like we need to hire more software engineers since the AI route still isn't mathing.

mdaniel•22h ago

> we employees should run and re-run the code generation hundreds of times

Well, Anthropic sure thinks that you should. Number go up!

drewvlaz•21h ago

One really has to wonder what their actual margins are though, considering the Claude Code plans vs API pricing

wahnfrieden•11h ago

It is accurate though. I even run multiple attempts in parallel. Which is a strategy that can work with human teams too.

gmueckl•22h ago

Data centers are CapEx, employees are OpEx. Building more data centers is cheap. Employees can always supervise more agents...

zer00eyz•21h ago

Data centers are cap ex

Except the power and cooling demands of the current crop of GPU's means you are not fitting full density in a rack. There is a real material increase in fiber use because of your now more distributed equipment. (because 800gbps interconnects are NOT cheap).

You can't capitalize power costs: this is now a non-trivial cost to account for. And the more power you use for compute the more power you have to use for cooling... (Power density is now so high that cooling with something other than air is looking not just attractive but like it is going to be a requirement.)

Meanwhile the cost of lending right now is high compared to recent decades...

The accounting side of things isnt as pretty as one would like it to be.

gmueckl•7h ago

That is a much more grounded reply than my comment deserved. Thanks!

Graziano_M•22h ago

Don’t forget to smash the power looms as well.

aprilthird2021•22h ago

Is it OK to be a Luddite?

https://archive.nytimes.com/www.nytimes.com/books/97/05/18/r...?

godelski•21h ago

Unironically this can actually be a good idea. Instead of "rerunning," run in parallel. Then pick the best solution.

  Pros:
   - Saved Time!
   - Scalable! 
   - Big Bill?

  Cons:
   - Big Bill
   - AI written code

DANmode•20h ago

Have you seen human-written code?

withinboredom•20h ago

At least when you tell a human the indentation is wrong, they can fix it on the first try. Watched an AI agent last night try to fix indentation by using sed for 20 minutes before I just fixed it myself after cringing.

aeontech•19h ago

I mean... this is a deterministic task, can just run it through autoformatter? Why ask AI to do indentation of all things?

theshrike79•17h ago

This is why I have a standing order for all LLMs to run goimports on any file they've edited. It'll fix imports and minor nags without the LLM having to waste context on removing a line there or changing ' to " somewhere.

Even better if you use an LLM with Hook support, just have the hook run formatters on the file after each edit.

jvanderbot•12h ago

One time I explained that I was afraid of tesla full self driving, because while using it my tesla accelerated to 45mph in a parking lot that was parallel to the road and only separated by a curb. The pushback I got was "Why would you use FSD in a parking lot". Well, "Full", right?

Same here. It's either capable of working unsupervised or not. And if not, you have to start wondering what you're even doing if you're at your keyboard, running tools, editing code that you don't like, etc.

We're still working out the edge cases with these "Full" self driving editors. It vastly diminishes the usefulness if it's going to spend 20 minutes (and $) on stupid simple things.

godelski•8h ago

  > We're still working out the edge cases

The difficult part is that like with FSD, it's mostly edge cases

const_cast•4h ago

Driving is just mostly edge cases. I've thought about it a lot, but I think automating driving is much harder than automating even air travel.

Sure the air is 3 dimensions, but driving is too dynamic and volatile. Every single road is different, and you have to rely on heuristics meant for humans.

It's stupid easy for humans to tell what is a yellow line and what a stop sign looks like, but it's not so easy for computers. These are human tools - physical things we look at with our eyes. Not easy to measure. Whereas measurements in the air are quite easy to measure.

On top of the visual heuristics, everthing changes all the time and very fast. You look away from the road and look back and you don't know what you're gonna see. It's why texting and driving is so dangerous.

david38•1h ago

A parking lot is an excellent use of self driving.

First, I want to summon my car. Then, when leaving, if I’m in a dense area with lots of shopping, the roads can be a pain. You have to exit right, immediately get into the left lane, three lanes over, the second of the right turn only lanes, etc

LeafItAlone•11h ago

Consider yourself lucky if you’ve never had a co-worker do something along those lines. At this point seeing a person something like that wouldn’t even phase me.

steve_adams_86•10h ago

With Claude Code you can configure hooks to ensure this is done before results are presented, or just run a linter yourself after accepting changes. If you're using something else, I'd just pull it out and lint it

lonelyasacloud•7h ago

Have seen similar issues with understanding things that are somewhat orthogonal concerns to the main thing that is being worked on.

My guess is that context = main thing + somewhat unrelated thing is too big a space for the models to perform well at this point in time.

The practical solution is to remove the need for the model to figure it out each time and instead explicitly tell it about as much as possible before hand in CLAUDE.md.

david38•1h ago

Why the hell would anyone do this instead of using any one of dozens of purpose written tools that accept configuration files?

They take less than a second to run, can run on every save, and are free

godelski•18h ago

I've taught undergraduates and graduates how to code. I've contributed to Open Source projects. I'm a researcher and write research code with other people who write research code.

You could say I've seen A LOT of poorly written human generated code.

Yet, I still trust it more. Why? Well one of the big reasons is exactly what we're joking about. I can trust a human to iterate. Lack of iteration would be fine if everything was containerized and code operates in an unchanging environment[0]. But in the real world, code needs to be iterated on, constantly. Good code doesn't exist. If it does exist, it doesn't stay good for long.

Another major problem is that AI generates code that optimizes for human preference, not correctness. Even the terrible students who were just doing enough to scrape by weren't trying to mask mistakes[1], but were still optimizing for correctness, even if it was the bare minimum. I can still walk through that code with the human and we can figure out what went wrong. I can ask the human about the code and I can tell a lot by their explanation, even if they make mistakes[2]. I can't trust the AI to tell an accurate account of even its own code because it doesn't actually understand. Even the dumb human has a much larger context window. They can see all the code. They can actually talk to me and try to figure out the intent. They will challenge me if I'm wrong! And for the love of god, I'm going to throw them out if they are just constantly showering me with praise and telling me how much of a genius I am. I don't want to work with someone where I feel like at any moment they're going to start trying to sell me a used car.

There's a lot of reasons, more than I list here. Do I still prompt LLMs and use them while I write code? Of course. Do I trust it to write code? Fuck no. I know it isn't trivial to see that middle ground if all you do is vibe code or hate writing code so much you just want to outsource it, but there's a lot of room here between having some assistant and having AI write code. Like the OP suggests, someone has got to write that 10-20%. That doesn't mean I've saved 80% of my time, I maybe saved 20%. Pareto is a bitch.

[0] Ever hear of "code rot?"

[1] Well... I'd rightfully dock points if they wrote obfuscated code...

[2] A critical skill of an expert in any subject is the ability to identify other experts. https://xkcd.com/451/

thunky•12h ago

> Lack of iteration

What makes you think that agents can't iterate?

> I'm going to throw them out if they are just constantly showering me with praise and telling me how much of a genius I am

You can tell the agent to have the persona of an arrogant ass if you prefer it.

tayo42•11h ago

Llms only work in one direction, they produce the next token only. It can't go back and edit. They would need to be able to back track and edit in place somehow

DANmode•4h ago

"Somehow", like caching multiple layers of context, like all the free tools are now doing?

tayo42•30m ago

That's different then seeing if it's current output made a mistake or not. It's not editing in place. Your just rolling the dice again with a different prompt

thunky•16m ago

No, the session history is all in the prompt, including the LLM's previous responses.

tayo42•13m ago

Appending more context to the existing prompt means it's a different prompt still... The text isn't the same

thunky•21m ago

Loops.

Plus, the entire session/task history goes into every LLM prompt, not just the last message. So for every turn of the loop the LLM has the entire context with everything that previously happened in it, along with added "memories" and instructions.

godelski•8h ago

  > What makes you think that agents can't iterate?

Please RTFA or RTF top most comment in the thread.

Can they? Yes. Will they reliably? If so, why would it be better to restart...

But the real answer to your question: personal experience

thunky•28m ago

> Please RTFA

TFA says:

Engineers use Claude Code for rapid prototyping by enabling "auto-accept mode" (shift+tab) and setting up autonomous loops in which Claude writes code, runs tests, and iterates continuously.

The tool rapidly prototypes features and iterates on ideas without getting bogged down in implementation details

Eggpants•8h ago

I hate to break it to you, but humans wrote the original code that was stolen and used for the training set.

beambot•7h ago

garbage in, garbage out...

a_bonobo•18h ago

This repo has a pattern where the in parallel jobs have different personalities: https://github.com/tokenbender/agent-guides/blob/main/claude...

stillsut•10h ago

Interesting, this repo (which I'm building) is doing the same but instead of just different personalities, I'm giving each agent a different CLI-agents (aider w/ Gemini, claude code, gemini cli, etc). I've got some writeups on here: https://github.com/sutt/agro/blob/master/docs/case-studies/a...

yodsanklai•12h ago

Usually, when you re-run, you change your prompt based on the initial results. You can't just run several tasks in parallel hoping for one of them to complete.

wahnfrieden•11h ago

Why not?

LeafItAlone•11h ago

>You can't just run several tasks in parallel hoping for one of them to complete.

Not only can you, some providers recommend it and their tools provide it, like ChatGPT Codex (the web tool). Can’t find where I read it but I’m pretty sure Anthropic devs said early on that they kick off the same prompt to Claude Code in multiple simultaneous runs.

Personally, I’ve had decent success from this way of working.

yodsanklai•10h ago

Ok, maybe it helps somewhat. My experience is that when the agent fails or produce crappy code, it's not a matter of non-deterministic output of the LLM but rather that the task is just not suitable or the system prompt didn't provide enough information.

lossolo•8h ago

Not always, sometimes just a different internal "seed" can create a different working solution.

bdangubic•22h ago

That's easy to say when the employee is not personally paying the massive amount of compute running Claude Code for a half-hour.

you can do the same for $200/month

tough•22h ago

it has limits too, it lasted like 1-2 weeks without only (for me personally at least)

artvandelai•21h ago

The limits are in 5 hour windows. You'd have to heavily work on 2+ projects in that window to hit the limit using ~500k tokens/min for around 4.5 hours, and even then it'll reset on the next window.

bdangubic•13h ago

with all dues respect you really need to learn the tools you are using which includes any usage limits (which are temporal). I run CC in 4 to 8 terminals my entire workday, every workday…

tomlockwood•21h ago

Yeah sweet what's the burn rate?

jonstewart•22h ago

And just like a slot machine, it seems pretty clear that some people get addicted to it even if it doesn’t make them better off.

threatofrain•22h ago

This is an easy calculation for everyone. Think about whether Claude is giving you the a sufficient boost in performance, and if not... then it's too expensive. No doubt some people are in some combination of domain, legacy, complexity of codebase, etc., where Claude just doesn't cut it.

preommr•22h ago

> A repeated trend is that Claude Code only gets 70-80% of the way, which is fine and something I wish was emphasized more by people pushing agents.

I have been pretty successful at using llms for code generation.

I have a simple rule that something is either 90%>ai or none at all (exluding inline completions, and very obvious text editing).

The model has an inherent understanding of some problems due to it's training data (e.g. setting up a web server with little to no deps in golang), that it can do with almost 100% certainty, where it's really easy to blaze through in a few minutes, and then I can setup the architecture for some very flat code flows. This can genuinely improve my output by 30%-50%

randmeerkat•22h ago

> I have a simple rule that something is either 90%>ai or none at all…

10% is the time it works 100% of the time.

MPSimmons•21h ago

Agree with your experiences. I've also found that if I build a lightweight skeleton of the structure of the program, it does a much better job. Also, ensuring that it does a full fledged planning/non-executing step before starting to change things leads to good results.

I have been using Cline in VSCode, and I've been enjoying it a lot.

benreesman•22h ago

The slot machine thing has a pretty compelling corollary: crank the formal systems rigor up as high as you can.

Vibe coding in Python is seductive but ultimately you end up in a bad place with a big bill to show for it.

Vibe coding in Haskell is a "how much money am I willing to pour in per unit clean, correct, maintainable code" exercise. With GHC cranked up to `-Wall -Werror` and some nasty property tests? Watching Claude Code try to weasel out with a mock goes from infuriating to amusing: bam, unused parameter! Now why would the test suite be demanding that a property holds on an unused parameter...

And Haskell is just an example, TypeScript is in some ways even more powerful in it's type system, so lots of projects have scope to dabble with what I'm calling "hyper modern vibe coding": just start putting a bunch of really nasty fastcheck and generic bounds on stuff and watch Claude Code try to cheat. Your move, Claude Code, I know you want to check off that line on the TODO list like I want to breathe, so what's it gonna be?

I find it usually gives up and does the work you paid for.

kevinventullo•8h ago

Interesting, I wonder if there is a way to quantify the value of this technique. Like give Claude the same task in Haskell vs. Python and see which one converges correctly first.

AzzyHN•22h ago

Not to mention, if an employee could usually write pretty good code but maybe 30% of the time they wrote something so non-functional it had to be entirely scrapped, they'd be fired.

melagonster•19h ago

But what if he only want 20$/per month?

TrainedMonkey•21h ago

$200 per month will get you roughly 4-5 hours of non-stop single-threaded usage per day.

A bigger issue here is that the random process is not a good engineering pattern. It's not repeatable, does not drive coherent architecture, and struggles with complex problems. In my experience, problem size correlates inversely with generated code quality. Engineering is a process of divide-and-conquer and there is a good reason people don't use bogo (random) sort in production.

More specifically, if you only look at the final code, you are either spending a lot of time reviewing the code or accepting the code with less review scrutiny. Carefully reviewing semi random diffs seems like a poor use of time... so I suspect the default is less review scrutiny and higher tech debt. Interestingly enough, higher tech debt might be an acceptable tradeoff if you believe that soon Code Assistants will be good enough to burn the tech debt down autonomously or with minimal oversight.

On the other hand, if the code you are writing is not allowed to fail, the stakes change and you can't pick the less review option. I never thought to codify it as a process, but here is what I do to guide the development process:

- Start by stating the problem and asking Claude Code to: analyze the existing code, restate the problem in a structured fashion, scan the codebase for existing patterns solving the problem, brainstorm alternative solutions. An enhancement here could be to have a map / list of the codebase to improve the search.

- Evaluate presented solutions and iterate on the list. Add problem details, provide insight, eliminate the solutions that would not work. A lot of times I have enough context to pick a winner here, but if not, I ask for more details about each solution and their relative pros and cons.

- Ask Claude to provide a detailed plan for the down-selected solution. Carefully review the plan (a significantly faster endeavor compared to reviewing the whole diff). Iterate on the plan as needed; after that, tell Claude to save the plan for comparison after the implementation and then to get cracking.

- Review Claude's report of what was implemented vs. what was initially planned. This step is crucial because Claude will try dumb things to get things working, and I've already done the legwork on making sure we're not doing anything dumb in the previous step. Make changes as needed.

- After implementation, I generally do a pass on the unit tests because Claude is extremely prolific with them. You generally need to let it write unit tests to make sure it is on the right track. Here, I ask it to scan all of the unit tests and identify similar or identical code. After that, I ask for refactor options that most importantly maximize clarity, secondly minimize lines of code, and thirdly minimize diffs. Pick the best ones.

Yes, I accept that the above process takes significantly longer for any single change; however, in my experience, it produces far superior results in a bounded amount of time.

P.S. if you got this far please leave some feedback on how I can improve the flow.

bavell•19h ago

Very nice, going to try this out tomorrow on some tough refactors Claude has been struggling with!

nightshift1•2h ago

I agree with that list. I would also add that you should explicitly ask the llm to read the whole files at least once before starting edits because they often have tunnel vision. The project map is auto generated with a script to avoid reading too many files but the files to be edited should be fresh in the context imo.

maerch•21h ago

> A repeated trend is that Claude Code only gets 70-80% of the way, which is fine and something I wish was emphasized more by people pushing agents.

Recently, I realized that this applies not only to the first 70–80% of a project but sometimes also to the final 70-80%.

I couldn’t make progress with Claude on a major refactoring from scratch, so I started implementing it myself. Once I had shaped the idea clearly enough but in a very early state, I handed it back to Claude to finish and it worked flawlessly, down to the last CHANGELOG entry, without any further input from me.

I saw this as a form of extensive guardrails or prompting-by-example.

golergka•21h ago

That’s why I like using it and get more fulfilment from coding than before: I do the fun parts. AI does the mundane.

bavell•19h ago

I need to try this - started using Claude code a few days ago and have been struggling to get good implementations with some high-complexity refactors. It keeps over engineering and creating more problems than it solves. It's getting close though, and I think your approach would work very well for this scenario!

theshrike79•17h ago

Claude has a tendency to reinvent the wheel heavily.

It'll create a massive bespoke class to do something that is already in the stdlib.

But if there's a pattern of already using stdlib functions, it can copy that easily.

lonelyasacloud•7h ago

Basically, to get the best out Claude (or any of the other agents) it if it is possible to tell it about something ahead of time time then it is generally wise to, be that in seed skeleton code, comments, CLAUDE.md etc

LeafItAlone•10h ago

The best way I’ve found to interact with it is to treat it like an overly eager junior developer who just read books like Gang of Four and feels the need to prove their worth as a senior. Explain that simplicity matters, or you have an existing pattern to follow, or even more specific.

As I’ve worked with a number of people like what I’ve described above, the way I’ve worked with them has helped me get better results from LLMs for coding. The difference is that you can help a junior grow over time. LLMs forget after that context (Claude.md helps, but not perfect).

paulddraper•19h ago

Who is paying?

Should be the same party as is getting the rewards of the productivity gains.

FeepingCreature•12h ago

Yeah my most common aider command sequence is

    > /undo
    > /clear
    > ↑ ↑ ↑ ⏎

jordanb•11h ago

And this is the marketing pitch from the people selling this stuff. ¯\_ (ツ)_/¯

mdaniel•22h ago

Oh, come on https://www.anthropic.com/news/how-anthropic-teams-use-claud... is not "for legal"

- Custom accessibility solution for family members

- The team created prototype "phone tree" systems to help team members connect with the right lawyer at Anthropic

- Team coordination tools

- Rapid prototyping for solution validation

So, not legal

jasonthorsness•22h ago

Claude Code works well for lots of things; for example yesterday I asked it to switch weather APIs backing a weather site and it came very close to one-shotting the whole thing even though the APIs were quite different.

I use it at home via the $20/m subscription and am piloting it at work via AWS Bedrock. When used with Bedrock APIs, at the end of every session it shows you the dollar amount spent which is a bit disconcerting. I hope the fine-grained metering of inference is a temporary situation otherwise I think it will have a chilling/discouraging effect on software developers, leading to less experimentation and fewer rewrites, overall lower quality.

I imagine Anthropic gets to consume it unmetered internally so I they probably completely avoid this problem.

lumost•22h ago

once upon a time - engineers often had to concern themselves with datacenter bills, cloud bills, and eventually SaaS bills. We'll probably have 5-10 years of being concerned about AI bills before the AI expense is trivial compared to the human time.

achierius•22h ago

"once upon a time"? Engineers concern themselves with cloud bills right now, today! It's not a niche thing either, probably the majority of AWS consumers have to think about this, regularly.

nextworddev•22h ago

You will start seriously worrying about coding AI bills within 6 months

philomath_mn•22h ago

Why is that?

throwawayoldie•22h ago

Because at some point, Anthropic needs to stop hemorrhaging money and actually make some.

fragmede•20h ago

Uber, founded in 2009 was finally profitable in 2023. That's a bit longer than 6 months.

mvieira38•8h ago

Different situation. Uber's entire strategy was to "disrupt" the transportation industry by undercutting everyone, with the promise that eventually adoption (and monetization via data, advertising, etc.) would be large enough for fees not to rise so much as to push consumers and drivers into the established taxi industry. Anthropic, on the other hand, is a competitor brand in a brand-new industry, one with heavy reliance on capex and exploding employee salaries, for that matter.

fragmede•6h ago

Totally different than except for the fact that it shows that "money" is able to wait 14 years, which is 28x longer than 6 months.

theshrike79•17h ago

And when Anthropic raises the prices enough, people will jump ship.

That's why you don't pay the yearly license for anything at this point in time. Pay monthly and evaluate before each bill if there's something better out already.

nextworddev•10h ago

and that jumping ship didn't happen IRL, cursor ARR jumped 50% after their pricing change.

alwillis•18h ago

You will start seriously worrying about coding AI bills within 6 months

Nope.

More open models ship everyday and are 80% cheaper for similar and sometimes better performance, depending on the task.

You can use Qwen-3 Coder (a 480 billion parameter model with 35 billion active per forward pass (8 out of 160 experts)) for $0.302/M input tokens $0.302/M output tokens via openrouter.

Claude 4 Sonnet is $3/M input tokens and $15/M output tokens.

Several utilities will let you use Claude Code to use these models at will.

gorbypark•11h ago

Where are you seeing those prices? Cheapest one I see (and it has horrible uptime/throughput numbers) is $0.40/$1.60, and it's more like $2/$2 on average.

philomath_mn•22h ago

AI bills are already trivial compared to human time. I pay for claude max, all I need to do is save an hour a month and I will be breaking even.

byzantinegene•22h ago

on the other hand, it could also you mean you are overpaid

oblio•22h ago

$200h * 8 * 5 * 4 * 12 = $384 000 per year.

You're like in the top 0.05% of earners in the software field.

Of course, if you save 10 hours per month, the math starts making more sense for others.

And this is assuming LLM prices are stable, which I very much doubt they are, since everyone is price dumping to get market share.

CMCDragonkai•21h ago

Across most anglosphere countries and tech cities - wages and salaries far outstrip what you can get for AI. AI is already objectively cheaper than human talent in rich countries. Is it as good? Yea I'd say it's better than most mid to junior engineers. Can it run entirely by itself? No, it still needs HITL.

oblio•21h ago

Again, those prices aren't stable.

Nobody is investing half a trillion in a tech without expecting a 10x return.

And fairly sure soon those $20/month subscriptions will sell your data, shove ads everywhere AND basically only allow you to get that junior dev for 30 minutes per day or 2 days a month.

And the $200/month will probably be $500-1000 with more limitations.

Still cheap, but AI can't run an entire project, can't deliver. So the human will be in the loop, as you said, so at least a partial cost on top.

alwillis•18h ago

What’s different is all the open weight models like Kimi-k2 or Qwen-3 Coder that are as good and, depending on the task, better than Anthropic’s Sonnet model for 80% less via openrouter [1] and other similar services.

You can use these models through Claude Code; I do it everyday.

Some developers are running smaller versions of these LLMs on their own hardware, paying no one.

So I don’t think Anthropic and the other companies can dramatically increase their prices without losing the customers that helped them go from $0 to $4 billion in revenue in 3 years.

Users can easily move between different AI platforms with no lock-in, which makes it harder to increase prices and proceed to enshitify their platforms.

[1]: https://openrouter.ai/

CMCDragonkai•9h ago

Not gonna happen. The competition for AI models is approaching commodity.

Quinner•5h ago

The wages aren't stable either. There's going to be gradual convergence.

oblio•4h ago

Oh, by the way, this entire discussion revolves around LLMs being an exponential tech. Real life only works with sigmoids.

lumost•11h ago

The percentages in the field are skewed, FAANG employ a vast number of engineers.

whatever1•22h ago

This guy has not been hit with a 100k/mo cloudwatch bill

fragmede•20h ago

Datadog has entered the chat.

lumost•10h ago

I’ve managed a few 8 figure infrastructure bills. The exponential decrease in hardware costs combined with the exponential growth in software engineer salaries has meant that these bills become inconsequential in the long run. I was at one unicorn which had to spend 10% of Cost of Goods Sold on cloud. Today their biggest costs could run on a modest Postgres cluster in the cloud thanks to progressively better hardware.

100k/mo on cloud watch corresponds to a moderately large software business assuming basic best practices are followed. Optimization projects can often run into major cost overruns where the people time exceeds the discounted future free cash flow savings from the optimization.

That being said, a team of 5 on a small revenue/infra spend racking up 100k/mo is excessive. Pedantically, cloud watch/datadog are SaaS vendors - 100k/mo on Prometheus would correspond to a 20 node SSD cluster in the cloud which could easily handle several 10s of millions of metrics per second from 10s of thousands of metric producers. If you went to raw colocation facility costs - you’d have over a hundred dual Xeon machines with multi-TB direct attached SSD. Supporting hundreds of thousands of servers producing hundreds of millions of data points per second.

Human time is really the main trade-off.

lovich•22h ago

> I use it at home via the $20/m subscription and am piloting it at work via AWS Bedrock. When used with Bedrock APIs, at the end of every session it shows you the dollar amount spent which is a bit disconcerting. I hope the fine-grained metering of inference is a temporary situation otherwise I think it will have a chilling/discouraging effect on software developers, leading to less experimentation and fewer rewrites, overall lower quality.

I’m legitimately surprised at your feeling on this. I might not want the granular cost put in my face constantly but I do like the ability to see how much my queries cost when I am experimenting with prompt setup for agents. Occasionally I find wording things one way or the other has a significantly cheaper cost.

Why do you think it will lead to a chilling effect instead of the normal effect of engineers ruthlessly innovating costs down now that there is a measurable target?

jasonthorsness•22h ago

I think it’s easy to spend _time_ when the reward is intangible or unlikely, like an evening writing toy applications to learn something new or prototyping some off-the-wall change in a service that might have an interesting performance impact. If development becomes metered in both time and to-the-penny dollars, I at least will have to fight the attitude that the rewards also need to be more concrete and probable.

mwigdahl•21h ago

I’ve seen it firsthand at work, where my developers are shy about spending even a single digit number of dollars on Claude Code, even when it saves them 10 times that much in opportunity cost. It’s got to be some kind of psychological loss aversion effect.

threecheese•1h ago

Years of being hammered to reduce reduce reduce in monthly cloud spend certainly makes us cost-averse. In reality you are completely correct wrt the real value, and leadership will always see this (hopefully), but I still don’t want to hear my name spoken in our monthly budget review.

spike021•22h ago

a couple weekends ago i handed it the basic MLB api and asked it to create some widgets for MacOS to show me stuff like league/division/wildcard standings along with basic settings to pick which should be shown. it cranked out a working widget in like a half hour with minimal input.

i know some swift so i checked on what it was doing. for a quick hack project it did all the work and easily updated things i saw issues with.

for a one-off like that, not bad at all. not too dissimilar from your example.

duped•18h ago

Meanwhile I ask it to write what I think are trivial functions and it gets them subtly wrong, but obvious in testing. I would be more suspicious if I were you.

theshrike79•17h ago

Ask it to write tests first and then implement based on the tests, don't let it change the tests.

ActionHank•11h ago

On more than one occasion I've asked for a change, indicated that the surrounding code was under test, told it not to change the tests, and it off it goes making something that is slightly wrong and rewrites the tests to compensate.

corytheboyd•10h ago

> it shows you the dollar amount spent which is a bit disconcerting

I can assure you that I don’t at all care about the MAYBE $10 charge my monster Claude Code session billed the company. They also clearly said “don’t worry about cost, just go figure out how to work with it”

neom•22h ago

Wouldn't it be a bit unusual if that wasn't true? We build our coding agent...with our coding agent... feeling like this is table steaks.

tkzed49•22h ago

mmmm... table steak

readthenotes1•22h ago

I bet your dog really loves you if steak is your idea of dogfooding ;)

chrisweekly•22h ago

FYI the idiom comes from card games, not restaurants. It's "table stakes", as in "the minimum amount required to commit to the game to be allowed to play".

neom•22h ago

I'm bad at speeling. :(

pton_xd•22h ago

Yeah that'd be a big misteak

arrowsmith•21h ago

The actual title is "How Anthropic teams use Claude Code". HN automatically strips the "How" from any title that starts with it, for dubious reasons — and this thread is a great example of the drawbacks.

wgjordan•21h ago

Completely agree- this is a very well-done, A1 argument I can certainly sink my teeth into.

bilsbie•22h ago

We passed a milestone with little fanfare. AI is now improving itself.

AIPedant•21h ago

By this standard we passed the milestone in the 60s when Lisp was used to build better Lisp compilers.

By a more honest standard we are still a very long way away from AI suggesting new ANN architectures, new approaches to managing RLHF, better training data, new benchmarks, etc etc. LLMs are nowhere close to being able to improve themselves.

energy123•11h ago

It's not binary. Your example is recursive self improvement, just a weak kind because the productivity bump was small. But now we have AlphaEvolve which can likely come up with new architectures. The productivity bumps are much larger and will keep growing.

mhrmsn•22h ago

It'd be more interesting if they shared actual examples of complete prompts, CLAUDE.md files, settings and MCP servers to achieve certain things.

The documentation is good, but is kept relatively general and I have a feeling that the quality of Claude Code's output really depends on the specific setup and prompts you use.

qaq•22h ago

Wonder what the context window is for Claude Code for internal users

kordlessagain•12h ago

Not as much as the $200 month account, that's for fucking certain.

Shopper0552•22h ago

It's funny since Anthropic asked their job candidates not to use AI [0]. I know it's not the same as actually working at Anthropic already, but I just thought it was funny.

[0] https://fortune.com/2025/02/04/anthropic-tells-job-candidate...

bgwalter•22h ago

They want cracked developers and sell the slop generator to script kiddies.

symfoniq•22h ago

Anthropic are trailblazers and this will be everywhere soon. It happened to me tonight with a different company. "We use AI to do our actual work, but please don't use AI for this coding challenge."

esafak•21h ago

They've relaxed their ban on AI: https://www.anthropic.com/candidate-ai-guidance

jonstewart•22h ago

> Create self-sufficient loops > > Set up Claude to verify its own work by running builds, tests, and lints automatically.

I’ve got much better at using Claude.md and plan files, etc., but it still goes off the rails so quickly when I try to get it to follow a normal TDD test/edit/build/commit workflow. It will report success, and then the unit tests will fail or the working copy is dirty or there are build errors, etc. An LLM may be great for figuring out what code to write and what weird tool incantation to run to debug something, but I am fed up enough I want to write my own agent, because all I want is a switch-statement state machine to manage workflow.

ojbyrne•22h ago

I used to say that the last 10% of software takes 90% of the effort. All the 70 percents and 80 percents in that article reminded me of that.

bgwalter•22h ago

So this style of articles is the future? Pages of unconnected bullet points that mention the word "Claude" at least 100 times. No real information and nothing to remember.

joe_the_user•22h ago

Yeah, I don't think all the information in the article is useful but it seems like disorganized dump - or a melding together of several article - boiler plate "we use Claude for all good things" and specific useful tips all run together.

It definitely has a "we use AI enough that we've lost the ability to communicate coherently" vibe to it.

kylestlb•21h ago

I'm assuming there was a directive from comms or someone else requiring each cost center write up examples/demos of what they use Claude Code for. Then dogfooded and turned into what you see here.

ipnon•19h ago

The future is that articles will generally be read by machine intelligences more often than human ones. You have to optimize for your audience.

syntaxbush•22h ago

I found if you are tightly in the loop, keep the code highly modular, and are developing new functionality alongside tests, Claude works much better.

corytheboyd•10h ago

Yeah me too, keeping the code modular being a huge part of it. In the same vein is actually finishing feature flags and removing the old code, killing off truly dead code, tests, documentation… basically eliminate as much noise as you can, so that it reads as much exemplary code as possible.

syntaxbush•10h ago

Good point. I tell Claude in the claude.md to use a linter and formatter which helps it get rid of its own dead code while it develops.

mixdup•22h ago

The first example was helping debug k8s issues, which was diagnosed as IP pool exhaustion, and Claude helped them fix it without needing a network expert

But, if they had an expert in networking build it in the first place, would they have not avoided the error entirely up front?

danielbln•5h ago

Experts make mistakes too. In fact, all humans do.

apwell23•4h ago

they don't make dumb mistakes like claude

mfrye0•21h ago

My optimization hack is that I'm using speech recognition now with Claude Code.

I can just talk to it like a person and explain the full context / history of things. Way faster than typing it all out.

onprema•21h ago

The SuperWhisper app is great, if you use a Mac.

mfrye0•21h ago

I checked that one out. The one that Reddit recommended was Voice Type. It's completely offline and a one-time charge:

https://apps.apple.com/us/app/voice-type-local-dictation/id6...

The developer is pretty cool too. I found a few bugs here and there and reported them. He responds pretty much immediately.

jwr•21h ago

There is also MacWhisper which works very well. I've been using it for several months now.

I highly recommend getting a good microphone, I use a Rode smartlav. It makes a huge difference.

sipjca•20h ago

Open source and cross platform one: https://handy.computer

theshrike79•17h ago

So you sit in a room, talking to it? Doesn't it feel weird?

I type a lot faster than I speak :D

jon-wood•11h ago

Not only do I type faster than I speak I'm also able to edit as I go along, correcting any mistakes or things I've stumbled over and can make clearer. Half my experience of using even basic voice assistants is starting to ask for something and then going "ugh, no cancel" because I stumbled over part of a sentence and I know I'll end up with some utter nonsense in my todo list.

mfrye0•6h ago

I know what you mean. This new generation of speech recognition is different though. It's able to understand fairly technical terms, specialized product names and other stuff that previously would have been garbled text.

Even if it gets slightly garbled, I often will add a note in my context that I'm using speech recognition. Then Claude will handle the potentially garbled or unclear sections perfectly or ask follow-up questions if it's unclear.

mfrye0•6h ago

Haha yeah, it does feel a bit weird.

I often work on large, complicated projects that span the whole codebase and multiple micro services. So it's often a blend of engineering, architectural, and product priorities. I can end up talking for paragraphs or multiple pages to fully explain the context. Then Claude typically has follow-up questions, things that aren't clear, or issues that I didn't catch.

Honestly, I just get sick of typing out "dissertations" every time. It's easier just to have a conversation, save it to a file, and then use that as context to start a new thread and do the work.

foob•8h ago

I've been pretty happy with the python package hns for this [1]. You can run it from the terminal with uvx hns and it will listen until you press enter and then copy the transcription to the clipboard. It's a simple tool that does one thing well and integrates smoothly with a CLI-based workflow.

[1] - https://github.com/primaprashant/hns

mfrye0•6h ago

I'll check that one out.

The copy aspect was the main value prop for the app I chose: Voice Type. You can do ctrl-v to start recording, again to stop, and it pastes it in the active text box anywhere on your computer.

apwell23•4h ago

any options for ubuntu ?

sipjca•1h ago

as above, open source and cross platform, should work on Ubuntu

https://handy.computer

bionhoward•21h ago

Anthropic goes hard on the virtue signalling but I still don’t get how anyone manages to satisfy “Customer may not and must not attempt to (a) access the Services to build a competing product or service”

So I just avoid it and generally think the whole thing isn’t serious, because nobody seems to care enough about the safety implications of building AGI with legal terms which are logically impossible to satisfy to demonstrate appropriate attention to detail (aka, yall are noobs)

umanwizard•21h ago

Are you trying to build an AI platform to compete with anthropic? If not, why does that line matter?

astrange•20h ago

That seems pretty problematic to me. Anthropic makes and sells Claude Code, so basically any developer productivity tool is a competitor!

kordlessagain•12h ago

That's reason enough to leave, if them having one 9 of reliability of the service is not enough.

amacneil•21h ago

It’s a funny post because our team wanted to use Claude Code, but the team plan of Claude doesn’t include Claude Code (unlike the pro plan at a similar price point). Found this out after we purchased it :( We’re not going to ask every engineer to purchase it separately.

Maybe before boasting about how your internal teams use your product, add an option for external companies to pay for it!

Industry leading AI models but basic things like subscription management are unsolved…

baby•21h ago

Why not ask them to purchase it separately?

theshrike79•17h ago

It's a billing nightmare pretty much.

The only reason why "team" plans exist is to have centralised billing and licensing.

onprema•21h ago

> When Kubernetes clusters went down and weren't scheduling new pods, the team used Claude Code to diagnose the issue. They fed screenshots of dashboards into Claude Code, which guided them through Google Cloud's UI menu by menu until they found a warning indicating pod IP address exhaustion. Claude Code then provided the exact commands to create a new IP pool and add it to the cluster, bypassing the need to involve networking specialists.

This seems rather inefficient, and also surprising that Claude Code was even needed for this.

ktzar•21h ago

They're subsidizing a world where we need ai instead of understanding or, at the very least, knowing who can help us. Eventually for us to be so dumb we are the ai slaves.

moomoo11•17h ago

Not really.

Is it really value add to my life that I know some detail on page A or have some API memorized?

I’d rather we be putting smart people in charge of using AI to build out great products.

It should make things 10000x more competitive. I’m for one excited AF for what the future holds.

If people want to be purists and pat themselves on the back sure. I mean people have hobbies like arts.

otabdeveloper4•13h ago

> for what the future holds

AI mostly provides negative efficiency gains.

This will not change in the future.

AstroBen•9h ago

> Is it really value add to my life that I know some detail on page A or have some API memorized?

yes, actually. Maybe not intimate details but knowing what's available in the API heavily helps with problem solving

whycombagator•20h ago

Yes. This is what id expect from an intern or very junior engineer (which could be the case here)

skeeter2020•21h ago

This post had to have been written in a large part by their model, right?

seba_dos1•21h ago

I could already tell that they do by watching how often and how badly their canvas integration breaks.

dzikibaz•20h ago

Why is everyone so careless about letting Claude Desktop upload their source code and/or private data to its servers? Or maybe I'm too paranoid for caring about where data go.

drak0n1c•20h ago

VeniceAI is great for privacy focused hosting of open-source models. Hopefully they add Qwen 3 Coder (which appears to be on par with Claude 4 for coding) to their API, then I can use my tools with it.

linkage•11h ago

The codebase I am paid to work on is a dog's breakfast, largely agglomerated over many months by offshore contractors. If Anthropic includes it in their training set, it's their own funeral.

corytheboyd•10h ago

Because my employer tells me it’s fine so long as we’re using it with AWS Bedrock. Do I believe that Amazon then wouldn’t be siphoning all this as model training data? It’s a fun conspiracy…

fumeux_fume•20h ago

Wholesale use of and/or dependence on Claude Code feels like a Faustian bargain from a personal and even a business point of view.

grensley•19h ago

Claude Desktop is one of the buggier applications I've ever used and one of our jokes internally is that it seems like it was very clearly vibe coded.

zuInnp•19h ago

I prefer to use Claude code like a smart rubber ducky while I write most the code in the end. I added rules to make claude first explain what it want to do in chat (and code changes should only be implemented if I ask for it). I like it to talk about ideas, discuss solutions and approaches. But I am in control of it - in the end, I am the person repsonsible for the code.

Since I don't like it to automatically do changes to files, I copy&paste the code from terminal to the IDE. That seems slow at first, but it allows me to correct the bigger and smaller issues on the fly faster than prompting Claude to my preffered solution. In my opinion, this makes more sense since I have more control and it is easier to spot problematic code. When fixing such issues, I point Claude to changes afterwards to add to its context.

For me Claude is like a very (over) confident junior developer. You have to keep an eye on them and if it is faster do it yourself, then just do it and explain them why you did it. (That might be a bad approach for Juniors, but for Claude it works for me)

Btw, can we talk about that this blog post is written by the company that tries to sell the tool? So we should take it with a huge grain of salt... . Like all what these AI companies are telling us, should probably be ignored for 90 % of time. They either want to raise money or or getting bought by some other company in the end ...

theshrike79•17h ago

Just keep it in plan mode and it won't change anything?

Unlike Gemini CLI which will just rush into implementation without hesitation :D

ipnon•18h ago

LLMs need documentation for semantic instructions and testing for formal verification. In the past efficient human teams could forego both of these and rely on tacit understanding. But this does not exist with LLMs so we will continue to see proliferation of elaborate documentation bases and test bases.

It is very interesting to me how the differences in our intelligences is physically manifested in text. It is one argument against hard-takeoff: the bioneuron can encode information in a sweet spot that cannot be targeted by the perceptron, by the so-called neuralese, by any amount of distillation.

forgotmypw17•13h ago

I’ve implemented and maintained an entire web app with CC, and also used many other tools (and took classes and taught workshops on using AI coding tools).

The most effective way I’ve found to use CC so far is this workflow:

Have a detailed and also compressed spec in an md file. It can be called anything, because you’re going to reference it explicitly in every prompt. (CC usually forgets about CLAUDE.md ime)

Start with the user story, and ask it to write a high-level staged implementation plan with atomic steps. Review this plan and have CC rewrite as necessary. (Another md file results.)

Then, based on this file, ask it to write a detailed implementation plan, also with atomic stages. Then review it together and ask if it’s ready to implement.

Then tell Claude to go ahead and implement it on a branch.

Remember the automated tests and functional testing.

Then merge.

stillsut•10h ago

Great advice, matches up to my experience. Personally I go a little cheaper and dirtier on the first prompt, then revise as needed. By the way what classes / workshops did you teach?

I've written a little about some my findings and workflow in detail here: https://github.com/sutt/agro/blob/master/docs/case-studies/a...

forgotmypw17•8h ago

Thank you for sharing. I taught some workshops on AI-assisted development using Cursor a Windsurf for MIT students (we built an application and wrote a book) and TAed another similar for-credit course. I’ve also been teaching high schoolers how to code, and we use ChatGPT to help us understand and solve leetcode problems by breaking them down into smaller exercises. There’s also now a Harvard CS course on developing with GenAI which I followed along with. The field is exploding.

beambot•7h ago

Can you provide any source materials - course notes, book, etc?

stillsut•7h ago

> ...development using Cursor a Windsurf for MIT students...

Richard Stallman is rolling in his grave with this.

But in all seriousness, nice work, I think this _is_ where the industry is going, hopefully we don't have to rely on using proprietary models forever though.

forgotmypw17•5h ago

Richard Stallman is alive and well, just had dinner with him earlier this month.

You can set up a FOSS toolchain to do similar work, it’s just something I haven’t spent the time on. I probably should.

eagerpace•10h ago

Do you have an example of this you could share?

stillsut•7h ago

I can share my own ai-generated codebase:

- there's a devlog showing all the prompts and accepted outputs: https://github.com/sutt/agro/blob/master/docs/dev-summary-v1...

- and you can look at the ai-generated tests (as is being discussed above) and see they aren't very well thought out for the behavior, but are syntactically impressive: https://github.com/sutt/agro/tree/master/tests

- check out the case-studies in the docs if you're interested in more ideas.

Nimitz14•9h ago

We're all gonna be PMs.

forgotmypw17•8h ago

That’s basically what it amounts to, being a PM to a team of extremely cracked and also highly distractable coders.

AstroBen•9h ago

Is working this way actually faster, or any improvement than just writing the code yourself?

forgotmypw17•8h ago

Much, much faster, and I’d say the code is more formal, and I’ve never had such a complete test suite.

The downside is I don’t have as much of a grasp on what’s actually happening in my project, while with hand-written projects I’d know every detail.

AstroBen•8h ago

How do you know the test suite is comprehensive if you don't have a grasp on what's happening?

Not a gotcha I'm just extremely skeptical that AI is at a point to have the level of responsibility you're describing and have it turn into good code long term

danielbln•5h ago

I do TDD. Plan out the feature, then have it write test. Scan the tests, check if everything is kosher. Then let it do the implementation, rinse and repeat.

amedviediev•4h ago

This matches my experience as well. But what I also found is that I hate this workflow so much that I would almost always rather write the code by hand. Writing specs and user stories was always my least favorite task.

rs186•10h ago

Does anyone else find the article difficult to read? I was at 30% and just couldn't withstand it any more. No doubt it was created/heavily revised by LLM, but the whole thing just isn't like anything that a human writes, and definitely unlike those well-structured, interesting and engaging blog articles I have read in the past.

seaports•6h ago

Yeah, it was unreadable. Constant start-stoppage

mvieira38•8h ago

I'm surprised they let it generate the tests. Aren't the tests precisely what you want to have fine control over, moreso than the code itself? Letting it write them itself feels like yielding control over what is considered "correct" code

danielbln•5h ago

Unpopular take: I couldn't give a single damn about the code. If it passes the spec and tests (that I review) and the feature works, that is good enough for me. I'm in the business of automation, and in the business of solving problems. Writing perfect, beautiful code is not what I'm in the business of. Now of course, good enough doesn't mean "complete horse shit spaghetti code", but it also doesn't mean I give a damn if some function is maybe not pure enough or whatever.

netpaladinx•8h ago

Which database is behind Claude Code? Cursor built on top of object storage. They failed with Yugabytes and Postgres

Adifounder•5h ago

I wonder if claude code would benefit from a ui?

itsafarqueue•1h ago

You’re absolutely right!

Do not download the app, use the website

It's time for modern CSS to kill the SPA

It's a DE9, not a DB9 (but we know what you mean)

Vanilla JavaScript support for Tailwind Plus

Experimental surgery performed by AI-driven surgical robot

Why MIT switched from Scheme to Python (2009)

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm?

Animated Cursors

Developing our position on AI

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope

Never write your own date parsing library

Windsurf employee #2: I was given a payout of only 1% what my shares where worth

CO2 Battery

Running PostmarketOS on Android Termux proot without a custom ROM (2024)

The future is not self-hosted

Programming vehicles in games

Internet Archive is now a federal depository library

Show HN: Price Per Token – LLM API Pricing Data

SRAM Has No Chill: Exploiting Power Domain Separation to Steal On-Chip Secrets

Women dating safety app 'Tea' breached, users' IDs posted to 4chan

Who has the fastest F1 website (2021)

Researchers value null results, but struggle to publish them

Show HN: Apple Health MCP Server

Why is there a date of 1968 in the Intel Chipset Device Software Utility?

Show HN: I built a biological network visualization tool

Trucking's uneasy relationship with new tech

Implementing a functional language with graph reduction (2021)

Celebrating 20 Years of MDN

Dwl: Dwm for Wayland

Claude Code Introduces Specialized Sub-Agents

Do not download the app, use the website

It's time for modern CSS to kill the SPA

It's a DE9, not a DB9 (but we know what you mean)

Vanilla JavaScript support for Tailwind Plus

Experimental surgery performed by AI-driven surgical robot

Why MIT switched from Scheme to Python (2009)

Efficient Computer's Electron E1 CPU – 100x more efficient than Arm?

Animated Cursors

Developing our position on AI

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope

Never write your own date parsing library

Windsurf employee #2: I was given a payout of only 1% what my shares where worth

CO2 Battery

Running PostmarketOS on Android Termux proot without a custom ROM (2024)

The future is not self-hosted

Programming vehicles in games

Internet Archive is now a federal depository library

Show HN: Price Per Token – LLM API Pricing Data

SRAM Has No Chill: Exploiting Power Domain Separation to Steal On-Chip Secrets

Women dating safety app 'Tea' breached, users' IDs posted to 4chan

Who has the fastest F1 website (2021)

Researchers value null results, but struggle to publish them

Show HN: Apple Health MCP Server

Why is there a date of 1968 in the Intel Chipset Device Software Utility?

Show HN: I built a biological network visualization tool

Trucking's uneasy relationship with new tech

Implementing a functional language with graph reduction (2021)

Celebrating 20 Years of MDN

Dwl: Dwm for Wayland

Claude Code Introduces Specialized Sub-Agents

How Anthropic teams use Claude Code

Comments