One trick I have gotten some milage out of was this: have Claude Code research Slash commands, then make a slash command to turn the previous conversation into a slash command.
That was cool and great! But then, of course you inevitably will interrupt it and need to do stuff to correct it, or to make a change or "not like that!" or "use this tool" or "think harder before you try that" or "think about the big picture" ... So you do that. And then you ask it to make a command and it figures out you want a /improve-command command.
So now you have primitives to build on!
Here are my current iterations of these commands (not saying they are optimal!)
https://github.com/ctoth/slashcommands/blob/master/make-comm...
https://github.com/ctoth/slashcommands/blob/master/improve-c...
A few small markdown documents and putting in the time to understand something interesting hardly seems a steep price!
When I point that out, they profusely apologize and say that of course the walls must be white and wonder why they even got the idea of making them pink in the first place.
Odd, but nice fellows otherwise. It feels like they’re 10x more productive than other contractors.
This actually happened to me Monday.
But sure, humans are deterministic clockwork mechanisms!
Are you now going to tell me how I got a bad contractor? Because that sure would sound a lot like "you're using the model wrong"
If you haven’t tried it, I can’t recommend it enough. It’s the first time it really does feel like working with a junior engineer to me.
And I reach for Claude quite a bit because if it worked as well for me like everyone here says, that would be amazing.
But at best it’ll get a bunch of boilerplate done after some manual debugging, at worst I spend an hour and some amount of tokens on a total dead end
Clear instructions go a long way, asking it to review work, asking it to debug problems, etc. definitely helps.
Definitely - with ONE pretty big callout. This only works when a clear and quantifiable rubric for verification can be expressed. Case in point, I put Claude Code to work on a simple react website that needed a "Refresh button" and walked away. When I came back, the button was there, and it had used a combination of MCP playwright + screenshots to roughly verify it was working.
The problem was that it decided to "draw" a circular arrow refresh icon and the arrow at the end of the semicircle was facing towards the circle centroid. Anyone (even a layman) would take one look at it and realize it looked ridiculous, but Claude couldn't tell even when I took the time to manually paste a screenshot asking if it saw any issues.
While it would also be unreasonable to expect a junior engineer to hand-write the coordinates for a refresh icon in SVG - they would never even attempt to do that in the first place realizing it would be far simpler to find one from Lucide, Font Awesome, emojis, etc.
But for other tasks like generating reports, I ask it to write little tools to reformat data with a schema definition, perform calculations, or do other things that are fairly easy to then double-check with tests that produce errors that it can work with. Having it "do math in its head" is just begging for disaster. But, it can easily write a tool to do it correctly.
That's exactly what I learned. In the early 2000's, from three expensive failed development outsourcing projects.
When it drops in something hacky, I use that to verify the functionality is correct and then prompt a refactor to make it follow better conventions.
I'd definitely watch Boris's intro video below [1]
[1] Boris introduction: https://www.youtube.com/watch?v=6eBSHbLKuN0 [2] summary of above video: https://www.nibzard.com/claude-code/
I sympathize with both experiences and have had both. But I think we've reached the point where such posts (both positive and negative) are _completely useless_, unless they're accompanied with a careful summary of at least:
* what kind of codebase you were working on (language, tech stack, business domain, size, age, level of cleanliness, number of contributors)
* what exactly you were trying to do
* how much experience you have with the AI tool
* is your tool set up so it can get a feedback loop from changes, e.g. by running tests
* how much prompting did you give it; do you have CLAUDE.me files in your codebase
and so on.
As others pointed out, TFA also has the problem of not being specific about most of this.
We are still learning as an industry how to use these tools best. Yes, we know they work really well for some people and others have bad experiences. Let's try and move the discussion beyond that!
For context, I was using Claude Code on a Ruby + Typescript large open source codebase. 50M+ tokens. They had specs and e2e tests so yeah I did have feedback when I was done with a feature - I could run specs and Claude Code could form a loop. I would usually advise it to fix specs one by one. --fail-fast to find errors fast.
Prior to Claude Code, I have been using Cursor for an year or so.
Sonnet is particularly good at NextJS and Typescript stuff. I also ran this on a medium sized Python codebase and some ML related work too (ranging from langchain to Pytorch lol)
I don't do a lot of prompting, just enough to describe my problem clearly. I try my best to identify the relevant context or direct the model to find it fast.
I made new claude.md files.
Your LLM (CC) doesn't have your whole codebase in context, so it can run off and make changes without considering that some remote area of the codebase are (subtly?) depending on the part that claude just changed. This can be mitigated to some degree depending on the language and tests in place.
The LLM (CC) might identify a bug in the codebase, fix it, and then figure, "Well, my work here is done." and just leave it as is without considering ramifications or that the same sort of bug might be found elsewhere.
I could go on, but my point is to simply validate the issues people will be having, while also acknowledging those seeing the value of an LLM like CC. It does provides useful work (e.g. large tedious refactors, prototyping, tracking down a variety of bugs, and so on...).
If your tests are good, Claude Code can run them and use them to check it hasn't broken any distant existing behavior.
I actually think it's more productive to just accept how people describe their experience, without demanding some extensive list of evidence to back it up. We don't do this for any other opinion, so why does it matter in this case?
> Let's try and move the discussion beyond that!
Sharing experiences using anecdotal evidence covers most of the discussion on forums. Maybe don't try to police it, and either engage with it, or move on.
I use Claude many times a day, I ask it and Gemini to generate code most days. Yet I fall into the "I've never included a line of code generated by an LLM in committed code" category. I haven't got a precise answer for why that is so. All I can come up with is the code generated lacks the depth of insight needed to write a succinct, fast, clear solution to the problem someone can easily understand in in 2 years time.
Perhaps the best illustration of this is someone proudly proclaimed to be they committed 25k lines in a week, with the help of AI. In my world, this sounds like they are claiming they have a way of turning the sea into ginger beer. Gaining the depth of knowledge required to change 25k lines of well written code would take me more than a week of reading. Writing that much in a week is a fantasy. So I asked them to show me the diff.
To my surprise, a quick scan of the diff revealed what the change did. It took me about 15 minutes to understand most of it. That's the good news.
The bad news it that 25k lines added 6 fields to a database. 2/3's were unit tests, perhaps 2/3's of the remainder was comments (maybe more). The comments were glorious in their length and precision, littered with ASCII art tables showing many rows in the table.
Comments in particular are a delicate art. They are rarely maintained, so they can bit rot in downright misleading babble after a few changes. But the insight they provide into what author was thinking, and in particular the invariants he had in mind can save hours of divining it from the code. Ideally they concisely explain only the obscure bits you can't easily see from the code itself. Anything more becomes technical debt.
Quoting Woodrow Wilson on the amount of time he spent preparing speeches:
“That depends on the length of the speech,” answered the President. “If it is a ten-minute speech it takes me all of two weeks to prepare it; if it is a half-hour speech it takes me a week; if I can talk as long as I want to it requires no preparation at all. I am ready now.”
Which is a round about way of saying I suspect the usefulness of LLM generated code depends more on how often a human is likely to read it, than of any of the things you listed. If it is write once, and the requirement is it works for most people in the common cases, LLM generated code is probably the way to go.I used PayPal's KYC web interface the other day. It looked beautiful, completely inline with the rest of PayPal's styling. But sadly I could not complete it because of bugs. The server refused to accept one page, it just returned to the same page with no error messages. No biggie, I phoned support (several times, because they also could not get past the same bug), and after 4 hours on the phone the job was done. I'm sure the bug will be fixed a new contractor. He spend an few hours on it, getting an LLM to write a new version, throwing the old code away, just as his predecessor did. He will say the LLM provided a huge productivity boost, and PayPal will be happy because he cost them so little. It will be the ideal application for an LLM - got the job done quickly, and no one will read the code again.
I later discovered there was a link on the page that allowed me to skip past the problematic page, so I could at least enter the rest of the information. It was in a thing that looked confusingly like a "menu bar" on the left, although there was no visual hit any of the items in the menu were clickable. I clicked on most of them anyway, but they did nothing. While on hold for phone support, I started reading the HTML and found one was a link. It was a bit embarrassing to admit to the help person I hadn't clicked that one. It sped the process up somewhat. As I said, the page did look very nice to the eye, probably partially because of the lack of clutter created by visual hints on what was clickable.
[0] https://quoteinvestigator.com/2012/04/28/shorter-letter/
When it creates a bunch of useless junk I feel free to discard it and either try again with clearer guidelines (or switch to Opus).
People have such widely varying experiences and I’m wondering why.
I'd think Win32 development would be something AIs are very strong at because it's so old, so well documented, and there's a ton of code out there for it to read. Yet it still struggles with the differences between Windows messages, control notification messages, and command messages.
It's also another in my growing list of data points towards my opinion that if an author posts meme pictures in their article, it's probably not an article I'm interested in reading.
The tools really do shine where they're good though. They're amazing. But the moment you try to do the more "serious" work with them, it falls apart rapidly.
I say this as someone that uses the tools every day. The only explanation that makes sense to me is that the "you don't get it, they're amazing at everything" people just aren't working on anything even remotely complicated. Or it's confirmation bias that they're only remembering the good results - as we saw with last week's study on the impact of these tools on open source development (perceived productivity was up, real productivity was down). Until we start seeing examples to the contrary, IMO it's not worth thinking that much about. Use them at what they're good at, don't use them for other tasks.
LLMs don't have to be "all or nothing". They absolutely are not good at everything, but that doesn't mean they aren't good at anything.
But I think we should expect the scope of LLM work to improve rapidly in the next few years.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
Sorry, but this is just not true.
I'm using agents with a totally idiosyncratic code base of Haskell + Bazel + Flutter. It's a stack that is so quirky and niche that even Google hasn't been able to make it work well despite all their developer talent and years of SWEs pushing for things like Haskell support internally.
With agents I'm easily 100x more productive than I would be otherwise.
I'm just starting on a C++ project, but I've already done at least 2 weeks worth of work in under a day.
I have mixed feelings; because this means there’s really no business reason to ever hire a junior; but it also (I think) threatens the stability of senior level jobs long term, especially as seniors slowly lose their knowledge and let Claude take care of things. The result is basically: When did you get into this field, by year?
I’m actually almost afraid I need to start crunching Leetcode, learning other languages, and then apply to DoD-like jobs where Claude Code (or other code security concerns) mean they need actual honest programmers without assistance.
However, the future is never certain, and nothing is ever inevitable.
aren't these people your seniors in the coming years? Its healthy to model an inflow and outflow.
Much less context babysitting too. Claude code is really good at finding the things it needs and adding them to its context. I find Cursor’s agent mode ceases to be useful at a task time horizon of 3-5 minutes but Claude Code can chug away for 10+ minutes and make meaningful progress without getting stuck in loops.
Again, all very surprising given that I use sonnet 4 w/ cursor + sometimes Gemini 2.5 pro. Claude Code is just so good with tools and not getting stuck.
If Claude is so amazing, could Anthropic not make their own fully-featured yet super-performant IDE in like a week?
What I'd really want is a way to easily hide it, which I did quite frequently with Copilot as its own pane.
My workflow now boils down to 2 tools really - leap.new to go from 0 to 1 because it also generates the backend code w/ infra + deployment and then I pick it up in Zed/Claude Code and continue working on it.
2. When you are in a new area, but you don't want to dive deep and just want something quick and it is not core of the app/service.
But, if you are experienced, you can see how AI can mess things up pretty quickly, hence for me it has been best used to 'fill in clear and well defined functionality' at peacemeal. Basically it is best for small bites, then large chunks.
Bunch of comments online also reflect how there's a lot of "butthurt" developers shutting things down with a closed mind - focusing only on the negatives, and not letting the positives go through.
I sound a bit philosophical but I hope I'm getting my point across.
This conversation is useless without knowing the author's skillset and use-case.
I mean, do we really want our code base to not follow a coding standard? Or are network code not to consider failure or transactional issues? I feel like all of these traits are hallmarks of good senior engineers. Really good ones learn to let go a little but no senior is going to watch a dev automated or otherwise, circumvent six layers of architecture by blasting in a static accessor or smth.
Craftsmanship, control issues and perfectionism, tend to exist for readability, to limit entropy and scope, so one can be more certain of the consequences of a chunk of code. So to consider them a problem is a weird take to me.
You have to watch Claude Code like a hawk. Because it's inconsistent. It will cheat, give up, change directions, and not make it clear to you that is what it's doing.
So, while it's not "junior" in capabilities, it is definitely "junior" in terms of your need as a "senior" to thoroughly review everything it does.
Or you'll regret it later.
Edit: I see a sibling comment mention the Max plan. I wanna be clear that I am not talking about rate limits here but actual models being inaccessible - so not a rate limit issue. I hope Anthropic figures this out fast, because it is souring me on Claude Code a bit.
Here is an example of chat gpt, followed by mostly Claude that finally solved a backlight issue with my laptop.
If you aren't sure whether to pull the trigger on a subscription, I would put $5-$10 into an API console account and use CC with an API key.
For anything but the smallest things I use claude code...
And even then...
For the bigger things, I ask it to propose to me a solution (when adding new features).
It helps when you give proper guidance: do this, use that, avoid X, be concise, ask to refactor when needed.
All in all, it's like a slightly autistic junior dev, so you need to be really explicit, but once it knows what to do, it's incredible.
That being said, whenever you're stuck on an issue, or it keeps going in circles, I tend to rollback, ask for a proper analysis based on the requirements, and fill in the details of necessary.
For the non-standard things (f.e. detect windows on a photo and determine the measurement in centimetres), you still have to provide a lot of guidance. However, once I told it to use xyz and ABC it just goes. I've never written more then a few lines of PHP in my life, but have a full API server with an A100 running, thanks to Claude.
The accumulated hours saved are huge for me, especially front-end development, refactoring, or implementing new features to see if they make sense.
For me it's a big shift in my approach to work, and I'd be really sad if I have to go back to the pre-AI area.
Truth to be told, I was a happy user of cline & Gemini and spent hundreds of dollars on API calls per month. But it never gave me the feeling Claude code gave me, the reliability for this thing is saving me 80% of my time.
I’ve mentored and managed juniors. They’re usually a net negative in productivity until they are no longer juniors.
Progress doesn't end here either, imo CC is more a mid-level engineer with a top-tier senior engineer's knowledge. I think we're getting to the point where we can begin to replace the majority of engineers (even seniors) for just a handful of seniors engineers to prompt and review AI produced code and PRs.
Not quite there yet, of course, but definitely feeling that shift starting now... There's going to be huge productivity boosts for tech companies towards the end this year if we can get there.
Exciting times.
It should be capable of rebuilding VS Code but better, no?
The recent Kimi-K2 supposedly works great.
I generally get great 1-shot (one input and the final output after all tasks are done) comments. I have moved past claude code though I am using the CLI itself with another model although I was using claude code and my reason for switching isn't that claude was a bad model it's just that it was expensive and I have access to larger models for cheaper. The CLI is the real power not the model itself per-se. Opus does perform a little better than others.
It's totally made it so I can do the code that I like to do while it works on other things during that time. I have about 60-70 different agent streams going at a time atm. Codebases sizes vary, the largest one right now is about 200m tokens (react, typescript, golang) in total and it does a good job. I've only had to tell it twice to do something differently.
wahnfrieden•3h ago
dejavucoder•3h ago