1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.
2) For really complex steps, ask the model to write code to visualize the problem and solution.
3) If the model fails on a given step, ask it to add logging to the code, save the logs, run the tests and the review the logs to determine what went wrong. Do this repeatedly until the step works well.
4) Ask the model to look at your existing code and determine how it was designed to implement a task. Some times the model will put all of the changes in one file but your code has a cleaner design the model doesn't take into account.
I've seen other people blog about their tricks and tips. I do still see garbage results but not as high as 95%.
I've interviewed with three tier one AI labs and _no-one_ I talked to had any idea where the business value of their models came in.
Meanwhile Chinese labs are releasing open source models that do what you need. At this point I've build local agentic tools that are better than anything Claude and OAI have as paid offerings, including the $2,000 tier.
Of course they cost between a few dollars to a few hundred dollars per query so until hardware gets better they will stay happily behind corporate moats and be used by the people blessed to burn money like paper.
Doesn't specifically seem to jive with the claim Anthropic made where they were worried about Claude Code being their secret sauce, leaving them unsure whether to publicly release it. (I know some skeptical about that claim.)
This doesn't match the sentiment on hackernews and elsewhere that claude code is the superior agentic coding tool, as it's developed by one of the AI labs, instead of a developer tool company.
You don't see better ones from code tooling companies because the economics don't work out. No one is going to pay $1,000 for a two line change on a 500,000k line code base after waiting four hours.
LLMs today the equivalent of a 4bit ALU without memory being sold as a fully functional personal computer. And like ALUs today, you will need _thousands_ of LLMs to get anything useful done, also like ALUs in 1950 we're a long way off from a personal computer being possible.
Tried this on a developer I worked with once and he just scoffed at me and pushed to prod on a Friday.
that's the --yolo flag in cc :D
Most users will just give a vague tasks like: "write a clone of Steam" or "create a rocket" and then they blame Claude Code.
If you want AI to code for you, you have to decompose your problem like a product owner would do. You can get helped by AI as well, but you should have a plan and specifications.
Once your plan is ready, you have to decompose the problem into different modules, then make sure each modules are tested.
The issue is often with the user, not the tool, as they have to learn how to use the tool first.
This seems like half of HN with how much HN hates AI. Those who hate it or say it’s not useful to them seem to be fighting against it and not wanting to learn how to use it. I still haven’t seen good examples of it not working even with obscure languages or proprietary stuff.
The main difference is that with the current batch of genai tools, the AI's context resets after use, whereas a (good) intern truly learns from prior behavior.
Additionally, as you point out, the language and frameworks need to be part of the training set since AI isn't really "learning" it's just prepolulating a context window for its pre-existing knowledge (token prediction), so ymmv depending on hidden variables from the secret (to you, the consumers) training data and weights. I use Ruby primarily these days, which is solidly in the "boring tech" camp and most AIs fail to produce useful output that isn't rails boilerplate.
If I did all my IC contributions via directed intern commits I'd leave the industry out of frustration. Using only AI outputs for producing code changes would be akin to torture (personally.)
Edit: To clarify I'm not against AI use, I'm just stating that with the current generation of tools it is a pretty lackluster experience when it comes to net new code generation. It excells at one off throwaway scripts and making large tedious redactors less drudgerly. I wouldn't pivot to it being my primary method of code generation until some of the more blatant productiviy losses are addressed.
Now, it's not always useless. It's GREAT at adding debugging output and knowing which variables I just added and thus want to add to the debugging output. And that does save me time.
And it does surprise me sometimes with how well it picks up on my thinking and makes a good suggestion.
But I can honestly only accept maybe 15-20% of the suggestions it makes - the rest are often totally different from what I'm working on / trying to do.
And it's C++. But we have a very custom library to do user-space context switching, and everything is built on that.
One option is to write "Please implement this change in small steps?" more-or-less exactly
Another option is to figure out the steps and then ask it "Please figure this out in small steps. The first step is to add code to the parser so that it handles the first new XML element I'm interested in, please do this by making the change X, we'll get to Y and Z later"
I'm sure there's other options, too.
I give an outline of what I want to do, and give some breadcrumbs for any relevant existing files that are related in some way, ask it to figure out context for my change and to write up a summary of the full scope of the change we're making, including an index of file paths to all relevant files with a very concise blurb about what each file does/contains, and then also to produce a step-by-step plan at the end. I generally always have to tell it to NOT think about this like a traditional engineering team plan, this is a senior engineer and LLM code agent working together, think only about technical architecture, otherwise you get "phase 1 (1-2 weeks), phase 2 (2-4 weeks), step a (4-8 hours)" sort of nonsense timelines in your plan. Then I review the steps myself to make sure they are coherent and make sense, and I poke and prod the LLM to fix anything that seems weird, either fixing context or directions or whatever. Then I feed the entire document to another clean context window (or two or three) and ask it to "evaluate this plan for cohesiveness and coherency, tell me if it's ready for engineering or if there's anything underspecified or unclear" and iterate on that like 1-3 times until I run a fresh context window and it says "This plan looks great, it's well crafted, organized, etc...." and doesn't give feedback. Then I go to a fresh context window and tell it "Review the document @MY_PLAN.md thoroughly and begin implementation of step 1, stop after step 1 before doing step 2" and I start working through the steps with it.
As an engineer, especially as you get more experience, you can kind of visualize the plan for a change very quickly and flesh out the next step while implementing the current step
All you have really accomplished with the kind of process described is make the worlds least precise, most verbose programming language
I can say the right precise wording in my prompt to guide it to a good plan very quickly. As the other commenter mentioned, the entire above process only takes something like 30-120 minutes depending on scope, and then I can generate code in a few minutes that would take 2-6 weeks to write myself, working 8 hr days. Then, it takes something like 0.5-1.5 days to work out all the bugs and clean up the weird AI quirks and maybe have the LLM write some playwright tests or whatever testing framework you use for integration tests to verify it's own work.
So yes, it takes significant time to plan things well for good results, and yes the results are often sloppy in some parts and have weird quirks that no human engineer would make on purpose, but if you stick to working on prompt/context engineering and getting better and faster at the above process, the key unlock is not that it just does the same coding for you, with it generating the code instead. It's that you can work as a solo developer at the abstraction level of a small startup company. I can design and implement an enterprise grade SSO auth system over a weekend that integrates with Okta and passes security testing. I can take a library written in one language and fully re-implement it in another language in a matter of hours. I recently took the native libraries for Android and iOS for a fairly large, non-trivial SDK, and had Claude build me a React Native wrapper library with native modules that integrates both natives libraries and presents a clean, unified interface and typescript types to the react native layer. This took me about two days, plus one more for validation testing. I have never done this before. I have no idea how "Nitro Modules" works, or how to configure a react native library from scratch. But given the immense scaffolding abilities of LLMs, plus my debugging/hacking skills, I can get to a really confident place, really quickly and ship production code at work with this process, regularly.
So I'll say something like "evaluate the URL fetcher library for best practices, security, performance, and test coverage. Write this up in a markdown file. Add a design for single flighting and retry policy. Break this down into steps so simple even the dumbest LLM won't get confused.
Then I clear the context window and spawn workers to do the implementation.
They are the single closest thing we've ever had to objective evaluation on if an engineering practice is better or worse. Simply because just about every single engineering practice that I see that makes coding agents work well also makes humans work well.
And so many of these circular debates and other best practices (TDD, static typing, keeping todo lists, working in smaller pieces, testing independently before testing together, clearly defined codebase practices, ...) have all been settled in my mind.
The most controversial take, and the one I dislike but may reluctantly have to agree with is "Is it better for a business to use a popular language less suited for the task than a less popular language more suited for it." While obviously it's a sliding scale, coding agents clearly weight in on one side of this debate... as little as I like seeing it.
I’ve seen incredible improvements just by doing this and using precise prompting to get Claude to implement full services by itself, tests included. Of course it requires manual correction later but just telling Claude to check the development documentation before starting work on a feature prevents most hallucinations (that and telling it to use the Context7 MCP for external documentation), at least in my experience.
The downside to this is that 30% of your context window will be filled with documentation but hey, at least it won’t hallucinate API methods or completely forget that it shouldn’t reimplement something.
Just my 2 cents.
In theory. In practice, it's not a very secure sandbox and Claude will happily go around updating files if you insist / the prompt is bad / it goes off on a tangent.
I really should just set up a completely sandboxed VM for it so that I don't care if it goes rm-rf happy.
A sandboxed devcontainer is worth setting up though. Lets me run it with —dangerously-skip-permissions
Writing the code is the fast and easy part once you know what you want to do. I use AI as a rubber duck to shorten that cycle, then write it myself.
Choosing the battles to pick is part of the skill at the moment.
I use AI for a lot of boiler plate, tedious tasks I can’t quite do a vim recording for, small targeted scripts.
The boilerplate argument is becoming quite old.
Doesn't that sound ridiculous to you?
I just don't enjoy the work as much as I did when was younger. Now I want to get things done and then spend the day on other more enjoyable (to me) stuff.
The funny thing is - we need less. Less of everything. But an up-tick in quality.
This seems to happen with humans with everything - the gates get opened, enabling a flood of producers to come in. But this causes a mountain of slop to form, and overtime the tastes of folks get eroded away.
Engineers don't need to write more lines of code / faster - they need to get better at interfacing with other folks in the business organisation and get better at project selection and making better choices over how to allocate their time. Writing lines of code is a tiny part of what it takes to get great products to market and to grow/sustain market share etc.
But hey, good luck with that - ones thinking power is diminished overtime by interacing with LLMs etc.
Sometimes I reflect on how much more efficiently I can learn (and thus create) new things because of these technologies, then get anxiety when I project that to everyone else being similarly more capable.
Then I read comments like this and remember that most people don't even want to try.
Come back and post here when you have built something that has commercial success.
Show us all how it's done.
Until then go away - more noise doesn't help.
I'm at the point now where I have to yell at the AI once in a while, but I touch essentially zero code manually, and it's acceptable quality. Once I stopped and tried to fully refactor a commit that CC had created, but I was only able to make marginal improvements in return for an enormous time commitment. If I had spent that time improving my prompts and running refactoring/cleanup passes in CC, I suspect I would have come out ahead. So I'm deliberately trying not to do that.
I expect at some point on a Friday (last Friday was close) I will get frustrated and go build things manually. But for now it's a cognitive and effort reduction for similar quality. It helps to use the most standard libraries and languages possible, and great tests are a must.
Edit: Also, use the "thinking" commands. think / think hard / think harder / ultrathink are your best friend when attempting complicated changes (of course, if you're attempting complicated changes, don't.)
In order for it not to do useless stuff I need to expend more energy on prompting than writing stuff myself. I find myself getting paranoid about minutia in the prompt, turns of phrase, unintended associations in case it gives shit-tier code because my prompt looked too much like something off experts-exchange or whatever.
What I really want is something like a front-end framework but for LLM prompting, that takes away a lot of the fucking about with generalised stuff like prompt structure, default to best practices for finding something in code, or designing a new feature, or writing tests..
Right now it's not easy prompting claude code (for example) to keep fixing until a test suite passes. It always does some fixed amount of work until it feels it's most of the way there and stops. So I have to babysit to keep telling it that yes I really mean for it to make the tests pass.
1. I don't see the output of the compiler, as in, all I get is an executable blob. It could be inspected, but I don't think that I ever have in my 20+ year career. Maybe I lie and I've rocked up with a Hex editor once or twice, out of pure curiousity, but I've never got past looking for strings that I recognise.
2. When I use Claude, I am using it to do things that I can do, by hand, myself. I am reviewing the code as I go along, and I know what I want it to do because it's what I would be writing myself if I didn't have Claude (or Gemini for that matter).
So, no, I have never congratulated the compiler (or interpreter, linker, assembler, or even the CPU).
Finally, I view the AI as a pairing partner, sometimes it's better than me, sometimes it's not, and I have to be "in the game" in order to make sure I don't end up with a vibe coded mess.
edit: This is from yesterday (Claude had just fixed a bug for me - all I did was paste the block of code that the bug was in, and say "x behaviour but getting y behaviour instead)
perfect, thanks
Edit You're welcome! That was a tricky bug - using rowCount instead of colCount in the index calculation is the kind of subtle error that can be really hard to spot. It's especially sneaky because row 0 worked correctly by accident, making it seem like the logic was mostly right. Glad we got it sorted out! Your Gaps redeal should now work properly with all the 2s (and other correctly placed cards) staying in their proper positions across all rows.
Also, there may be selfish reasons to do this as well: (1) "Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance" https://arxiv.org/abs/2402.14531 (2) "Three Things to Know About Prompting LLMs" https://sloanreview.mit.edu/article/three-things-to-know-abo...
1) Summarize what I think my project currently does
2) Summarize what I think it should do
3) Give a couple of hints about how to do it
4) Watch it iterate a write-compile-test loop until it thinks it's ready
I haven't added any files or instructions anywhere, I just do that loop above. I know of people who put their Claude in YOLO mode on multiple sessions, but for the moment I'm just sitting there watching it.
Example:
"So at the moment, we're connecting to a websocket and subscribing to data, and it works fine, all the parsing tests are working, all good. But I want to connect over multiple sockets and just take whichever one receives the message first, and discard subsequent copies. Maybe you need a module that remembers what sequence number it has seen?"
Claude will then praise my insightful guidance and start making edits.
At some point, it will do something silly, and I will say:
"Why are you doing this with a bunch of Arc<RwLock> things? Let's share state by sharing messages!"
Claude will then apologize profusely and give reasons why I'm so wise, and then build the module in an async way.
I just keep an eye on what it tries, and it's completely changed how I code. For instance, I don't need to be fully concentrated anymore. I can be sitting in a meeting while I tell Claude what to do. Or I can be close to falling asleep, but still be productive.
I don't know if this is a question of the language or what but I just have no good luck with its consistency. And I did invest time into defining various CLAUDE.md files. To no avail.
Does it end in a forever loop for you? I used to have this problem with other models.
But yeah, strongly typed languages, test driven development, and good high quality compiler errors are real game changers for LLM performance. I use Rust for everything now.
Claude code is more user friendly than cursor with its CLI like interface. The file modifications are easy to view and it automatically runs psql, cd, ls , grep command. Output of the commands is shown in more user friendly fashion. Agents and MCPs are easy to organized and used.
Having said the above some level of AI spending is the new reality. Your workplace pays for internet right? Probably a really expensive fast corporate grade connection? Well they now also need to pay for an AI subscription. That's just the current reality.
Aider felt similar when I tried it in architect mode; my prompt would be very short and then I'd chew through thousands of tokens while it planned and thought and found relevant code snippets and etc.
It’s way easier to let the agent code the whole thing if your prompt is good enough than to give instructions bit by bit only because your colleagues cannot review a PR with 50 file changes.
The 50 file changes is most likely unsafe to deploy and unmaintainable.
"Ask the LLM" is a good enough solution to an absurd number of situations. Being open to questioning your approach - or even asking the LLM (with the right context) to question your approach has been valuable in my experience.
But from a more general POV, its something we'll have to spend the next decade figuring out. 'Agile'/scrum & friends is a sort of industry-wide standard approach, and all of that should be rethought - once a bit of the dust settles.
We're so early in the change that I haven't even seen anybody get it wrong, let alone right.
I am sorry, but this is so out of touch with reality. Maybe in the US most companies are willing to allocate you 1000 or 1500 USD/month/engineer, but I am sure that in many countries outside of the US not even a single line (or other type of) manager will allocate you such a budget.
I know for a fact that in countries like Japan you even need to present your arguments for a pizza party :D So that's all you need to know about AI adoption and what's driving it
We'll just keep getting submission after submission talking about how amazing Claude Code is with zero real world examples.
Here's what works for me:
- Detailed claude.md containing overall information about the project.
- Anytime Claude chooses a different route that's not my preferred route - ask my preference to be saved in global memory.
- Detailed planning documentation for each feature - Describe high-level functionality.
- As I develop the feature, add documentation with database schema, sample records, sample JSON responses, API endpoints used, test scripts.
- MCP, MCP, MCP! Playwright is a game changer
The more context you give upfront, the less back-and-forth you need. It's been absolutely transformative for my productivity.
Thank you Claude Code team!
Really simple workflow!
Claude code is amazing at producing code for this stack. It does excellent job at outputting ffmpeg, curl commands, linux shell script etc.
I have written detailed project plan and feature plan in MarkDown - and Claude has no trouble understanding the instructions.
I am curious - what is your usecase?
EDIT: I see, you're asking Claude to modify claude.md to track your preference there, right?
Ask Claude to update the preference and document the moment you realize that claude has deviated away from the path.
I fed Claude a copy of everything I've ever written on Hacker News. Then I asked it to generate an essay that sounds like me.
Out of five paragraphs I had to change one sentence. Everything else sounded exactly as I would have written it.
It was scary good.
I'm not comfortable using it to generate code for this project, but I can absolutely see using it to generate code for a project I'm familiar with in a language I know well.
Is this another case of someone using API keys and not knowing about the claude MAX plans? It's $100 or $200 a month, if you're not pure yolo brute-force vibe coding $100 plan works.
“The future of agentic coding with Claude Code”
I havn't put a huge effort into learning to write prompts but in short, it seems easier to write the code myself than determine prompts. If you don't know every detail ahead of time and ask a slightly off question, the entire result will be garbage.
Abstracting the boilerplate is how you make things easier for future you.
Giving it to an AI to generate just makes the boilerplate more of a problem when there's a change that needs to be made to _all_ the instances of it. Even worse if the boilerplate isn't consistent between copies in the codebase.
furyofantares•3h ago
I notice what worked and what didn't, what was good and what was garbage -- and also how my own opinion of what should be done changed. I have Claude Code help me update the initial prompt, help me update what should have been in the initial context, maybe add some of the bits that looked good to the initial context as well, and then write it all to a file.
Then I revert everything else and start with a totally blank context, except that file. In this session I care about the code, I review it, I am vigilant to not let any slop through. I've been trying for the second session to be the one that's gonna work -- but I'm open to another round or two of this iteration.
soperj•3h ago
bongodongobob•3h ago
shinecantbeseen•3h ago
sfjailbird•2h ago
OK I made up the statistic, but the core idea is true, and it's something that is rarely considered in this debate. At least with code you wrote, you can probably recognize it later when you need to maintain it or just figure out what it does.
adastra22•57m ago
furyofantares•30m ago
furyofantares•3h ago
I think I can also end up with a better result, and having learned more myself. It's just better in a whole host of directions all at once.
I don't end up intimately familiar with the solution however. Which I think is still a major cost.