Asking the agent to perform a code review on its own work is surprisingly fruitful.
I do this routinely with its suggestions, usually before I apply them. It is surprising how often Claude immediately dumps on its own last output, talking both of us out if it, and usually with good reasons. I'd like to automate this double-take.So I looked at the code more closely and it was using the React frontend and useEffect instead of a proper game engine. It's also not great at following hook rules and understanding their timing in advance scenarios. So now I'm prompting it to use a proper tick based game engine and rebuilding the game up, doing code reviews. It's going 'slower' now, but it's going much better now.
My goal is to make a Show HN post when I have a good demo.
The system helps you build out a spec first, then uses a few subagents which are tuned for placing files, reviewing for best practice, etc.
I've been using it for about a week and about 70% of my Claude Code usage runs through /feature right now.
The nice thing is you can give it a _lot_ of requests and let it run for 10-15 minutes without interruption. Plus, it makes a set of planning documents before it implements, so you can see exactly what it thought it was changing.
And do it very step by step in what would equate to a tiny PR that gradually roles out the functionality. Too big and I find lots of ugly surprises and bugs and reorganizations that don’t make sense.
Some additional thoughts:
- I like to start with an ideation session with Claude in the web console. I explain the goals of the project, work through high level domain modeling, and break the project down into milestones with a target releasable goal in mind. For a small project, this might be a couple hours of back and forth. The output of this is the first version of CLAUDE.md.
- Then I start the project with Claude Code, have it read my global CLAUDE.md and the project CLAUDE.md and start going. Each session begins this way.
- I have Claude Code update the project CLAUDE.md as it goes. I have it mark its progress through the plan as it goes. Usually, at the end of the session, I will have it rewrite a special section that contains its summary of the project, how it works, and how to navigate the code. I treat this like Claude's long term memory basically. I have found it helps a lot.
- Even with good guidelines, Claude seems to have a tendency to get ahead of itself. I like to keep it focused and build little increments as I would myself if it is something I care about. If its just some one off or prototype, I let it go crazy and churn whatever works.
I’m curious about the tool but I wonder if it requires more significant investment to be a daily driver.
If you're using an AI for the "architecture" / spec phase, play a few of the models off each other.
I will start with a conversation in Cursor (with appropriate context) and ask Gemini 2.5 Pro to ask clarifying questions and then propose a solution, and once I've got something, switch the model to O3 (or your other preferred thinking model of choice - GPT-5 now?). Add the line "please review the previous conversation and critique the design, ask clarifying questions, and proposal alternatives if you think this is the wrong direction."
Do that a few times back and forth and with your own brain input, you should have a pretty robust conversation log and outline of a good solution.
Export that whole conversation into an .md doc, and use THAT in context with Claude Code to actually dive in and start writing code.
You'll still need to review everything and there will still be errors and bad decisions, but overall this has worked surprisingly well and efficiently for me so far.
One tip I picked up from a video recently to avoid sycophancy was to take the resulting spec and instead of telling the reviewing LLM "I wrote this spec", tell it "an engineer on my team wrote this spec". When it doesn't think it's insulting you, it tends to be a bit more critical.
If you happen catch it and you're quick to "esc" and just tell it to find a simpler solution, it's surprisingly great at reconsidering, resolving the issue simply, and picking up where it left off before the blunder.
Claude Code power users, what would you say makes it superior to other agents?
I hated at first that it wasn’t like Cursor, sitting in the IDE. Then I realised I was using Cursor completely differently, using it often for small tasks where it’s only moderately helpful (refactoring, adding small functions, autocompleting)
With Claude I have to stop, think and plan before engaging with it, meaning it delivers much more impactful changes.
Put another way, it demands more from me meaning I treat it with more respect and get more out of it
However, there are a lot of Claude Code clones out there now that are basically the same (Gemini CLI, Codex, now Cursor CLI etc.). Claude still seems to lead the pack, I think? Perhaps it’s some combination of better coding performance due to the underlying LLM (usually Sonnet 4) being fine-tuned on the agent tool calls, plus Claude is just a little more mature in terms of configuration options etc.?
Gemini's been very quick to dive in and start changing things, even when I don't want it to. But those changes almost always fall short of what I'm after. They don't run or they leave failing tests, and when I ask it to fix the tests or the underlying issue, it churns without success. Claude is significantly slower and definitely not right all the time, but it seems to do a better job of stepping through a problem and resolving it well enough, while also improving results when I interject.
Meanwhile Gemini got itself stuck in a loop of compile/fail/try to fix/compile/fail again. Eventually it just gave up and said "I'm not able to figure this out". It does seem to have a kind of self-esteem problem in these scenarios, whereas Claude is more bullish on itself (maybe not always a good thing).
Claude seems to be the best at getting something that actually works. I do think Gemini will end up being tough competition, if nothing else because of the price, but Google really need a bit of a quality push on it. A free AI agent is worthless if it can't solve anything for me.
And I don't think we have a great eval benchmark that exactly measures this capability yet. SWE Bench seems to be pretty good, but there's already a lot of anecdotal comments that Claude is still better at coding than GPT 5, despite having similar scores on SWE Bench.
I started with an idea but no spec. I got it to a happy place I can deploy yesterday. Spent around $75 on tokens. It was starting to feel expensive towards the end.
I did wonder if I had started with a clearer specification could I have got there quicker and for less money.
The thing is though, looking back at the conversations I had with it, the back and forth (vibe coding I guess) helped me refine what I was actually after so in two minds if a proper tight specification upfront would have been the best thing.
At the end of each phase, I ask claude to update my implementation plan with new context for a new instance of claude to pick it up. This way it propagates context forward, and then I can clear the context window to start fresh on the next phase.
I personally really like to use Claude Code together with Zen MCP https://github.com/BeehiveInnovations/zen-mcp-server to analyse existing and review fresh code with additional eyes from Gpt5 and Gemini.
aosaigh•1h ago
As mentioned in the article, the big trick is having clear specs. In my case I sat down for 2 hours and wrote a 12 step document on how I would implement this (along with background information). Claude went through step by step and wrote the code. I imagine this saved me probably 6-10 hours. I’m now reviewing and am going to test etc. and start adjusting and adding future functionality.
Its success was rooted in the fact I knew exactly how to do what it needed to do. I wrote out all the steps and it just followed my lead.
It makes it clear to me that mid and senior developers aren’t going anywhere.
That said, it was amazing to just see it go through the requirements and implement modules full of organised documented code that I didn’t have to write.
philipwhiuk•1h ago
aosaigh•1h ago
danielbln•1h ago
aosaigh•1h ago
A few times while writing the doc I had to go back and update the previous steps to add missing features.
Also I knew when to stop. It’s not fully finished yet. There are additional stages I need to implement. But as an experienced developer, I knew when I had enough for “core functionalty” that was well defined.
What worries me is how do you become a good developer if AI is writing it all?
One of my strengths as a developer is understanding the problem and breaking it down into steps, creating requirements documents like I’ve discussed.
But that’s a hard-earned skill from years of client work where I wrote the code. I have a huge step up in getting the most from these agents now.
danielbln•18m ago
closewith•1h ago
The downside that Agile sought to remedy was inflexibility, which is an issue greatly ameliorated by coding agents.
danielbln•19m ago
esafak•1h ago
mft_•1h ago
bongodongobob•1h ago
Step 1: back and forth chat about the functionality we want. What do we want it to do? What are the inputs and outputs? Then generate a spec/requirements sheet.
Step 2: identify what language, technologies, frameworks to use to accomplish the goal. Generate a technical spec.
Step 3: architecture. Get a layout of the different files that need to be created and a general outline of what each will do.
Step 4: combine your docs and tell it to write the code.
jamesponddotco•1h ago
1. I use the Socratic Coder[1] system prompt to have a back and forth conversation about the idea, which helps me hone the idea and improve it. This conversation forces me to think about several aspects of the idea and how to implement it.
2. I use the Brainstorm Specification[2] user prompt to turn that conversation into a specification.
3. I use the Brainstorm Critique[3] user prompt to critique that specification and find flaws in it which I might have missed.
4. I use a modified version of the Brainstorm Specification user prompt to refine the specification based on the critique and have a final version of the document, which I can either use on my own or feed to something like Claude Code for context.
Doing those things improved the quality of the code and work spit out by the LLMs I use by a significant amount, but more importantly, it helped me write much better code on my own because I know have something to guide me, while before I used to go blind.
As a bonus, it also helped me decide if an idea was worth it or not; there are times I'm talking with the LLM and it asks me questions I don't feel like answering, which tells me I'm probably not into that idea as much as I initially thought, it was just my ADHD hyper focusing on something.
[1]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
[2]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
[3]: https://github.com/jamesponddotco/llm-prompts/blob/trunk/dat...
time0ut•1h ago
miroljub•1h ago
Small side remark, but what is the value added of the AI generated documentation for the AI generated code. It's just a burden that increases context size whenever AI needs to re-analyse or change the existing code. It's not like any human is ever going to read the code docs, when he can just ask AI what it is about.
nisegami•1h ago
infecto•1h ago
felixgallo•1h ago
manwe150•47m ago
Der_Einzige•33m ago
felixgallo•11m ago
aosaigh•1h ago
weego•38m ago
I try to prompt-enforce no line by line documentation, but encourage function/class/module level documentation that will help future developers/AI coding agents. Humans are generally better, but AI sometimes needs a help to stop it not understanding a piece of code's context and just writing it's own new function that does the same thing
spyckie2•1h ago
aosaigh•1h ago
That said, I think that the differing UIs of Cursor (in the IDE) and Claude (in the CLI) fundamentally change how you approach problems with them.
Cursor is “too available”. It’s right there and you can be lazy and just ask it anything.
Claude nudges you to think more deeply and construct longer prompts before engaging with it.
That my experience anyway
danielbln•22m ago
camel_gopher•1h ago
dewey•38m ago
It puts you in a different mind space to sit down and think about it instead of iterating too much and in the end feeling productive while actually not achieving much and going mostly in circles.
sillyfluke•25m ago
The parent wrote:
>I imagine this saved me probably 6-10 hours. I’m now reviewing and am going to test etc.
Guessing the time saved prior to reviewing and testing seems premature fron my end.