Ask HN: Why are AI coding agents not working for me?

3•rich_sasha•3w ago

I'm really trying to use them with an open mind. I'm writing detailed specs. On failure, I adjust the initial spec, rather than go down the spiral of asking for many adjustments. I'm using Claude Opus 4.5 inside Cursor. My ambitions are also quite low. The latest was to split a mega Python file into a few submodules according to a pretty simple criterion. It's not even that it failed, it is more about the how. It was trying to action the refactor by writing some Python one-liners to edit the file, in an extremely clumsy way - in many cases failing to write syntactically correct Python.

I'm torn, as I don't want to be an old man luddite shouting at the clouds "LLMs are garbage", and plenty of reasonable people seem to do well with them. But my experience is rather poor. So, maybe I'm holding it wrong?

It's not only failures, to be fair. I found it fairly good at writing a lot of benign code, like tests, simple tools I wouldn't bother with that save me a few mins here and there. But certainly nothing great. Also good at general queries and asking design questions. But not actually doing my job of being a programmer.

Googling the topic mostly yields various grifters' exclusive online courses in no-code get rich quick agents packed with AdWord keywords, or hyper optimised answers about having 100s of stored prompts hypertuned for the latest agent, but hoping for higher quality answers here.

Comments

mahaekoh•3w ago

I’m in the same boat. I’ve been working with a more junior engineer that’s ecstatic about AI coding and I’m slowly settling into the position that, for those of us that have developed tons of opinions about how things should be done, trying to transfer all that experience to an AI through prompting is just not efficient, and I’ve grown comfortable with saying it’s easier for me to do it myself than repeated prompting and adjusting prompts. For a more junior engineer, though, it’s a lot easier to accept what the AI did, and as long as it’s functional, their opinions aren’t strong enough to spark the urge to keep adjusting. Theres just a different utility curve for different people.

Does that mean we’ll get worse (or less opinionated) code over time? Maybe. I used to tell my team that code should be written to be easily understood by maintainers, but if all the maintainers are AI and they don’t care, does it matter?

FWIW, I still reach for Claude once in a while, and I find its response useful maybe one out of ten times, particularly when dealing with code I don’t feel the need to learn or maintain in the long run. But if reviewing Claude’s code requires me to learn the code base properly, often might as well write it myself.

seanmcdirmid•3w ago

I’m in the opposite boat, having trouble instructing my colleagues on how to get the same success with AI coding that I’ve realized. The issue is that you spend effort “working” the AI to get things done, but at the end of it your only artifact is a bunch of CLI commands executed and…how are you going to describe that?

AI instructions for AI coding really need to be their own code somehow, so programmers can more successfully share their experiences.

zahlman•3w ago

> but at the end of it your only artifact is a bunch of CLI commands executed

That sounds like a failure of process. Executing the commands is supposed to result in a Git commit history, and in principle nothing prevents logging the agent session. I'm told that some users even prompt the AI afterwards to summarize what happened in the session, write .md files to record "lessons learned", etc.

seanmcdirmid•3w ago

That isn't what I meant, you could save the entire CLI session, and not have something that can be shared easily. You need to document things like "try this or that, it still isn't very sharable.

enobrev•3w ago

I haven't used cursor, so I'm not sure I can be much help there. I've been mostly using claude code and IntelliJ IDEs for code-reviews when necessary. Over the past year I've moved to almost entirely coding via agent. Maybe my input will be helpful.

One very important thing to keep in mind is context management. Every time your agent reads a file, searches documentation, answers a question, writes a file, or otherwise iterates on a problem, the context will grow. The larger the context gets, the dumber the responses. It will basically start forgetting earlier parts of the conversation. To be explicit about this, I've disabled "auto-compact" in claude code and when I see a warning that it's getting too big, I cut things off, maybe ask the agent to commit, or write a summary, and then /compact or /clear. It's important to figure out the context limits of the model you're using and stay comfortably within them.

Next, I generally treat the agent like a mid-level engineer who answers to me. That is to say, I do not try to convince it to code like I do, instead I treat it like a member on my team. When I'm on a team, we stick to standards and use tools like prettier, etc to keep the code in shape. My personal preferences go out the window, unless there's solid technical reason for others to follow them.

With that out of the way, the general loop is to plan with the agent, spec the work to be done, let the agent do the work, review, and repeat. To start, I converse with the agent directly. I'm not writing a spec, I'm discussing the problem with the agent and asking the agent to write the spec. We review, and discuss, and once our decisions are aligned and documented, I'll ask it to break down how it would implement the plan we've agreed upon.

From there I'll keep the context size in mind. If implementation is a multi-hour endeavor, I'll work with the agent to break down the problem into pieces that should ideally fit into the context window. Otherwise, by this point the agent will have asked me "would you like me to go ahead and get started?" and I'll let it get started

Once it's done, I'll ask it to run lint, typechecks, automated testing, do a code review of what's in the current git workspace, compare the changes to the spec, do my own code reviews, run it myself, whatever is needed to make sure what was written solves the problem.

In general, I'd say it's a bad idea to just let the agent go off on its own with a giant task. It should be iterative and communicative. If the task is too big, it WILL take shortcuts. You can probably get an agent to rewrite your whole codebase with a big fancy prompt and a few markdown files. But if you're not part of the process, there's a good chance it'll create a serious mess.

For what you're doing, I would likely like ask the agent to read the mega python file and explain it to me. Then I would discuss what it missed or got wrong and add additional context and explain what needs to be done. Then I would ask it if it has any suggestions for how we should break it into submodules. If the plan looks good, run with it. If not, explain what you're going for and then ask how it would go about extracting the first submodule. If the plan looks good, ask it to write tests, let it extract the submodule, let it run the tests, review the results, do your own code review, tweak the formatting, Goto 10.

zahlman•3w ago

> Next, I generally treat the agent like a mid-level engineer who answers to me. That is to say, I do not try to convince it to code like I do, instead I treat it like a member on my team. When I'm on a team, we stick to standards and use tools like prettier, etc to keep the code in shape. My personal preferences go out the window, unless there's solid technical reason for others to follow them.

Do you suppose it would work to prompt a separate agent to infer coding style preferences from your commits and then refactor the first agent's work to bring it in line?

KellyCriterion•3w ago

For Claude, I can recommend to put / organize relevant source files of your app into the project context/container.

Im also in the opposite boat: Claude is such a boon, it allowed me to boost productivity. Though, I mainly used it always for single functions which Im integrating into the code base. Id say I have a hitquote of 90%+ on the first prompt. Just yesterday I re-prompted a data visualization component which was developed in the first iteration also with Claude (but Sonnet), I had to do 3 simple prompts to get some heavy optimization which wasnt done in the first iteration.

Also I have to say I like its capability to write useful comments a lot.

rich_sasha•3w ago

Yeah. It's good for tinkering around the edges, IME. But that's a far cry from replacing software developers!

Show HN: A calculus course with an AI tutor watching the lectures with you

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

Show HN: SAA – A minimal shell-as-chat agent using only Bash

Mario Tchou

Does Anyone Even Know What's Happening in Zim?

The last Morse code maritime radio station in North America [video]

Show HN: Hacker Newspaper – Yet another HN front end optimized for mobile

OpenClaw Is Changing My Life

Everything you need to know about lasers in one photo

SCOTUS to decide if 1988 video tape privacy law applies to internet uses

Epstein files reveal deeper ties to scientists than previously known

Red teamers arrested conducting a penetration test

Show HN: Open-source AI powered Kubernetes IDE

Show HN: Lucid – Use LLM hallucination to generate verified software specs

AI Doesn't Write Every Framework Equally Well

Aisbf – an intelligent routing proxy for OpenAI compatible clients

Let's handle 1M requests per second

OpenClaw Partners with VirusTotal for Skill Security

Goal: Ship 1M Lines of Code Daily

Show HN: Codex-mem, 90% fewer tokens for Codex

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

LineageOS 23.2

Crypto Deposit Frauds

Substack makes money from hosting Nazi newsletters

Framing an LLM as a safety researcher changes its language, not its judgement

Are there anyone interested about a creator economy startup

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

2003: What is Google's Ultimate Goal? [video]

Roger Ebert Reviews "The Shawshank Redemption"

Busy Months in KDE Linux