frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Why are AI coding agents not working for me?

3•rich_sasha•3w ago
I'm really trying to use them with an open mind. I'm writing detailed specs. On failure, I adjust the initial spec, rather than go down the spiral of asking for many adjustments. I'm using Claude Opus 4.5 inside Cursor. My ambitions are also quite low. The latest was to split a mega Python file into a few submodules according to a pretty simple criterion. It's not even that it failed, it is more about the how. It was trying to action the refactor by writing some Python one-liners to edit the file, in an extremely clumsy way - in many cases failing to write syntactically correct Python.

I'm torn, as I don't want to be an old man luddite shouting at the clouds "LLMs are garbage", and plenty of reasonable people seem to do well with them. But my experience is rather poor. So, maybe I'm holding it wrong?

It's not only failures, to be fair. I found it fairly good at writing a lot of benign code, like tests, simple tools I wouldn't bother with that save me a few mins here and there. But certainly nothing great. Also good at general queries and asking design questions. But not actually doing my job of being a programmer.

Googling the topic mostly yields various grifters' exclusive online courses in no-code get rich quick agents packed with AdWord keywords, or hyper optimised answers about having 100s of stored prompts hypertuned for the latest agent, but hoping for higher quality answers here.

Comments

mahaekoh•3w ago
I’m in the same boat. I’ve been working with a more junior engineer that’s ecstatic about AI coding and I’m slowly settling into the position that, for those of us that have developed tons of opinions about how things should be done, trying to transfer all that experience to an AI through prompting is just not efficient, and I’ve grown comfortable with saying it’s easier for me to do it myself than repeated prompting and adjusting prompts. For a more junior engineer, though, it’s a lot easier to accept what the AI did, and as long as it’s functional, their opinions aren’t strong enough to spark the urge to keep adjusting. Theres just a different utility curve for different people.

Does that mean we’ll get worse (or less opinionated) code over time? Maybe. I used to tell my team that code should be written to be easily understood by maintainers, but if all the maintainers are AI and they don’t care, does it matter?

FWIW, I still reach for Claude once in a while, and I find its response useful maybe one out of ten times, particularly when dealing with code I don’t feel the need to learn or maintain in the long run. But if reviewing Claude’s code requires me to learn the code base properly, often might as well write it myself.

seanmcdirmid•3w ago
I’m in the opposite boat, having trouble instructing my colleagues on how to get the same success with AI coding that I’ve realized. The issue is that you spend effort “working” the AI to get things done, but at the end of it your only artifact is a bunch of CLI commands executed and…how are you going to describe that?

AI instructions for AI coding really need to be their own code somehow, so programmers can more successfully share their experiences.

zahlman•3w ago
> but at the end of it your only artifact is a bunch of CLI commands executed

That sounds like a failure of process. Executing the commands is supposed to result in a Git commit history, and in principle nothing prevents logging the agent session. I'm told that some users even prompt the AI afterwards to summarize what happened in the session, write .md files to record "lessons learned", etc.

seanmcdirmid•3w ago
That isn't what I meant, you could save the entire CLI session, and not have something that can be shared easily. You need to document things like "try this or that, it still isn't very sharable.
enobrev•3w ago
I haven't used cursor, so I'm not sure I can be much help there. I've been mostly using claude code and IntelliJ IDEs for code-reviews when necessary. Over the past year I've moved to almost entirely coding via agent. Maybe my input will be helpful.

One very important thing to keep in mind is context management. Every time your agent reads a file, searches documentation, answers a question, writes a file, or otherwise iterates on a problem, the context will grow. The larger the context gets, the dumber the responses. It will basically start forgetting earlier parts of the conversation. To be explicit about this, I've disabled "auto-compact" in claude code and when I see a warning that it's getting too big, I cut things off, maybe ask the agent to commit, or write a summary, and then /compact or /clear. It's important to figure out the context limits of the model you're using and stay comfortably within them.

Next, I generally treat the agent like a mid-level engineer who answers to me. That is to say, I do not try to convince it to code like I do, instead I treat it like a member on my team. When I'm on a team, we stick to standards and use tools like prettier, etc to keep the code in shape. My personal preferences go out the window, unless there's solid technical reason for others to follow them.

With that out of the way, the general loop is to plan with the agent, spec the work to be done, let the agent do the work, review, and repeat. To start, I converse with the agent directly. I'm not writing a spec, I'm discussing the problem with the agent and asking the agent to write the spec. We review, and discuss, and once our decisions are aligned and documented, I'll ask it to break down how it would implement the plan we've agreed upon.

From there I'll keep the context size in mind. If implementation is a multi-hour endeavor, I'll work with the agent to break down the problem into pieces that should ideally fit into the context window. Otherwise, by this point the agent will have asked me "would you like me to go ahead and get started?" and I'll let it get started

Once it's done, I'll ask it to run lint, typechecks, automated testing, do a code review of what's in the current git workspace, compare the changes to the spec, do my own code reviews, run it myself, whatever is needed to make sure what was written solves the problem.

In general, I'd say it's a bad idea to just let the agent go off on its own with a giant task. It should be iterative and communicative. If the task is too big, it WILL take shortcuts. You can probably get an agent to rewrite your whole codebase with a big fancy prompt and a few markdown files. But if you're not part of the process, there's a good chance it'll create a serious mess.

For what you're doing, I would likely like ask the agent to read the mega python file and explain it to me. Then I would discuss what it missed or got wrong and add additional context and explain what needs to be done. Then I would ask it if it has any suggestions for how we should break it into submodules. If the plan looks good, run with it. If not, explain what you're going for and then ask how it would go about extracting the first submodule. If the plan looks good, ask it to write tests, let it extract the submodule, let it run the tests, review the results, do your own code review, tweak the formatting, Goto 10.

zahlman•3w ago
> Next, I generally treat the agent like a mid-level engineer who answers to me. That is to say, I do not try to convince it to code like I do, instead I treat it like a member on my team. When I'm on a team, we stick to standards and use tools like prettier, etc to keep the code in shape. My personal preferences go out the window, unless there's solid technical reason for others to follow them.

Do you suppose it would work to prompt a separate agent to infer coding style preferences from your commits and then refactor the first agent's work to bring it in line?

KellyCriterion•3w ago
For Claude, I can recommend to put / organize relevant source files of your app into the project context/container.

Im also in the opposite boat: Claude is such a boon, it allowed me to boost productivity. Though, I mainly used it always for single functions which Im integrating into the code base. Id say I have a hitquote of 90%+ on the first prompt. Just yesterday I re-prompted a data visualization component which was developed in the first iteration also with Claude (but Sonnet), I had to do 3 simple prompts to get some heavy optimization which wasnt done in the first iteration.

Also I have to say I like its capability to write useful comments a lot.

rich_sasha•3w ago
Yeah. It's good for tinkering around the edges, IME. But that's a far cry from replacing software developers!

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

https://iocombats.com/blogs/micro-frontends-in-2026
1•ghazikhan205•1m ago•0 comments

Japanese rice is the most expensive in the world

https://www.cnn.com/2026/02/07/travel/this-is-the-worlds-most-expensive-rice-but-what-does-it-tas...
1•mooreds•1m ago•0 comments

These White-Collar Workers Actually Made the Switch to a Trade

https://www.wsj.com/lifestyle/careers/white-collar-mid-career-trades-caca4b5f
1•impish9208•1m ago•1 comments

The Wonder Drug That's Plaguing Sports

https://www.nytimes.com/2026/02/02/us/ostarine-olympics-doping.html
1•mooreds•1m ago•0 comments

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

https://new.knife.day/blog/reddit-steel-sentiment-analysis
1•p-s-v•2m ago•0 comments

Federated Credential Management (FedCM)

https://ciamweekly.substack.com/p/federated-credential-management-fedcm
1•mooreds•2m ago•0 comments

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

https://app.writtte.com/read/kZ8Kj6R
1•lasgawe•2m ago•1 comments

The Story of Heroku (2022)

https://leerob.com/heroku
1•tosh•3m ago•0 comments

Obey the Testing Goat

https://www.obeythetestinggoat.com/
1•mkl95•3m ago•0 comments

Claude Opus 4.6 extends LLM pareto frontier

https://michaelshi.me/pareto/
1•mikeshi42•4m ago•0 comments

Brute Force Colors (2022)

https://arnaud-carre.github.io/2022-12-30-amiga-ham/
1•erickhill•7m ago•0 comments

Google Translate apparently vulnerable to prompt injection

https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-ba...
1•julkali•7m ago•0 comments

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

https://bsky.app/profile/fullmoon.id/post/3meadfaulhk2s
1•todsacerdoti•8m ago•0 comments

Software development is undergoing a Renaissance in front of our eyes

https://twitter.com/gdb/status/2019566641491963946
1•tosh•8m ago•0 comments

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

https://tryward.app/aiquiz
1•bennydog224•9m ago•1 comments

Spec-Driven Design with Kiro: Lessons from Seddle

https://medium.com/@dustin_44710/spec-driven-design-with-kiro-lessons-from-seddle-9320ef18a61f
1•nslog•9m ago•0 comments

Agents need good developer experience too

https://modal.com/blog/agents-devex
1•birdculture•11m ago•0 comments

The Dark Factory

https://twitter.com/i/status/2020161285376082326
1•Ozzie_osman•11m ago•0 comments

Free data transfer out to internet when moving out of AWS (2024)

https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-internet-when-moving-out-of-aws/
1•tosh•12m ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•alwillis•13m ago•0 comments

Prejudice Against Leprosy

https://text.npr.org/g-s1-108321
1•hi41•14m ago•0 comments

Slint: Cross Platform UI Library

https://slint.dev/
1•Palmik•18m ago•0 comments

AI and Education: Generative AI and the Future of Critical Thinking

https://www.youtube.com/watch?v=k7PvscqGD24
1•nyc111•18m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•19m ago•0 comments

Moltbook isn't real but it can still hurt you

https://12gramsofcarbon.com/p/tech-things-moltbook-isnt-real-but
1•theahura•23m ago•0 comments

Take Back the Em Dash–and Your Voice

https://spin.atomicobject.com/take-back-em-dash/
1•ingve•23m ago•0 comments

Show HN: 289x speedup over MLP using Spectral Graphs

https://zenodo.org/login/?next=%2Fme%2Fuploads%3Fq%3D%26f%3Dshared_with_me%25253Afalse%26l%3Dlist...
1•andrespi•24m ago•0 comments

Teaching Mathematics

https://www.karlin.mff.cuni.cz/~spurny/doc/articles/arnold.htm
2•samuel246•27m ago•0 comments

3D Printed Microfluidic Multiplexing [video]

https://www.youtube.com/watch?v=VZ2ZcOzLnGg
2•downboots•27m ago•0 comments

Abstractions Are in the Eye of the Beholder

https://software.rajivprab.com/2019/08/29/abstractions-are-in-the-eye-of-the-beholder/
2•whack•27m ago•0 comments