frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Do you have any evidence that agentic coding works?

61•terabytest•3h ago
I've been trying to get agentic coding to work, but the dissonance between what I'm seeing online and what I'm able to achieve is doing my head in.

Is there real evidence, beyond hype, that agentic coding produces net-positive results? If any of you have actually got it to work, could you share (in detail) how you did it?

By "getting it to work" I mean: * creating more value than technical debt, and * producing code that’s structurally sound enough for someone responsible for the architecture to sign off on.

Lately I’ve seen a push toward minimal or nonexistent code review, with the claim that we should move from “validating architecture” to “validating behavior.” In practice, this seems to mean: don’t look at the code; if tests and CI pass, ship it. I can’t see how this holds up long-term. My expectation is that you end up with "spaghetti" code that works on the happy path but accumulates subtle, hard-to-debug failures over time.

When I tried using Codex on my existing codebases, with or without guardrails, half of my time went into fixing the subtle mistakes it made or the duplication it introduced.

Last weekend I tried building an iOS app for pet feeding reminders from scratch. I instructed Codex to research and propose an architectural blueprint for SwiftUI first. Then, I worked with it to write a spec describing what should be implemented and how.

The first implementation pass was surprisingly good, although it had a number of bugs. Things went downhill fast, however. I spent the rest of my weekend getting Codex to make things work, fix bugs without introducing new ones, and research best practices instead of making stuff up. Although I made it record new guidelines and guardrails as I found them, things didn't improve. In the end I just gave up.

I personally can't accept shipping unreviewed code. It feels wrong. The product has to work, but the code must also be high-quality.

Comments

damnitbuilds•3h ago
You are asking two very different questions here.

i.e. You are asking a question about whether using agents to write code is net-positive, and then you go on about not reviewing the code agents produce.

I suspect agents are often net-positive AND one has to review their code. Just like most people's code.

spolitry•1h ago
It seems that people feel code review is a cost, but time spent writing code is not a cost because it feels productive.
swiftcoder•38m ago
I don't think that's quite it - review is a recurring cost which you pay on every new PR, whereas writing code is a cost you pay once.

If you are continually accumulating technical debt due to an over-enthusiastic junior developer (or agent) churning out a lot of poorly-conceived code, then the recurring costs will sink you in the long run

proc0•2h ago
My experience is the same. In short, agents cannot plan ahead, or plan at a high level. This means they have a blindspot for design. Since they cannot design properly, it limits the kind of projects that are viable to smaller scopes (not sure exactly how small but in my experience, extremely small and simple). Anything that exceeds this abstract threshold has a good chance of being a net negative, with most of the code being unmantainable, unextensible, and unreliable.

Anyone who claims AI is great is not building a large or complex enough app, and when it works for their small project, they extrapolate to all possibilities. So because their example was generated from a prompt, it's incorrectly assumed that any prompt will also work. That doesn't necessarily follow.

The reality is that programming is widely underestimated. The perception is that it's just syntax on a text file, but it's really more like a giant abstract machine with moving parts. If you don't see the giant machine with moving parts, chances are you are not going to build good software. For AI to do this, it would require strong reasoning capabilities, that lets it derive logical structures, along with long term planning and simulation of this abstract machine. I predict that if AI can do this then it will be able to do every single other job, including physical jobs as it would be able to reason within a robotic body in the physical world.

To summarize, people are underestimating programming, using their simple projects to incorrectly extrapolate to any possible prompt, and missing the hard part of programming which involves building abstract machines that work on first principles and mathematical logic.

linsomniac•1h ago
>Anyone who claims AI is great is not building a large or complex enough app

I can't speak for everyone, but lots of us fully understand that the AI tooling has limitations and realize there's a LOT of work that can be done within those limitations. Also, those limitations are expanding, so it's good to experiment to find out where they are.

Conversely, it seems like a lot of people are saying that AI is worthless because it can't build arbitrarily large apps.

I've recently used the AI tooling to make a docusign-like service and it did a fairly good job of it, requiring about a days worth of my attention. That's not an amazingly complex app, but it's not nothing either. Ditto for a calorie tracking web app. Not the most complex app, but companies are making legit money off them, if you want a tangible measure of "worth".

antonvs•55m ago
> Anyone who claims AI is great is not building a large or complex enough app

That might be true for agentic coding (caveat below), but AI in the hands of expert users can be very useful - "great" - in building large and complex apps. It's just that it has to be guided and reviewed by the human expert.

As for agentic coding, it may depend on the app. For example, Steve Yegge's "beads" system is over a quarter million lines of allegedly vibe-coded Go code. But developing a CLI like that may be a sweet spot for LLMs, it doesn't have all the messiness of typical business system requirements.

znsksjjs•7m ago
> For example, Steve Yegge's "beads" system is over a quarter million lines of allegedly vibe-coded Go code. But developing a CLI like that may be a sweet spot

Is that really a success? I was just reading an article talking about how sloppy and poorly implemented it is: https://lucumr.pocoo.org/2026/1/18/agent-psychosis/

I guess it depends on what you’re looking to get out of it.

edude03•1h ago
I have the same experience despite using claude every day. As an funny anecdote:

Someone I know wrote the code and the unit tests for a new feature with an agent. The code was subtly wrong, fine, it happens, but worse the 30 or so tests they added added 10 minutes to the test run time and they all essentially amounted to `expect(true).to.be(true)` because the LLM had worked around the code not working in the tests

antonvs•1h ago
> they all essentially amounted to `expect(true).to.be(true)` because the LLM had worked around the code not working in the tests

A very human solution

monooso•48m ago
There was an article on HN last week (?) which described this exact behaviour in the newer models.

Older, less "capable", models would fail to accomplish a task. Newer models would cheat, and provide a worthless but apparently functional solution.

Hopefully someone with a larger context window than myself can recall the article in question.

SatvikBeri•15m ago
I think that article was basically wrong. They asked the agent not to provide any commentary, then gave an unsolvable task, and wanted the agent to state that the task was impossible. So they were basically testing which instructions the agent would refuse to follow.

Purely anecdotally, I've found agents have gotten much better at asking clarifying questions, stating that two requirements are incompatible and asking which one to change, and so on.

https://spectrum.ieee.org/ai-coding-degrades

jermaustin1•41m ago
This happens with me every time I try to get claude to write tests. I've given up on it. Instead I will write the tests if I really care enough to have tests.
sReinwald•13m ago
From my experience: TDD helps here - write (or have AI write) tests first, review them as the spec, then let it implement.

But when I use Claude code, I also supervise it somewhat closely. I don't let it go wild, and if it starts to make changes to existing tests it better have a damn good reason or it gets the hose again.

The failure mode here is letting the AI manage both the implementation and the testing. May as well ask high schoolers to grade their own exams. Everyone got an A+, how surprising!

linesofcode•1h ago
When you first began learning how to program were you building and shipping apps the next day? No.

Agentic programming is a skill-set and a muscle you need to develop just like you did with coding in the past.

Things didn’t just suddenly go downhill after an arbitrary tipping point - what happened is you hit a knowledge gap in the tooling and gave up.

Reflect on what went wrong and use that knowledge next time you work with the agent.

For example, investing the time in building a strong test suite and testing strategy ahead of time which both you and the agent can rely on.

Being able to manage the agent and getting quality results on a large, complex codebase is a skill in itself, it won’t happen over night.

It takes practice and repetition with these tools to level-up, just like any thing else.

terabytest•51m ago
Your point is fair, but it rests on a major assumption I'd question: that the only limit lies with the user, and the tooling itself has none. What if it’s more like “you can’t squeeze blood from a stone”? That is, agentic coding may simply have no greater potential than what I've already tried. To be fair I haven't gone all the way in trying to make it work but, even if some minor workarounds exist, the full promise being hyped might not be realistically attainable.
linesofcode•40m ago
How can one judge potential without fully understanding or having used it to its full potential?

I don’t think agentic programming is some promised land of instant code without bugs.

It’s just a force multiplier for what you can do.

koakuma-chan•1h ago
> The product has to work, but the code must also be high-quality.

I think in most cases the speed at which AI can produce code outweighs technical debt, etc.

Gazoche•1h ago
But the thing with debt is that it has to be paid eventually.
cdelsolar•1h ago
not if you get acquired
Boxxed•57m ago
Is your argument that it's now someone else's problem? That it must be paid, just by someone else? Thanks, I hate it.
koakuma-chan•36m ago
You will probably be able to just keep throwing AI at it in the coming years, as memory systems improve, if not already.
pzo•41m ago
project also have to be paid off financially. We have been there before - startup used to go fast and break things so that once MVP is validated they slow down and fix things or even rewrite to new tech/architecture. No you can validate idea even faster with AI. And probably there is a lot of code that you write for one time or throw away internal tools etc.
stavros•1h ago
Yes, agentic coding works and has massive value. No, you can't just deploy code unreviewed.

Still takes much less time for me to review the plan and output than write the code myself.

znsksjjs•11m ago
> much less time for me to review the plan and output

So typing was a bottleneck for you? I’ve only found this true when I’m a novice in an area. Once I’m experienced, typing is an inconsequential amount of time. Understanding the theory of mind that composes the system is easily the largest time sink in my day to day.

sirwhinesalot•1h ago
The only approach I've tried that seems to work reasonably well, and consistently, was the following:

Make a commit.

Give Claude a task that's not particularly open ended, the closer to pure "monkey work" boilerplate nonsense the task is, the better (which is also the sort of code I don't want do deal with myself).

Preferably it should be something that only touches a file or two in the codebase unless it is a trivial refactor (like changing the same method call all over the place)

Make sure it is set to planning mode and let it come up with a plan.

Review the plan.

Let it implement the plan.

If it works, great, move on to review. I've seen it one-shot some pretty annoying tasks like porting code from one platform to another.

If there are obvious mistakes (program doesn't build, tests don't pass, etc.) then a few more iterations usually fix the issue.

If there are subtle mistakes, make a branch and have it try again. If it fails, then this is beyond what it can do, abort the branch and solve the issue myself.

Review and cleanup the code it wrote, it's usually a lot messier than it needs to be. This also allows me to take ownership of the code. I now know what it does and how it works.

I don't bother giving it guidelines or guardrails or anything of the sort, it can't follow them reliably. Even something as simple as "This project uses CMake, build it like this" was repeatedly ignored as it kept trying to invoke the makefile directly and in the wrong folder.

This doesn't save me all that much time since the review and cleanup can take long, but it serves a great unblocker.

I also use it as a rubber duck that can talk back and documentation source. It's pretty good for that.

This idea of having an army of agents all working together on the codebase is hilarious to me. Replace "agents" with "juniors I hired on fiverr with anterograde amnesia" and it's about how well it goes.

laylower•18m ago
That's the way.
lostmsu•1h ago
It works in the sense that there are lots of professional (as in they earn money from software engineering) developers out there who do the work of exactly same quality. I would even bet they are the majority (or at least were prior to late 2024).
afavour•1h ago
I’ve heard coding agents best described as a fleet of junior developers available to you 24/7 and I think that’s about right. With the added downside that they don’t really learn as they go so they will forever be junior developers (until models get better).

There are projects where throwing a dozen junior developers at the problem can work but they’re very basic CRUD type things.

PlatoIsADisease•1h ago
Since we are on this topic, how would I make an agent that does this job:

I am writing an automation software that interfaces with a legacy windows CAD program. Depending on the automation, I just need a picture of the part. Sometimes I need part thickness. Sometimes I need to delete parts. Etc... Its very much interacting with the CAD system and checking the CAD file or output for desired results.

I was considering something that would take screenshots and send it back for checks. Not sure what platforms can do this. I am stumped how Visual Studio works with this, there are a bunch of pieces like servers, agents, etc...

Even a how-to link would work for me. I imagine this would be extremely custom.

WillAdams•59m ago
What controls the legacy CAD app? Are you using AutoLISP? or VB scripting? Or something else?
recroad•1h ago
Works pretty great for me, especially Spec-driven development using OpenSpec

- Cleaner code - Easily 5x speed minimum - Better docs, designs - Focus more on the product than than the mechanics - More time for family

kitd•17m ago
Really interested in your workflow using OpenSpec. How do you start off a project with it? And what does a typical code change look like?
jorgeleo•1h ago
I did the same experiment as you, and this is what I learned:

https://www.linkedin.com/pulse/concrete-vibe-coding-jorge-va...

The bottom line is this:

* The developer stop been a developer, and becomes a product designer with high technical skills.

  * This is a different set of skills than than a developer or a product owner currently have. It is a mix of both, and the expectations of how agentic development works need to be adjusted.
* Agents will behave like junior developers, they can type very fast, and produce something that has a high probability to work. They priority will be to make it work, not maintainability, scalability, etc. Agents can achieve that if you detail how to produce it.

  * The working with an agent feels more like mentoring the AI than ask and receive.
* When I start to work on a product that will be vibe coded, I need to have clear in my head all the user stories, code architecture, the whole system, then I can start to tell the agent what to build, and correct and annotate in the md files the code quality decisions so it remembers them.

* Use TDD, ask the agent to create the tests, and then code to the test. Don't correct the bugs, make the agent correct them and explain why that is a bug, specially with code design decisions. Store those in AGENTS.md file at the root of the project.

There are more things that can be done to guide the agent, but I need to have clear in an articulable way the direction of the coding. On the other side, I don't worry about implementation details like how to use libraries and APIs that I am not familiar with, the agent just writes and I test.

Currently I am working on a product and I can tell you, working no more than 10 hours a week (2 hours here, 3 there, leave the agent working while I am having dinner with family) I am progressing at I would say 5 to 10 times faster than without it. So, yeah it works, but I had to adjust how I do my job.

lukebechtel•1h ago
1. Start with a plan. Get AI to help you make it, and edit.

2. Part of the plan should be automated tests. AI can make these for you too, but you should spot check for reasonable behavior.

3. Use Claude 4.5 Opus

4. Use Git, get the AI to check in its work in meaningful chunks, on its own git branch.

5. Ask the AI to keep am append-only developer log as a markdown file, and to update it whenever its state significantly changes, or it makes a large discovery, or it is "surprised" by anything.

baal80spam•55m ago
> Use Claude 4.5 Opus

In my org we are experimenting with agentic flows, and we've noticed that model choice matters especially for autonomy.

GPT-5.2 performed much better for long-running tasks. It stayed focused, followed instructions, and completed work more reliably.

Opus 4.5 tended to stop earlier and take shortcuts to hand control back sooner.

vessenes•59m ago
Yep, it works. Like anything getting the most out of these tools is its own (human) skill.

With that in mind, a couple of comments - think of the coding agents as personalities with blind spots. A code review by all of them and a synthesis step is a good idea. In fact currently popular is the “rule of 5” which suggests you need the LLM to review five times, and to vary the level of review, e.g. bugs, architecture, structure, etc. Anecdotally, I find this is extremely effective.

Right now, Claude is in my opinion the best coding agent out there. With Claude code, the best harnesses are starting to automate the review / PR process a bit, but the hand holding around bugs is real.

I also really like Yegge’s beads for LLMs keeping state and track of what they’re doing — upshot, I suggest you install beads, load Claude, run ‘!bd prime’ and say “Give me a full, thorough code review for all sorts of bugs, architecture, incorrect tests, specification, usability, code bugs, plus anything else you see, and write out beads based on your findings.” Then you could have Claude (or codex) work through them. But you’ll probably find a fresh eye will save time, e.g. give Claude a try for a day.

Your ‘duplicated code’ complaint is likely an artifact of how codex interacts with your codebase - codex in particular likes to load smaller chunks of code in to do work, and sometimes it can get too little context. You can always just cat the relevant files right into the context, which can be helpful.

Finally, iOS is a tough target — I’d expect a few more bumps. The vast bulk of iOS apps are not up on GitHub, so there’s less facility in the coding models.

And any front end work doesn’t really have good native visual harnesses set up, (although Claude has the Claude chrome extension for web UIs). So there’s going to be more back and forth.

Anyway - if you’re a career engineer, I’d tell you - learn this stuff. It’s going to be how you work in very short order. If you’re a hobbyist, have a good time and do whatever you want.

CjHuber•51m ago
I still don't get what beads needs a daemon for, or a db. After a while of using 'bd --no-daemon --no-db' I was sick of it and switched to beans and my agents seem to be able to make use of it much better, on the one hand its directly editable by them as its just markdown, on the other hand the CLI still gives them structure and makes the thing queryable
traceroute66•57m ago
I am in the same boat as you.

The only positive antigenic coding experience I had was using it as a "translator" from some old unmaintained shell + C code to Go.

I gave it the old code, told it to translate to Go. I pre-installed a compiled C binary and told it to validate its work using interop tests.

It took about four hours of what the vibecoding lovers call "prompt engineering" but at the end I have to admit it did give me a pretty decent "translation".

However for everything else I have tried (and yes, vibecoders, "tried" means very tightly defined tasks) all I have ever got is over-engineered vibecoding slop.

The worst part of of it is that because the typical cut-off window is anywhere between 6–18 months prior, you get slop that is full of deprecated code because there is almost always a newer/more efficient way to do things. Even in languages like Go. The difference between an AI-slop answer for Go 1.20 and a human coded Go 1.24/1.25 one can be substantial.

3vidence•56m ago
Googler opinions are my own.

If agentic coding worked as well as people claimed on large codebases I would be seeing a massive shift at my Job... Im really not seeing it.

We have access to pretty much all the latest and greatest internally at no cost and it still seems the majority of code is still written and reviewed by people.

AI assisted coding has been a huge help to everyone but straight up agentic coding seems like it does not scale to these very large codebases. You need to keep it on the rails ALL THE TIME.

strange_quark•4m ago
Yup, same experience here at a much smaller company. Despite management pushing AI coding really hard for at least 6 months and having unlimited access to every popular model and tool, most code still seems to be produced and reviewed by humans.

I still mostly write my own code and I’ve seen our claude code usage and me just asking it questions and generating occasional boilerplate and one-off scripts puts me in the top quartile of users. There are some people who are all in and have it write everything for them but it doesn’t seem like there’s any evidence they’re more productive.

saikatsg•56m ago
> Scaling long-running autonomous coding https://news.ycombinator.com/item?id=46624541
rzmmm•47m ago
I'm not sure OP is looking for evidence like this. There are many optimistic articles from people or organizations who are selling AI products, AI courses, or AI newsletters.
terabytest•41m ago
This is exactly the issue I have with what I'm seeing around: lots of "here's something impressive we did" but nearly nothing in terms of how it was actually achieved in clear, reproducible detail.
PaulHoule•53m ago
Treat it as a pair programmer. Ask it questions like "How do I?", "When I do X, Y happens, why is that?", "I think Z, prove me wrong" or "I want to do P, how do you think we should do it?"

Feed it little tasks (30 s-5 min) and if you don't like this or that about the code it gives you either tell it something like

   Rewrite the selection so it uses const, ? and :
or edit something yourself and say

   I edited what you wrote to make it my own,  what do you think about my changes?
If you want to use it as a junior dev who gets sent off to do tickets and comes back with a patch three days later that will fail code review be my guest, but I greatly enjoy working with a tight feedback loop.
highspeedbus•53m ago
Honestly, I only use coding agents when I feel too lazy to type lots of boilerplate code.

As in "Please write just this one for me". Even still, I take care to review each line produced. The key is making small changes at a time.

Otherwise, I type out and think about everything being done when in ‘Flow State’. I don't like the feeling of vibe coding for long periods. It completely changes the way work is done, it takes away agency.

On a bit of a tangent, I can't get in Flow State when using agents. At least not as we usually define it.

fotcorn•50m ago
I used Claude Opus 4.5 inside Cursor to write RISC-V Vector/SIMD code. Specifically Depthwise Convolution and normal Convolution layers for a CNN.

I started out by letting it write a naive C version without intrinsic, and validated it against the PyTorch version.

Then I asked it (and two other models, Gemini 3.0 and GPT 5.1) to come up with some ideas on how to make it faster using SIMD vector instructions and write those down as markdown files.

Finally, I started the agent loop by giving Cursor those three markdown files, the naive C code and some more information on how to compile the code, and also an SSH command where it can upload the program and test it.

It then tested a few different variants, ran it on the target (RISC-V SBC, OrangePI RV2) to check if it improves runtime, and then continue from there. It did this 10 times, until it arrived at the final version.

The final code is very readable, and faster than any other library or compiler that I have found so far. I think the clear guardrails (output has to match exactly the reference output from PyTorch, performance must be better than before) makes this work very well.

sifar•13m ago
I am really surprised by this. While I know it can generate correct SIMD code, getting a performant version is non trivial, especially for RVV, where the instruction choices and the underlying micro architecture would significantly impact the performance.

IIRC, Depthwise is memory bound so the bar might be lower. Perhaps you can try some thing with higher compute intensity like a matrix multiply. I have observed, it trips up with the columnar accesses for SIMD.

camel-cdr•6m ago
can you share the code?
DustinBrett•46m ago
I've been having good results lately/finally with Opus 4.5 in Cursor. It still isn't one-shotting my entire task, but the 90% of the way it gets me is pretty close to what I wanted, which is better than in the past. I feel more confident in telling it to change things without it making it worse. I only use it at work so I can't share anything, but I can say I write less code by hand now that it's producing something acceptable.

For sysops stuff I have found it extremely useful, once it has MCP's into all relevant services, I use it as the first place I go to ask what is happening with something specific on the backend.

nathan_compton•43m ago
I think of coding agents more like "typing assistants" than programmers. If you know exactly what and how to do what you want, you can ask them to do it with clear instructions and save yourself the trouble of typing the code out.

Otherwise, they are bad.

tacone•38m ago
The way I see it, is that for non-trivial things you have to build your method piece by piece. Then things start to improve. It's a process of... developing a process.

Write a good AGENTS.md (or CLAUDE.md) and you'll see that code is more idiomatic. Ask it to keep a changelog. Have the LLM write a plan before starting code. Ask it to ask you questions. Write abstraction layers it (along with the fellow humans of course) can use without messing with the low-level detail every time.

In a way you have to develop a framework to guide the LLM behavior. It takes time.

devalexwells•35m ago
I have had similar questions, and am still evaluating here. However, I've been increasingly frustrated with the sheer volume of anecdotal evidence from yay and naysayers of LLM-assisted coding. I have personally felt increased productivity at times with it, and frustrations at others.

In order to better research, I built (ironically, mostly vibe coded) a tool to run structured "self-experiments" on my own usage of AI. The idea is I've init a bunch of hypotheses I have around my own productivity/fulfillment/results with AI-assisted coding. The tool lets me establish those then run "blocks" where I test a particular strategy for a time period (default 2 weeks). So for example, I might have a "no AI" block followed by a "some AI" block followed by a "full agent all-in AI block".

The tool is there to make doing check-ins easier, basically a tiny CLI wrapper around journaling that stays out of my way. It also does some static analysis on commit frequency, code produced, etc. but I haven't fleshed out that part of it much and have been doing manual analysis at the end of blocks.

For me this kind of self-tracking has been more helpful than hearsay, since I can directly point to periods where it was working well and try to figure out why or what I was working on. It's not fool-proof, obviously, but for me the intentionality has helped me get clearer answers.

Whether those results translate beyond a single engineer isn't a question I'm interested in answering and feels like a variant of developer metrics-black-hole, but maybe we'll get more rigorous experiments in time.

The tool open source here (may be bugs, only been using it a few weeks): https://github.com/wellwright-labs/devex

dpcan•33m ago
Yes, constantly.

I don’t know what I do differently, but I can get Cursor to do exactly what I want all the time.

Maybe it’s because it takes more time and effort, and I don’t connect to GitHub or actual databases, nor do I allow it to run terminal commands 99% of the time.

I have instructions for it to write up readme files of everything I need to know about what it has done. I’ve provided instructions and created an allow list of commands so it creates local backups of files before it touches them, and I always proceed through a plan process for any task that is slightly more complicated, followed by plan cleanup, and execution. I’m super specific about my tech stack and coding expectations too. Tests can be hard to prompt, I’ll sometimes just write those up by hand.

Also, I’ve never had to pay over my $60 a month pro plan price tag. I can’t figure out how others are even doing this.

At any rate, I think the problem appears to be the blind commands of “make this thing, make it good, no bugs” and “this broke. Fix!” I kid you not, I see this all the time with devs. Not at all saying this is what you do, just saying it’s out there.

And “high quality code” doesn’t actually mean anything. You have to define what that means to you. Good code to me may be slop to you, but who knows unless it is defined.

cwoolfe•31m ago
Hang in there. Yes it is possible; I do it every day. I also do iOS and my current setup is: Cursor + Claude Opus 4.5.

You still need to think about how you would solve the problem as an engineer and break down the task into a right-sized chunk of work. i.e. If 4 things need to change, start with the most fundamental change which has no other dependencies.

Also it is important to manage the context window. For a new task, start a new "chat" (new agent). Stay on topic. You'll be limited to about five back-and-forths before performance starts to suffer. (cursor shows a visual indicator of this in the for of the circle/wheel icon)

For larger tasks, tap the Plan button first, and guide it to the correct architecture you are looking for. Then hit build. Review what it did. If a section of code isn't high-quality, tell Claude how to change it. If it fails, then reject the change.

It's a tool that can make you 2 - 10x more productive if you learn to use it well.

allisdust•30m ago
You need to perturb the token distribution by overlaying multiple passes. Any strategy that does this would work.
jaxn•23m ago
I have a small-ish vertical SaaS that is used by ~700. I have enabled our customer success team for fix bugs. I approve the PRs, but they have fixed a surprising number of issues.
SatvikBeri•18m ago
Sure, here are my own examples:

* I came up with a list of 9 performance improvement ideas for an expensive pipeline. Most of these were really boring and tedious to implement (basically a lot of special cases) and I wasn't sure which would work, so I had Claude try them all. It made prototypes that had bad code quality but tested the core ideas. One approach cut the time down by 50%, I rewrote it with better code and it's saved about $6,000/month for my company.

* My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier.

* When AWS RDS costs spiked one month, I set Claude Code to investigate and it found the reason was a misconfigured backup setting

* I'll use Claude to throw together a bunch of visualizations for some data to help me investigate

* I'll often give Claude the type signature for a function, and ask it to write the function. It generally gets this about 85% right

dlandis•14m ago
> Last weekend I tried building an iOS app for pet feeding reminders from scratch.

Just start smaller. I'm not sure why people try to jump immediately to creating an entire app when they haven't even gotten any net-positive results at all yet. Just start using it for small time saving activities and then you will naturally figure out how to gradually expand the scope of what you can use it for.

nickphx•10m ago
lol no. it is all fomo clown marketing. they make outlandish claims and all fall short of producing anything more than noise.
cat_plus_plus•9m ago
Do not blame the tools? Given a clear description (overall design, various methods to add, inputs, outputs), Google Antigravity often writes better zero shot code than an average human engineer - consistent checks for special cases, local optimizations, extensive comments, thorough text coverage. Now in terms of reviews, the real focus is reviewing your own code no matter which tools you used to write it, vi or agentic AI IDE, not someone else reviewing your code. The later is a safety/mentorship tool in the best circumstances and all too often just an excuse for senior architects to assert their dominance and justify their own existence at the expense of causing unnecessary stress and delaying getting things shipped.

Now in terms of using AI, the key is to view yourself as a technical lead, not a people manager. You don't stop coding completely or treat underlying frameworks as a black box, you just do less of it. But at some point fixing a bug yourself is faster than writing a page of text explaining exactly how you want it fixed. Although when you don't know the programming language, giving pseudocode or sample code in another language can be super handy.

Tell HN: Use news.ycombinator.com/active

5•lysace•1h ago•4 comments

Ask HN: COBOL devs, how are AI coding affecting your work?

160•zkid18•1d ago•178 comments

Ask HN: What non-fiction do you read?

6•yanis_t•4h ago•7 comments

Ask HN: Share your personal website

942•susam•5d ago•2375 comments

Ask HN: Is it still worth pursuing a software startup?

190•newbebee•3d ago•226 comments

Ask HN: How can we solve the loneliness epidemic?

795•publicdebates•4d ago•1232 comments

Ask HN: What did you find out or explore today?

220•blahaj•5d ago•431 comments

Ask HN: Do you have any evidence that agentic coding works?

61•terabytest•3h ago•60 comments

Ask HN: When has humanities/history knowledge helped you in tech?

6•amadeuswoo•19h ago•6 comments

YC Events

7•obedvega•19h ago•0 comments

Ask HN: Claude Opus performance affected by time of day?

38•scaredreally•3d ago•39 comments

Ask HN: One IP, multiple unrealistic locations worldwide hitting my website

43•nacho-daddy•4d ago•26 comments

Tell HN: YouTube gave my username switzerland to a half government organization

38•faebi•4d ago•9 comments

Tell HN: The way I do simple data management for new prototypes

15•AndreyK1984•4d ago•8 comments

Ask HN: How to bullet proof yourself from AI?

20•max_•2d ago•21 comments

Ask HN: How many local logins do you have on your computer?

8•bahmboo•1d ago•15 comments

Ask HN: How do you safely give LLMs SSH/DB access?

84•nico•5d ago•106 comments

Ask HN: Where to find VC fund or investor for project in Europe?

8•nicksbg•1d ago•5 comments

Ask HN: What non-LLM tools have meaningfully improved your dev productivity?

4•primaprashant•1d ago•5 comments

Ask HN: How have you or your firm made money with LLMs?

12•bwestergard•3d ago•11 comments

Ask your Slack bot what the dev team shipped

2•inferno22•1d ago•0 comments

Ask HN: Browser extension vs. native app for structured form filling?

6•livrasand•3d ago•5 comments

Ask HN: How to get a job after a career break?

11•shivajikobardan•2d ago•4 comments

Ask HN: How to make spamming us uncomfortable for LinkedIn and friends?

14•zx8080•5d ago•8 comments

Ask HN: What are your best purchases under $100?

90•krishadi•4d ago•243 comments

Ask HN: Are cross-platform UI frameworks suitable for camera apps?

4•Austin_Conlon•1d ago•3 comments

Tell HN: Poshmark instantly leaked my email to scammers

8•hardenedmetapod•1d ago•8 comments

Ask HN: Distributed SQL engine for ultra-wide tables

23•synsqlbythesea•5d ago•20 comments

Ask HN: Is replacing an enterprise product with LLMs a realistic strategy?

8•chandmk•2d ago•6 comments

Ask HN: 1 year from today what will have been the worst behavior from AI corps?

3•keepamovin•1d ago•4 comments