AI can code, but it can't build software

https://bytesauna.com/post/coding-vs-software-engineering

141•nreece•3h ago

Comments

subtlesoftware•3h ago

True for now because models are mainly used to implement features / build small MVPs, which they’re quite good at.

The next step would be to have a model running continuously on a project with inputs from monitoring services, test coverage, product analytics, etc. Such an agent, powered by a sufficient model, could be considered an effective software engineer.

We’re not there today, but it doesn’t seem that far off.

jahbrewski•3h ago

I’ve heard “we’re not there today, but it doesn’t seem that far off” since the beginning of the AI infatuation. What if, it is far off?

bloppe•3h ago

It's telling to me that nobody who actually works in AI research thinks that it's "not that far off".

pil0u•3h ago

I agree that tooling is maturing towards that end.

I wonder if that same non-technical person that built the MVP with GenAI and requires a (human) technical assistance today, will need it tomorrow as well. Will the tooling be mature enough and lower the barrier enough for anyone to have a complete understanding about software engineering (monitoring services, test coverage, product analytics)?

thomasfromcdnjs•3h ago

Agreed.

I've played around with agent only code bases (where I don't code at all), and had an agent hooked up to server logs, which would create an issue when it encounters errors, and then an agent would fix the tickets, push to prod and check deployment statuses etc. Worked good enough to see that this could easily become the future. (I also had it claude/codex code that whole setup)

Just for semantic nitpicking, I've zero shot heaps of small "software" projects that I use then throw away. Doesn't count as a SAAS product but I would still call it software.

bloppe•3h ago

The article "AI can code, but it can't build software"

An inevitable comment: "But I've seen AI code! So it must be able to build software"

bcrosby95•3h ago

> The next step would be to have a model running continuously on a project with inputs from monitoring services, test coverage, product analytics, etc. Such an agent, powered by a sufficient model, could be considered an effective software engineer.

Building an automated system that determines if a system is correct (whatever that means) is harder to build than the coding agents themselves.

bloppe•3h ago

> We’re not there today, but it doesn’t seem that far off.

What time frame counts as "not that far off" to you?

If you tried to bet me that the market for talented software engineers would collapse within the next 10 years, I'd take it no question. 25 years, I think my odds are still better than yours. 50 years, I might not take the bet.

subtlesoftware•3h ago

Great question. It depends on the product. For niche SaaS products, I’d say in the next few years. For like Amazon.com, on the order of decades.

bloppe•2h ago

If the niche SaaS product never required a talented engineer in the first place, I'd be inclined to agree with you. But even a niche SaaS product requires a decent amount of engineering skill to maintain well.

simonw•3h ago

This is a good headline. LLMs are remarkably good at writing code. Writing code isn't the same thing as delivering working software.

A human expert needs to identify the need for software, decide what the software should do, figure out what's feasible to deliver, build the first version (AI can help a bunch here), evaluate what they've built, show it to users, talk to them about whether it's fit for purpose, iterate based on their feedback, deploy and communicate the value of the software, and manage its existence and continued evolution in the future.

Some of that stuff can be handled by non-developer humans working with LLMs, but a human expert needs who understands code will be able to do this stuff a whole lot more effectively.

I guess the big question is if experienced product management types can pick up enough coding technical literacy to work like this without programmers, or if programmers can pick up enough enough PM skills to work without PMs.

My money is on both roles continuing to exist and benefit from each other, in a partnership that produces results a lot faster because the previously slow "writing the code" part is a lot faster than it used to be.

roxolotl•3h ago

One of the interesting corollaries of the title is that this can also be true of humans. Being able to code is not the same as being a software engineer. It never has been.

bloppe•2h ago

At least you can teach a human to become a software engineer.

echelon•2h ago

We're also finding this true with media generation.

AI video is an incredible tool, but it can't make movies.

It's almost as if all of these models are an exoskeleton for people that already know what they're doing. But you still need an expert in the loop.

falcor84•2h ago

> but it can't make movies.

To me this appears to be a very time-dependent assertion. 5 years ago, AI couldn't generate a good movie frame. 2 years ago, AI couldn't generate a good shot, but now in 2025, AI can generate a not-too-shabby scene. If capabilities continue improving at this rate (e.g. as they have with AI being able to generate full musical albums), I wouldn't bet against AI being able to generate a decent feature film in the next decade. It might take longer until it's the sort of thing that we'd present in festivals, but I just don't a clear barrier any more.

Looking at it from another perspective, if an AI driven task currently requires "an expert in the loop" to navigate things by offering the appropriate prompts, evaluating and iterating on the AI generated content, then there's nothing clear to stop us from training the next generation of AI to include that expert's competency.

Taking it into full extrapolation mode, the thing that current generation AIs really don't have is the human experience that leads to a creative drive, but once we have robotic agents among us, these would arguably be able start gathering "experiences" that they could then mine to write and produce "their own" stories.

kujjerl7•1h ago

>it can't make movies

Humans are sharply declining in this ability at the same time. Most of what Hollywood churns out now is superhero slop, forced-diversity spin-offs, awful remakes of classics, and awkward comebacks for yesteryear's leading men.

I know it's not a movie but I could've happily watched "Nothing, Forever" for the rest of my life. That was creative, chaotic, hilarious, and wildly entertaining.

Meanwhile I watched the human-created War Of The Worlds (2025) last weekend... The less said, the better.

jfim•3h ago

> I guess the big question is if experienced product management types can pick up enough coding technical literacy to work like this without programmers

I'd argue that they can't, at least on a short timeframe. Not because LLMs can't generate a program or product that works, but that there needs to be enough understanding of how the implementation works to fix any complex issues that come up.

One experience I had is that I had tried to generate a MITM HTTPS proxy that uses Netty using Claude, and while it generated a pile of code that looked good on the surface, it didn't actually work. Not knowing enough about Netty, I wasn't able to debug why it didn't work and trying to fix it with the LLM didn't help either.

Maybe PMs can pick up enough knowledge over time to be able to implement products that can scale, but by that time they'd effectively be a software engineer, minus the writing code part.

ambicapter•3h ago

LLMs are great for learning though, you can easily ask them questions, and you can evaluate your understanding every step of the way, and gradually build the accuracy of your world model that way. It’s not uncommon for me to ask a general question, drill deeper into a concept, and then either test things manually with some toy code or end up reading the official documentation, this time with at least some exposure to the words that I’m looking for to answer my question.

o11c•3h ago

If I wanted a confident and simple answer with no regard for veracity, I would just ask a politician.

sodaclean•3h ago

This is how I use them- but I also use them to write initial UI's (usually very primitive). Because I've got an issue where the UI has to be perfect, and if I can blame somebody/something other than me I can ignore it until the UI becomes important enough.

kaashif•57m ago

If an LLM can get you 90% of the way there, you need fewer engineers. But the engineer you need probably needs to be a senior engineer who went through the pain of learning all of the details and can function without AI.

If all juniors are using AI, or even worse, no juniors are ever hired, I'm not sure how we can produce those seniors at the scale we currently do. Which isn't even that large a scale.

prmph•3h ago

> LLMs are remarkably good at writing code.

Just this past weekend, I've designed and written code (in Typescript) that I don't think LLMs can even come close to writing in years. I have a subscription to a frontier LLM, but lately I find myself using like 25% of the time.

At a certain level the software architecture problems I'm solving, drawing upon decades of understanding about maintainable, performant, and verifiable design of data structures and types and algorithms, are things LLMs cannot even begin to grasp.

At that point, I find that attempting to use an LLM to even draft an initial solution is a waste of time. At best I can use it for initial brainstorming.

The people saying LLMs can code are hard for me to understand. They are good for simple bash scripts and complex refactoring and drafting basic code idioms and that's about it.

And even for these tasks the amount of hand-holding I need to do is substantial. At least Gemini Pro/CLI seems good at one-shot performance, before its context gets poisoned

airstrike•3h ago

I find LLMs most helpful when I already have half of the answer written and need them to fill in the blanks.

"Take X and Y I've written before, some documentation for Z, an example W from that repo, now smash them together and build the thing I need"

crazygringo•2h ago

> The people saying LLM can code are hard for me to understand.

Just today, I spent an hour documenting a function that performs a set of complex scientific simulations. Defined the function input structure, the outputs, and put a bunch of references in the body to function calls it would use.

I then spent 15 minutes explaining to the free version of ChatGPT what the function needs to do both in scientific terms and in computer architecture terms (e.g. what needed to be separated out for unit tests). Then it asked me to answer ~15 questions it had (most were yes/no, it took about 5 min), then it output around 700 lines of code.

It took me about 5 minutes to get it working, since it had a few typos. It ran.

Then I spent another 15 minutes laying out all the categories of unit tests and sanity tests I wanted it to write. It produced ~1500 lines of tests. It took me half an hour to read through them all, adjusting some edge cases that didn't make sense to me and adjusting the code accordingly. And a couple cases where it was testing the right part of the code, but had made valiant but wrong guesses as to what the scientifically correct answer would be. All the tests then passed.

All in all, a little over two hours. And it ran perfectly. In contrast, writing the code and tests myself entirely by hand would have taken at least a couple of entire days.

So when you say they're good for those simple things you list and "that's about it", I couldn't disagree more. In fact, I find myself relying on them more and more for the hardest scientific and algorithmic programming, when I provide the design and the code is relatively self-contained and tests can ensure correctness. I do the thinking, it does the coding.

DougWebb•2h ago

> Just today, I spent an hour documenting a function that performs a set of complex scientific simulations. Defined the function input structure, the outputs, and put a bunch of references in the body to function calls it would use.

So that's... math. A very well defined problem, defined very well. Any decent programmer should be able to produce working software from that, and it's great that ChatGPT was able to help you get it done much faster than you could have done it yourself. That's also the kind of project that's very well suited for unit testing, because again: math. Functions with well defined inputs, outputs, and no side-effects.

Only a tiny subset of software development projects are like that though.

simonw•2h ago

> Only a tiny subset of software development projects are like that though.

Right: the majority of software development is things like "build a REST API for these three database tables" or "build a contact form with these four fields" or "write unit tests for this new function" or "update my YAML CI configuration to run this extra command".

prmph•2h ago

> documenting a function that performs a set of complex scientific simulations.

The example you gave sounds like the problem is deterministic, even if composed of many moving parts. That's one way of looking at complexity.

When I talk about complex problems I'm not just talking about intricate problems. I'm talking about problems where the "problem" is design, not just implementing a design, and that is where LLMs struggle a lot.

Example, I want to design a strongly typed fluent API interface to some functionality. Even knowing how to shape the fluent interface so that is powerful, intuitive, well/strongly typed, and maintainable is a deep art.

The intuitive design constraints that I'm designing under would be hard to even explain to an LLM.

simonw•2h ago

For the problems like that I consider my role to be the expert designer. I figure out a the design, then get the LLM to write the code and the tests for me.

It is a lot faster at typing than I am.

CjHuber•2h ago

Can you maybe give an example you’ve encountered of an algorithm or a data structure that LLMs cannot handle well?

In my experience implementing algorithms from a good comprehensive description and keeping track of data models is where they shine the most.

manwe150•2h ago

Converting an algorithm implementation from recursive to iterative: it got the concept broadly right, but was quite bad at making the logic actually match up, often refusing to fix mistakes or reverting fixes two edits later. Still a positive experience though, since it was fixable issues and reduced the amount of tedious copies I had to type

cadamsdotcom•2h ago

Did you have it write tests and give it the ability to iterate & validate its implementation without you in the loop?

Anything less is setting it up for failure...

manwe150•2h ago

Yes, but it got 99% of those then got stuck on why the others made no sense to it

prmph•2h ago

Example (expanding on [1]): I want to design a strongly typed fluent API interface to some role/permissions based authorization engine functionality. Even knowing how to shape the fluent interface so that is powerful but intuitive, as strongly typed as possible but also and maintainable, is a deep art.

One reason I know LLM can't come close to my design is this: I've written something that works (that a typical senior engineer might write), but this not enough. I have evaluated it critically (drawing on my experience with long lived software), rewritten it again to better meet the targets above, and repeated this process several times. I don't know what would make an LLM go: now that kind of works, but is this the most intuitive, well typed, and maintainable design that there could be?

1. https://news.ycombinator.com/item?id=45728183

simonw•1h ago

Funny you should use role/permissions as an example here, I spent the weekend using Claude Code to rewrite my own permissions engine to a new design that uses SQL queries to solve the problem "list all of the resources that this actor can perform this action on".

My previous design required looping through all known resources asking "can actor X action Y on this?". The new design gets to generate a very complex by thoroughly tested SQL query instead.

Applying that new design and updating the hundred of related tests would have taken me weeks. I got it done in two days.

Here's a diff that captures most of the work: https://github.com/simonw/datasette/compare/e951f7e81f038e43...

YZF•1h ago

What % of the total amount of software (lessay lines of code or time invested) in the world is like that?

NicoJuicy•1h ago

There are severe edge cases. Here are some of the last days.

Eg. Just updating bootstrap to angular bootstrap. It didn't transfer how I placed the dropdowns ( basically using dropdown-end). So everything was out of view in desktop and mobile.

It forgot the transloco I used everywhere and just used default English ( happens a lot).

Suggested code that fixed 1 bug ( expression property recursion), but now linq to SQL was broken.

Upgrade to angular 17 in a asp.net core app. I knew it used vite now. But it also required a browser folder to deploy. 20 changes down the road, I noticed something on my ui wasn't updated in dev ( fast commits for my side project, I don't build locally), it didn't deploy anything related to angular no more...

I had 2 files named ApplicationDbContext and it took the one from wrong monolith module.

It adds files in the wrong directory sometimes. Eg. Some modules were made with feature folders.

It sometimes forgets to update my ocelot gateway or updates the compressed version. ...

Note: I documented my architecture in eg. cline. But I use multiple agents to experiment with.

Tldr: it's an expert beginner programmer.

simonw•49m ago

Do you have any automated tests for that project?

I'm bringing to suspect a lot of my great experiences with coding agents come from the fact that they can run tests to confirm they haven't broken anything.

jcelerier•1h ago

> The people saying LLMs can code are hard for me to understand. They are good for simple bash scripts and complex refactoring and drafting basic code idioms and that's about it

that's like, 90% of the code people are writing

FromTheFirstIn•50m ago

But not 90% of the work people do. It’s solved a task, not a problem.

latentsea•22m ago

I think C# is really going to shine in the LLM coding era. You can write Roslyn Analyzers to fail the build on arbitrary conditions after inspecting the AST. LLMs are great at helping you write these too. If you get a solid architecture well defined you can then use these as guardrails to constrain development to only happen in the manner you intend. You can then get LLMs to implement features and guarantee the code comes out in the shape you expect it to.

This works well for humans too, but custom analysers are abstract and not many devs know how to write them, so they are mostly provided by library authors. However, being able to generate them via LLMs makes them so much more accessible, and IMHO is a game changer for enforcing an architecture.

I've been exploring this direction a lot lately, and it feels very promising.

bgrainger•9m ago

Completely agree, and I've started writing more Roslyn analyzers to provide quick feedback to the LLM (assuming you're using it in something like VS Code that exposes the `problems` tool to the model).

I also want C# semantics even more closely integrated with the LLM. I'm imagining a stronger version of Structured Model Outputs that knows all the valid tokens that could be generated following a "." (including instance methods, extension properties, etc.) and prevents invalid code from even being generated in the first place, rather than needing a roundtrip through a Roslyn analyzer or the compiler to feed more text back to the model. (Perhaps there's some leeway to allow calls to not-yet-written methods to be generated.) Or maybe this idea is just a crutch I'm inventing for current frontier models and future models will be smart enough that they don't need it?

raddan•6m ago

Can you expand a little? What you’re suggesting sounds a bit like program verification, or at least program analysis. But what properties are you checking?

I have written many program analyses (though never any for C#; I’ll have to check it out), and my experience is that they are quite challenging to write. Many are research-level CS, so well outside the skill set of your average vibe coder. I’m wondering if you have some insight about LLM generated code that has not occurred to me…

colordrops•3h ago

Once all the context that a typical human engineer has to "build software" is available to the LLM, I'm not so sure that this statement will hold true.

bloppe•2h ago

But it's becoming increasingly clear that LLMs based on the transformer model will never be able to scale their context much further than the current frontier, due mainly to context rot. Taking advantage of greater context will require architectural breakthroughs.

Bukhmanizer•2h ago

> the big question is if experienced product management types can pick up enough coding technical literacy to work like this without programmers

I have a strong opinion that AI will boost the importance of people with “special knowledge” more than anyone else regardless of role. So engineers with deep knowledge of a system or PMs with deep knowledge of a domain.

simonw•1h ago

That sounds right to me.

samsolomon•2h ago

I think you're right, the roles will exist for some time. But I think we'll start to see more and more overlap between engineering, product management and design.

In a lot of ways I think that will lead to stronger delivery teams. As a designer—the best performing teams I've been on have individuals with a core competency, but a lot of overlap in other areas. Product managers with strong engineering instincts, engineers with strong design instincts, etc. When there is less ambiguity in communication, teams deliver better software.

Longer-term I'm unsure. Maybe there is some sort of fusion into all-purpose product people able to do everything?

IanCal•2h ago

I disagree. Unless you’re focussed on right now, in which case case… maybe? Depends on scale.

I have a few scattered thoughts here but I think you’re caught up on how things are done now.

A human expert in a field is the customer.

Do you think, say, gpt5 pro can’t talk to them about a problem and what’s reasonable to try and build in software?

It can build a thing, with tests, run stuff and return to a user.

It can take feedback (talking to people is the key major things LLMs have solved).

They can iterate (see: codex) deploy and they can absolutely write copy.

What do you really think in this list they can’t do?

For simplicity reduce it to a relatively basic crud app. We know that they can make these over several steps. We know they can manage the ui pretty well, do incremental work etc. What’s missing?

I think something huge here is that some of the software engineering roles and management become exceptionally fast and cheap. That means you don’t need to have as many users to be worthwhile writing code to solve a problem. Entirely personal software becomes economically viable. I don’t need to communicate value for the problem my app has solved because it’s solved it for me.

Frankly most of the “AI can’t ever do my thing” comments come across as the same as “nobody can estimate my tasks they’re so unique” we see every time something comes up about planning. Most business relevant SE isn’t complex logically, interestingly unique or frankly hard. It’s just a different language to speak.

Disclaimer: a client of mine is working on making software simpler to build and I’m looking at the AI side, but I have these views regardless.

simonw•1h ago

I expect that customers who have those needs would much rather hire somebody to be the intermediary with the LLM writing the code than take on that role themselves.

You'll get the occasional high agency non-technical customer who decides to learn how to get these things done with LLMs but they'll be a pretty rare breed.

IanCal•1h ago

This may be a timeframe issue but I sincerely doubt anyone wants to hire someone to be an intermediary. They just want the thing done.

I know that right now few want to sit in front of claude code, but it's just not that big of a leap to move this up a layer. Workflows do this even without the models getting better.

simonw•48m ago

YouTube can show anyone how to unblock a sink. Most people still choose to call a plumber.

vrc•14m ago

I’m a PM and I’ve been able to do a lot of very interesting near production ready bits of coding recently with an LLM. I say near production ready because I specifically only build functional data processing stuff that I intentionally build with clean I/O requirements to hand to the real engineers on the team to slot in. They still have to fix some things to meet our standards, but I’m basically a “researcher” level coder. Which makes sense — I do have an undergrad and MS in CS, and did a lot of mathy algo stuff. For the last 15+ years I could never use anything in my brain to help the team solve things I was best suited to solve. I am now, and that’s nice.

The one key point is that I am keenly aware of what I can and cannot do. With these new superpowers, I often catch myself doing too much, and I end up doing a lot more rewrites than a real engineer would. But I can see Dunning Kruger playing out everywhere when people say they can vibe code an entire product.

bradfa•3h ago

The context windows are still dramatically too small and the models aren’t yet seeming to train on how to build maintainable software. There is a lot less written down about how to do this on the public web. There’s a bunch of high level public writing but not may great examples of real world situations that happen on every proprietary software project, because that’s very messy data locked away internal to companies.

I’m sure it’ll improve over time but it won’t be nearly as easy as making ai good at coding.

AnimalMuppet•3h ago

In fairness, there's a lot more "software" than there is "maintainable software" in their training data...

ewoodrich•1h ago

> aren’t yet seeming to train on how to build maintainable software.

A while ago I discovered that Claude, left to its own devices, has been doing the LLM equivalent of Ctrl-C/Ctrl-V for almost every component it's created in an ever growing .NET/React/Typescript side project for months on end.

It was legitimately baffling seeing the degree to which it had avoided reusing literally any shared code in favor of updating the exact same thing in 19 places every time a color needed to be tweaked or something. The craziest example was a pretty central dashboard view with navigation tabs in a sidebar where it had been maintaining two almost identical implementations just to display a slightly different tab structure for logged in vs logged out users.

I've now been directing it to de-spaghetti things when I spot good opportunities and added more best practices to CLAUDE.md (with mixed results) so things are gradually getting more manageable, but it really shook my confidence in its ability to architect, well, anything on its own without micromanagement.

preommr•3h ago

These discussions are so tiring.

Yes, they're bad now, but they'll get better in a year.

If the generative ability is good enough for small snippets of code, it's good enough for larger software that's better organized. Maybe the models don't have enough of the right kind of training data, or the agents don't have the right reasoning algorithms. But it is there.

CivBase•3h ago

Problem is, as the author points out, designing software solutions is a lot more complicated than writing code. AI might get better in a year, but when will it be good enough? Does our current approach to AI even produce an economical solution to this problem, even if it's technically possible?

phyzome•1h ago

I've been hearing "they'll be better in a few months/years" for a few years now.

Esophagus4•1h ago

But hasn’t the ecosystem as a whole been getting better? Maybe or maybe not on the models specifically, but ChatGPT came out and it could do some simple coding stuff. Then came Claude which could do some more coding stuff. Then Cursor and Cline, then reasoning models, then Claude Code, then MCPs, then agents, then…

If we’re simply measuring model benchmarks, I don’t know if they’re much better than a few years ago… but if we’re looking at how applicable the tools are, I would say we’re leaps and bounds beyond where we were.

orionblastar•3h ago

I see so many people on the Internet who claim they can fix AI VIBE Code. Nothing new I've been Super Debugging crappy code for 30 years to make it work.

jumploops•3h ago

I've been forcing myself to "pure vibe-code" on a few projects, where I don't read a single line of code (even the diffs in codex/claude code).

Candidly, it's awful. There are countless situations where it would be faster for me to edit the file directly (CSS, I'm looking at you!).

With that said, I've been surprised at how far the coding agents are able to go[0], and a lot less surprised about where I need to step in.

Things that seem to help: 1. Always create a plan/debug markdown file 2. Prompt the agent to ask questions/present multiple solutions 3. Use git more than normal (squash ugly commits on merge)

Planning is key to avoid half-brained solutions, but having "specs" for debug is almost more important. The LLM will happily dive down a path of editing as few files as possible to fix the bug/error/etc. This, unchecked, can often lead to very messy code.

Prompting the agent to ask questions/present multiple solutions allows me to stay "in control" over the how something is built.

I now basically commit every time a plan or debug step is complete. I've tried having the LLM control git, but I feel that it eats into the context a bit too much. Ideally a 3rd party "agent" would handle this.

The last thing I'll mention is that Claude Code (Sonnet 4.5) is still very token-happy, in that it eagerly goes above and beyond when not always necessary. Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault. For both cases, this is where planning up-front is super useful.

[0]Caveat: the projects are either Typescript web apps or Rust utilities, can't speak to performance on other languages/domains.

throwaway314155•3h ago

> Candidly, it's awful.

Noting your caveat but I’m doing this with Python and your experience is very different from mine.

jumploops•13m ago

Oh, don't get me wrong, the models are marvelous!

The "it's awful" admission is due to the "don't look at code" aspect of this exercise.

For real work, my split is more like 80% LLM/20% non-LLM, and I read all the code. It's much faster!

svachalek•2h ago

Also, put heavy lint rules in place, and commit hooks to make sure everything compiles, lints, passes tests, etc. You've got to be super, super defensive. But Claude Code will see all those barriers and respond to them automatically which saves you the trouble of being vigilant over so many little things. You just need to watch the big picture, like make sure tests are there to replicate bugs, new features are tested, etc, etc.

tharkun__•2h ago

    Always create a plan/debug markdown file

Very much necessary. Especially with Claude I find. It auto-compacts so often (Sonnet 4.5) and it instantly goes a-wall stupid after that. I then make it re-read the markdown file, so we can actually continue without it forgetting about 90% of what we just did/talked about.

    Prompt the agent to ask questions/present multiple solutions

I find that only helps marginally. They all output so much text it's not even funny. And that's with one "solution".

I don't get how people can stand reading all that nonsense they spew, especially Claude. Everything is insta-ready to deploy, problem solved, root cause found, go hit the big red button that might destroy the earth in a mushroom cloud. I learned real fast to only skim what it says and ignore all that crap (as in I never tried to "change its personality" for real - I did try to tell it to always use the scientific method and prove its assumptions but just like a junior dev it never does and just tells me stupid things it believes to be true and I have to question it. Again, just like a junior dev, but it's my junior dev that's always on and available when I have time and it does things while I do other stuff. And instead of me having to ask the junior after and hour or two what rabbit hole it went down and get them out of there, Claude and Codex usually visually ping the terminal before I even have time to notice. That's for when I don't have full time focus on what I'm trying to do with the agents, which is why I do like using them.

The times when I am fully attentive, they're just soooo slow. And many many times I could do what they're doing faster or just as fast but without spending extra money and "environment". I've been trying to "only use AI agents for coding" for like a month or two now to see its positives and limitations and form my own opinion(s).

    Prompting the agent to ask questions/present multiple solutions allows me to stay "in control" over the how something is built.

I find Claude's "Plan mode" is actually ideal. I just enable it and I don't have to tell it anything. While Codex "breaks out" from time to time and just starts coding even when I just ask it a question. If these machines ever take over, there's probably some record of me swearing at them and I will get a hitman on me. Unlike junior devs, I have no qualms about telling a model that it again ignored everything I told it.

    Ideally a 3rd party "agent" would handle this.

With sub-agents you can. Simple git interactions are perfect for subagents because not much can get lost in translation in the interface between the main agent and the sub agent. Then again, I'm not sure how you loose that much context. I rather use a sub agent for things like running the tests and linter on the whole project in the final steps, which spew a lot of unnecessary output.

Personally, I had a rather bad set of experiences with it controlling git without oversight, so I do that myself, since doing it myself is less taxing than approving everything it wants to do (I automatically allow Claude certain commands that are read only for investigations and reviewing things).

orliesaurus•3h ago

Software engineering has always been about managing complexity, not writing code. Code is just the artifact. No-code, low-code is all code but doesn't make for a good software engineered application

hamasho•3h ago

The problem with vibe coding is it demoralizes experienced software engineers. I'm developing a MVP with vibes and Claude Code and Codex output work in many cases for this relatively new project. But the quality of code is bad. There is already duplicated or unused logic, a lot of code is unnecessarily complex (especially React and JSX). And there's little PR reviews so that "we can keep velocity". I'm paying much less attention for quality now. After all, why bother when AI produce working code? I can't justify and don't have energy for deep-diving system design or dozens of nitpicking change requests. And it makes me more and more replaceable by LLM.

bloppe•3h ago

> I'm paying much less attention for quality now. After all, why bother when AI produce working code?

I hear this so much. It's almost like people think code quality is unrelated to how well the product works. As though you can have 1 without the other.

If your code quality is bad, your product will be bad. It may be good enough for a demo right now, but that doesn't mean it really "works".

krackers•2h ago

Because there's a notion that if any bugs are discovered later on, they can just "be fixed". And generally unless you're the one fixing the bugs, it's hard to understand the asymmetry in effort here. No one also ever got any credit for bug-fixes compared to adding features.

carlosjobim•2h ago

> If your code quality is bad, your product will be bad.

Why? Modern hardware power allow for extremely inefficient code, so even if some code runs a thousand times slower because it's badly programmed it will still be so fast that it seems instant.

For the rest of the stuff, it has no relevance for the user of the software what the code is doing inside of the chip, as long as the inputs and outputs function as they should. User wants to give input and receive output, nothing else has any significance at all for her.

bloppe•2h ago

Sure. Everyone remembers from Algorithms 101 that a constant multiple ("a thousand times slower") is irrelevant. What matters is the scalability. Something that's O(n) will always scale better than something that O(n^2), even if the thing that's O(n) has 1000x overhead per unit.

But that's just a small piece of the puzzle. I agree that the user only cares about what the product does and not how the product works, but the what is always related to how, even if that relationship is imperceptible to the user. A product with terrible code quality will have more frequent and longer outages (because debugging is harder), and it will take longer for new features to be added (because adding things is harder). The user will care about these things.

hamasho•1h ago

I know how important code quality is. But I can't (or don't have energy to) convince junior engineers and sometimes project managers to submit good quality code instead of vibe-coded garbage anymore.

phyzome•1h ago

I find it fascinating that your reaction to that situation is to double down while my reaction would be to kill it with fire.

apical_dendrite•3h ago

I've been working with a data processing pipeline that was vibe-coded by an AI engineer, and while the code works, as software that has to fit into a production environment, it's a mess. Take logging for example. The pipeline is made up of AWS lambdas written in python. The person who built it wanted to add context to each log for debugging and the LLM generated hundreds of lines of python in each lambda to do this (no common library). But he (and the LLM) didn't understand that there were a bunch of files that initialized their own loggers at the top of the file, so all that code to set context in the root logger wouldn't get used in those files. And then he wanted to parallelize some tasks, and both he and the LLM didn't understand that the logging context was thread-local and wouldn't show up in logs generated in another thread. So what we ended up with was 250+ line logging_config.py files in each individual lambda that were only used for a small portion of the logs generated by the application.

mrheosuper•32m ago

Does it work ?

pron•3h ago

> I don’t really know why AI can't build software (for now)

Could be because programming involves:

1. Long chains of logical reasoning, and

2. Applying abstract principles in practice (in this case, "best practices" of software engineering).

I think LLMs are currently bad at both of these things. They may well be among the things LLMs are worst at atm.

Also, there should be a big asterisk next to "can write code". LLMs do often produce correct code of some size and of certain kinds, but they can also fail at that too frequently.

eterm•3h ago

I've been experimenting with a little vibe coding.

I've generally found the quality of .NET to be quite good. It trips up sometimes when linters ping it for rules not normally enforced, but it does the job reasonably well.

The front-end javascript though? It's both an absolute genuis and a complete menace at the same time. It'll write reams of code to gets things just right but with no regards to human maintainability.

I lost an entire session to the fact that it cheerfully did:

    npm install fabric
    npm install -D @types/fabric

Now that might look fine, but a human would have realised that the typings library is a completely different out-dated API, the package last updated 6 years ago.

Claude however didn't realise this, and wrote a ton of code that would pass unit tests but fail the type check. It'd check the type checker, re-write it all to pass the type checker, only for it now to fail the unit tests.

Eventually it semi-gave up typing and did loads of (fabric as any) all over the place, so now it just gave runtime exceptions instead.

I intervened when I realised what it was doing, and found the root cause of it's problems.

It was a complete blindspot because it just trusted both the library and the typechecker.

So yeah, if you want to snipe a vibe coder, suggest installing fabricjs with typings!

teaearlgraycold•2h ago

Although - at least for simple packages - I've found LLMs good at extracting type definitions from untyped libraries.

ergocoder•2h ago

Yeah, just like many software engineers. AI has achieved software engineering.

zeckalpha•2h ago

I think this can be extended (but not necessarily fully mitigated) by working with non-SWE agents interacting with the same codebase. Drafting product requirements, assess business opportunities, etc. can be done by LLMs.

CMCDragonkai•2h ago

Many human devs can code, but few can build software.

gdulli•2h ago

It's the ultimate irony that I cling to the stance that humans are capable of nuance and creativity that machines will never match, yet the human-written defenses of AI are so repetitive and shallow and cliched that they don't even require the sophistication of LLMs to produce.

Calamityjanitor•2h ago

I feel you can apply this to all roles. When models passed highschool exam benchmarks, some people talked as if that made the model equivalent to a person passing highschool. I may be wrong, but I bet even an state of the art LLM couldn't complete high school. You have to do things like attending classes at the right time/place, take initiative, keep track of different classes. All of the bigger picture thinking and soft skills that aren't in a pure exam.

Improving this is what everyone's looking into now. Even larger models, context windows, adding reasoning, or something else might improve this one day.

takoid•1h ago

How would LLMs ever be able to attend classes at the right time/place, assuming the classes are in-person and not remote? Seems like an odd and irrelevant criticism.

thegrim33•2h ago

And here I am, using AI twice within the last 12 hours, to ask it two questions about an extremely well used, extremely well documented, physics library, and both times having it return to me sample code which makes use of library methods which don't exist. When I tell it this, I get the "Oh, you're so right to point that out!" response, and get new code returned, which still just blatantly doesn't work.

drcxd•1h ago

Hello, have you ever tried using the coding agents?

For example, you can pull the library code to your working environment and install the coding agent there as well. Then you can ask them to read specific files, or even all files in the library. I believe (according to my personal experience) this would significantly decrease the possibility of hallucinating.

abhishekismdhn•1h ago

Even the code quality is often quite poor. At the same time, not using critical thinking can have serious consequences for those who treat AI as more than an explorer or companion. You might think that with AI, the number of highly skilled developers would increase but it could be quite the opposite. Code is just a medium; developers are paid to solve problems, not to write code. But writing code is still important as it refines your thoughts and sharpens your problem-solving skills.

The human brain learns through mistakes, repetition, breaking down complex problems into simpler parts, and reimagining ideas. The hippocampus naturally discards memories that aren’t strongly reinforced.. so if you rely solely on AI, you’re simply not going to remember much.

aussieguy1234•1h ago

I'm of the opinion that not a single software engineer has yet lost their job to AI.

Any company claiming they've replaced engineers with AI has done so in an attempt to cover up the real reasons they've gotten rid of a few engineers. "AI automating our work" sounds much better to investors than "We overhired and have to downsize".

cdelsolar•1h ago

I definitely disagree. I'm a software engineer, but have been heavily using AI the last few months and have gotten multiple apps to production since then. I have to guide the LLM along, yes, but it's perfectly capable of doing everything needed up to and including building the cloudformation templates for Fargate or whatever.

Animats•1h ago

OK, he makes a statement, and then just stops.

In some ways, this seems backwards. Once you have a demo that does the right thing, you have a spec, of sorts, for what's supposed to happen. Automated tooling that takes you from demo to production ready ought to be possible. That's a well-understood task. In restricted domains, such as CRUD apps, it might be automated without "AI".

ruguo•1h ago

True. AI might not have a soul, but it’s become an absolute lifesaver for me.

To really get the most out of it though, you still need to have solid knowledge in your own field.

dreamcompiler•54m ago

I've worked in a few teams where some member of the [human] team could be described as "Joe can code, but he can't build software."

The difference is what we used to call the "ilities": Reliability, inhabitability, understandability, maintainability, securability, scalability, etc.

None of these things are about the primary function of the code, i.e. "it seems to work." In coding, "it seems to work" is good enough. In software engineering, it isn't.

xeckr•44m ago

Give it a year or two...

liqilin1567•6m ago

Every time I see a "build an app with just one English sentence" hype, I turn away immediately

Easy RISC-V

10M people watched a YouTuber shim a lock; the lock company sued him – bad idea

Claude for Excel

Finding my rhythm again

Iroh-blobs

Simplify your code: Functional core, imperative shell

JetKVM – Control any computer remotely

Yet another year with Decker

Study finds growing social circles may fuel polarization

Pyrex catalog from from 1938 with hand-drawn lab glassware [pdf]

Why Busy Beaver hunters fear the Antihydra

Go beyond Goroutines: introducing the Reactive paradigm

Are these real CVEs? VulDB entries for dnsmasq rely on replacing config files

MCP-Scanner – Scan MCP Servers for vulnerabilities

The new calculus of AI-based coding

TOON – Token Oriented Object Notation

Smartphones manipulate our emotions and trigger our reflexes

Should LLMs just treat text content as an image?

The last European train that travels by sea

Creating an all-weather driver

Show HN: Dlog – Journaling and AI coach that learns what drives wellbeing (Mac)

Show HN: JSON Query

Solving regex crosswords with Z3

Image Dithering: Eleven Algorithms and Source Code (2012)

Sieve (YC X25) is hiring engineers to build video datasets for frontier AI

When 'perfect' code fails

AI can code, but it can't build software

Corrosion

PSF has withdrawn $1.5M proposal to US Government grant program

Fnox, a secret manager that pairs well with mise

Easy RISC-V

10M people watched a YouTuber shim a lock; the lock company sued him – bad idea

Claude for Excel

Finding my rhythm again

Iroh-blobs

Simplify your code: Functional core, imperative shell

JetKVM – Control any computer remotely

Yet another year with Decker

Study finds growing social circles may fuel polarization

Pyrex catalog from from 1938 with hand-drawn lab glassware [pdf]

Why Busy Beaver hunters fear the Antihydra

Go beyond Goroutines: introducing the Reactive paradigm

Are these real CVEs? VulDB entries for dnsmasq rely on replacing config files

MCP-Scanner – Scan MCP Servers for vulnerabilities

The new calculus of AI-based coding

TOON – Token Oriented Object Notation

Smartphones manipulate our emotions and trigger our reflexes

Should LLMs just treat text content as an image?

The last European train that travels by sea

Creating an all-weather driver

Show HN: Dlog – Journaling and AI coach that learns what drives wellbeing (Mac)

Show HN: JSON Query

Solving regex crosswords with Z3

Image Dithering: Eleven Algorithms and Source Code (2012)

Sieve (YC X25) is hiring engineers to build video datasets for frontier AI

When 'perfect' code fails

AI can code, but it can't build software

Corrosion

PSF has withdrawn $1.5M proposal to US Government grant program

Fnox, a secret manager that pairs well with mise

AI can code, but it can't build software

Comments