When AI writes the software, who verifies it?

https://leodemoura.github.io/blog/2026/02/28/when-ai-writes-the-worlds-software.html

55•todsacerdoti•2h ago

Comments

rademaker•2h ago

In his latest essay, Leonardo de Moura makes a compelling case that if AI is going to write a significant portion of the world’s software, then verification must scale alongside generation. Testing and code review were never sufficient guarantees, even for human-written systems; with AI accelerating output, they become fundamentally inadequate. Leo argues that the only sustainable path forward is machine-checked formal verification — shifting effort from debugging to precise specification, and from informal reasoning to mathematical proof checked by a small, auditable kernel. This is precisely the vision behind Lean: a platform where programs and proofs coexist, enabling AI not just to generate code, but to generate code with correctness guarantees. Rather than slowing development, Lean-style verification enables trustworthy automation at scale.

righthand•1h ago

No one really. Code is for humans to read and for machines to compile and execute. Llms are enabling people to just write the code and not have anyone read it. It’s solving a problem that didn’t really exist (we already had code generators before llms).

It’s such an intoxicating copyright-abuse slot machine that a buddy who is building an ocaml+htmx tree editor told me “I always get stuck and end up going to the llm to generate code. Usually when I get to the html part.” I asked if he used a debugger before that, he said “that’s a good idea”.

galbar•1h ago

This is something I've been wondering about...

If boilerplate was such a big issue, we should have worked on improving code generation. In fact, many tools and frameworks exist that did this already: * rails has fantastic code generation for CRUD use cases * intelliJ IDEs have been able to do many types of refactors and class generation that included some of the boilerplate

I haven't reached a conclusion on this train of thought yet, though.

acedTrex•1h ago

No one does currently, and its going to take a few very painful and high profile failures of vital systems for this industry to RELEARN its lesson about the price of speed.

In fact it will probably need to happen a few times PER org for the dust to settle. It will take several years.

arscan•1h ago

Sure but industry cares about value (= benefit - price), not just price. Price could be astronomical, but that doesn’t matter if benefit is larger.

jcgrillo•57m ago

I feel like people used to talk about nines of uptime more. As in more than one. These days we've lost that: https://bsky.app/profile/jkachmar.com/post/3mg4u3e6nak2p

I recall a time, maybe around 2013-2017, when people were talking about 4 or 5 nines. But sometime around then the goalposts shifted, and instead of trying to make things as reliable as possible, it started becoming more about seeing how unreliable they can get before anyone notices or cares. It turns out people will suffer through a lot if there's some marginal benefit--remember what personal computers were like in the 1990s before memory protection? Vibe coding is just another chapter in that user hostile epic. Convenient reliability, like this author describes, (if it can be achieved) might actually make things better? But my money isn't on that.

foolfoolz•1h ago

no one wants to believe this but there will be a point soon when an ai code review meets your compliance requirements to go to production. is that 2026? no. but it will come

righthand•1h ago

We already have specifications though, so that’s not different. What happens when the AI is wrong and wont let anyone deploy to production?

oakpond•1h ago

You do. Even the latest models still frequently write really weird code. The problem is some developers now just submit code for review that they didn't bother to read. You can tell. Code review is more important than ever imho.

MrDarcy•1h ago

It is remarkably effective to have Claude Code do the code review and assign a quality score, call it a grade, to the contribution derived from your own expectations of quality.

Then don’t even bother looking at C work or below.

NitpickLawyer•1h ago

IME it works even better if you use another model for review. We've seen code by cc and review by gpt5.2/3 work very well.

Also works with planning before any coding sessions. Gemini + Opus + GPT-xhigh works to get a lot of questions answered before coding starts.

sausagefeet•1h ago

I agree with you. But I have to say, it is an uphill battle and all the incentives are against you.

1. AI is meant to make us go faster, reviews are slow, the AI is smart, let it go.

2. There are plenty of AI maximizers who only think we should be writing design docs and letting the AI go to town on it.

Maybe, this might be a great time to start a company. Maximize the benefits of AI while you can without someone who has never written a line of code telling you that your job is going to disappear in 12 months.

All the incentives are against someone who wants to use AI in a reasonable way, right now.

redhed•57m ago

I actually agree with good time to start a company. Lot of available software engineers that can actually understand code, AI at a level that can actually speed up development, and so many startups focusing on AI wrapper slop that you can actually make a useful product and separate yourself from the herd.

Or you can be a grifter and make some AI wrapper yourself and cash out with some VC investment. So good time for a new company either way.

xienze•57m ago

> The problem is some developers now just submit code for review that they didn't bother to read.

Can you blame them? All the AI companies are saying “this does a better job than you ever could”, every discussion topic on AI includes at least one (totally organic, I’m sure) comment along the lines of “I’ve been developing software for over twenty years and these tools are going to replace me in six months. I’m learning how to be a plumber before I’m permanently unemployed.” So when Claude spits out something that seems to work with a short smoke test, how can you blame developers for thinking “damn the hype is real. LGTM”?

jf22•34m ago

I'm an 99% organic person (I suppose I have tooth fillings) and the new models write code better than I do.

I've been using LLMS for 14+ months now and they've exceeded my expectations.

xienze•5m ago

So are you learning a trade? Or do you somehow think you’ll be one of the developers “good enough” to remain employed?

bluefirebrand•18m ago

> Can you blame them?

Yes I absolutely can and do blame them

bradleykingz•33m ago

But it's so BORING. AI gets to do the fun part (writing code) and I'm stuck with the lame bits.

It's like watching someone else solve a puzzle, or watching someone else play a game vs playing it yourself (at least that's half as interesting as playing it through)

lukan•25m ago

For me the most fun part is getting something that works. Design the goal, but not micromanage and get lost in the details. I love AI for that, but it is hard really owning code this way. (At least I manually approve every or most changes, but still, verifying is hard).

bitwize•6m ago

AI has really sharpened the line between the Master Builders of the world and the Lord Businesses along this question: What, exactly, is the "fun part" of programming? Is it simply having something that works? Or is it the process of going from not having it to having it through your own efforts and the sum total of decisions you made along the way?

_pdp_•1h ago

I think the issue goes even deeper than verification. Verification is technically possible. You could, in theory, build a C compiler or a browser and use existing tests to confirm it works.

The harder problem is discovery: how do you build something entirely new, something that has no existing test suite to validate against?

Verification works because someone has already defined what "correct" looks like. There is possible a spec, or a reference implementation, or a set of expected behaviours. The system just has to match them.

But truly novel creation does not have ground truth to compare against and no predefined finish line. You are not just solving a problem. You are figuring out what the problem even is.

Avshalom•56m ago

Well that's a problem the software industry has been building for itself for decades.

Software has, since at least the adoption of "agile" created an industry culture of not just refusing to build to specs but insisting that specs are impossible to get from a customer.

daveguy•23m ago

Agile hasn't been insisting that specs are impossible to get from a customer. They have been insisting that getting specs from a customer is best performed as a dynamic process. In my opinion, that's one of agile's most significant contributions. It lines up with a learning process that doesn't assume the programmer or the customer knows the best course ahead of time.

holtkam2•1h ago

At the end of the day you need humans who understand the business critical (or safety critical) systems that underpin the enterprise.

Someone needs to be held accountable when things go wrong. Someone needs to be able to explain to the CEO why this or that is impossible.

If you want to have AI generate all the code for your business critical software, fine, but you better make sure you understand it well. Sometimes the fastest path to deep understanding is just coding things out yourself - so be it.

This is why the truly critical software doesn’t get developed much faster when AI tools are introduced. The bottleneck isn’t how fast the code can be created, it’s how fast humans can construct their understanding before they put their careers on the line by deploying it.

Ofc… this doesn’t apply to prototypes, hackathons, POCs, etc. for those “low stakes” projects, vibe code away, if you wish.

lgl•1h ago

I'm in the process of building v2.0 of my app using opus 4.6 and largely agree with this.

It's pretty awesome but still does a lot of basic idiotic stuff. I was implementing a feature that required a global keyboard shortcut and asked opus to define it, taking into account not to clash with common shortcuts. He built a field where only one modifier key was required. After mentioning that this was not safe since users could just define CTRL+C for the shortcut and we need more safeguards and require at least two modifier keys I got the usual "you're absolutely right" and proceeded to require two modifier keys. But then it also created a huge list of common shortcuts into a blacklist like copy, cut, paste, print, select all, etc.. basically a bunch of single modifier key shortcuts. Once I mentioned that since we're already forcing two modifier keys that's useless it said I'm right again and fixed it.

The counter point of this idiocy is that it's very good overall at a lot of what is (in my mind) much more complicated stuff. It's a .NET app and stuff like creating models, viewmodels, usercontrols, setting up the entire hosting DI with pretty much all best practices for .net it does it pretty awesomely.

tl;dr is that training wheels are still mandatory imho

indymike•57m ago

Because of the scale of generated code, often it is the AI verifying the AI's work.

tartoran•54m ago

So who's verifying the AI doing the verifying or is it yet another AI layer doing that? If something goes wrong who's liable, the AI?

simonw•40m ago

The "Nearly half of AI-generated code fails basic security tests" link provided in this piece is not credible in my opinion. It's a very thinly backed vendor report from a company selling security scanning software.

muraiki•28m ago

The article says that AWS's Cedar authorization policy engine is written in Lean, but it's actually written in Dafny. Writing Dafny is a lot closer to writing "normal" code rather than the proofs you see in Lean. As a non-mathematician I gave up pretty early in the Lean tutorial, while in a recent prototype I learned enough Dafny to be semi-confident in reviewing Claude's Dafny code in about half a day.

The Dafny code formed a security kernel at the core of a service, enforcing invariants like that an audit log must always be written to prior to a mutating operation being performed. Of course I still had bugs, usually from specification problems (poor spec / design) or Claude not taking the proof far enough (proving only for one of a number of related types, which could also have been a specification problem on my part).

In the end I realized I'm writing a bunch of I/O bound glue code and plain 'ol test driven development was fine enough for my threat model. I can review Python code more quickly and accurately than Dafny (or the Go code it eventually had to link to), so I'm back to optimizing for humans again...

yoaviram•14m ago

I just finished writing a post about exactly this. Software development, as the act of manually producing code, is dying. A new discipline is being born. It is much closer to proper engineering.

Like an engineer overseeing the construction of a bridge, the job is not to lay bricks. It is to ensure the structure does not collapse.

The marginal cost of code is collapsing. That single fact changes everything.

https://nonstructured.com/zen-of-ai-coding/

MattDaEskimo•13m ago

Accountability then

yoaviram•11m ago

Anticipating modes of failure

bitwize•10m ago

Also AI.

I'm reluctant to verify my identity or age for any online services

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

GitHub Is Having Issues

MacBook Pro with new M5 Pro and M5 Max

GPT‑5.3 Instant

Physics Girl: Super-Kamiokande – Imaging the sun by detecting neutrinos [video]

Iran War Cost Tracker

MacBook Air with M5

Don't become an engineering manager

Claude's Cycles [pdf]

The Xkcd thing, now interactive

Why payment fees matter more than you think

Apple Studio Display and Studio Display XDR

When AI writes the software, who verifies it?

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

Tell HN: GitHub Having Issues

TorchLean: Formalizing Neural Networks in Lean

Arm's Cortex X925: Reaching Desktop Performance

I'm losing the SEO battle for my own open source project

Show HN: Explain Curl Commands

Disable Your SSH access accidentally with scp

Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act

Simplifying Application Architecture with Modular Design and MIM

The beauty and terror of modding Windows

Points on a ring: An interactive walkthrough of a popular math problem

Most-read tech publications have lost over half their Google traffic since 2024

Pass-Through of Tariffs: Evidence from European Wine Imports

Show HN: Effective Git

Meta’s AI smart glasses and data privacy concerns

Simple screw counter

When AI writes the software, who verifies it?

Comments

I'm reluctant to verify my identity or age for any online services

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

GitHub Is Having Issues

MacBook Pro with new M5 Pro and M5 Max

GPT‑5.3 Instant

Physics Girl: Super-Kamiokande – Imaging the sun by detecting neutrinos [video]

Iran War Cost Tracker

MacBook Air with M5

Don't become an engineering manager

Claude's Cycles [pdf]

The Xkcd thing, now interactive

Why payment fees matter more than you think

Apple Studio Display and Studio Display XDR

When AI writes the software, who verifies it?

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

Tell HN: GitHub Having Issues

TorchLean: Formalizing Neural Networks in Lean

Arm's Cortex X925: Reaching Desktop Performance

I'm losing the SEO battle for my own open source project

Show HN: Explain Curl Commands

Disable Your SSH access accidentally with scp

Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act

Simplifying Application Architecture with Modular Design and MIM

The beauty and terror of modding Windows

Points on a ring: An interactive walkthrough of a popular math problem

Most-read tech publications have lost over half their Google traffic since 2024

Pass-Through of Tariffs: Evidence from European Wine Imports

Show HN: Effective Git

Meta’s AI smart glasses and data privacy concerns

Simple screw counter