When AI Writes the Software, Who Verifies It?

https://leodemoura.github.io/blog/2026/02/28/when-ai-writes-the-worlds-software.html

32•todsacerdoti•2h ago

Comments

rademaker•1h ago

In his latest essay, Leonardo de Moura makes a compelling case that if AI is going to write a significant portion of the world’s software, then verification must scale alongside generation. Testing and code review were never sufficient guarantees, even for human-written systems; with AI accelerating output, they become fundamentally inadequate. Leo argues that the only sustainable path forward is machine-checked formal verification — shifting effort from debugging to precise specification, and from informal reasoning to mathematical proof checked by a small, auditable kernel. This is precisely the vision behind Lean: a platform where programs and proofs coexist, enabling AI not just to generate code, but to generate code with correctness guarantees. Rather than slowing development, Lean-style verification enables trustworthy automation at scale.

righthand•54m ago

No one really. Code is for humans to read and for machines to compile and execute. Llms are enabling people to just write the code and not have anyone read it. It’s solving a problem that didn’t really exist (we already had code generators before llms).

It’s such an intoxicating copyright-abuse slot machine that a buddy who is building an ocaml+htmx tree editor told me “I always get stuck and end up going to the llm to generate code. Usually when I get to the html part.” I asked if he used a debugger before that, he said “that’s a good idea”.

galbar•21m ago

This is something I've been wondering about...

If boilerplate was such a big issue, we should have worked on improving code generation. In fact, many tools and frameworks exist that did this already: * rails has fantastic code generation for CRUD use cases * intelliJ IDEs have been able to do many types of refactors and class generation that included some of the boilerplate

I haven't reached a conclusion on this train of thought yet, though.

acedTrex•42m ago

No one does currently, and its going to take a few very painful and high profile failures of vital systems for this industry to RELEARN its lesson about the price of speed.

In fact it will probably need to happen a few times PER org for the dust to settle. It will take several years.

arscan•20m ago

Sure but industry cares about value (= benefit - price), not just price. Price could be astronomical, but that doesn’t matter if benefit is larger.

foolfoolz•40m ago

no one wants to believe this but there will be a point soon when an ai code review meets your compliance requirements to go to production. is that 2026? no. but it will come

righthand•31m ago

We already have specifications though, so that’s not different. What happens when the AI is wrong and wont let anyone deploy to production?

oakpond•31m ago

You do. Even the latest models still frequently write really weird code. The problem is some developers now just submit code for review that they didn't bother to read. You can tell. Code review is more important than ever imho.

MrDarcy•27m ago

It is remarkably effective to have Claude Code do the code review and assign a quality score, call it a grade, to the contribution derived from your own expectations of quality.

Then don’t even bother looking at C work or below.

NitpickLawyer•22m ago

IME it works even better if you use another model for review. We've seen code by cc and review by gpt5.2/3 work very well.

Also works with planning before any coding sessions. Gemini + Opus + GPT-xhigh works to get a lot of questions answered before coding starts.

sausagefeet•20m ago

I agree with you. But I have to say, it is an uphill battle and all the incentives are against you.

1. AI is meant to make us go faster, reviews are slow, the AI is smart, let it go.

2. There are plenty of AI maximizers who only think we should be writing design docs and letting the AI go to town on it.

Maybe, this might be a great time to start a company. Maximize the benefits of AI while you can without someone who has never written a line of code telling you that your job is going to disappear in 12 months.

All the incentives are against someone who wants to use AI in a reasonable way, right now.

_pdp_•21m ago

I think the issue goes even deeper than verification. Verification is technically possible. You could, in theory, build a C compiler or a browser and use existing tests to confirm it works.

The harder problem is discovery: how do you build something entirely new, something that has no existing test suite to validate against?

Verification works because someone has already defined what "correct" looks like. There is possible a spec, or a reference implementation, or a set of expected behaviours. The system just has to match them.

But truly novel creation does not have ground truth to compare against and no predefined finish line. You are not just solving a problem. You are figuring out what the problem even is.

holtkam2•14m ago

At the end of the day you need humans who understand the business critical (or safety critical) systems that underpin the enterprise.

Someone needs to be held accountable when things go wrong. Someone needs to be able to explain to the CEO why this or that is impossible.

If you want to have AI generate all the code for your business critical software, fine, but you better make sure you understand it well. Sometimes the fastest path to deep understanding is just coding things out yourself - so be it.

This is why the truly critical software doesn’t get developed much faster when AI tools are introduced. The bottleneck isn’t how fast the code can be created, it’s how fast humans can construct their understanding before they put their careers on the line by deploying it.

Ofc… this doesn’t apply to prototypes, hackathons, POCs, etc. for those “low stakes” projects, vibe code away, if you wish.

lgl•10m ago

I'm in the process of building v2.0 of my app using opus 4.6 and largely agree with this.

It's pretty awesome but still does a lot of basic idiotic stuff. I was implementing a feature that required a global keyboard shortcut and asked opus to define it, taking into account not to clash with common shortcuts. He built a field where only one modifier key was required. After mentioning that this was not safe since users could just define CTRL+C for the shortcut and we need more safeguards and require at least two modifier keys I got the usual "you're absolutely right" and proceeded to require two modifier keys. But then it also created a huge list of common shortcuts into a blacklist like copy, cut, paste, print, select all, etc.. basically a bunch of single modifier key shortcuts. Once I mentioned that since we're already forcing two modifier keys that's useless it said I'm right again and fixed it.

The counter point of this idiocy is that it's very good overall at a lot of what is (in my mind) much more complicated stuff. It's a .NET app and stuff like creating models, viewmodels, usercontrols, setting up the entire hosting DI with pretty much all best practices for .net it does it pretty awesomely.

tl;dr is that training wheels are still mandatory imho

OpenAI releases GPT-5.3 Instant update to make ChatGPT less 'cringe'

Beatport and Beatsource to Unite into One Premium DJ Platform

Identity Formation and the Politics of Belonging: Bengali Migrants in Kerala [pdf]

Ask HN: What are your go to sources for relatively unbiased global news?

Show HN: Voquill, an open source and cross-platform alternative to wisprflow

The unfortunate need for an "age verification" API for legal compliance

OpenclawwOpenClaw Partners with VirusTotal for Skill Security

Blocking a brain receptor may calm blood pressure signals

Show HN: Mozilla.ai introduces Clawbolt, an AI Assistant for the trades

Claude and Pentagon whole fight timeline

New tool for designing software architecture diagrams and presentations

Section 230 is the best protection we have from Trump's censorship

Cofounder search: An internet-native way to do ML and bio research

The Making of the Atomic Bomb book predicted the AI crisis before it happened

Show HN: SmartRuler Pro – ESP32-powered motorized ruler with 0.5mm precision

Show HN: HackerNews.pink – A PWA HN reader with personalized recommendations

Show HN: SOTA long memory eval with open source models

Wormhole Vectors with Trey Grainger

Why payment fees matter more than you think

GitLab Active Incident

Show HN: OpenMandate – Declare what you need, get matched

El Paso ICE Camp East Montana under quarantine after measles outbreak

Waymo Driving in Snow

IOC announces principles of neutrality during aggression against Iran

Skill for Creating Agent Skills

The Birth of a New Commodity Class and a Spot Market for Inference

Understanding Model Context Protocol: Connecting Your Software to AI

Do Nothing Investing

Reverse Engineering Crazy Taxi, Part 1

War has pushed global markets into the danger zone