frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

AI will make formal verification go mainstream

https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html
6•mau•1mo ago

Comments

mvr123456•1mo ago
This is going to happen and is real stuff we could be working towards with the tools that we already have. No need for AGI vaporware, no waiting around for the perfect agentic playground. Not even necessarily a big requirement for excellent reasoning. Just using LLMs for what they are actually good at, i.e. fuzzy translators, stylistic filters, and compilers.

Gradual-typing was practice and hints. Like gradual typing, gradual spec'ing could be an iterative and kind of parallel annotated representation, ignored by the main runtime unless called for, and ignored by developers that aren't interested in it. But when there's enough of it to hit some kind of critical mass, then it's suddenly very powerful and lots of very interesting stuff is possible

mutkach•1mo ago
I certainly hope so.

I wonder, what is the actual blocker right now? I'd assume that LLMs are still not very good with specifications and verifcation languages? Anyone tried Datalog, TLA+, etc. with LLMs? I suppose that Gemini was specifically trained on Lean. Or at least some IMO-finetuned fork of it. Anyhow, there's probably a large Lean dataset collected somewhere in Deepmind servers, but that's not certification applicable necessarily, I think?

> AI also creates a need to formally verify more software: rather than having humans review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is correct.

At RL stage LLMs could game the training*, proving easier invariants then actually expected (the proof is correct and possibly short - means positive reward). It would take additional care it to set it up right.

* I mean, if you set it to generate a code AND a proof to it.

lukeasrodgers•1mo ago
I would like this to be true, but am a bit skeptical.

I am what the article calls an "industrial software engineer" and I work on "low- to medium-assurance" projects, but have used various formal methods (alloy and TLA+) in my work to prevent and discover bugs.

I've experimented with using LLMs to generate both Alloy and TLA+ a couple times over the past years, and the problems I see are:

- They have gotten better over the last few years, but still can only produce useful results in the hands of someone who is moderately competent. Becoming moderately competent requires many hours of investment in these tools, and you will lose much of this competence if you don't keep it up. For example, I can still read TLA+ and Pluscal but can't write them without lots of referring to the docs because I only write them like once or twice a year.

- They suffer even more from GIGO than other aspects of software development. If you can't really rigorously define your problem you will get a bad model/output that only gives you false confidence. A large part of the value of doing formal methods is building the muscle for thinking rigorously. Hillel Wayne says this in several places, that doing enough TLA+ (e.g.) work gives you a much better innate sense for where there will be race conditions.

- There will still be a cultural and technical problems with integrating formal methods, and their artifacts, into the rest of your codebase and team. For example, how do you prevent drift? Will you have a CI automation that uses an LLM to detect when the spec has diverged from the code?

I'm not saying it is impossible that this will happen, and I would love to be wrong, but the general tendency I see with LLM use is to make software developers less intimately familiar with their tools, and less invested in deeply understanding their code. That bodes ill for formal methods even more than regular programming.

mutkach•1mo ago
What would you suggest as a reference problem (a benchmark of sorts) to try to play with formal methods for someone with just a bit of formal verification background but not in the field of software verification? Can you suggest some helpful materials?

I've come across TLA+ multiple times, but it seems it was more targeted towards distributed systems (Lamport being the creator, that makes sense). Is it correct, that it would be useless in other domains?

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•38s ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
1•ramenbytes•3m ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•4m ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•7m ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•8m ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
1•cinusek•8m ago•0 comments

Starter Template for Ory Kratos

https://github.com/Samuelk0nrad/docker-ory
1•samuel_0xK•10m ago•0 comments

LLMs are powerful, but enterprises are deterministic by nature

1•prateekdalal•13m ago•0 comments

Make your iPad 3 a touchscreen for your computer

https://github.com/lemonjesus/ipad-touch-screen
2•0y•18m ago•1 comments

Internationalization and Localization in the Age of Agents

https://myblog.ru/internationalization-and-localization-in-the-age-of-agents
1•xenator•18m ago•0 comments

Building a Custom Clawdbot Workflow to Automate Website Creation

https://seedance2api.org/
1•pekingzcc•21m ago•1 comments

Why the "Taiwan Dome" won't survive a Chinese attack

https://www.lowyinstitute.org/the-interpreter/why-taiwan-dome-won-t-survive-chinese-attack
1•ryan_j_naughton•21m ago•0 comments

Xkcd: Game AIs

https://xkcd.com/1002/
1•ravenical•23m ago•0 comments

Windows 11 is finally killing off legacy printer drivers in 2026

https://www.windowscentral.com/microsoft/windows-11/windows-11-finally-pulls-the-plug-on-legacy-p...
1•ValdikSS•23m ago•0 comments

From Offloading to Engagement (Study on Generative AI)

https://www.mdpi.com/2306-5729/10/11/172
1•boshomi•25m ago•1 comments

AI for People

https://justsitandgrin.im/posts/ai-for-people/
1•dive•26m ago•0 comments

Rome is studded with cannon balls (2022)

https://essenceofrome.com/rome-is-studded-with-cannon-balls
1•thomassmith65•32m ago•0 comments

8-piece tablebase development on Lichess (op1 partial)

https://lichess.org/@/Lichess/blog/op1-partial-8-piece-tablebase-available/1ptPBDpC
2•somethingp•33m ago•0 comments

US to bankroll far-right think tanks in Europe against digital laws

https://www.brusselstimes.com/1957195/us-to-fund-far-right-forces-in-europe-tbtb
3•saubeidl•34m ago•0 comments

Ask HN: Have AI companies replaced their own SaaS usage with agents?

1•tuxpenguine•37m ago•0 comments

pi-nes

https://twitter.com/thomasmustier/status/2018362041506132205
1•tosh•39m ago•0 comments

Show HN: Crew – Multi-agent orchestration tool for AI-assisted development

https://github.com/garnetliu/crew
1•gl2334•39m ago•0 comments

New hire fixed a problem so fast, their boss left to become a yoga instructor

https://www.theregister.com/2026/02/06/on_call/
1•Brajeshwar•41m ago•0 comments

Four horsemen of the AI-pocalypse line up capex bigger than Israel's GDP

https://www.theregister.com/2026/02/06/ai_capex_plans/
1•Brajeshwar•41m ago•0 comments

A free Dynamic QR Code generator (no expiring links)

https://free-dynamic-qr-generator.com/
1•nookeshkarri7•42m ago•1 comments

nextTick but for React.js

https://suhaotian.github.io/use-next-tick/
1•jeremy_su•44m ago•0 comments

Show HN: I Built an AI-Powered Pull Request Review Tool

https://github.com/HighGarden-Studio/HighReview
1•highgarden•44m ago•0 comments

Git-am applies commit message diffs

https://lore.kernel.org/git/bcqvh7ahjjgzpgxwnr4kh3hfkksfruf54refyry3ha7qk7dldf@fij5calmscvm/
1•rkta•47m ago•0 comments

ClawEmail: 1min setup for OpenClaw agents with Gmail, Docs

https://clawemail.com
1•aleks5678•53m ago•1 comments

UnAutomating the Economy: More Labor but at What Cost?

https://www.greshm.org/blog/unautomating-the-economy/
1•Suncho•1h ago•1 comments