frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Test, don't (just) verify

https://alperenkeles.com/posts/test-dont-verify/
57•alpaylan•2h ago

Comments

esafak•1h ago
Alperen,

Thanks for the article. Perhaps you could write a follow-up article or tutorial on your favored approach, Verification-Guided Development? This is new to most people, including myself, and you only briefly touch on it after spending most of the article on what you don't like.

Good luck with your degree!

P.S. Some links in your Research page are placeholders or broken.

alpaylan•1h ago
I'll add some links for the original VGD paper and related articles, that should help in short term. Thank you! I'll look into writing something on VGD itself in the next few weeks.
getregistered•1h ago
> AI-assisted programming pushes the limits of programming from what you can implement to what you can specify and what you can verify.

This really resonates. We can write code a lot faster than we can safely deploy it at the moment.

marcosdumay•1h ago
> We can write code a lot faster than we can safely deploy it at the moment.

We always could. That has been true since the days we programmed computers by plugging jumper wires into a panel.

acedTrex•31m ago
> We can write code a lot faster than we can safely deploy it at the moment.

This has always been the case?

badgersnake•1h ago
> AI is making formal verification go mainstream.

This nonsense again. No. No it isn’t.

I’m sure the people selling it wish it was, but that doesn’t make it true.

AnimalMuppet•1h ago
LLM-style AI isn't great for formal verification, not so far as I understand. And the recent advances in AI didn't do much for the kind of AI that is useful for formal verification.
otterley•15m ago
You don’t use AI to perform formal verification. You give the agent access to verification tools whose output can then be fed back to the model.

It’s the same design as giving LLMs the current time, since they can’t tell time themselves, either.

baq•32m ago
You haven't been paying attention.

The fact that we're reading about it here today and have read about it in the past weeks is one piece of evidence. Another is that we hadn't been reading about it in the past months before November. Opus 4.5 and GPT 5.2 have crossed an usefulness frontier.

Anecdotally, I've been having some success (guiding LLMs) writing Alloy models in the past month and ensuring conformance with code. Making these would've been unjustifiable from ROI perspective fairy tales just this summer. The landscape has changed qualitatively.

andai•1h ago
Related discussion from last week:

AI will make formal verification go mainstream

https://news.ycombinator.com/item?id=46294574

zipy124•1h ago
I think this misses a lot of reasons why learning verification is important. For instance learning the concept of invariants and their types such as loop invariants. They make reasoning about code in general easier, even if you never formally do any verification, it makes it easier to write tests or asserts(). A substantial amount of bugs are due to the program having a different state to that assumed by the programmer, and there are other tools that help with this. For example a statically typed language is a type of verification since it verifies a variable has a specific type and thus operations that can be performed on it, and limits the valid input and output range of any function. Languages like Rust are also verification in terms of memory correctness, and are also extremely useful tools.
CuriouslyC•1h ago
Formal verification is a nice idea but it's a big hill to climb from where we're at. Most people can't even get agents to robustly E2E QA code, which is a much smaller hill to climb for (probably) larger benefits. I'm sure this area will improve over time though, since it is an eventual unlock for fully autonomous engineering.
__MatrixMan__•1h ago
I think for most complex systems, robust E2E QA is a waste of money. A small handful of E2E smoke tests and thoughtful application of smaller tests is usually enough. Though to be fair, agent aren't good at that either.
smccabe0•56m ago
I think the section on AI from Zero to QED (a proofs in Lean/lang guide) gives a sober path forward from the perspective of market-makers and trading:

"Imagine market infrastructure where agents must prove, before executing, that their actions satisfy regulatory constraints, risk limits, fairness properties, and eventually machine-checkable proofs of Pareto efficiency of market mechanisms. This is a big, hairy, ambitious goal. Not “we reviewed the code” but “the system verified the proof.” The agent that cannot demonstrate compliance cannot act."

https://sdiehl.github.io/zero-to-qed/20_artificial_intellige...

TeodorDyakov•9m ago
I dream of a future where before any software is released we can predict 100 years into the future what effect it will have on every living thing and not release it if unhappiness delta for some living thing falls below a certain threshold.
ecocentrik•58m ago
Doesn't this run into the same bottleneck as developing AI first languages? AI need tons of training material for how to write good formal verification code or code in new AI first languages that doesn't exist. The only solution is large scale synthetic generation which is hard to do if you humans, on some level, can't verify that the synthetic data is any good.
omgJustTest•49m ago
my user should get upvotes for this :)
andrewmutz•41m ago
I agree completely with the author that AI assisted coding pushes the bottleneck to verification of the code.

But you don't really need complete formal verification to get these benefits. TDD gets you a lot of them as well. Perhaps your verification is less certain, but it's much easier to get high automated test coverage than it is to get a formally verifiable codebase.

I think AI assisted coding is going to cause a resurgence of interest in XP (https://en.wikipedia.org/wiki/Extreme_programming) since AI is a great fit for two big parts of XP. AI makes it easy to write well-tested code. The "pairing" method of writing code is also a great model for interacting with an AI assistant (much better than the vibe-coding model).

baq•28m ago
We won't be formally verifying millions of LOC anytime soon, don't get your hopes that high up.

...but we will be modelling those 5-10kLOC modules across multiple services doing critical business logic or distributed transactions. This has been unthinkable a couple months ago and today is a read-only-Friday experiment away (try it with a frontier model and you'll be surprised).

tgtweak•12m ago
I think more salient here (at term certainly) is setting up adversarial agents for testing/verification - that has been a big win for me in multi-agent workflows - when claude first released "computer use" that was a very big step in closing this loop and avoiding the manual babysitting involved in larger projects. PSA that it's not a silver bullet as the "analyzer" can still get tripped up and falsely declare something as broken (or functional), but it greatly reduces the "Hey I've done the task" when the task is not done or the output is broken.

Test, don't (just) verify

https://alperenkeles.com/posts/test-dont-verify/
58•alpaylan•2h ago•20 comments

Adobe Photoshop 1.0 Source Code (1990)

https://computerhistory.org/blog/adobe-photoshop-source-code/
283•tosh•4d ago•61 comments

Instant database clones with PostgreSQL 18

https://boringsql.com/posts/instant-database-clones/
194•radimm•7h ago•59 comments

Executorch: On-device AI across mobile, embedded and edge for PyTorch

https://github.com/pytorch/executorch
44•klaussilveira•5d ago•3 comments

Font with Built-In Syntax Highlighting (2024)

https://blog.glyphdrawing.club/font-with-built-in-syntax-highlighting/
62•california-og•5h ago•9 comments

Ryanair fined €256M over ‘abusive strategy’ to limit ticket sales by OTAs

https://www.theguardian.com/business/2025/dec/23/ryanair-fined-limit-online-travel-agencies-ticke...
63•aquir•4h ago•88 comments

Carnap – A formal logic framework for Haskell

https://carnap.io/
65•ravenical•6h ago•11 comments

10 years bootstrapped: €6.5M revenue with a team of 13

https://www.datocms.com/blog/a-look-back-at-2025
147•steffoz•7h ago•40 comments

Show HN: CineCLI – Browse and torrent movies directly from your terminal

https://github.com/eyeblech/cinecli
212•samsep10l•10h ago•77 comments

Snitch – A friendlier ss/netstat

https://github.com/karol-broda/snitch
246•karol-broda•14h ago•68 comments

It's Always TCP_NODELAY

https://brooker.co.za/blog/2024/05/09/nagle.html
389•eieio•18h ago•128 comments

Ask HN: What are the best engineering blogs with real-world depth?

174•nishilpatel•5h ago•57 comments

The Illustrated Transformer

https://jalammar.github.io/illustrated-transformer/
429•auraham•20h ago•79 comments

Ultrasound Cancer Treatment: Sound Waves Fight Tumors

https://spectrum.ieee.org/ultrasound-cancer-treatment
299•rbanffy•19h ago•87 comments

Partial inlining

https://xania.org/202512/18-partial-inlining
29•hasheddan•5d ago•1 comments

GLM-4.7: Advancing the Coding Capability

https://z.ai/blog/glm-4.7
379•pretext•20h ago•202 comments

NIST was 5 μs off UTC after last week's power cut

https://www.jeffgeerling.com/blog/2025/nist-was-5-μs-utc-after-last-weeks-power-cut
307•jtokoph•22h ago•131 comments

The Polyglot NixOS

https://x86.lol/generic/2025/12/19/polyglot.html
93•todsacerdoti•3d ago•28 comments

Claude Code gets native LSP support

https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md
473•JamesSwift•23h ago•280 comments

Our New Sam Audio Model Transforms Audio Editing

https://about.fb.com/news/2025/12/our-new-sam-audio-model-transforms-audio-editing/
133•ushakov•6d ago•47 comments

Cecot – 60 Minutes

https://archive.org/details/insidececot
936•lawlessone•14h ago•153 comments

Debian adds LoongArch as officially supported architecture

https://lists.debian.org/debian-devel-announce/2025/12/msg00004.html
111•cbmuser•3d ago•27 comments

Classical billiards can compute (2d billiard systems are Turing complete)

https://arxiv.org/abs/2512.19156
14•nabla9•5h ago•2 comments

The Garbage Collection Handbook

https://gchandbook.org/index.html
244•andsoitis•20h ago•29 comments

Solving the Problems of HBM-on-Logic

https://morethanmoore.substack.com/p/solving-the-problems-of-hbm-on-logic
13•zdw•5d ago•0 comments

CBS 60 minutes on CECOT only aired in Canada

https://archive.org/details/cecot-60mins
4•clacker-o-matic•6m ago•0 comments

FPGAs Need a New Future

https://www.allaboutcircuits.com/industry-articles/fpgas-need-a-new-future/
211•thawawaycold•4d ago•131 comments

The Duodecimal Bulletin, Vol. 55, No. 1, Year 1209 [pdf]

https://dozenal.org/drupal/sites_bck/default/files/DuodecimalBulletinIssue551.pdf
55•susam•13h ago•24 comments

Flock Exposed Its AI-Powered Cameras to the Internet. We Tracked Ourselves

https://www.404media.co/flock-exposed-its-ai-powered-cameras-to-the-internet-we-tracked-ourselves/
698•chaps•23h ago•427 comments

Are We Loong Yet?

https://areweloongyet.com/en/
12•todsacerdoti•5h ago•4 comments