Teaching Program Verification in Dafny at Amazon (2023)

https://dafny.org/blog/2023/12/15/teaching-program-verification-in-dafny-at-amazon/

57•Jtsummers•1d ago

Comments

Jtsummers•1d ago

https://news.ycombinator.com/item?id=38691437 - Dec 2023, 1 comment

This only had the one previous submission but I found it interesting. The mentioned book, Program Proofs, is worth checking out if the topic and language interests you.

trashchomper•1d ago

Having played with Dafny only in a university course, I really enjoyed it as a way of implementing algorithms and being certain they work with much less cruft than unit tests.

I haven't gone looking but verifier tools compatible with languages people already use (typescript/rust/go/whatever is the flavour of the month) feel like the way to go

Jtsummers•1d ago

I agree, using tools more in line with your language is better but I believe the knowledge from learning Dafny ought to transfer well to other systems. And Dafny seems better as a pedagogical system than what I've seen and used for other languages.

I'm exploring it now as a way to ease colleagues into SPARK. A lot of the material appears to transfer over and the book Program Proofs seems better to me than what I found for SPARK. I probably wouldn't have colleagues work through the book themselves so much as run a series of tutorials. We've done this often in the past when trying to bring everyone up to speed on some new skillset or tooling, if someone already knows it (or has the initiative to learn ahead of time) then they run tutorial sessions every week or so for the team.

mmoskal•1d ago

https://github.com/verus-lang/verus is similar tool for Rust (developed by previous heavy users of Dafny).

codebje•1d ago

The languages people already use are inconsistent (in a logic theory sense) and lack formal semantics. Efforts to try and prove anything useful about programs written in those languages don't get far, because the first fact requires you to restrict everything to a subset of the languages, and sometimes that restricts you to an unworkable subset, and the latter fact is a massive hill to climb at whose peak is a note saying, "we have changed the implementation, this hill is no longer relevant."

The only* practical way to approach this is exactly Dafny: start with a (probably) consistent core language with well understood semantics, build on a surface language with syntax features that make it more pleasant to use, proving that the surface language has a translation to the core language that means semantics are clear, and then generate the desired final language after verification has succeeded.

Dafny's about the best of the bunch for this too, for the set of target languages it supports.

(It's fine and normal for pragmatic languages to be inconsistent: all that means here is you can prove false is true, which means you can prove anything and everything trivially, but it also means you can tell the damn compiler to accept your code you're sure is right even though the compiler tells you it isn't. It looks like type casts, "unsafe", and the like.)

* one could alternatively put time and effort into making consistent and semantically clear languages compile efficiently, such that they're worth using directly.

Xmd5a•1d ago

Can't I just work on a subset of the language? I don't care if the linked list implementation I use isn't verified. I want to verify my own code, not because it will run on mars, but as an alternative to writing tests. Is this possible?

4ad•1d ago

Unless you use a language designed for formal verification (Lean/Idris/Agda/F*/ATS/etc), no, it is not possible.

You can get pretty far in Haskell (with various extensions) and Scala. But for Go/TypeScript/etc, forget about it.

Jtsummers•1d ago

Yes. See SPARK for an example of this. It is a subset of Ada plus the SPARK portions to convey details and proofs. You can use it per file, not just on an entire program, which lets you prove properties of critical parts and leave the rest to conventional testing approaches.

4ad•1d ago

Dafny is great, and has some advantages compared to its competitors, but unequivocally calling it "the best" is quite bullish. For example, languages using dependent types (F*, ATS, Coq/Adga/Lean) are more expressive. And there are very mature systems using HOL.

Truth is that everything involves a tradeoff, and some systems are better than others at different things. Dafny explores a particular design space. Hoare-style invariants are easier to use than dependent types (as long as your SMT solver is happy, anyway) but F* also has that, except that in F* you can also use dependent types when automatic refinement proofs become inadequate. And F* and ATS can target low-level, more so than Dafny.

Probably I would not use ATS for anything, but between F* and Dafny, there isn't such a clear cut (I'd most likely use F*).

And if I don't need (relatively) low-level, I wouldn't use either.

codebje•15h ago

Ah, my apologies, I wasn't particularly clear. The "best" I am referring to is only code generation into popularly used languages.

I'm currently using Lean myself, but Agda's probably going to win out longer term for me simply because it compiles to GHC Haskell.

pjmlp•1d ago

The only tool I see to ever take off in languages that people already use is design by contract, and is has been a hard ride even making that available in some.

Let alone something more complex in formal verification.

anonymousDan•1d ago

Is this applicable to proofs of concurrent code? Or is Dafny not the right tool?

lou1306•1d ago

I am not a Dafny expert but from what I have gathered it uses a deductive procedure underneath, so it's rather geared towards sequential code. To analyse concurrent code, one needs to essentially build a sequential program that _also models the scheduler_ (see e.g., [1]).

This procedure is unsurprisingly called sequentialization and (somewhat less unsurprisingly) is also a pretty good approach when applied to other techniques, such as bounded model checking [2].

[1] https://leino.science/papers/krml260.pdf

[2] https://research.cs.wisc.edu/wpis/papers/cav08.pdf

pjmlp•1d ago

A plus for Dafny versus something like TLA+, is that it is an actual programming language, so there is some guarantee that the proofs have been properly translated to executable code, and that further changes still map to the proofs.

Currently it has translation backends for C#, Java, JavaScript, Go and Python.

https://dafny.org/latest/DafnyRef/DafnyRef#sec-compilation

4ad•1d ago

Comparison with TLA+ doesn't make any sense as TLA+ implements a very different sort of logic, but the property that it is a real programming language is shared by virtually everything in this space.

Lean/Adga are real programming languages, while Coq (Rocq), F*, ATS, Isabelle/HOL all extract to various other programming languages.

Frankly, it's TLA+ that is the odd one here.

pjmlp•1d ago

Agreed, my remark was more coming from the point of view that I don't get why TLA+ keeps being talked about, when there is such an impedance mismatch between the math model proving the logic, and the actual implementation, where even the language semantics might play a role that wasn't clearly mapped on TLA+ model.

steego•23h ago

Isn't TLA+ is more like Alloy insofar as they're thinking tools optimized for the design phase?

I'm more familiar with Alloy, which is a great tool for exploring a specification and looking for counter-examples that violate your specification.

AFAIK, none of the languages you listed above work well in conceptualization phase. Are any of them capable of synthesizing counter-examples out of the box? (Aside: I feel like Lean's meta capabilities could be leveraged to do this.)

pjmlp•23h ago

I only listed Dafny, although I do agree with the list on the reply to me.

Never looked into Alloy, I guess have to get an understanding of it.

How can you validate that the beautiful design phase actually maps to e.g. C code, writing data via ODBC to a SQL database, with stored procedures written in PL/SQL?

Neither the semantics of the toolchains nor the semantics of the commercial products ar part of the TLA+ model as such.

Additionally it requires someone to meticulously compare the mathematical model with the implementation code, to validate that what is written actually maps to what was designed.

Although it wouldn't work for my contrieved example, at least tools like Dafny have more viability by going through "formal model => generate library for consumption", so that we can get an automated model to code validation, without human intervention.

Jtsummers•22h ago

> Additionally it requires someone to meticulously compare the mathematical model with the implementation code, to validate that what is written actually maps to what was designed.

This is a deficiency in TLA+ (and many other systems), but it's not a good enough reason to discard or dismiss it. The industry alternative to TLA+ is not something that traces fully and easily from spec to code, but mostly to use informal, largely prose, documents as their specification (if they specify anything at all). TLA+ is a massive improvement on that even if it's not a perfect tool. Same for Alloy and other systems. It's better if people model and specify at least portions of their system formally even if it still takes effort to verify the code correctly implements that specification, effort they have to expend anyways but with greater difficulty lacking any thing approaching a formal specification.

sterlind•19h ago

the problem I had with Dafny was the automated prover abruptly driving off a complexity cliff. the strategy is to build "tactic" functions that don't run at runtime but discharge a proof obligation at compile time.

I'd be ecstatic if LLMs could write those tactic functions. Unfortunately, they're not great at the reasoning and preciseness required (plus their training data isn't exactly overflowing with Dafny proofs.)

Software Error Forces MIAX Sapphire Exchange to Bust Trades

LLMs are mirrors of operator skill

"AI Will Replace All the Jobs " Is Just Tech Execs Doing Marketing

Is AI consciousness possible? Podcast

Crims stole 40k people's data from our network, admits publisher Lee Enterprises

Denmark gets more serious about digital sovereignty

Unreal Engine 5.6 is now available

Ask HN: Unexplainable Copilot Premium Requests

SSU drops full 4-min video of 'Operation Spiderweb' drone strikes inside Russia

Ditching Sonos for Music Assistant

AI Merch by Amazon Listing Generator – Amazon Listing Writer

You can now hide your Reddit posts and comments from your profile

One small step for man, but a huge leap for developers

AI Answer Engines Are Worth Trying

Jujutsu v0.30.0 Released

Just dropped ragbits v1.0 and create-ragbits-app – spin up a RAG app in minutes

Tsukuyomi: Intelligent Modular Framework for Structured Analysis and Processing

IBM Cloud login breaks for second time in a fortnight

Ask HN: Why doesn't HN have dark mode yet?

Show HN: A wah effect plugin for regular DAWs, web browsers and C64

Doubling Down on Open Source

Types ⊂ Tests ⊂ Evals?

How Will AI Transform Human Life in the Next 20 Years?

"Give me a few seconds to get everything ready for you so that I can help."

I'm organizing a free class on design as a tool for change ⬛

The Gutting of America's Medical Research

Assessing the Duration of the Paleocene-Eocene Thermal Maximum

Startup Wants to Help Parents Rank Embryos for Longevity (Nuclear Genomics)

Why finding a new job as an engineer is becoming so boring

Precision is not limited by the second law of thermodynamics

Teaching Program Verification in Dafny at Amazon (2023)

Comments

Software Error Forces MIAX Sapphire Exchange to Bust Trades

LLMs are mirrors of operator skill

"AI Will Replace All the Jobs " Is Just Tech Execs Doing Marketing

Is AI consciousness possible? Podcast

Crims stole 40k people's data from our network, admits publisher Lee Enterprises

Denmark gets more serious about digital sovereignty

Unreal Engine 5.6 is now available

Ask HN: Unexplainable Copilot Premium Requests

SSU drops full 4-min video of 'Operation Spiderweb' drone strikes inside Russia

Ditching Sonos for Music Assistant

AI Merch by Amazon Listing Generator – Amazon Listing Writer

You can now hide your Reddit posts and comments from your profile

One small step for man, but a huge leap for developers

AI Answer Engines Are Worth Trying

Jujutsu v0.30.0 Released

Just dropped ragbits v1.0 and create-ragbits-app – spin up a RAG app in minutes

Tsukuyomi: Intelligent Modular Framework for Structured Analysis and Processing

IBM Cloud login breaks for second time in a fortnight

Ask HN: Why doesn't HN have dark mode yet?

Show HN: A wah effect plugin for regular DAWs, web browsers and C64

Doubling Down on Open Source

Types ⊂ Tests ⊂ Evals?

How Will AI Transform Human Life in the Next 20 Years?

"Give me a few seconds to get everything ready for you so that I can help."

I'm organizing a free class on design as a tool for change ⬛

The Gutting of America's Medical Research

Assessing the Duration of the Paleocene-Eocene Thermal Maximum

Startup Wants to Help Parents Rank Embryos for Longevity (Nuclear Genomics)

Why finding a new job as an engineer is becoming so boring

Precision is not limited by the second law of thermodynamics