What works (and doesn't) selling formal methods

https://www.galois.com/articles/what-works-and-doesnt-selling-formal-methods

137•azhenley•4d ago

Comments

agentultra•1d ago

I think the cost of some formal methods have long been over-stated.

Model checking, for example, is not as hard as it sounds and not nearly as expensive or time consuming as building the wrong thing. Most programmers I’ve taught don’t seem to have too much difficulty learning TLA+ or Alloy. The concepts themselves are fundamentally rather simple.

Theorem proving does take some expertise. But it’s also not an insurmountable obstacle.

However I agree that there’s a spectrum of cost and the higher you go down the assurance chain the effort and time it will take. Often we do price in the cost of errors into the holistic process of software (eg: SRE style error budgets, iterative development, etc).

One angle that works for me when deciding when to use a more substantial FM is the cost of getting it wrong (or conversely, how important is it that we get some key thing right)?

If I’m building a financial database can my business afford to lose customers’ money? Do we have that liability built into our insurance?

Sometimes you need to go slow at first in order to go fast later.

And some properties are simply too hard to trust to some boxes and arrows and a prayer.

almostgotcaught•1d ago

> I think the cost of some formal methods have long been over-stated.

how are you measuring cost? the cost of anything in this space (or any other rarefied space) is the cost a "principal/staff" (maybe 1-person year) to build it and then a team of very very strong seniors to maintain it. who can afford that?

> Most programmers I’ve taught don’t seem to have too much difficulty learning TLA+ or Alloy. The concepts themselves are fundamentally rather simple.

this is like saying building a prod LLM pipeline is no big deal because learning the basics of pytorch is easy.

more relevantly: what are "most programmers" going to do when they hit an undecidable clause?

here's a razor for highly technical projects (a kind of efficient market hypothesis for software if you will): if it's easy to do then it's worthless (highly-commofidied, marginal value, etc.).

bravesoul2•1d ago

Formal methods can be understood by interns. This is the weird thing about this industry. Make people run gauntlets then label them and tie their hands behind their backs.

Formal methods is just another tool. Do you need a principal engineer with 15yoe for Haskell or Rust code.

In today's AI and low head count world you want someone who is really good at that thing (regardless of experience)

Needing a principal who can go deep on Kafka queues in a design interview isn't what is required.

Most organisations are measuring impact narratives not skill to get up the org chart.

YetAnotherNick•1d ago

I don't know which interns are you talking to. Most interns can hardly understand e.g. say what `useEffect` means, let alone would be able to understand how to model it.

bravesoul2•1d ago

Ones that studied CS and got a good grade.

Formal systems may be easier than React to understand (assuming decent effort put into both) as there is more of a disciplined teaching culture around it. useEffect is a feature of a framework React made by a tech company, with coders taking the mantle of teaching it to others.

I'm not joking.

creata•1d ago

> Formal systems may be easier than React to understand

Some specific formal systems, maybe. I feel like a lot of people could get some mileage out of learning Dafny, and it'd definitely be easier than learning React imo.

bubblyworld•1d ago

Go write something simple like a merge-sort with formal guarantees in Dafny, and tell me it's easier than working on a react project afterwards =P

creata•1d ago

Merge sort in Dafny:

    predicate sorted(a: seq<int>)
    {
        forall i, j :: 0 <= i <= j < |a| ==> a[i] <= a[j]
    }

    predicate merge_invariants(a: seq<int>, b: seq<int>, c: array<int>, i: nat, j: nat)
        reads c
    {
        && i <= |a|
        && j <= |b|
        && i + j <= c.Length
        && multiset(c[..i+j]) == multiset(a[..i]) + multiset(b[..j])
        && (forall k1, k2 :: 0 <= k1 < i && j <= k2 < |b| ==> a[k1] <= b[k2])
        && (forall k1, k2 :: 0 <= k1 < j && i <= k2 < |a| ==> b[k1] <= a[k2])
        && sorted(c[..i+j])
        && (forall k1, k2 :: 0 <= k1 < i + j && j <= k2 < |b| ==> c[k1] <= b[k2])
        && (forall k1, k2 :: 0 <= k1 < i + j && i <= k2 < |a| ==> c[k1] <= a[k2])
    }

    method merge(a: seq<int>, b: seq<int>)
        returns (out: seq<int>)
        requires sorted(a)
        requires sorted(b)
        ensures multiset(a) + multiset(b) == multiset(out)
        ensures sorted(out)
    {
        var i := 0;
        var j := 0;
        var c := new int[|a| + |b|];

        while i < |a| && j < |b|
            decreases |a| + |b| - (i + j)
            invariant merge_invariants(a, b, c, i, j)
        {
            assert a[..i+1] == a[..i] + [a[i]];
            assert b[..j+1] == b[..j] + [b[j]];

            if a[i] <= b[j] {
                c[i + j] := a[i];
                i := i + 1;
            } else {
                c[i + j] := b[j];
                j := j + 1;
            }
        }

        while i < |a|
            invariant merge_invariants(a, b, c, i, j)
        {
            assert a[..i+1] == a[..i] + [a[i]];
            c[i + j] := a[i];
            i := i + 1;
        }

        while j < |b|
            invariant merge_invariants(a, b, c, i, j)
        {
            assert b[..j+1] == b[..j] + [b[j]];
            c[i + j] := b[j];
            j := j + 1;
        }

        assert a[..i] == a;
        assert b[..j] == b;
        assert c[..i+j] == c[..];

        return c[..];
    }

    method merge_sort(a: seq<int>)
        returns (out: seq<int>)
        decreases |a|
        ensures multiset(out) == multiset(a)
        ensures sorted(out)
    {
        if |a| <= 1 {
            assert sorted(a);
            out := a;
        } else {
            var half := |a| / 2;
            assert a[..half] + a[half..] == a;
            var l := merge_sort(a[..half]);
            var r := merge_sort(a[half..]);
            out := merge(l, r);
        }
    }

Imo it's a lot easier to explain than React. Preconditions, postconditions, loop invariants, assertions, termination arguments, and... that's about it, right?

bubblyworld•1d ago

Hah, nice one! I didn't know about multiset when I went about this (some years ago) and ended up proving all sorts of facts about sorted sets and partitions.

I think you've changed my mind, actually, thank you. I still maintain that this requires more mathematical maturity than diving into react, but I may be saying that out of ignorance of react's internals =P

bravesoul2•10h ago

For someone who has never done React or Formal? Probably about the same learning curve.

bubblyworld•9h ago

I suspect we are falling into the classic human trap of measuring a complex topic using a linear scale. But the sibling comment convinced me formal methods are more approachable than I thought, for sure.

YetAnotherNick•23h ago

This thread(I believe) is entirely about real world systems. In that you don't want to prove merge sort is correct. You want to prove that the connections created inside `useEffect` are properly closed or something like that.

ted_dunning•1d ago

My intern this summer is teaching me about formal methods.

You are hiring the wrong ones.

almostgotcaught•1d ago

> Formal methods can be understood by interns

You're just repeating the same claim - yes I agree interns can also understand pytorch. Does that mean they can build prod LLM pipelines?

bravesoul2•1d ago

Ah I think I see what you mean. For AI pipelines you need top guns anyway regardless of whether they use reversable or traditional computing.

almostgotcaught•1d ago

> reversable or traditional computing

o.O ...

No man I'm saying a very simple and non-controversial thing: just because someone does a tut, has an elementary understanding, knows the basics, doesn't mean I'm going to let them build/maintain/own mission critical systems.

It's not that deep.

bravesoul2•1d ago

I see. But your point is then orthogonal. So it's $400/h not $100/h per SWE as you need experienced good people anyway.

But the question really is: what solution minimises tbe number of those hours? Formal or informal?

simiones•1d ago

If I can build something in 100 hours paying 100$/hour, or 33 hours paying 400$/hour, I may well choose the 100 hours version. Which is why we hire interns in the first place, rather than only hiring senior architects for our companies.

AnimalMuppet•1d ago

Where do you work that you're paying interns 100$/hr and seniors 400$/hr, and are you hiring?

simiones•23h ago

I'm just copying the numbers from the above comment.

theamk•12h ago

Some places do run on 2x overhead, which means if company is paying $100/hour, that intern is taking $68k/year... Pretty normal in a lot of places.

anonzzzies•1d ago

This might be your echo chamber of only working with phds? I have been doing fm for almost 40 years and, while it became easier, I can count people who actually understand what they are doing when programming on one hand, including seniors. Interns definitely don't understand and cannot learn to do formal methods generally. We see many companies inside as we do troubleshooting for large (fortune 500) companies to mid sized ones and these projects are short and urgent: we see a lot of code and, contrary to what you are saying or what seems to be the norm on HN, most people are really really bad at all they do; most we see cannot explain how the code actually works and indeed, often it's just beaten and trialed and errored until it has a facsimile of what it is supposed to do. There are no peer reviews often no version control and these systems run in massive hospitals, manage insurance, taxes or pensions for millions of people and so on. You are grossly overestimating the norm. If people did enjoy a uni degree, they generally look disgusted when you mention something like haskell as 'that useless stuff we had to to'. When I taught prolog in uni in NL, the students asked all the time why and why not c++ or java which would get them a job and make games. No one knows what formal methods even are.

simiones•1d ago

While I agree with the gist of your argument, you should also note that "organizations that hire outside consultancies for urgent issues" is a self-selecting set of orgs with terrible coding practices. There are probably many more such orgs than GP would imagine, but they are not necessarily the norm, even though they are likely to represent the vast majority of what someone working for such a consultancy would see.

anonzzzies•1d ago

I agree with that, however I think it is the majority. Being stuck in the largest companies in the world seeing the worst code makes me quite confident the majority sucks and the idea of that not being the case is just a huge outlier consumed by HN because of FAANG companies.

naasking•1d ago

I'm not sure these people aren't capable of understanding, I think it's more a product of what you said: their projects were also short and urgent, and that sort of environment is not conducive to understanding, particularly given high churn where people are working on inherited code from people who also didn't have time to learn.

g9yuayon•1d ago

Specifying a system correctly can be hard with the previous generation of tools. For instance, using LTL to describe system properties is not necessarily easy. I remember there used to be pattern library for model checking or for temporal logic. For something as simple as checking bounded existence, one has to write LTL formula like below. That certainly is out of most people's interest. Fortunately tools have improved a lot, and engineers do not really need to study temporal logic deeply for many cases.

``` []((Q & <>R) -> ((!P & !R) U (R | ((P & !R) U (R | ((!P & !R) U (R | ((P & !R) U (R | (!P U R)))))))))) ```

Davidbrcz•1d ago

Dwyer patterns https://matthewbdwyer.github.io/psp/patterns.html

g9yuayon•18h ago

Thanks! I used to use the same pattern library hosted in either CMU or PSU, IIRC. Glad that it has a new home.

somethingsome•1d ago

Hey, if you have working (shearable) material that you use for teaching interns, I would be interested!

AnimalMuppet•1d ago

As I understood that article, it seemed to be saying that formal methods often don't provide a good enough cost/benefit tradeoff to be worth it in many situations to many organizations. Right at the end, it basically says that your favorite underappreciated formal method isn't used because it costs too much and/or delivers too little. If an outfit that sells formal methods is saying that, then I suspect that (many) formal methods actually do cost too much for what benefit they deliver.

pjmlp•1d ago

Unless they are executable and part of the language tooling like SPARK, Dafny and co, I see little value coding an abstract model, that might be validated in theory, but hardly anyone will be able to cross-check if the manually written language ZYX code actually maps to the validated model.

And then there are the code changes, which most likely devs aren't going back to the formal model to validate them, before applying the new set of changes.

dataflow•1d ago

I agree with you on the first paragraph, but the second one is something you could presumably enforce with tooling and during reviews, no?

pjmlp•23h ago

In theory, in practice even enforcing unit tests is a challenge, and most code reviews end up being about code style than anything else, if they happen.

bbkane•22h ago

AWS recently mentioned something called PObserve in an ACM article.

They said it checks logs emitted from the running application against the formal model. Sounds really useful for this exact problem; hopefully one day they'll open source it.

pjmlp•22h ago

I would need to check the article, although it sounds very fragile, expecting some logs are present in a specific format, and they actually tell the truth about what the code is doing.

nickpsecurity•1d ago

I believe you're way understating the difficulty of learning things like Alloy and doing theorem proving.

Let's say I want to teach them non-formal, high assurance. That will involve safe subsets of their language, functional style of programming with inputs to outputs, isolation of side effects, and code reviews. They can do this without any extra or unusual knowledge.

The basics of testing would likewise build on their existing knowledge. If their language supports it, static analyzers might work with the push of a button. Basic contracts might knock out many type or range errors without much thought. Basic tests go from their intuition to specific code implementing it.

OK, now let's build models using relationships expressed in Alloy or TLA+. Now, they have to learn to think in a totally different way. They'll need new notations, new techniques. They'll have to put time into these not knowing the payoff. They'll also have far less or zero resources like StackOverflow or GeeksForGeeks that can solve the whole problem for them.

Moving into theorem proverbs, they now have to know math or logic. They have to think formally. They have to specify the problem that way, keep it in sync with the English, do lots of work that might just be about limitations of the formal tool, and eventually they succeeded (or dont) at the proof. It's also a very, slow process vs informal methods of correctness checking.

So, it's quite difficult to learn these things vs alternative methods. They also might have low or slow payoff while others payoff quickly and often. That's why I push using the alternatives first. Then, learning others overtime to gradually add them to your toolbox for when it makes sense: data structures, interface specification, highly-critical routines, etc.

agentultra•23h ago

It’s not a free skill, for sure.

There’s a spectrum of problems one can apply FMs to. Galois is probably selecting for the harder end of projects that do require PhDs with scary sounding degrees.

In industry there are a lot of problems that don’t require that level of skill which can benefit a lot from application of FM’s.

nickpsecurity•22h ago

To support your position, I still think the best ROI is interface specs with test generation and code level proofs. Also, detection of common problems like a type changing or no increment in a while. That would've caught many problems I've had in the last, few months.

The biggest problem I see with that is how to specify those specs, especially contracts. I'd like to see a free collection of specs for common properties. Examples include ranges, ordering, not a value, invariants, and confidentiality. If searchable, then that might already be enough for many situations.

From there, we can develop tools to generate them for people. The tools might parse the code, ask some questions, and generate the specs. Alternatively, LLM's trained on such things might generate specs. They might review them and suggest things. Over time, we'll also have more components that are already spec'd which specialists might build on.

We just really need ways for developers to go from their situation to specs with no work or little work. If it's as easy as code and comments, we'll see more adoption.

almostgotcaught•1d ago

Sibling comments laments (obliquely) formal methods aren't more ubiquitous. The reason is not cost. It can also be cost but usually things that cost a lot people sometimes still want (lambos, ferraris) but almost no one even wants formal methods. Here's the reason:

> For example, the formal proofs of the seL4 kernel establish amongst other things that the implementation C code precisely matches an abstract model of the kernel. This is a really powerful and impressive result, but to understand it, you have to study the formal model in detail.

No not the complexity of understanding the abstract model - the fact that you have agree with the abstract model, the fact that the abstract model has to be a good abstract model of your system. This is what everyone says about TLA+, that sure you can prove a raft spec has no deadlock but you can't extract an implementation from the same spec. Yes I know some proof systems do actually let you extract (fsharp) but they are few and far between because proof languages are to programming languages as the NBA is to college ball: a whole different league of precision/specification (so lots of things cannot be translated).

I don't have answer/solution. Not saying that I do. The article is good though, reads like any basic sales journey (which isn't a bad thing).

zmgsabst•1d ago

I’m not convinced they’re that different — there’s a spectrum from type systems to Agda/Coq.

Most developers use some kind of formal method (types); and probably correctly conclude that more formality doesn’t aid their case. That’s okay — but I think we do everyone a disservice by posing these as binaries or different leagues, when it’s a spectrum with lots of different and partial applications (eg, there’s utility in proving properties of your math even without a full formal proof).

That false binary is what slows adoption more than anything — at least in my experience.

almostgotcaught•1d ago

> I’m not convinced they’re that different — there’s a spectrum from type systems to Agda/Coq.

I didn't say you couldn't write code in proof languages, I just said (implied) it would be extremely tedious/onerous.

rocqua•1d ago

There's a benefit to types beyond correctness. It helps make code understandable locally. And that helps speed up code development, by making changes to code much faster.

I think this might be the real value of formal methods. Not correctness, but speed of itteration. That does affect what kind of formal methods do and don't help. Which really matches what the article said on delivering value with just a small amount of effort.

LegionMammal978•1d ago

Often, a program can accumulate so many types that little logic is present in the code proper, but now you need an encyclopedic knowledge of the types and all their dependencies to work with them, since the complexity has just been shifted and not removed.

I'm mainly thinking of the gnarly template errors you'll see in C++, or trait errors in Rust, caused by heavy use of type-based logic. Poor error formatting can exacerbate this, but ultimately you still need an understanding of all the type constraints to do anything, and documentation is often unhelpful since it's diced up into a dozen little pieces.

That is to say, the local optimum of "making changes to code much faster" can be reached pretty early in the process of increasing specification, and many existing projects (IMHO) have overshot that mark to little benefit. So I'm inclined to be skeptical about FM's value proposition, unless it can be applied in an very progressive manner in places where extra specification makes the most sense.

zmgsabst•1d ago

Your last point is what I was trying to get at — eg, reasoning just about your math expressions.

https://herbie.uwplse.org/

I think that FM (beyond basic typing) is best used sparingly to address particularly sensitive cases — eg, floating point stability in modeling.

But our discussion usually doesn’t include incremental or partial usage, unfortunately.

rocqua•1d ago

I've been looking at incremental / partial formal verification based on 'assert'-like statements. Kani for rust has an interface close to what I imagine. But I'd like to use deductive verification (something like hoare / seperation logic) instead.

This means the developer gets to decide what is important enough to try to assert. It's likely the developer will need to add sub-lemma's to get the assert to be proven, and I suspect some of these will need to be in the form of 'assumptions / requirements' at the start of a function, which then need to be confirmed at all call-sites.

A big downside here is that some things you'd like to prove often don't fit into boolean expressions in the underlying language. For example, universal / existential quantifiers aren't available (but are implicit in the formal prover, so can be hacked into). You also don't have predicates for 'this address has been allocated / initialized' nor any other predicates that let you say a value is constant. Similarly, there are no temporal quantifiers available.

I'm hoping that the above is not too limiting in practice. The fact that developers already user 'assert' statements does suggest that normal boolean expressions are already valuable.

naasking•1d ago

> Your last point is what I was trying to get at — eg, reasoning just about your math expressions.

I think it's borderline criminal that programming languages haven't added units of measure to their type systems yet, and then required every numerical type to be associated with a unit. We've known how to do this for decades now. All numerical code has latent units: either a specific unit, or polymorphic over the unit (like scalar transforms). Even memory copies and moves operate on byte units.

rocqua•1d ago

C++29 is considering adding units.

Currently, apparently, available in this library: https://mpusz.github.io/mp-units/HEAD/

naasking•21h ago

Great news if true, but I'm not holding my breath. Just adding units support doesn't quite go far enough, but C++'s backwards compatibility prevents it from enforcing the stronger requirement I mentioned.

creata•1d ago

In my uninformed opinion, the difficulty with code extraction (or going the other way, extracting a theorem prover representation from a program) is shared, mutable data. All the solutions I've seen (e.g., separation logic) feel clunky as heck.

> the fact that you have agree with the abstract model, the fact that the abstract model has to be a good abstract model of your system

Regarding seL4 in particular, they've plugged many of those sorts of holes.

https://trustworthy.systems/publications/nicta_full_text/737...

rocqua•1d ago

Rust elliminates shared mutable data. Except for interior mutability and unsafe raw pointers.

Even with that asterisk, that makes Rust much more potent in the formal methods space. This has been noted by graydon aswell. I can dig up the reference if someone cares.

creata•1d ago

True! Rust proves that you can get very far while only rarely using unrestricted shared mutable data (UnsafeCell). Do you know anything that uses borrowing-like ideas in a theorem proving context?

rocqua•1d ago

I know of Prusti, which is made by Eth Zurich. They use a form of separation logic, their backend (viper) was initially aimed at other languages, mostly go. But now prusti is also aimed at Rust. From what I heard, Rust worked a lot better for them, because a lot of the annotation overhead of Viper was about excluding mutable aliasing. Which rust handily solves.

However, last time I checked, I think I saw that Prusti does not currently handle UnsafeCell at all. I don't think that's a fundamental limitation, just a matter prioritized development time.

zozbot234•1d ago

Yup, and Rust does also have an official Rust Formal Methods Interest Group https://rust-formal-methods.github.io/ It's likely that there will be significant future developments in this space.

moi2388•1d ago

“ And so then I go to the potential client and I say: “Hey, what about four times as much time and eight times as much money?” At that point, they mostly don’t call me back.”

Well, have you tried only asking 4 times the money for 4 times the time? :)

freekh•1d ago

I really liked this article. I think all of these points, in particular the conclusion of cost vs benefit could be the same whether you were talking about formal methods or web apps. This way of seeing things, with an engineering perspective, is the exact same we (at least I) see it but we mostly do web apps.

stereolambda•1d ago

The part how the whole system can be proved correct, but still break and bug out as handled and understood by its users and maintainers, to me summarizes practical problems with the formal trends in software engineering. Abstractions break all the time, starting with the type systems. Once you introduce something like this, you have to feed the system that you built, and then as a second consideration do what you actually want/have to do.

I can see the intellectual appeal and even the argument for formalism in some high-risk areas. I guess I am most burnt by the advent of hyperlinted, theatrically "typechecked" enterprise Python. I don't think people pushing this stuff would have the balls to introduce an actually formal environment, since even Java is there, not to mention ML-style languages, Rust etc. Maybe there's fear about the staffing and scary appearance, which the article also mentions. (I still sincerely wait for a bigger niche for functional and lispy shops, maybe when the industry transforms with de-monopolization.)

But there is a common thread in the bad kind of arguing for formalisms, where the Scotsmen are never true and your lazy human mind isn't Godlike enough to reason about the program correctly from the start. I actually like how TFA moves toward communicating better so maybe conversations could be more productive in these aspects.

Taikonerd•1d ago

I think the key bottleneck for formal methods in "normal" software is that formal methods imply a formal model that you're checking against -- and most software doesn't have that.

For example, deployment software. There's probably some way to formally model deployment. But most deployment software in the wild is written like, "here are 100 things that can go wrong while you're deploying, and 100 corresponding business rules for what to do."

I do sometimes fantasize about a more pure-Computer-Science world where all text formats are a formally-defined grammar, all systems are state machines, and everything has a spec ;-)

agentultra•1d ago

I’ve used TLA+ to model a deployment pipeline. It helped to demonstrate the safety property we needed that process to maintain.

The code running the deploy process is messy and complex. But the a spec doesn’t have to be. You can represent as much or as little detail as you need.

gooosle•1d ago

How do you know whether your TLA+ model is accurate?

hwayne•1d ago

With one client I have, we know the TLA+ model is accurate because we're extracting tests directly from the spec. It's kind of a riff on what MongoDB does in this paper: https://arxiv.org/abs/2006.00915

Taikonerd•1d ago

Cool! And you make a good point: just because the actual software has 100 corner cases, doesn't mean the model has to. It can still be useful even if it just models an interesting part of the problem.

jerf•1d ago

"Correctness Doesn’t Matter"

Let me explain this in formal methods terms.

As programs grow their state space naturally exponentially expands. One of the major methods of formal methods is to contain that growth but even then it's exponential by its nature. Unconstrained programs written normally have vast, vast state spaces. It takes very little before human beings fundamentally can not know what all is lurking in the state space of a program.

A formal methods advocate sees that exponential state space and is well aware of the dragons lurking within the vast majority of programs. It is hard to see everyone else seem to be ignorant, even willingly ignorant, of these horrible dragons.

But how then does the world function, if effectively every program we have has monsters hiding in its exponential state space?

The answer is that generally the number of paths the program will take through the state space is polynomial. Those exponential spaces are available but they are far from equally available, they are not uniformly randomly chosen from. Hence, it is not terribly difficult to end up with programs that are nominally disasters where literally 99.99999999999999999999...% of the possible state space of the program is "wrong" yet they happily hum along for years at a time, because the 0.00...001% of the state space they ever get into is correct.

A simple sort of thing where this can happen is a threaded program that doesn't use concurrency primitives correctly, yet, requests come in slowly, perhaps even regularly, and tend to run to completion before the next one comes in. A theoretical disaster of a program can run for long periods of time quite correctly.

I've seen several code bases like this. I think of them as the code bases where every time something went wrong, the programmers basically went in and started hitting the code base with a stick. They'd fix the bug presenting itself, but they'd probably introduce 5 more, but each with a lesser probability. Then a couple of those come up, and they hit those with a stick, and introduce more bugs in principle, but fix the two from the original batch that came up. Maybe a couple of years later the third bug from that third batch finally comes up and they patch around it. I just spent the last 1.5 years rewriting such a code base. It's a disaster. It's a seething mass of bugs, both the ones they gave up trying to fix because they couldn't hit it hard enough, and the ones that are potentially there from all the stick-hitting that has been done. No structure. No coherence. Formal methods? Ha! It doesn't even have any testing in it and basically can't by its nature. But if you keep running around the loop, shove enough input through it, and keep hitting it with a stick, it will, eventually, basically work. Yea, though I walk through the Valley of Bugs, I will fear no errors, because all the evil has already befallen me.

The mismatch between formal advocates and us normies (which, though I am very sympathetic to formal methods, I've never actually gotten to use them for anything non-trivial myself, so I include myself in the latter set) is that formal advocates believe the rest of the world either does or should desire to remove all the dragons from the full exponential state space, whereas the rest of the world is really only concerned about the state spaces they will actually hit, which is exponentially less than the possible parts of the state space.

The reason you see someone like Amazon use formal methods to prove things for things like S3 is that they are hit so hard and in so many circumstances that they don't get to hide. In theory even in such a situation the full state space may not be covered, but the coverage is so intense and extreme that de facto the easiest thing to do is just pretend absolutely anything can happen, that any conceivable bug will be hit, and therefore it is worth it to go to a full formal methodology to prove that there are no dragons anywhere, or at the very least, fully characterize anything you couldn't dispose of. Kernels have a similar situation. But honestly, very little code is hit that hard, so it's not too surprising that people often don't seek formal methods out. Plus, the consequences need to be severe for anyone to care, e.g., office suites are probably nightmares in their full state space but if it crashes, well, generally, not much bad stuff happens to anyone.

I'd say a formal methods salesman needs a clear explanation as to how they can help with the polynomial common path. They can't expect to scare people with the dragons lying in exponential space because most people can't see them, even if you point at them, and that's when you sound like a kook rather than someone helpful. And there you run into the fact that, as the post explains in a completely different way, a lot of times the answer is that formal methods don't have as much to offer as it may seem like, because we non-formal programmers do have some tooling that is broadly, if not perfectly, adequate to the task of handling the polynomial-type state spaces, by testing them, constraining them, etc.

(In this post I have somewhat sloppily used the term "polynomial", so don't worry about it too much. It's a sort of amortized, limited sort of term, based on the real consequences of the state space on the code. For example, a text editor is plainly "exponential" in state space because the set of all possible text files is obviously exponential... but in practice, you don't generally have problems in a text editor where the editor crashes because the 1024'th character was an "a" instead of a "b". (Such things happen, but they are clearly exceptional, generally not hard fixes, and generally ultimately due to some other combination of bugs and not "really" that the character was "wrong".) So in pratice, despite the theoretically-exponential nature of text files, the complexity actually experienced by the programmer is nowhere near that full exponential space. What you end up with instead is generally a relatively small set of issues; excessively long lines, characters you didn't expect to see, handling emoji wrong, many issues, but text editor programmers don't experience specific 25-character long strings crashing the program, and find themselves dedicating months upon months to finding all the other such strings that specifically crash the program. The experience of the exponential state space is not exponential in practice and the 'real paths' through the program can be characterized through the elimination of information as something much simpler than the true exponential space. If it was, no non-trivial program would work, ever. Technically it's probably still exponential but compared to the theoretical exponentiality, so much vaster, it's still effectively polynomial by comparison.)

constantcrying•1d ago

I have worked with quite rigorous design guidelines in embedded aerospace software. E.g. there was a 100+ page word document describing, in detail, the functioning of the software, down to a "if this, then that, within x amount of time".

I would have loved to have a formal specification of this, not just to have my code checked, but also to make sure the specification is coherent.

The problems with that were:

- The cost of creating that specification would have been very substantial, both in money and in time.

- The end to end functionality of the system was verified anyway and could not have been replaced by a formal correctness proof. Creating a few more test cases is cheap, creating a formal verification is extremely hard. Tye regulator expects that verification, it does not expect a formal proof. This is actually a project risk.

- The people writing the specification do not think formally, they concern themselves with overall functionality, which they them translated into a more detailed description. There is a serious communication problem here if you want formal verification.

I really like the concept, but to apply it you need to integrate it from ground up and rethink how you write software.

pfdietz•1d ago

Also, if you have a formal specification, you now have the opportunity to use it to generate tests rather than prove correctness. And this works even if it's just a partial specification.

constantcrying•23h ago

Of course, I do not doubt the value of having such a formal specification.

Interestingly we also had bugs in the specifications, where certain contradictory scenarios were specified, only very careful reading of the code and specification even led to me discovering the existence. A formal specification would of course have made the design flaw obvious.

You mention test cases, but there actually is another, very much related, application. Actually understanding what the specifications does. E.g. given a certain operating scenario, what is the expected outcome? That can be very difficult to do with lengthy, informal, specifications.

jiehong•22h ago

In that ladder of testing, I’d add simulation testing before formal methods in terms of cost vs benefits as well.

IMO, I see FMs as converging to be part of the type system that is checked by the compiler automatically during CICD.

If it’s not, it’s adoption cost is too high 90% of the time.

The only things that gets widely adopted follow that idea: types, unit tests, properties, fuzzing, integration test, simulation, packaging in a deployment pipeline, etc.

Finally, knowing that incompleteness theorems are a thing, formal methods will always fall short in some fundamental way.

Claude Code Is My Computer

Quarkdown: A modern Markdown-based typesetting system

Show HN: Controlling 3D models with voice and hand gestures

The Small World of English

Show HN: I wrote a Java decompiler in pure C language

Covert Web-to-App Tracking via Localhost on Android

Mario Kart designers had to rethink everything to make it open world

Show HN: PinSend – Share text between devices using a PIN(P2P, no login)

AI makes the humanities more important, but also weirder

My AI skeptic friends are all nuts

Plutonium Mountain: The 17-year mission to guard remains of Soviet nuclear tests

GUIs are built at least 2.5 times

Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Francisco

Cloudlflare builds OAuth with Claude and publishes all the prompts

The Metamorphosis of Prime Intellect (1994)

A High-Level View of TLA+

Fun with Futex

Ask HN: Who is hiring? (June 2025)

Making Magic with MCP: From Data Retrieval to Real Analysis and Insights

Implementing a Forth

How to Store Data on Paper?

Show HN: Kan.bn – An open-source alterative to Trello

Updates to Windows for the Digital Markets Act

Ukraine's autonomous killer drones defeat electronic warfare

Sid Meier's Pirates – In-depth (2017)

Show HN: A toy version of Wireshark (student project)

What Is "Seeing" in Astrophotography? The Science Behind Atmospheric Turbulence

Conformance checking at MongoDB: Testing that our code matches our TLA+ specs

A Complete Guide to Meta Prompting

Show HN: Onlook – Open-source, visual-first Cursor for designers

Claude Code Is My Computer

Quarkdown: A modern Markdown-based typesetting system

Show HN: Controlling 3D models with voice and hand gestures

The Small World of English

Show HN: I wrote a Java decompiler in pure C language

Covert Web-to-App Tracking via Localhost on Android

Mario Kart designers had to rethink everything to make it open world

Show HN: PinSend – Share text between devices using a PIN(P2P, no login)

AI makes the humanities more important, but also weirder

My AI skeptic friends are all nuts

Plutonium Mountain: The 17-year mission to guard remains of Soviet nuclear tests

GUIs are built at least 2.5 times

Spark AI (YC W24) Is Hiring a Full Stack Engineer in San Francisco

Cloudlflare builds OAuth with Claude and publishes all the prompts

The Metamorphosis of Prime Intellect (1994)

A High-Level View of TLA+

Fun with Futex

Ask HN: Who is hiring? (June 2025)

Making Magic with MCP: From Data Retrieval to Real Analysis and Insights

Implementing a Forth

How to Store Data on Paper?

Show HN: Kan.bn – An open-source alterative to Trello

Updates to Windows for the Digital Markets Act

Ukraine's autonomous killer drones defeat electronic warfare

Sid Meier's Pirates – In-depth (2017)

Show HN: A toy version of Wireshark (student project)

What Is "Seeing" in Astrophotography? The Science Behind Atmospheric Turbulence

Conformance checking at MongoDB: Testing that our code matches our TLA+ specs

A Complete Guide to Meta Prompting

Show HN: Onlook – Open-source, visual-first Cursor for designers

What works (and doesn't) selling formal methods

Comments