> For example, the formal proofs of the seL4 kernel establish amongst other things that the implementation C code precisely matches an abstract model of the kernel. This is a really powerful and impressive result, but to understand it, you have to study the formal model in detail.
No not the complexity of understanding the abstract model - the fact that you have agree with the abstract model, the fact that the abstract model has to be a good abstract model of your system. This is what everyone says about TLA+, that sure you can prove a raft spec has no deadlock but you can't extract an implementation from the same spec. Yes I know some proof systems do actually let you extract (fsharp) but they are few and far between because proof languages are to programming languages as the NBA is to college ball: a whole different league of precision/specification (so lots of things cannot be translated).
I don't have answer/solution. Not saying that I do. The article is good though, reads like any basic sales journey (which isn't a bad thing).
Most developers use some kind of formal method (types); and probably correctly conclude that more formality doesn’t aid their case. That’s okay — but I think we do everyone a disservice by posing these as binaries or different leagues, when it’s a spectrum with lots of different and partial applications (eg, there’s utility in proving properties of your math even without a full formal proof).
That false binary is what slows adoption more than anything — at least in my experience.
I didn't say you couldn't write code in proof languages, I just said (implied) it would be extremely tedious/onerous.
I think this might be the real value of formal methods. Not correctness, but speed of itteration. That does affect what kind of formal methods do and don't help. Which really matches what the article said on delivering value with just a small amount of effort.
I'm mainly thinking of the gnarly template errors you'll see in C++, or trait errors in Rust, caused by heavy use of type-based logic. Poor error formatting can exacerbate this, but ultimately you still need an understanding of all the type constraints to do anything, and documentation is often unhelpful since it's diced up into a dozen little pieces.
That is to say, the local optimum of "making changes to code much faster" can be reached pretty early in the process of increasing specification, and many existing projects (IMHO) have overshot that mark to little benefit. So I'm inclined to be skeptical about FM's value proposition, unless it can be applied in an very progressive manner in places where extra specification makes the most sense.
I think that FM (beyond basic typing) is best used sparingly to address particularly sensitive cases — eg, floating point stability in modeling.
But our discussion usually doesn’t include incremental or partial usage, unfortunately.
This means the developer gets to decide what is important enough to try to assert. It's likely the developer will need to add sub-lemma's to get the assert to be proven, and I suspect some of these will need to be in the form of 'assumptions / requirements' at the start of a function, which then need to be confirmed at all call-sites.
A big downside here is that some things you'd like to prove often don't fit into boolean expressions in the underlying language. For example, universal / existential quantifiers aren't available (but are implicit in the formal prover, so can be hacked into). You also don't have predicates for 'this address has been allocated / initialized' nor any other predicates that let you say a value is constant. Similarly, there are no temporal quantifiers available.
I'm hoping that the above is not too limiting in practice. The fact that developers already user 'assert' statements does suggest that normal boolean expressions are already valuable.
I think it's borderline criminal that programming languages haven't added units of measure to their type systems yet, and then required every numerical type to be associated with a unit. We've known how to do this for decades now. All numerical code has latent units: either a specific unit, or polymorphic over the unit (like scalar transforms). Even memory copies and moves operate on byte units.
Currently, apparently, available in this library: https://mpusz.github.io/mp-units/HEAD/
> the fact that you have agree with the abstract model, the fact that the abstract model has to be a good abstract model of your system
Regarding seL4 in particular, they've plugged many of those sorts of holes.
https://trustworthy.systems/publications/nicta_full_text/737...
Even with that asterisk, that makes Rust much more potent in the formal methods space. This has been noted by graydon aswell. I can dig up the reference if someone cares.
However, last time I checked, I think I saw that Prusti does not currently handle UnsafeCell at all. I don't think that's a fundamental limitation, just a matter prioritized development time.
Well, have you tried only asking 4 times the money for 4 times the time? :)
I can see the intellectual appeal and even the argument for formalism in some high-risk areas. I guess I am most burnt by the advent of hyperlinted, theatrically "typechecked" enterprise Python. I don't think people pushing this stuff would have the balls to introduce an actually formal environment, since even Java is there, not to mention ML-style languages, Rust etc. Maybe there's fear about the staffing and scary appearance, which the article also mentions. (I still sincerely wait for a bigger niche for functional and lispy shops, maybe when the industry transforms with de-monopolization.)
But there is a common thread in the bad kind of arguing for formalisms, where the Scotsmen are never true and your lazy human mind isn't Godlike enough to reason about the program correctly from the start. I actually like how TFA moves toward communicating better so maybe conversations could be more productive in these aspects.
For example, deployment software. There's probably some way to formally model deployment. But most deployment software in the wild is written like, "here are 100 things that can go wrong while you're deploying, and 100 corresponding business rules for what to do."
I do sometimes fantasize about a more pure-Computer-Science world where all text formats are a formally-defined grammar, all systems are state machines, and everything has a spec ;-)
The code running the deploy process is messy and complex. But the a spec doesn’t have to be. You can represent as much or as little detail as you need.
Let me explain this in formal methods terms.
As programs grow their state space naturally exponentially expands. One of the major methods of formal methods is to contain that growth but even then it's exponential by its nature. Unconstrained programs written normally have vast, vast state spaces. It takes very little before human beings fundamentally can not know what all is lurking in the state space of a program.
A formal methods advocate sees that exponential state space and is well aware of the dragons lurking within the vast majority of programs. It is hard to see everyone else seem to be ignorant, even willingly ignorant, of these horrible dragons.
But how then does the world function, if effectively every program we have has monsters hiding in its exponential state space?
The answer is that generally the number of paths the program will take through the state space is polynomial. Those exponential spaces are available but they are far from equally available, they are not uniformly randomly chosen from. Hence, it is not terribly difficult to end up with programs that are nominally disasters where literally 99.99999999999999999999...% of the possible state space of the program is "wrong" yet they happily hum along for years at a time, because the 0.00...001% of the state space they ever get into is correct.
A simple sort of thing where this can happen is a threaded program that doesn't use concurrency primitives correctly, yet, requests come in slowly, perhaps even regularly, and tend to run to completion before the next one comes in. A theoretical disaster of a program can run for long periods of time quite correctly.
I've seen several code bases like this. I think of them as the code bases where every time something went wrong, the programmers basically went in and started hitting the code base with a stick. They'd fix the bug presenting itself, but they'd probably introduce 5 more, but each with a lesser probability. Then a couple of those come up, and they hit those with a stick, and introduce more bugs in principle, but fix the two from the original batch that came up. Maybe a couple of years later the third bug from that third batch finally comes up and they patch around it. I just spent the last 1.5 years rewriting such a code base. It's a disaster. It's a seething mass of bugs, both the ones they gave up trying to fix because they couldn't hit it hard enough, and the ones that are potentially there from all the stick-hitting that has been done. No structure. No coherence. Formal methods? Ha! It doesn't even have any testing in it and basically can't by its nature. But if you keep running around the loop, shove enough input through it, and keep hitting it with a stick, it will, eventually, basically work. Yea, though I walk through the Valley of Bugs, I will fear no errors, because all the evil has already befallen me.
The mismatch between formal advocates and us normies (which, though I am very sympathetic to formal methods, I've never actually gotten to use them for anything non-trivial myself, so I include myself in the latter set) is that formal advocates believe the rest of the world either does or should desire to remove all the dragons from the full exponential state space, whereas the rest of the world is really only concerned about the state spaces they will actually hit, which is exponentially less than the possible parts of the state space.
The reason you see someone like Amazon use formal methods to prove things for things like S3 is that they are hit so hard and in so many circumstances that they don't get to hide. In theory even in such a situation the full state space may not be covered, but the coverage is so intense and extreme that de facto the easiest thing to do is just pretend absolutely anything can happen, that any conceivable bug will be hit, and therefore it is worth it to go to a full formal methodology to prove that there are no dragons anywhere, or at the very least, fully characterize anything you couldn't dispose of. Kernels have a similar situation. But honestly, very little code is hit that hard, so it's not too surprising that people often don't seek formal methods out. Plus, the consequences need to be severe for anyone to care, e.g., office suites are probably nightmares in their full state space but if it crashes, well, generally, not much bad stuff happens to anyone.
I'd say a formal methods salesman needs a clear explanation as to how they can help with the polynomial common path. They can't expect to scare people with the dragons lying in exponential space because most people can't see them, even if you point at them, and that's when you sound like a kook rather than someone helpful. And there you run into the fact that, as the post explains in a completely different way, a lot of times the answer is that formal methods don't have as much to offer as it may seem like, because we non-formal programmers do have some tooling that is broadly, if not perfectly, adequate to the task of handling the polynomial-type state spaces, by testing them, constraining them, etc.
(In this post I have somewhat sloppily used the term "polynomial", so don't worry about it too much. It's a sort of amortized, limited sort of term, based on the real consequences of the state space on the code. For example, a text editor is plainly "exponential" in state space because the set of all possible text files is obviously exponential... but in practice, you don't generally have problems in a text editor where the editor crashes because the 1024'th character was an "a" instead of a "b". (Such things happen, but they are clearly exceptional, generally not hard fixes, and generally ultimately due to some other combination of bugs and not "really" that the character was "wrong".) So in pratice, despite the theoretically-exponential nature of text files, the complexity actually experienced by the programmer is nowhere near that full exponential space. What you end up with instead is generally a relatively small set of issues; excessively long lines, characters you didn't expect to see, handling emoji wrong, many issues, but text editor programmers don't experience specific 25-character long strings crashing the program, and find themselves dedicating months upon months to finding all the other such strings that specifically crash the program. The experience of the exponential state space is not exponential in practice and the 'real paths' through the program can be characterized through the elimination of information as something much simpler than the true exponential space. If it was, no non-trivial program would work, ever. Technically it's probably still exponential but compared to the theoretical exponentiality, so much vaster, it's still effectively polynomial by comparison.)
I would have loved to have a formal specification of this, not just to have my code checked, but also to make sure the specification is coherent.
The problems with that were:
- The cost of creating that specification would have been very substantial, both in money and in time.
- The end to end functionality of the system was verified anyway and could not have been replaced by a formal correctness proof. Creating a few more test cases is cheap, creating a formal verification is extremely hard. Tye regulator expects that verification, it does not expect a formal proof. This is actually a project risk.
- The people writing the specification do not think formally, they concern themselves with overall functionality, which they them translated into a more detailed description. There is a serious communication problem here if you want formal verification.
I really like the concept, but to apply it you need to integrate it from ground up and rethink how you write software.
Interestingly we also had bugs in the specifications, where certain contradictory scenarios were specified, only very careful reading of the code and specification even led to me discovering the existence. A formal specification would of course have made the design flaw obvious.
You mention test cases, but there actually is another, very much related, application. Actually understanding what the specifications does. E.g. given a certain operating scenario, what is the expected outcome? That can be very difficult to do with lengthy, informal, specifications.
IMO, I see FMs as converging to be part of the type system that is checked by the compiler automatically during CICD.
If it’s not, it’s adoption cost is too high 90% of the time.
The only things that gets widely adopted follow that idea: types, unit tests, properties, fuzzing, integration test, simulation, packaging in a deployment pipeline, etc.
Finally, knowing that incompleteness theorems are a thing, formal methods will always fall short in some fundamental way.
agentultra•1d ago
Model checking, for example, is not as hard as it sounds and not nearly as expensive or time consuming as building the wrong thing. Most programmers I’ve taught don’t seem to have too much difficulty learning TLA+ or Alloy. The concepts themselves are fundamentally rather simple.
Theorem proving does take some expertise. But it’s also not an insurmountable obstacle.
However I agree that there’s a spectrum of cost and the higher you go down the assurance chain the effort and time it will take. Often we do price in the cost of errors into the holistic process of software (eg: SRE style error budgets, iterative development, etc).
One angle that works for me when deciding when to use a more substantial FM is the cost of getting it wrong (or conversely, how important is it that we get some key thing right)?
If I’m building a financial database can my business afford to lose customers’ money? Do we have that liability built into our insurance?
Sometimes you need to go slow at first in order to go fast later.
And some properties are simply too hard to trust to some boxes and arrows and a prayer.
almostgotcaught•1d ago
how are you measuring cost? the cost of anything in this space (or any other rarefied space) is the cost a "principal/staff" (maybe 1-person year) to build it and then a team of very very strong seniors to maintain it. who can afford that?
> Most programmers I’ve taught don’t seem to have too much difficulty learning TLA+ or Alloy. The concepts themselves are fundamentally rather simple.
this is like saying building a prod LLM pipeline is no big deal because learning the basics of pytorch is easy.
more relevantly: what are "most programmers" going to do when they hit an undecidable clause?
here's a razor for highly technical projects (a kind of efficient market hypothesis for software if you will): if it's easy to do then it's worthless (highly-commofidied, marginal value, etc.).
bravesoul2•1d ago
Formal methods is just another tool. Do you need a principal engineer with 15yoe for Haskell or Rust code.
In today's AI and low head count world you want someone who is really good at that thing (regardless of experience)
Needing a principal who can go deep on Kafka queues in a design interview isn't what is required.
Most organisations are measuring impact narratives not skill to get up the org chart.
YetAnotherNick•1d ago
bravesoul2•1d ago
Formal systems may be easier than React to understand (assuming decent effort put into both) as there is more of a disciplined teaching culture around it. useEffect is a feature of a framework React made by a tech company, with coders taking the mantle of teaching it to others.
I'm not joking.
creata•1d ago
Some specific formal systems, maybe. I feel like a lot of people could get some mileage out of learning Dafny, and it'd definitely be easier than learning React imo.
bubblyworld•1d ago
creata•1d ago
bubblyworld•1d ago
I think you've changed my mind, actually, thank you. I still maintain that this requires more mathematical maturity than diving into react, but I may be saying that out of ignorance of react's internals =P
bravesoul2•10h ago
bubblyworld•9h ago
YetAnotherNick•23h ago
ted_dunning•1d ago
You are hiring the wrong ones.
almostgotcaught•1d ago
You're just repeating the same claim - yes I agree interns can also understand pytorch. Does that mean they can build prod LLM pipelines?
bravesoul2•1d ago
almostgotcaught•1d ago
o.O ...
No man I'm saying a very simple and non-controversial thing: just because someone does a tut, has an elementary understanding, knows the basics, doesn't mean I'm going to let them build/maintain/own mission critical systems.
It's not that deep.
bravesoul2•1d ago
But the question really is: what solution minimises tbe number of those hours? Formal or informal?
simiones•1d ago
AnimalMuppet•1d ago
simiones•23h ago
theamk•12h ago
anonzzzies•1d ago
simiones•1d ago
anonzzzies•1d ago
naasking•1d ago
g9yuayon•1d ago
``` []((Q & <>R) -> ((!P & !R) U (R | ((P & !R) U (R | ((!P & !R) U (R | ((P & !R) U (R | (!P U R)))))))))) ```
Davidbrcz•1d ago
g9yuayon•18h ago
somethingsome•1d ago
AnimalMuppet•1d ago
pjmlp•1d ago
And then there are the code changes, which most likely devs aren't going back to the formal model to validate them, before applying the new set of changes.
dataflow•1d ago
pjmlp•23h ago
bbkane•22h ago
They said it checks logs emitted from the running application against the formal model. Sounds really useful for this exact problem; hopefully one day they'll open source it.
pjmlp•22h ago
nickpsecurity•1d ago
Let's say I want to teach them non-formal, high assurance. That will involve safe subsets of their language, functional style of programming with inputs to outputs, isolation of side effects, and code reviews. They can do this without any extra or unusual knowledge.
The basics of testing would likewise build on their existing knowledge. If their language supports it, static analyzers might work with the push of a button. Basic contracts might knock out many type or range errors without much thought. Basic tests go from their intuition to specific code implementing it.
OK, now let's build models using relationships expressed in Alloy or TLA+. Now, they have to learn to think in a totally different way. They'll need new notations, new techniques. They'll have to put time into these not knowing the payoff. They'll also have far less or zero resources like StackOverflow or GeeksForGeeks that can solve the whole problem for them.
Moving into theorem proverbs, they now have to know math or logic. They have to think formally. They have to specify the problem that way, keep it in sync with the English, do lots of work that might just be about limitations of the formal tool, and eventually they succeeded (or dont) at the proof. It's also a very, slow process vs informal methods of correctness checking.
So, it's quite difficult to learn these things vs alternative methods. They also might have low or slow payoff while others payoff quickly and often. That's why I push using the alternatives first. Then, learning others overtime to gradually add them to your toolbox for when it makes sense: data structures, interface specification, highly-critical routines, etc.
agentultra•23h ago
There’s a spectrum of problems one can apply FMs to. Galois is probably selecting for the harder end of projects that do require PhDs with scary sounding degrees.
In industry there are a lot of problems that don’t require that level of skill which can benefit a lot from application of FM’s.
nickpsecurity•22h ago
The biggest problem I see with that is how to specify those specs, especially contracts. I'd like to see a free collection of specs for common properties. Examples include ranges, ordering, not a value, invariants, and confidentiality. If searchable, then that might already be enough for many situations.
From there, we can develop tools to generate them for people. The tools might parse the code, ask some questions, and generate the specs. Alternatively, LLM's trained on such things might generate specs. They might review them and suggest things. Over time, we'll also have more components that are already spec'd which specialists might build on.
We just really need ways for developers to go from their situation to specs with no work or little work. If it's as easy as code and comments, we'll see more adoption.