frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

OpenAI to buy AI startup from Jony Ive

https://www.bloomberg.com/news/articles/2025-05-21/openai-to-buy-apple-veteran-jony-ive-s-ai-device-startup-in-6-5-billion-deal
393•minimaxir•3h ago•544 comments

For Algorithms, a Little Memory Outweighs a Lot of Time

https://www.quantamagazine.org/for-algorithms-a-little-memory-outweighs-a-lot-of-time-20250521/
17•makira•45m ago•0 comments

Collaborative Text Editing Without CRDTs or OT

https://mattweidner.com/2025/05/21/text-without-crdts.html
112•samwillis•3h ago•23 comments

Animated Factorization

http://www.datapointed.net/visualizations/math/factorization/animated-diagrams/
194•miniBill•5h ago•38 comments

Introducing the Llama Startup Program

https://ai.meta.com/blog/llama-startup-program/?_fb_noscript=1
111•mayalilpony10•4h ago•34 comments

Storefront Web Components

https://shopify.dev/docs/api/storefront-web-components
68•maltenuhn•3h ago•19 comments

Devstral

https://mistral.ai/news/devstral
205•mfiguiere•5h ago•41 comments

LLM function calls don't scale; code orchestration is simpler, more effective

https://jngiam.bearblog.dev/mcp-large-data/
84•jngiam1•3h ago•20 comments

The curious tale of Bhutan's playable record postage stamps (2015)

https://thevinylfactory.com/features/the-curious-tale-of-bhutans-playable-record-postage-stamps/
19•ohjeez•1h ago•0 comments

Harnessing the Universal Geometry of Embeddings

https://arxiv.org/abs/2505.12540
49•jxmorris12•2h ago•14 comments

Understanding the Go Scheduler

https://nghiant3223.github.io/2025/04/15/go-scheduler.html
54•gnabgib•3d ago•6 comments

An upgraded dev experience in Google AI Studio

https://developers.googleblog.com/en/google-ai-studio-native-code-generation-agentic-tools-upgrade/
22•meetpateltech•2h ago•8 comments

Harper (YC W25) Is Hiring Applied AI / AI Context Engineers and Data Scientist

https://www.ycombinator.com/companies/harper/jobs
1•nairtu•3h ago

New dwarf planet found in our solar system

https://www.minorplanetcenter.net/mpec/K25/K25K47.html
26•ddahlen•1h ago•17 comments

The US has a new most powerful laser hitting 2 petawatts

https://news.engin.umich.edu/2025/05/the-us-has-a-new-most-powerful-laser/
63•voxadam•5h ago•64 comments

By Default, Signal Doesn't Recall

https://signal.org/blog/signal-doesnt-recall/
229•feross•3h ago•137 comments

Launch HN: SIM Studio (YC X25) – Figma-Like Canvas for Agent Workflows

39•waleedlatif1•4h ago•29 comments

'Turbocharged' Mitochondria Power Birds' Epic Migratory Journeys

https://www.quantamagazine.org/turbocharged-mitochondria-power-birds-epic-migratory-journeys-20250519/
70•rbanffy•6h ago•46 comments

Show HN: I've built online video editor

https://clipjs.vercel.app/
5•mohyware•28m ago•5 comments

Show HN: Appwrite Sites – the open-source vercel alternative

https://appwrite.io/blog/post/announcing-appwrite-sites
34•eldad_fux•2d ago•22 comments

What Is the Difference Between a Block, a Proc, and a Lambda in Ruby? (2013)

https://blog.awaxman.com/what-is-the-difference-between-a-block-a-proc-and-a-lambda-in-ruby
40•Tomte•3d ago•5 comments

Show HN: Evolved.lua – An Evolved Entity Component System for Lua

https://github.com/BlackMATov/evolved.lua
37•blackmat•4h ago•7 comments

Building my own solar power system

https://medium.com/@joe_5312/pg-e-sucks-or-how-i-learned-to-stop-worrying-and-love-building-my-own-solar-system-acf0c9f03f3b
369•JKCalhoun•3d ago•275 comments

Lune: Standalone Luau Runtime

https://github.com/lune-org/lune
54•erlend_sh•5h ago•32 comments

All That Glitters

https://magazine.atavist.com/all-that-glitters-jona-rechnitz-lawsuit-jadelle-jewelry-coba-ethereummax-mayweather/
10•gmays•1h ago•0 comments

Ratatoi is a C libary that wraps stdlib's strtol (as atoi does), but it's evil.

https://github.com/rept0id/ratatoi
5•rept0id-2•1h ago•4 comments

Python Tooling at Scale: LlamaIndex’s Monorepo Overhaul

https://www.llamaindex.ai/blog/python-tooling-at-scale-llamaindex-s-monorepo-overhaul
20•cheesyFish•2h ago•8 comments

Show HN: Representing Agents as MCP Servers

https://github.com/lastmile-ai/mcp-agent/tree/main/examples/mcp_agent_server
20•saqadri•2h ago•4 comments

Visualizing entire Chromium include graph

https://blog.bkryza.com/posts/visualizing-chromium-include-graph/
24•bkryza•5h ago•0 comments

Show HN: Trendly AI – Trend detection across 42 languages

https://trendlyai.com/
24•bhuwanaryal1404•5h ago•10 comments
Open in hackernews

Why Property Testing Finds Bugs Unit Testing Does Not (2021)

https://buttondown.com/hillelwayne/archive/why-property-testing-finds-bugs-unit-testing-does/
53•Tomte•8h ago

Comments

arnsholt•7h ago
I haven't used PBT much, but I did once get a lot of mileage out of it, which was when I was implementing a fairly gnarly edit-distance algorithm. In that case, I used PBT to check that the required properties of metric functions (d(x,x)=0, d(x,y)>0, d(x,y)=d(y,x), d(x,z) <= d(x,y)+d(y,z)) held for my implementation, which helped shake out a fair few stupid implementation mistakes.
frogulis•6h ago
Aw man, I was nodding along with the "most examples suck" section and then... it ended :(
pfdietz•6h ago
I'll give you the good example I've been doing for the last two decades: testing a compiler.

The complexity here is the complete opposite of the simple toy examples. What are the edge cases of an optimizing compiler? How do you even approach them, if they're buried deep in a chain of transformations?

The properties are simple things like "the compiler shouldn't crash, the compiled code shouldn't crash, and code compiled with different optimization levels should do the same thing." This assumes the randomly generated code doesn't touch undefined behavior in the language spec.

Here's a recent example of a bug found by this approach. The Common Lisp code stimulating the bug has been automatically minimized: https://bugs.launchpad.net/sbcl/+bug/2109837 with the bug fix https://sourceforge.net/p/sbcl/sbcl/ci/1abebf7addda1a43d6d24...

chipsrafferty•5h ago
Unfortunately I have no idea what I'm looking at here.
pfdietz•3h ago
The function constructs two lambda expressions (source code for anonymous functions) that should be equivalent. One has some extra declarations. It then compiles the two lambda expressions and calls the compiled code on the same arguments, and gets different values (which is the bug).
hikarudo•5h ago
Here's a great talk by John Hughes, one of the authors of QuickCheck, with real-life examples:

https://www.youtube.com/watch?v=zi0rHwfiX1Q

nickpsecurity•5h ago
They should put some good ones in the article.
djoldman•6h ago
Isn't unit testing a subset of property testing?

Seems like a unit test tests specific input and property testing tests more than one input.

skybrian•6h ago
Yes, but converting a test to be data-driven adds some complexity (how do you debug just the failures?) and generating the inputs randomly makes it harder to know what properties to assert.

Also, if you never look at the test data, it might give you false confidence about how thoroughly your code is being tested.

stonemetal12•5h ago
Yes, Property testing is a generalization of unit testing.
andreareina•5h ago
They’re different, complementary approaches. Example based testing has a known input and output. Unless you have an oracle (not usually the case) you won’t know the exact output you should get from a generated input. So instead you depend on something that should be true no matter the input. e.g. if you have a serialize/deserialize pair then `deserialize(serialize(x))` better equal `x`.
Jtsummers•4h ago
Sort of. But you can also use PBT for integration and end-to-end testing, too. A user tries to schedule something and it's overlapping with an existing event, what happens? A user adds 10 events with their various requirements (none overlapping) do they successfully schedule?

Set the expectations and model the expected behavior, verify that the system matches that expectation. The approach works at all testing scales.

chriswarbo•3h ago
Yes, if you've got a mixture of unit tests and property tests then you can write them all using a single PBT framework, rather than needing two test frameworks.
esafak•6h ago
I independently discovered PBT when I was a junior, and suggested we use it. My coworkers rejected it because tests should be predictable, and it's the programmer's job to pick the edge cases.
skybrian•6h ago
You do need to be able to reproduce a test failure, which can be done by printing the inputs or the random seed used in a way that makes it trivial to rerun it.
alkonaut•5h ago
I think it goes without saying. PBT shouldn't be random-random (e.g. use a timestamp or cryptographic seed), it should be deterministically pseudorandom if it uses random values. You shouldn't be able to run it 10 times and get 9 pass and one failure. It's either 10 passes or 10 failures.
Akronymus•5h ago
> You shouldn't be able to run it 10 times and get 9 pass and one failure. It's either 10 passes or 10 failures.

With property based testing, it actually CAN be 9 passes and 1 failure, because that one single fail can be hitting an edge case the others just aren't. In fact, only a few failures are more likely than it being all failures

bluGill•5h ago
That is the one thing about PBT that worries me. I can write code and all tests pass, then next week the edge case I missed randomly is hit by a coworker who now has to figure out why their change broke my code (it didn't).

I can tell you from experience that random failures cause loss of trust. People learn to ignore failures and just keep hitting rebuild until the tests pass. People will not investigate test failures in code that they don't understand.

Akronymus•5h ago
That's a fair concern. I can only really suggest upping the amount of test cases that are ran when merging so that you get a much more extensive run for that time, and later dial it back. Along with having the seed included in the failure case, so that you can bisect to check what actually broke that test. Also, implementing a standard test alongside the property based on, for all bugs you encounter over time (basically "hard coding" one of the failure cases)

But yeah, probabilistic testing isn't perfect.

aswerty•4h ago
Potentially using the git hash as a seed would make sense, so for a given snapshot of code it is always going to be deterministic. When the git hash changes (i.e. your code) then that would result in a different set of test inputs running.

Allowing reproducibility for a given change set.

Akronymus•4h ago
That's a pretty good idea, actually. Using the git hash as a seed to seed the rng for the different runs of the PB tests.

Didn't even enter my mind.

chriswarbo•4h ago
Using a git hash still has the problem of a co-worker's changes (which alter the git commit) causing an unrelated property to fail.

Hypothesis has a nice option, to pick the seed for each property by hashing that property's code. It's a nice idea, but relies on Python's highly dynamic nature; so may not be easy/possible in other languages (especially compiled ones).

jononor•5h ago
So lock the seed in that case?
bluGill•4h ago
If you lock your seed you are worse than unit tests - odds are you are not testing some interesting cases ever. The whole point of PBT is there are some properties that hold in all cases (well technically in the domain of the function inputs), so try random examples to see if I missed something. Generally the domain of all possible function inputs is a very large set such that exhaustive testing is impossible. The more different random seeds we try the better the odds we eventually catch some case that we didn't handle correctly.
IanCal•4h ago
This is fair, but also a tooling or process problem.

The co-worker should add the test to a new branch / test it on main. If it fails that's a new ticket (with the great side effect of having a failing test). If that passes it's a problem in their branch. If not it's the same as having a broken main which happens anyway and you deal with that as you usually do.

zelphirkalt•2h ago
Usually though the failure case inputs should be stored, so that once the test fails, it will fail again, no matter how often you hit the run or rebuild button.
alkonaut•2h ago
When would that be preferable to have nondeterministic random?
Akronymus•2h ago
I don't think I have said anything about it being nondeterministic.

You can have a prng seeded from something like the commit hash and have prngs generate the test cases. That still can fail on 1/10 tests for a particular run.

Jtsummers•2h ago
With PBT you generate an arbitrary number of inputs, ideally they all pass. However it's entirely possible that you have an error in your program and the property only holds for some inputs, but not all. In that case, since each execution starts with a different seed (unless you provide a specific seed, in which case the generated inputs should be exactly the same each time), you may have some executions that always pass, some that always fail, and others that are in between (have a mix of passes and failures). This is expected.
aswerty•5h ago
Just as an anecdotal experience. It doesn't necessarily go without saying.

The most memorable discussion I had around PBT was with a colleague (a skip report) who saw "true" randomness as a net benefit and that reproducibility was not a critical characteristic of the test suite (I guess the reasoning was then it could catch things at a later date?). To be honest, it scared the hell out of me and I pushed back pretty hard on them and the broader team.

I have no issue with a psuedo-random set of test cases that are declaratively generated. That makes sense if that is what is meant by PBT. Since it is just a more efficient way of testing (and you would assume this would allow you to cast a wider net).

senderista•4h ago
This dilemma is of course trivially solvable by persisting a (presumably randomly generated) RNG seed with each test run. You just have to ensure that your RNG is configured once with the seed at the beginning of each test run.
IanCal•4h ago
What's the issue you have?

The idea is you have random testing and the test failures are added as explicit tests that then always get run.

Is that so different from someone else testing?

The main issue is you stumble across a new issue in an unrelated branch, but it's not wildly different from doing that while using your application.

crabbone•5h ago
Nah. That's not what this is about. In any system worth testing, with property-based testing, the system will take so many steps before it encounters an error that even knowing what the steps were isn't going to be very helpful in reproducing the error.

I've been there. It takes many hours to try to guess where the system went wrong to produce the undesirable result, and then you still might not be sure if you are looking at the right place, and then there are always environment issues, you aren't sure of. So, you don't know who to blame.

Only very simple systems will reproduce pathological results 100% of the time given some initial conditions. The bane of complex systems is the timeouts. They are usually very hard to justify and are easy to blame for undesirable behavior.

jononor•5h ago
Property based testing is incredibly useful for state-free systems/modules. Especially those that have a wide/complex input space. Simple examples would be a general (de)serialization library for something like JSON.
skybrian•3h ago
Yes, timeouts are bad, but that’s true even for regular tests with determininistic input. You’ll get flakes running on different machines, depending on how much load is on them.

To test an entire system with reproducable failures, you probably need something more heavyweight like Antithesis. Property tests are more useful for unit tests.

nottorp•5h ago
It's unrelated to this article, but I suppose this motivation is why no one has a Gremlins-like feature any more.

It seems to be almost totally forgotten, since the only link I could find is an excerpt from a PalmOS programming book:

https://www.oreilly.com/library/view/palm-programming-the/15...

chriswarbo•3h ago
That sounds like fuzz testing; which is similar to PBT, but (a) usually checks a single property ("the program doesn't crash") and (b) sends data via the program/system's ordinary input channels (whereas PBT has "white box" access to internals, like unit test do).
mrguyorama•11m ago
Good fuzzing workflows and tools use the flow of code and which branches get taken to help find correlations between inputs and outcomes, and uses that extra context to more efficiently fuzz inputs.

In bigger programs this is an outright necessity because pure random fuzzing would basically be a lottery.

I've always felt that unit testing frameworks and libraries and even parameterized testing were missing this kind of functionality.

Intellij is able to run my tests and figure out the code coverage, but why isn't it closing the loop and auto-fuzzing/auto-discovering how to mutate tests to cover more?

And don't point me at AI, none of this requires AI and nothing should have to "think" to do this.

It's crazy to me that the vast majority of code running all the time is not exhaustively tested through almost all of it's possible state space with most of it's possible input space. It's not like we are lacking the CPU bandwidth to do it.

Why can't I write a new function and have something tell me within ten minutes "this input param causes an exception" without any effort from me? Instead all those extra cores in my CPU just run javascript trash and crowdstrike scanners

sundarurfriend•1h ago
The gremlins turned into monkeys (as they quickly did in the page you shared as well):

https://developer.android.com/studio/test/other-testing-tool...

nottorp•1h ago
Ohh nice. Thanks!
pfdietz•5h ago
Tests should be optimized to find bugs. Tests that you have already run have a lower chance of doing that (they only find regressions); tests with novel inputs are preferred. And since writing tests manually is so expensive, this means automatic test input generation. How do you determine if such tests pass? Properties.

In practice, property based testing fails because the organization is not actually interested in delivering correct code. "This bug will never happen in practice so we won't fix it." "If we fix this, we may change some incorrect behavior some customer is depending on." And once that happens, PBT is useless, because it will keep finding that "don't fix" bug over and over.

chriswarbo•3h ago
Your last paragraph is using terms like "correct", "fix" and "bug" as if they're absolute, when they're actually relative to some sort of spec (whether formal or informal, written or vibes-based, etc.). If the organisation controls the spec, then it can be perfectly reasonable for them to update that spec to e.g. allow certain behaviours that previously would have been considered bugs.

In that case, we update the properties to reflect the new spec.

pfdietz•2h ago
The spec becomes "don't break important customers code". How could one possibly formalize that?
pfdietz•6h ago
One can use property testing to automatically generate unit test inputs. The property becomes "does this do something new that I wanted to test for but wasn't?" (or, rather, the negation; inputs "pass" and can be ignored if they do nothing new.) This could be code coverage, or it could be killing code mutants.
bluGill•5h ago
Unfortunately it ends before it gets to the good stuff. It has me interested that maybe PBT can find some bugs that unit testing wouldn't - however I'm not sure how to write a PBT that would catch those bugs. The obvious tests that drive PBT advocates to drink are not interesting - unit tests will catch all the errors and because there is no randomness they will catch the errors faster in general. However how do I write a property for non-trivial functions it not clear at all, and the article stops.
drowsspa•5h ago
Yeah, for simple nearly mathematical functions it's quite clear how to do it. I find it hard to extend this to more business-focused inputs
zelos•5h ago
I remember using something similar a long time ago - basically fuzz testing, I suppose you could call it property based testing where the property is "whatever edits the user does, we should be able to save and reopen the document without crashing".

It found so many bugs: file corruption, crashes, memory leaks, pathological performance issues. The kind of issues that standard unit testing doesn't find.

bpshaver•5h ago
Does the article he links to towards the end of the article address your concerns?

> Without complex input spaces, there's no explosion of edge cases, which minimizes the actual benefit of PBT. The real benefits come when you have complex input spaces. Unfortunately, you need to be good at PBT to write complex input strategies. I wrote a bit about it here...

Here's the link: https://www.hillelwayne.com/post/property-testing-complex-in...

bluGill•3h ago
It is a start, but I still feel like I'm not sure how I'd apply that to my own domain.
chipsrafferty•5h ago
Indeed! The article says PBT is useful, but can't provide any examples of how :/
IanCal•5h ago
I used it for running a series of "execute this api call".

If it's valid for a user to do, you can make a list and have it do those in sequence.

I had this for a UI library. It could call the functions to add and create the library and then afterwards would move through it. It was for the BBC so on TVs and could move u/d/l/r - the logic was regardless of the UI if you moved right and the focus changed then moving left should bring you back to where you were (u/d the same, etc).

That's tricky, yet being able to write

FOR ANY ui a person can construct

FOR ANY path a user takes through it

WHEN a user presses R

AND the focus changes

THEN when the user presses L

THEN the user is on the item they were before

Was actually quite easy to write and yet insanely powerful.

The one that really convinced me on PBT was one in this library where it found the bug, and the bug had an explicit test for it and it was explicitly in the spec but the spec was inconsistent! The spec was broken, and nobody had noticed.

Another that drove out a lot of bugs was similar but was that regardless of how many ui changes we made and how many movements the user made, we always had something in focus.

Anyway, the big thing here I want to stress is a series of API calls and asserting something at the end or all the way through.

Side note - oh my this is so long ago, 15 years ago building a new PBT tool in actionscript

senderista•4h ago
You can do "white-box" PBT by just asserting all the nontrivial invariants you can think of in your code, and then counting on the generator to force evaluation of those invariants on a representative sample of inputs.
ngruhn•4h ago
If you have a pair of functions for encoding/decoding "something" you can do a round-trip and test that you get the original input back out, e.g.:

    JSON.parse(JSON.stringify(randomObject)) === randomObject
That often works. What also often works is generating the expected output and constructing the input from it. For example, a `stripPrefix` function that removes a known prefix from string, e.g. `stripPrefix("foo", "foobar") === "bar"`. Property test:

    stripPrefix(randomPrefix, randomPrefix + randomSuffix) === randomSuffix
Note, we "go backwards" and generated the expected output `randomSuffix` directly and then construct the input from it `randomPrefix + randomSuffix`.

Reference implementation based properties also work very often. For example, we've been developing a JavaScript rich text editor. That requires a bunch of utility functions on DOM trees that are analogous to standard string functions. For example, on a standard string you can get a char at an index with `"foo bar".charAt(3)` and on a rich text DOM tree we would need something like `treeCharAt(<U><B>foo</B> bar</U>, 3)`. The string functions can serve as a reference implementation for the more complex tree functions:

    treeCharAt(randomTree, randomIndex) === extractStringContent(randomTree).charAt(randomIndex)
The same can be done with all string functions like `slice`, `indexOf`, `trim`, ...
perrygeo•4h ago
The base property you can test for on every function is "does this crash or return?"
chriswarbo•4h ago
For code that's more "business logic" rather than "algorithmic", I find the following helpful:

- Despite the terrible tutorial examples, PBT isn't about running one function on an arbitrary input, then trying to think of assertions about the result. Instead, focus on ways that different parts of your production code fits together, what assumptions are being made at each point, etc.

- You don't need to plug random inputs directly into the code you're testing. There are usually very few things to say regarding truly arbitrary inputs, like `forAll(x) { foo(x) }`; but lots more to say about e.g. "inputs which don't contain Y" (so run the input through a filter first), or "inputs which don't overlap" (so remove any overlapping region first), and so on.

- Don't focus on the random inputs; the whole idea is that they're irrelevant to the statement you're asserting (it's meant to hold regardless of their value). Likewise, if your unit test contains some irrelevant details, use PBT to generate those parts instead.

- It's often useful in business-type software to think of a "sequence of actions" (which could be method calls, REST endpoints, DB queries, or whatever). For example, "any actions taken as User A will not affect the data for User B". Come up with a simple datatype to represent the actions you care about, write a function which "interprets" those actions (i.e. a `switch` to actually call the method, or trigger the endpoint, or submit to query, or whatever). Then we can write properties which take a list of actions as input. Remember, we don't need to run truly arbitrary lists: a property might filter certain things out of the list, prepend/append some particular actions, etc.

- Once we have some assertion, look for ways to generalise it; for example by looking for places to stick extra things which should be irrelevant.

As a simple example, say we have a function like `store(key, value)`; it's hard to say much about the result of that on its own, but we can instead say how it relates to other functions, like `lookup(key)`:

    forAll(key, value) {
      store(key, value);
      assertEqual(lookup(key), Some(value))
    }
Yet we don't really care about lookups happening immediately after stores, we want to make a more general statement about values being persisted:

    forAll(key, value, pre, suf) {
      runActions(pre)  # Storing shouldn't be affected by anything before it
      store(key, value)
      runActions(suf.filter(notIsStore(key)))  # Do anything except storing the same key
      assertEqual(lookup(key), Some(value))
    }
Jtsummers•3h ago
https://hypothesis.readthedocs.io/en/latest/stateful.html

That last test style you describe can be done with Hypothesis. I've had some good success testing both Python programs and programs written in other languages that could be driven from Python with it. Like a server using gRPC (or CORBA once) as an interface, driven by tests written in Python imitating client behavior.

hwayne•56m ago
A couple of useful general approaches:

- "Metamorphic testing" is where analyze how code changes with changing inputs. For example, adding more filters to a query should return a strict subset of the results, or if a computer vision system recognizes a person, it should recognize the same person if you tilt the image.

- Creating a simplified model of the code, and then comparing the code implementation to the model, a la https://matklad.github.io/2024/07/05/properly-testing-concur... or https://johanneslink.net/model-based-testing

There's also this paper, which I haven't read yet but seems intriguing: https://andrewhead.info/assets/pdf/pbt-in-practice.pdf

moi2388•5h ago
Why are we even testing to begin with, and not using theorem provers like lean to prove without any doubt that our commutative function is indeed commutative?
klysm•5h ago
I agree with the sentiment, but formal methods are tricky
pfdietz•5h ago
It's an example of Alfred Whitehead's quote:

“Civilization advances by extending the number of important operations which we can perform without thinking about them.”

We can do property based testing with less thinking than if we try to prove correctness. We are exploiting the ability of the computer to run millions or even billions of tests. It's the enormous power of today's computers that enables this to work.

The theorem prover approach would work only if it could be automatic: press a button, get the proof (after some acceptable delay). Otherwise, look at all the expensive manual effort you just signed up for.

You might think that as computers get faster, theorem proving becomes easier. But testing becomes easier also. It's not clear testing will ever lose this race.

recroad•5h ago
Can you please explain more and maybe give some examples that resonate with people who don't have the understanding that you do?
moi2388•2h ago
An example would be critical systems such as defence and Aerospace, where they use for example Ada Spark to formally proof certain bugs cannot occur
bluGill•5h ago
Multiple problems.

Proving code is only as good as the requirements which are often garbage - the customer often doesn't know what they even want. Even if you put in effort, requirements as the proof needs are often very abstract from the customer requirements and so your program can be proved but still be wrong because it doesn't do what the customer really wanted. In any complex program is a reasonable to state that several requirements are wrong and thus even if your prove your code correct it will be wrong. Often the problem itself cannot even be formally defined - a spell checker cannot be proved correct because human languages are not formally defined, not that you can't prove one, just that whatever you prove will be wrong.

Many systems are very complex. You can (should!) prove simple algorithms, but put everything together and a proof is not something we can do at all. There are too many halting problems like things in large programs.

Tests solve some of the above problems: They can (do not confuse with what they do!) be a simple example of "yes when inputs are exactly x,y,z then I expect that result". A bunch of simple examples that make sense can often be close enough.

We do a lot more theorem proving than most people realize. Types which many languages have are a form of formal proof. They don't cover everything, but even in C++ they cover a lot of issues.

I think the best answer is a combination: prove the things we know how to prove, and test the rest and hope that between the two we have covered enough to prevent bugs.

jmull•5h ago
For purely mathematical properties, a purely mathematical technique is probably best.

But "is commutative" is just an example here (one of the topics of this post is how simplistic examples can mislead people as to the usefulness of a given verification technique).

The general point of software verification is to ensure the software "does what I want". But in a very large proportion of cases, people aren't clear on precisely what they want. They could not use a formal method because they could not write a formal specification. A nice thing about unit tests is that you can work through your expectations iteratively and incrementally, broadening and deepening your understanding of exactly what the software should do, capturing each insight along the way in a reusable way.

kylereeve•4h ago
Assuming you're not being facetious, one of the best parts about PBT is it gets you a good percent of the value of formal proof with a lot less work. PBT at least lets you demonstrate that property is ~probably~ true, whereas traditional unit testing doesn't usually explicitly state properties.
nickpsecurity•4h ago
High-assurance systems always required a combo of methods since they can catch what others missed. Also, they have different cost-benefit ratios. A quick glance at the code or some tests catch many problems quickly while formal verification takes roughly forever in real-world, project time.

Here's a few reasons to use testing strategies:

1. Your developers might not be mathematicians.

2. Your system or its properties might be hard to specify mathematically. One can often design a test for such properties.

3. Your functions that are easy to model mathematically might also have side effects or environmental dependencies due to other requirements (eg performance, legacy).

4. Your specifications and code might get out of sync at some point. If it does, people will think the code has properties that it doesn't. That can poison the verification all the way up the proof chain.

5. Mathematical modeling or proof might take much, much longer to find the bug than a code review or testing. That is, it's a waste of money.

6. Your mathematical tools might have errors that cause a false claim of correctness. Diverse, assurance methods catch errors like this. Also, testing often uses the most, widely-used parts of a programming language. The constructs are highly likely to be compiled correctly vs esoteric methods or tools in formally-proven systems.

7. Automated testing that, in some way, searches through your execution paths can find problems your team never thought of. Fuzzing is the most common technique. However, there's many methods of automated, test generation.

Your best bet is to use code reviews, Design-by-Contract, static analyzers for common problems, contract/property-based generation of tests, fuzzing with contracts as runtime checks, and manual tests for anything hard to specify.

Don't waste time on formal verification at all unless it's a high-value asset that's worth it. If you do, first attempt it with tools like SPARK Ada and Frama-C. That have high automation. Also, if you fail on full correctness, you might still prove no runtime errors in certain categories.

tikhonj•5h ago
I wrote some property-based tests for a parsing library. Eventually one of the tests for case-insensitive parsing failed... because it hit the letter "DŽ" which has upper, lower and title-case.

The funny thing is that the parsing library was correct and it was the test property that was wrong—but I still learned about an edge case I had never considered!

This has been a common pattern for "simpler" property-based tests I've written: I write a test, it fails right away, and it turns out that my code is fine, but my property was wrong. And this is almost always useful; if I write an incorrect property, it means that some assumption I had about my code is incorrect, so the exercise directly improves my conceptual model of whatever I'm doing.

nottorp•5h ago
That's Unicode, which has a whole section of hell dedicated to it.
riwsky•5h ago
Update: It’s actually been divided into multiple sections; if your code assumes a single section, it will break as of the addition of multibyte characters in Unicode v7.1
atmavatar•3h ago
When the robots rise up and exterminate the human race, Unicode will be one of the reasons we deserve it.
rented_mule•3h ago
Is the root of the problem Unicode? Or is the root of the problem the complexity of the union of written human languages? To the extent that it's the latter, Unicode is just the messenger.
nottorp•2h ago
I would shed a tear but then I remember that they have not one, not two but four canonical forms...

And what do they do about it? They add more emoji.

Besides, even if it's justified, it's still sections 7.1-A to 7.3-D of hell.

testthetest•5h ago
Yes! We use a version of this in our end-to-end Playwright scripts as well, because we want our tests to be both:

1) lightweight, because most of our test suites run on production infrastructure and can’t afford to run them constantly

2) "creative", to find bugs we hadn’t considered before

Probabilistic test scenarios allow us to increase the surface we're testing without needing to exhaustively test every scenario.

chriswarbo•5h ago
Indeed, I've had a property fail due to `Char.isWhitespace` disagreeing with a `\s` regex over whether Mongolian Vowel Separator counts as whitespace (it's not any more, but was prior to Unicode 6.3, according to https://unicode-explorer.com/c/180E )

It's been said (I think by Hughes?) that the causes of property failures tend to be spread equally between buggy code, buggy property and buggy generator.

senderista•4h ago
Yup, and many have discovered this is an unexpected benefit of formal methods in general: writing a formal spec forces you to think precisely about requirements, assumptions, and edge cases, independently of whatever benefit formal verification may provide.
sn9•2h ago
There are techniques to discover properties of your function as written, which can then tell you if you've written the function you intended to write: https://www.fuzzingbook.org/html/DynamicInvariants.html (also at https://www.debuggingbook.org/html/DynamicInvariants.html)
metalrain•5h ago
I would love to used PBT more, but many tests I write have only one answer per input. Think sum like aggregations.

For then it's not clear how would one derive the answer from the generated inputs, that is what code is for.

But PBT can be great for pruning out crashes you don't expect while parsing.

chriswarbo•5h ago
> I would love to used PBT more, but many tests I write have only one answer per input. Think sum like aggregations.

Not quite sure what you mean by "only one answer per input" (that it's a function, i.e. a 1:1 mapping?), but there are of lots of properties that aggregations might typically need to satisfy, e.g. off the top of my head:

    # Identity element
    forAll(pre, post) {
      assertEqual(
        agg(pre ++ [agg([])] ++ post),
        agg(pre ++ post)
      )
    }

    # Invariant to order
    forAll(elems, seed) {
      assertEqual(
        agg(elems),
        agg(permute(elems, seed))
      )
    }

    # Left-associative
    forAll(xs, ys, zs) {
      assertEqual(
        agg([agg(xs ++ ys)] ++ zs),
        agg(xs ++ ys ++ zs)
      )
    }

    # Right-associative
    forAll(xs, ys, zs) {
      assertEqual(
        agg(xs ++ [agg(yz ++ zs)]),
        agg(xs ++ ys ++ zs)
      )
    }
(FYI these typical properties of a (commutative) monoid, which is an algebraic structure that describes many "aggregation-like" operations)
jononor•5h ago
Numerical aggregates often have the property that their output is in the range min and max of the input.

An aggregate on discrete values my have the property that the output is one of the elements in the input.

It may also have a no-NaN property, or maybe no-NaN unless NaN in input.

Sukera•5h ago
For aggregation-like things, the interesting properties are often about the properties of the accumulation function, and not the entire aggregation, which should then be correct by extension. So for your `sum` example, you'd use PBT to test that your `+` works first, and only then coming up with things that should hold on top of that when repeatedly applying your operation. For example, once you have a list of numbers and know it's sum, adding one additional number to the list and taking the sum again should lead to the same result as if you had added the number to the sum directly (barring non-associative shenanigans like with floating point - but that should have already been found in the addition step ;) ).

There's a bunch of these kinds of patterns (the above was inspired by [0]) that are useful in practice, but unfortunately rarely talked about. I suppose that's because most people end up replicating their TDD workflows and just throwing more randomness at it, instead of throwing more properties at their code.

[0] https://fsharpforfunandprofit.com/posts/property-based-testi...

tikhonj•4h ago
Good example for sums: you can write a property that checks whether the sum of a list is the same before and after you randomly shuffle the elements.

In practice sum is a sufficiently well-understood function that this property will only catch the edge-cases people know about up-front (integer overflow, floating point issues...). But for more complex cases, this kind of property will catch problems you didn't think about. And even if you decide that the bug is not important—sometimes we have no real choice but to live with these edgecases—at least you'll know about them explicitly and be able to document them.

recroad•5h ago
> The majority of errors you find with testing are either issues with an entire "partition" of inputs or "boundary" inputs, like INT_MIN.

I don't find this to be true at all. Most bugs I find are business scenarios that I didn't consider or mismatches in API expectations etc. Rarely is a bug for me coming from not considering edge cases of min and maximum values for integers, floats etc.