> Property-based testing vs. Antithesis
> Property-based testing (PBT) uses random inputs to check individual data structures, procedures, or occasionally whole programs for high-level invariants or properties. Property-based testing has much in common with fuzzing—the main differences are heritage (PBT comes from the functional programming world, while fuzzing comes from the security/systems programming world) and focus (program functionality vs. security issues). Like fuzzing, PBT is generally only applicable to self-contained libraries and processes.
> Antithesis is analogous to applying PBT to an entire interacting software system—including systems that are concurrent, stateful, and interactive. Antithesis can randomly vary the inputs to a software program, and also the environment within which it runs. Like a PBT system, Antithesis is designed to check high-level properties and invariants of the system under test, but it can do so with many more types of software.
I've scrubbed through the video, and it seems to be 100% talking-head filler except for an outro still image—no actual video information content at all unless you want to analyze Wilson's facial expressions or think he's hot.
Regular reminder that yt-dlp (--write-sub --write-auto-sub --sub-lang en) can download subtitles that you can read, grep, and excerpt, so you don't have to watch videos like this unless you like to.
How did you get ytdlp to work? It used to work for me and I just did a fresh install a week ango and now youtube is giving me auto/cookie/sign in errors (captcha I presume?) when it didn’t before
(I haven't tried it today so can't speak to whether this is a complete solution in this particular case.)
Edit: a few minutes later it worked, although I didn't let it download the whole video, because it was huge. The subtitle file is 12631 words processed with http://canonical.org/~kragen/sw/dev3/devtt.py. That's about 38 minutes of reading.
One drawback of the transcript in this case is that it doesn't identify the speaker. It doesn't seem to contain many errors.
The key point seems to be this one (18'06"):
> But what you what you what you want to do is use guidance and use feedback from the system under test to optimize the search and notice when things have interesting things have happened, things that aren't necessarily bugs, but that are rare behavior or special behavior or unusual behavior. And so the test system can see that something interesting has happened and follow up opportunistically on that discovery. And that gives you a massive lift in the speed of finding issues.
> And the way that we're able to do that is with this sort of magical hypervisor that we've developed which allows us to deterministically and perfectly recreate any past system state.
> So people generally think of the value of that hypervisor as like any issue we find is reproducible. Nothing is nothing is works on my machine. If we find it once we can repro it for you add infin item.
Including reproducibility that isn't of phenomena that are, strictly speaking, computational:
> like all of the like very low-level decisions about when threads get scheduled or how long particular operations take or you know exactly how long a packet takes to get from node A to node B will reproduce 100% perfectly from run to run.
But, interestingly, they're not targeting things like multicore race conditions, even though their approach is the only way you could make them reproducible; instead they just always do some kind of thread interleaving (though they do change the thread interleaving order):
> If you did it that way, you could like a cycle accurate CPU simulator, you could find all kinds of like weird bugs that required like true multicore parallelism or like you know weird me atomic memory operations, stuff like that. Yeah. Um, we are not trying to find those bugs because 99.999% of developers can never even think about those bugs, right? Like we're trying to we're trying to find we're trying to find more more everyday type stuff.
Also:
> 99% of your CPU instructions are just executing on the host CPU and it's very fast. Um and so that that means there's not much performance overhead at all to doing this which is which is I think really important to making it actually practical.
I'm guessing this means they're using the hypervisor virtualizable instruction set extensions on amd64 CPUs (VT) just like Xen or whatever.
I found amusing the analogy of deterministic-replay-based time-travel fuzzing (like American Fuzzy Lop does) to save-scumming:
> But the crazy thing is once I have a time machine, once I have a hypervisor, I can run until I make event A happen. And then if I notice that event A has happened, I can say this is interesting. I want to now just focus on worlds where event A has happened. I don't need to refind event A every single time. I can just lock it in, right? It's like if you play computer games, it's like save scumming, right? It's like I can I can just save my state when I got the boss down to half health and now always reload from that point.
> And so it takes me a thousand trials to get event A to happen and now just another thousand to get B to happen instead of it taking a million trials if I always have to start from the start.
A lot of the content of the interview is not going to be novel if you're familiar with things like afl_fuzz, data prevalence, or time-travel debugging, but it's pretty interesting to read about what their experiences are.
As far as I know, though, this is novel:
> when we actually do find a bug we can then go back and and ask when did the bug become inevitable right this is this is kind of kind of crazy
> how how
> right we can we can we can go back to the previous time that we reached in and changed the future and we can try changing it to like a hundred different things and see if they all still hit the bug. And if they do, it means the bug was already baked in. And then we can go back to the next one before that and do the same thing.
> Yeah. Yeah.
> And we can sort of bisect backwards and then we can find the exact moment when the bug went from really unlikely to really likely. And then we can do things like look at which lines of code were running then, you know, look at, you know, look at all all you know what what log messages were being printed then. And often that is actually enough to root cause the bug too.
also not sure about antithesis biz practices. you pay them for integration, you spend time educating them and improving their product. and in the end get vendor locking on their compute with arbitrary non transparent pricing.
if your are not in rust - sure it can be price efficient.
There’s a reason you should still be writing unit tests and hypothesis/property tests in Rust, to catch issues the compiler can’t catch at runtime which is a huge surface area.
in rust i just have more time for other things you mentioned.
also it is clear you are misunderstand rust. rust type and macro system allow to write adhoc partial proves of things around in my code with no extra tooling. that is easy bits what rust adds on top of thread and mem.
and definetely i do need to run for help of rocq right away, rust ecosystem has a lot of options.
also not only lang itself matters, but also cargo which goes along.
i argue, overall, that antithesis less likely will be adopted in rust because language itself (in sense extended to typelevel patterns and in macro simulations) and its ecosystem (by ecosystem i mean available libraries and tools and integrations which cover a lot of antithesis agenda). i did not expanded ecosystem argument so, because there was objection to that yet.
You're making the claim that Antithesis isn't necessary because compile time type-checking solves problems that Antithesis is targeting. That's strictly not true; the kinds of bugs that Antithesis is targeting are not solved via type checking and has never been something Rust has targeted at solving, through ecosystem or otherwise. See my example about trying to implement a distributed consensus algorithm.
sorry, i never claimed what you stated above.
you have taken only part of my initial statement and made total nonsense of it by stating it is the only one thing i said.
i said:
1. rust needs antithesis less of others
2. rust has equivalent tooling for free
3. biz practices of antithesis will harm its adoption, in rust eco
It would be irresponsible to suggest that Rust eliminates a large enough proportion of common errors that you can YOLO anything that compiles and achieve an acceptable defect rate... but it does happen to be true in my experience.
tests test what? logic. logic = types (proven). so stronger type system less test needs to be written.
more, if proc macro or build.rs can execute logic, based on parsed """types"""(partial info only), we can extend type systems with custom logic (and panic at compile time and/or startup time if usage violation detected).
on top of that, add fail fast (fail at compile time, build time or start up time) and newtype and errors-part-of-api culture; and lack of late binding (dyn is very limited use, no runtime reflection), and we get even less reasons to write tests.
some examples of industrial """typing"""(eDSLs, construction time) solutions in rust :
- https://github.com/elastio/bon
- https://github.com/contextgeneric/cgp
- https://github.com/paritytech/orchestra
- https://git.pipapo.org/cehteh/linear_type.git
sure we need write tests, and tests like antithesis helps with.
but list of tools helping with tests exactly as antithesis does(and more of others) is huge. that is built on top of absolutely strong supply chain audit, quality and security. there is even """levels""" of determinism tooling to decide how much to pay for cloud compute.
wdym by "no tests"?
seems you have not read my response at all...
> sure we need write tests, and tests like antithesis helps with.
also, imho both lamports (bft and paxos) and are better of raft https://www.hytradboi.com/2025/2016d6c4-b08d-40b3-af2f-67926...
2. i mentioned tooling, also in other thread i mentioned ecosystem
- https://github.com/BurtonQin/Awesome-Rust-Checker
- add link for dozen of tools for cfg feature(product lines) eng here (compile time conditional composeability) and const expressions
- add link to dozen of tools to audit deps and decide on panic safety and first class crash hooks crates
- mix with nix which many (way many compared to go, kotlin, cpp) rust projects use (and sri cargo.lock out box)
- and qemu/mozilla/no rr tools (no determinism to get seed, and run slower hosts things w that seed)
- and add here clusterfuzz eco (supports all oss fuzzers, and these do proper search https://camshaft.github.io/bolero/features/unified-interface..., and rust adapters allow for manual macro based fuzz narrowing in rust)
- https://quint-lang.org/docs/model-checkers
- several ongoing projects formalizing rust std and lang
SO, why i need antithesis?
To get Raft, will do sans-io with full typed rust covered by proofs for state(machine) and macro generator for glue(from state machine).
I need only prop/fuzz test to test small integration sans-io part. I do not need antithesis.
> To get Raft, will do sans-io with full typed rust covered by proofs for state(machine) and macro generator for glue(from state machine). I need only prop/fuzz test to test small integration sans-io part. I do not need antithesis.
Again, I think you're missing the point entirely. With hand written property test/fuzz test, you have to manually and correctly implement all the failure conditions that can happen "sans-io". A notable one would be packet reordering (if you're I/O doesn't guarantee ordering) or network partitions. The point of Antithesis is that it will transparently do all of that state checking for you. It's not that you'd not get there eventually, but Antithesis will be an easier system and you'll move faster. Whether or not that's valuable is a decision everyone makes for themselves.
You generate the workload by defining your test case the same as property tests or traditional example tests. You cannot call arbitrary network services.
Or it like a super set of mutation testing?
Antithesis is more like property testing - it tests your code under random scenarios and sees if the tests still pass. Unlike property testing, rather than just random inputs, it also randomly reorders events at a very deep level to make sure your distributed system behaves correctly. It can even be used for simple things like helping you deterministically reproduce a flaky test in a non-distributed system.
narsa123•4mo ago
fitzn•4mo ago
https://reflect.run/mobile-testing/
disclaimer: I co-founded Reflect.
narsa123•4mo ago