From unit tests to whole universe tests (with will wilson of antithesis) [video]

https://www.youtube.com/watch?v=_xJ4maWhSNU

36•zdw•4mo ago

Comments

narsa123•4mo ago

Any tools we could use to test mobile apps automation testing using AI (Like MCPs for mobile app testing)??

fitzn•4mo ago

Reflect tests mobile apps by converting plain text instructions into appium commands at runtime using AI. Your tests are just the text steps.

https://reflect.run/mobile-testing/

disclaimer: I co-founded Reflect.

narsa123•4mo ago

Thank you so much, This is a nice tool, I will recommend this tool to my clients.

kragen•4mo ago

I found helpful this explanation of what Antithesis isn't:

> Property-based testing vs. Antithesis

> Property-based testing (PBT) uses random inputs to check individual data structures, procedures, or occasionally whole programs for high-level invariants or properties. Property-based testing has much in common with fuzzing—the main differences are heritage (PBT comes from the functional programming world, while fuzzing comes from the security/systems programming world) and focus (program functionality vs. security issues). Like fuzzing, PBT is generally only applicable to self-contained libraries and processes.

> Antithesis is analogous to applying PBT to an entire interacting software system—including systems that are concurrent, stateful, and interactive. Antithesis can randomly vary the inputs to a software program, and also the environment within which it runs. Like a PBT system, Antithesis is designed to check high-level properties and invariants of the system under test, but it can do so with many more types of software.

I've scrubbed through the video, and it seems to be 100% talking-head filler except for an outro still image—no actual video information content at all unless you want to analyze Wilson's facial expressions or think he's hot.

Regular reminder that yt-dlp (--write-sub --write-auto-sub --sub-lang en) can download subtitles that you can read, grep, and excerpt, so you don't have to watch videos like this unless you like to.

stogot•4mo ago

Thanks for the auto sub I didn’t know it is a feature.

How did you get ytdlp to work? It used to work for me and I just did a fresh install a week ango and now youtube is giving me auto/cookie/sign in errors (captcha I presume?) when it didn’t before

tczMUFlmoNk•4mo ago

As a general rule, you should update yt-dlp before using it. They release new versions very frequently to work around new walls on YouTube and other platforms. An update usually solves this kind of issue for me, even if I've updated just a few days ago.

(I haven't tried it today so can't speak to whether this is a complete solution in this particular case.)

kragen•4mo ago

At the moment I'm getting "HTTP Error 429: Too Many Requests" (with yt-dlp-2025.9.5 installed in a virtualenv via pip), which has been happening more often recently. I got it when downloading the Spanish subtitles file after successfully downloading the English one, so yt-dlp didn't continue on to try to download the video. But YouTube has also been working unreliably for me in the browser.

Edit: a few minutes later it worked, although I didn't let it download the whole video, because it was huge. The subtitle file is 12631 words processed with http://canonical.org/~kragen/sw/dev3/devtt.py. That's about 38 minutes of reading.

One drawback of the transcript in this case is that it doesn't identify the speaker. It doesn't seem to contain many errors.

The key point seems to be this one (18'06"):

> But what you what you what you want to do is use guidance and use feedback from the system under test to optimize the search and notice when things have interesting things have happened, things that aren't necessarily bugs, but that are rare behavior or special behavior or unusual behavior. And so the test system can see that something interesting has happened and follow up opportunistically on that discovery. And that gives you a massive lift in the speed of finding issues.

> And the way that we're able to do that is with this sort of magical hypervisor that we've developed which allows us to deterministically and perfectly recreate any past system state.

> So people generally think of the value of that hypervisor as like any issue we find is reproducible. Nothing is nothing is works on my machine. If we find it once we can repro it for you add infin item.

Including reproducibility that isn't of phenomena that are, strictly speaking, computational:

> like all of the like very low-level decisions about when threads get scheduled or how long particular operations take or you know exactly how long a packet takes to get from node A to node B will reproduce 100% perfectly from run to run.

But, interestingly, they're not targeting things like multicore race conditions, even though their approach is the only way you could make them reproducible; instead they just always do some kind of thread interleaving (though they do change the thread interleaving order):

> If you did it that way, you could like a cycle accurate CPU simulator, you could find all kinds of like weird bugs that required like true multicore parallelism or like you know weird me atomic memory operations, stuff like that. Yeah. Um, we are not trying to find those bugs because 99.999% of developers can never even think about those bugs, right? Like we're trying to we're trying to find we're trying to find more more everyday type stuff.

Also:

> 99% of your CPU instructions are just executing on the host CPU and it's very fast. Um and so that that means there's not much performance overhead at all to doing this which is which is I think really important to making it actually practical.

I'm guessing this means they're using the hypervisor virtualizable instruction set extensions on amd64 CPUs (VT) just like Xen or whatever.

I found amusing the analogy of deterministic-replay-based time-travel fuzzing (like American Fuzzy Lop does) to save-scumming:

> But the crazy thing is once I have a time machine, once I have a hypervisor, I can run until I make event A happen. And then if I notice that event A has happened, I can say this is interesting. I want to now just focus on worlds where event A has happened. I don't need to refind event A every single time. I can just lock it in, right? It's like if you play computer games, it's like save scumming, right? It's like I can I can just save my state when I got the boss down to half health and now always reload from that point.

> And so it takes me a thousand trials to get event A to happen and now just another thousand to get B to happen instead of it taking a million trials if I always have to start from the start.

A lot of the content of the interview is not going to be novel if you're familiar with things like afl_fuzz, data prevalence, or time-travel debugging, but it's pretty interesting to read about what their experiences are.

As far as I know, though, this is novel:

> when we actually do find a bug we can then go back and and ask when did the bug become inevitable right this is this is kind of kind of crazy

> how how

> right we can we can we can go back to the previous time that we reached in and changed the future and we can try changing it to like a hundred different things and see if they all still hit the bug. And if they do, it means the bug was already baked in. And then we can go back to the next one before that and do the same thing.

> Yeah. Yeah.

> And we can sort of bisect backwards and then we can find the exact moment when the bug went from really unlikely to really likely. And then we can do things like look at which lines of code were running then, you know, look at, you know, look at all all you know what what log messages were being printed then. And often that is actually enough to root cause the bug too.

lioeters•4mo ago

Great tip about downloading subtitles, useful!

dlahoda•4mo ago

do they have sustained at least one prominent rust first and rust core customer? i doubt, rust has a lot of tooling and catches at compile time what their product does in runtime.

also not sure about antithesis biz practices. you pay them for integration, you spend time educating them and improving their product. and in the end get vendor locking on their compute with arbitrary non transparent pricing.

if your are not in rust - sure it can be price efficient.

vlovich123•4mo ago

I think you are misunderstanding. Rust does not solve or prevent distributed systems bugs, just memory safety and certain kinds of thread safety problems. For that you’d need to use a formal proof system like Coq.

There’s a reason you should still be writing unit tests and hypothesis/property tests in Rust, to catch issues the compiler can’t catch at runtime which is a huge surface area.

dlahoda•4mo ago

i guess most of issues anthitesis finds are preventable by simple or more evolved rust (patterns).

in rust i just have more time for other things you mentioned.

also it is clear you are misunderstand rust. rust type and macro system allow to write adhoc partial proves of things around in my code with no extra tooling. that is easy bits what rust adds on top of thread and mem.

and definetely i do need to run for help of rocq right away, rust ecosystem has a lot of options.

also not only lang itself matters, but also cargo which goes along.

Smaug123•4mo ago

You still seem to be completely misunderstanding, as is evident from the fact that your argument "proves" that in Rust you don't even need to write any tests. Again, Antithesis is designed to test distributed systems, deterministically.

dlahoda•4mo ago

sorry, where exactly i stated no need to write tests?

i argue, overall, that antithesis less likely will be adopted in rust because language itself (in sense extended to typelevel patterns and in macro simulations) and its ecosystem (by ecosystem i mean available libraries and tools and integrations which cover a lot of antithesis agenda). i did not expanded ecosystem argument so, because there was objection to that yet.

vlovich123•4mo ago

> i doubt, rust has a lot of tooling and catches at compile time what their product does in runtime.

You're making the claim that Antithesis isn't necessary because compile time type-checking solves problems that Antithesis is targeting. That's strictly not true; the kinds of bugs that Antithesis is targeting are not solved via type checking and has never been something Rust has targeted at solving, through ecosystem or otherwise. See my example about trying to implement a distributed consensus algorithm.

dlahoda•4mo ago

> You're making the claim that Antithesis isn't necessary because compile time type-checking solves problems that Antithesis is targeting

sorry, i never claimed what you stated above.

you have taken only part of my initial statement and made total nonsense of it by stating it is the only one thing i said.

i said:

1. rust needs antithesis less of others

2. rust has equivalent tooling for free

3. biz practices of antithesis will harm its adoption, in rust eco

Smaug123•4mo ago

Can you point to any of Rust's supposed "equivalent tooling"?

lmm•4mo ago

> There’s a reason you should still be writing unit tests and hypothesis/property tests in Rust, to catch issues the compiler can’t catch at runtime which is a huge surface area.

It would be irresponsible to suggest that Rust eliminates a large enough proportion of common errors that you can YOLO anything that compiles and achieve an acceptable defect rate... but it does happen to be true in my experience.

dlahoda•4mo ago

yes.

tests test what? logic. logic = types (proven). so stronger type system less test needs to be written.

more, if proc macro or build.rs can execute logic, based on parsed """types"""(partial info only), we can extend type systems with custom logic (and panic at compile time and/or startup time if usage violation detected).

on top of that, add fail fast (fail at compile time, build time or start up time) and newtype and errors-part-of-api culture; and lack of late binding (dyn is very limited use, no runtime reflection), and we get even less reasons to write tests.

some examples of industrial """typing"""(eDSLs, construction time) solutions in rust :

- https://github.com/elastio/bon

- https://github.com/contextgeneric/cgp

- https://github.com/paritytech/orchestra

- https://git.pipapo.org/cehteh/linear_type.git

sure we need write tests, and tests like antithesis helps with.

but list of tools helping with tests exactly as antithesis does(and more of others) is huge. that is built on top of absolutely strong supply chain audit, quality and security. there is even """levels""" of determinism tooling to decide how much to pay for cloud compute.

vlovich123•4mo ago

Ok. Please write me an implementation of RAFT using no tests and have the Rust type checker prove correctness. I admit complete ignorance into how to get Rust to even go about partially proving that.

dlahoda•4mo ago

you cannot prove RAFT correctness using Rust type checker.

wdym by "no tests"?

seems you have not read my response at all...

> sure we need write tests, and tests like antithesis helps with.

also, imho both lamports (bft and paxos) and are better of raft https://www.hytradboi.com/2025/2016d6c4-b08d-40b3-af2f-67926...

vlovich123•4mo ago

Regardless of the algorithm, antithesis specifically can make sure your implementation is correct. You started this thread by claiming that that’s needed less in Rust because the type system helps you avoid bugs. I’m highlighting classes of problems that Antithesis is targeting to help you test and Rust’s type system cannot help you with. Raft vs Paxos doesn’t matter

dlahoda•4mo ago

1. many things antithesis finds in go, kotling, cpp just do not exist in rust. that i less needed case. these people like antithesis.

2. i mentioned tooling, also in other thread i mentioned ecosystem

- https://github.com/BurtonQin/Awesome-Rust-Checker

- add link for dozen of tools for cfg feature(product lines) eng here (compile time conditional composeability) and const expressions

- add link to dozen of tools to audit deps and decide on panic safety and first class crash hooks crates

- mix with nix which many (way many compared to go, kotlin, cpp) rust projects use (and sri cargo.lock out box)

- and qemu/mozilla/no rr tools (no determinism to get seed, and run slower hosts things w that seed)

- and add here clusterfuzz eco (supports all oss fuzzers, and these do proper search https://camshaft.github.io/bolero/features/unified-interface..., and rust adapters allow for manual macro based fuzz narrowing in rust)

- https://quint-lang.org/docs/model-checkers

- several ongoing projects formalizing rust std and lang

SO, why i need antithesis?

To get Raft, will do sans-io with full typed rust covered by proofs for state(machine) and macro generator for glue(from state machine).

I need only prop/fuzz test to test small integration sans-io part. I do not need antithesis.

vlovich123•4mo ago

Literally none of the things you mention help with verifying distributed systems. I think you just fundamentally aren't understanding what it is and this isn't the right forum for me to bridge that gap for you. You highlight this lack of understanding by dismissing a custom hypervisor as a collection of bash scripts and focusing on things Rust has or does well that are completely irrelevant for the classes of bugs that Antithesis is trying to uncover.

> To get Raft, will do sans-io with full typed rust covered by proofs for state(machine) and macro generator for glue(from state machine). I need only prop/fuzz test to test small integration sans-io part. I do not need antithesis.

Again, I think you're missing the point entirely. With hand written property test/fuzz test, you have to manually and correctly implement all the failure conditions that can happen "sans-io". A notable one would be packet reordering (if you're I/O doesn't guarantee ordering) or network partitions. The point of Antithesis is that it will transparently do all of that state checking for you. It's not that you'd not get there eventually, but Antithesis will be an easier system and you'll move faster. Whether or not that's valuable is a decision everyone makes for themselves.

dlahoda•4mo ago

check this https://news.ycombinator.com/item?id=45264789 , antitesis feels like set of bash scripts thb

lmm•4mo ago

Encode the relevant parts of the specification into the type system. E.g. if the specification says that you must only call A after you've called B and C, make B and C return some unique token type (with zero size at runtime) and make A require a parameter of that token type.

jatins•4mo ago

Is there a demo of what Antithesis does? I have seen it on HN a few times and I like the idea of monkey typing a system. But how does it work in practice? Does it call my APIs, does it introduce memory corruptions, does it bring down my containers...what does it do?

vlovich123•4mo ago

It arbitrarily reorders events across the entire “universe” and injects reasonable kinds of faults (eg dropping or reordering packets). It does so by running all events for all threads across all machines in a deterministic “random” order by serializing on a single thread and the randomness is initialized by the seed for that run. It also runs the universe in faster than real time since there’s no actual network delay or time elapsing (that too is simulated).

You generate the workload by defining your test case the same as property tests or traditional example tests. You cannot call arbitrary network services.

typpilol•4mo ago

Is it like Stryker basically? Mutation testing?

Or it like a super set of mutation testing?

vlovich123•4mo ago

Mutation testing is poorly named in some sense. Mutation testing is probabilistic measurement of code coverage. It doesn’t test anything, instead it tells you how good your existing test suite provides coverage of non-existent bugs - it changes your code to generate a “mutant” (eg changes an addition to a subtraction) and sees if your test suite still passes - if it does that counts as a failure. Traditional alternative approaches are things like codecov that measure line coverage or branch coverage which famously don’t actually give you an actually accurate estimate of quality whereas mutation testing does a bit better job. However, none of this actually tests anything and is more a meta-metric of how good your test coverage is.

Antithesis is more like property testing - it tests your code under random scenarios and sees if the tests still pass. Unlike property testing, rather than just random inputs, it also randomly reorders events at a very deep level to make sure your distributed system behaves correctly. It can even be used for simple things like helping you deterministically reproduce a flaky test in a non-distributed system.

Advanced Inertial Reference Sphere

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

Show HN: A longitudinal health record built from fragmented medical data

CoreWeave's $30B Bet on GPU Market Infrastructure

Creating and Hosting a Static Website on Cloudflare for Free

"The Stanford scam proves America is becoming a nation of grifters"

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Effective Nihilism

The UK government didn't want you to see this report on ecosystem collapse

No 10 blocks report on impact of rainforest collapse on food prices

Seedance 2.0 Is Coming

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Dexterous robotic hands: 2009 – 2014 – 2025

Interop 2025: A Year of Convergence

JobArena – Human Intuition vs. Artificial Intelligence

Concept Artists Say Generative AI References Only Make Their Jobs Harder

Show HN: PaySentry – Open-source control plane for AI agent payments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

Advanced Inertial Reference Sphere

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

Show HN: A longitudinal health record built from fragmented medical data

CoreWeave's $30B Bet on GPU Market Infrastructure

Creating and Hosting a Static Website on Cloudflare for Free

"The Stanford scam proves America is becoming a nation of grifters"

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Effective Nihilism

The UK government didn't want you to see this report on ecosystem collapse

No 10 blocks report on impact of rainforest collapse on food prices

Seedance 2.0 Is Coming

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Dexterous robotic hands: 2009 – 2014 – 2025

Interop 2025: A Year of Convergence

JobArena – Human Intuition vs. Artificial Intelligence

Concept Artists Say Generative AI References Only Make Their Jobs Harder

Show HN: PaySentry – Open-source control plane for AI agent payments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

From unit tests to whole universe tests (with will wilson of antithesis) [video]

Comments