Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

https://github.com/hauntsaninja/git_bayesect

85•hauntsaninja•4d ago

Comments

hauntsaninja•4d ago

git bisect works great for tracking down regressions, but relies on the bug presenting deterministically. But what if the bug is non-deterministic? Or worse, your behaviour was always non-deterministic, but something has changed, e.g. your tests went from somewhat flaky to very flaky.

In addition to the repo linked in the title, I also wrote up a little bit of the math behind it here: https://hauntsaninja.github.io/git_bayesect.html

Myrmornis•1h ago

This is really cool! Is there an alternative way of thinking about it involving a hidden markov model, looking for a change in value of an unknown latent P(fail)? Or does your approach end up being similar to whatever the appropriate Bayesian approach to the HMM would be?

supermdguy•3d ago

Okay this is really fun and mathematically satisfying. Could even be useful for tough bugs that are technically deterministic, but you might not have precise reproduction steps.

Does it support running a test multiple times to get a probability for a single commit instead of just pass/fail? I guess you’d also need to take into account the number of trials to update the Beta properly.

hauntsaninja•3d ago

Yay, I had fun with it too!

IIUC the way you'd do that right now is just repeatedly recording the individual observations on a single commit, which effectively gives it a probability + the number of trials to do the Beta update. I don't yet have a CLI entrypoint to record a batch observation of (probability, num_trials), but it would be easy to add one

But ofc part of the magic is that git_bayesect's commit selection tells you how to be maximally sample efficient, so you'd only want to do a batch record if your test has high constant overhead

__s•19m ago

recompiling can be high constant overhead

Retr0id•1h ago

Super cool!

A related situation I was in recently was where I was trying to bisect a perf regression, but the benchmarks themselves were quite noisy, making it hard to tell whether I was looking at a "good" vs "bad" commit without repeated trials (in practice I just did repeats).

I could pick a threshold and use bayesect as described, but that involves throwing away information. How hard would it be to generalize this to let me plug in a raw benchmark score at each step?

davidkunz•1h ago

Useful for tests with LLM interactions.

SugarReflex•2m ago

I hope this comment is not out of place, but I am wondering what the application for all this is? How can this help us or what does it teach us or help us prove? I am asking out of genuine curiosity as I barely understand it but I believe it has something to do with probability.

Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

Show HN: Flight-Viz – 10K flights on a 3D globe in 3.5MB of Rust+WASM

Show HN: Zerobox – Sandbox any command with file, network, credential controls

Show HN: Real-time dashboard for Claude Code agent teams

Show HN: A typing trainer that uses real code snippets

Show HN: CLI to order groceries via reverse-engineered REWE API (Haskell)

Show HN: Mycellm – BitTorrent for LLMs, pool GPUs into federated networks

Show HN: Local RAG on 25 Years of Teletext News

Show HN: Roadie – An open-source KVM that lets AI control your phone

Show HN: Sycamore – next gen Rust web UI library using fine-grained reactivity

Show HN: Modern AI assisted goals and performance management

Show HN: Canon PIXMA G3010 macOS driver, reverse-engineered with Claude

Show HN: Hire Gnome – a lightweight ATS for small recruiting agencies

Show HN: Postgres extension for BM25 relevance-ranked full-text search

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

Show HN: Forkrun – NUMA-aware shell parallelizer (50×–400× faster than parallel)

Show HN: Metal Quantized Attention on M5 Max

Show HN: Rustobol – Compile Rust to COBOL

Show HN: Claude Code rewritten as a bash script

Show HN: Max Headbox, a local agent that fits on a Raspberry Pi 5

Show HN: Loreline, narrative language transpiled via Haxe: C++/C#/JS/Java/Py/Lua

Show HN: Baton – A desktop app for developing with AI agents

Show HN: Sundial – a new way to look at a weather forecast

Show HN: You Got Snarked: A snarky messaging app

Show HN: OpenHarness Open-source terminal coding agent for any LLM

Show HN: Coasts – Containerized Hosts for Agents

Show HN: Draw a Picture for My Cat

Show HN: Aphelo – A Redis-like store in C++ with Progressive Rehashing

Show HN: I turned a sketch into a 3D-print pegboard for my kid with an AI agent

Show HN: I made a Mario Galaxy game with Claude Code and Three.js in 53 days

Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

Comments

Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

Show HN: Flight-Viz – 10K flights on a 3D globe in 3.5MB of Rust+WASM

Show HN: Zerobox – Sandbox any command with file, network, credential controls

Show HN: Real-time dashboard for Claude Code agent teams

Show HN: A typing trainer that uses real code snippets

Show HN: CLI to order groceries via reverse-engineered REWE API (Haskell)

Show HN: Mycellm – BitTorrent for LLMs, pool GPUs into federated networks

Show HN: Local RAG on 25 Years of Teletext News

Show HN: Roadie – An open-source KVM that lets AI control your phone

Show HN: Sycamore – next gen Rust web UI library using fine-grained reactivity

Show HN: Modern AI assisted goals and performance management

Show HN: Canon PIXMA G3010 macOS driver, reverse-engineered with Claude

Show HN: Hire Gnome – a lightweight ATS for small recruiting agencies

Show HN: Postgres extension for BM25 relevance-ranked full-text search

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

Show HN: Forkrun – NUMA-aware shell parallelizer (50×–400× faster than parallel)

Show HN: Metal Quantized Attention on M5 Max

Show HN: Rustobol – Compile Rust to COBOL

Show HN: Claude Code rewritten as a bash script

Show HN: Max Headbox, a local agent that fits on a Raspberry Pi 5

Show HN: Loreline, narrative language transpiled via Haxe: C++/C#/JS/Java/Py/Lua

Show HN: Baton – A desktop app for developing with AI agents

Show HN: Sundial – a new way to look at a weather forecast

Show HN: You Got Snarked: A snarky messaging app

Show HN: OpenHarness Open-source terminal coding agent for any LLM

Show HN: Coasts – Containerized Hosts for Agents

Show HN: Draw a Picture for My Cat

Show HN: Aphelo – A Redis-like store in C++ with Progressive Rehashing

Show HN: I turned a sketch into a 3D-print pegboard for my kid with an AI agent

Show HN: I made a Mario Galaxy game with Claude Code and Three.js in 53 days