Hey HN! I'm Rodrigo, I run distributed systems across a few countries. I built Openfuse because of something that kept bugging me about how we all do circuit breakers.
If you're running 20 instances of a service and Stripe starts returning 500s, each instance discovers that independently. Instance 1 trips its breaker after 5 failures. Instance 14 just got recycled and hasn't seen any yet. Instance 7 is in half-open, probing a service you already know is dead. For some window of time, part of your fleet is protecting itself and part of it is still hammering a dead dependency and timing out, and all you can do is watch.
Libraries can't fix this. Opossum, Resilience4j, Polly are great at the pattern, but they make per-instance decisions with per-instance state. Your circuit breakers don't talk to each other.
Openfuse is a centralized control plane. It aggregates failure metrics from every instance in your fleet and makes the trip decision based on the full picture. When the breaker opens, every instance knows at the same time.
It's a few lines of code:
const result = await openfuse.breaker('stripe').protect(
() => chargeCustomer(payload)
);
The SDK is open source, anyone can see exactly what runs inside their services.The other thing I couldn't let go of: when you get paged at 3am, you shouldn't have to find logs across 15 services to figure out what's broken. Openfuse gives you one dashboard showing every breaker state across your fleet: what's healthy, what's degraded, what tripped and when. And, you shouldn't need a deploy to act. You can open a breaker from the dashboard and every instance stops calling that dependency immediately. Planned maintenance window at 3am? Open beforehand. Fix confirmed? Close it instantly. Thresholds need adjusting? Change them in the dashboard, takes effect across your fleet in seconds. No PRs, no CI, no config files.
It has a decent free tier for trying it out, then $99/mo for most teams, $399/mo with higher throughput and some enterprise features. Solo founder, early stage, being upfront.
Would love to hear from people who've fought cascading failures in production. What am I missing?