Author here. I built this after shipping a multi-model voting
ensemble that, when audited in production, was returning the same
class on 90%+ of decisions on inputs where the underlying voters
were balanced. The voters were individually correct. The aggregator
was mathematically correct. The bias only lived in the joint
distribution of the voting function — exactly the place unit tests
don't reach but property-based tests do.
The library is six property checks that nearly every sensible
aggregator should satisfy, packaged so you can audit any voting
function in two lines. Pure stdlib, zero runtime deps.
Two things I learned while building it that surprised me:
1. A naive Counter.most_common()[0][0] inherits its tie-break from
dict insertion order, which silently leaks the order voters
arrived in — caught by the permutation-invariance test.
2. With three or more classes, there is no fully unbiased
deterministic tie-break. Every choice (alphabetical, hash-based,
fall-back-to-neutral) introduces a measurable asymmetry. The
library catches all of them in turn.
The examples directory has a side-by-side audit of three "innocent
looking" 3-class aggregators with three different failure modes —
that's the part I found most pedagogically interesting.
killersamurai•39m ago
The library is six property checks that nearly every sensible aggregator should satisfy, packaged so you can audit any voting function in two lines. Pure stdlib, zero runtime deps.
Two things I learned while building it that surprised me:
The examples directory has a side-by-side audit of three "innocent looking" 3-class aggregators with three different failure modes — that's the part I found most pedagogically interesting.Feedback and adversarial cases welcome.
https://github.com/fuentesamurai/ensemble-bias-detector