fp.

Author here. I built this after shipping a multi-model voting ensemble that, when audited in production, was returning the same class on 90%+ of decisions on inputs where the underlying voters were balanced. The voters were individually correct. The aggregator was mathematically correct. The bias only lived in the joint distribution of the voting function — exactly the place unit tests don't reach but property-based tests do.

The library is six property checks that nearly every sensible aggregator should satisfy, packaged so you can audit any voting function in two lines. Pure stdlib, zero runtime deps.

Two things I learned while building it that surprised me:

  1. A naive Counter.most_common()[0][0] inherits its tie-break from
     dict insertion order, which silently leaks the order voters
     arrived in — caught by the permutation-invariance test.

  2. With three or more classes, there is no fully unbiased
     deterministic tie-break. Every choice (alphabetical, hash-based,
     fall-back-to-neutral) introduces a measurable asymmetry. The
     library catches all of them in turn.

The examples directory has a side-by-side audit of three "innocent looking" 3-class aggregators with three different failure modes — that's the part I found most pedagogically interesting.

Feedback and adversarial cases welcome.

https://github.com/fuentesamurai/ensemble-bias-detector