This website showcases our agent, based on Claude, that autonomously infers properties of code and tests them using Hypothesis. Our agent found hundreds of bugs across popular Python libraries, some of which we have since reported and patched! On this website, you can browse all of the bugs it found. You can read the linked paper and code for more information.
mmaaz•1h ago