- We launched an AI bug scanner 6 months ago. We wanted to know what the bugs that get fixed have in common.
- We clustered 1,000 bugs from 99 codebases and grouped them by failure mechanism.
- 21 recurring mechanisms cover 70% of the bugs. These same mistakes show up again and again in completely unrelated products. We seem to be writing similar bugs over and over again with agents.
- A common denominator is silence. Bugs that make it to production today are not easily ‘legible’. They don’t crash, they pass CI, they look fine. You don’t see them unless you look for them.
If you’d like to know anything else about these bugs, let us know. Happy to analyze the dataset further.
saidnooneever•52m ago
interesting data and results. it kind of looks like they are good at coding plainly no buffer overflows or such things, but as noted application logic is the trick. Things that are difficult for humans (authentication, paralelism) seems also tricky for them.
Id wonder if they are badly putting together the logic from good instructions or that the prompting was missing pieces and it followed correctly but provided some broken code due to missing requirements / details.
pyryt•1h ago
- We launched an AI bug scanner 6 months ago. We wanted to know what the bugs that get fixed have in common.
- We clustered 1,000 bugs from 99 codebases and grouped them by failure mechanism.
- 21 recurring mechanisms cover 70% of the bugs. These same mistakes show up again and again in completely unrelated products. We seem to be writing similar bugs over and over again with agents.
- A common denominator is silence. Bugs that make it to production today are not easily ‘legible’. They don’t crash, they pass CI, they look fine. You don’t see them unless you look for them.
If you’d like to know anything else about these bugs, let us know. Happy to analyze the dataset further.
saidnooneever•52m ago
Id wonder if they are badly putting together the logic from good instructions or that the prompting was missing pieces and it followed correctly but provided some broken code due to missing requirements / details.