> However, this approach is not ready for production. Even the best model, Claude Opus 4.6, found relatively obvious backdoors in small/mid-size binaries only 49% of the time. Worse yet, most models had a high false positive rate — flagging clean binaries.
stared•1h ago
All tasks are open-source & we welcome contributions: https://github.com/QuesmaOrg/BinaryAudit
Discussion on X: https://x.com/pmigdal/status/2021244382800760873