Nice to see a benchmark in this space especially with black-box constraints.
saikia_•2d ago
curious.. let me see if this works for our internal setup
akshay_93•2d ago
like that the scoring bias is toward bug detection & not test generation only. generating lots of tests with AI is easy but that doesn't necessarily mean they're good
riyajoshi•2d ago