Cool launch - let me try this with our in house setup!
calderon_1903•56m ago
interesting stuff
riyajoshi•53m ago
Thank you. Would really appreciate feedback on methodology or the evaluation framework!
naveenprasanthv•50m ago
Kind of wild that we're finally getting benchmarks for AI-generated API testing. Feels like the equivalent of SWE-bench, but for finding actual bugs instead of writing code.
saikia_•58m ago