I work on vision systems for structural inspection. A common pain point is usually that while we have a lot of "healthy" images, we often lack a reliable "Golden Set" of rare failures (like shattered porcelain) to validate our models before deployment.
You can't trust your model's recall if your test set only has 5 examples of the failure mode for example.
So to fix this, I built a pipeline to generate datasets. In this example, I took 7 real-world defect samples, extracted their topology/texture, and procedurally generated 200 hard-to-detect variations across different lighting and backgrounds.
I’m releasing this batch of broken insulators (CC0) specifically to help teams benchmark their model's recall on rare classes:
https://www.silera.ai/blog/free-200-broken-insulators-datase...
- Input: 7 real samples.
- Output: 200 fully labeled evaluation images (COCO/YOLO).
- Use Case: Validation / Test Set (not full training).
How do you guys currently validate recall for "1 in 10,000" edge cases?
Jérôme
embedding-shape•1h ago
If you're releasing this CC0, couldn't you just offer a download link instead of registering and having to purchase credits for the download? Otherwise you'll just be encouraging others to rehost the content, and then you won't even be able to tell how many downloads it from the server logs.
Ps, your "Get Dataset" button breaks once you've clicked on it and then go back from the signup page, no longer possible to click on it anymore after that.
jmalevez•40m ago