I work on vision systems for structural inspection. A common pain point is usually that while we have a lot of "healthy" images, we often lack a reliable "Golden Set" of rare failures (like shattered porcelain) to validate our models before deployment.
You can't trust your model's recall if your test set only has 5 examples of the failure mode for example.
So to fix this, I built a pipeline to generate datasets. In this example, I took 7 real-world defect samples, extracted their topology/texture, and procedurally generated 200 hard-to-detect variations across different lighting and backgrounds.
I’m releasing this batch of broken insulators (CC0) specifically to help teams benchmark their model's recall on rare classes:
https://www.silera.ai/blog/free-200-broken-insulators-datase...
- Input: 7 real samples.
- Output: 200 fully labeled evaluation images (COCO/YOLO).
- Use Case: Validation / Test Set (not full training).
How do you guys currently validate recall for "1 in 10,000" edge cases?
Jérôme