Show HN: I generated a "stress test" of 200 rare defects from 7 real photos

5•jmalevez•3d ago

Hello HN,

I work on vision systems for structural inspection. A common pain point is usually that while we have a lot of "healthy" images, we often lack a reliable "Golden Set" of rare failures (like shattered porcelain) to validate our models before deployment.

You can't trust your model's recall if your test set only has 5 examples of the failure mode for example.

So to fix this, I built a pipeline to generate datasets. In this example, I took 7 real-world defect samples, extracted their topology/texture, and procedurally generated 200 hard-to-detect variations across different lighting and backgrounds.

I’m releasing this batch of broken insulators (CC0) specifically to help teams benchmark their model's recall on rare classes:

https://www.silera.ai/blog/free-200-broken-insulators-datase...

- Input: 7 real samples.

- Output: 200 fully labeled evaluation images (COCO/YOLO).

- Use Case: Validation / Test Set (not full training).

How do you guys currently validate recall for "1 in 10,000" edge cases?

Jérôme

Comments

embedding-shape•1h ago

> I’m releasing this batch of broken insulators (CC0) specifically to help teams benchmark their model's recall on rare classes:

If you're releasing this CC0, couldn't you just offer a download link instead of registering and having to purchase credits for the download? Otherwise you'll just be encouraging others to rehost the content, and then you won't even be able to tell how many downloads it from the server logs.

Ps, your "Get Dataset" button breaks once you've clicked on it and then go back from the signup page, no longer possible to click on it anymore after that.

jmalevez•40m ago

Thanks for the heads up on the broken button. I added the direct HF link: https://huggingface.co/datasets/silera/broken-insulators-syn...

yellow_lead•1h ago

It's not free if you have to trade your info for them. It's not like I have a business case for photos of broken insulators, just trying to check what you made.

jmalevez•58m ago

My bad guys, I didn't mean to make it feel like email trap.

Here is the direct Huggingface link: https://huggingface.co/datasets/silera/broken-insulators-syn...

Ministry of Justice orders deletion of the UK's largest court reporting database

Running My Own XMPP Server

Ghidra by NSA

MessageFormat: Unicode standard for localizable message strings

Qwen3.5: Towards Native Multimodal Agents

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

I’m joining OpenAI

What Your Bluetooth Devices Reveal About You

Rolling your own serverless OCR in 40 lines of code

Anthropic tries to hide Claude's AI actions. Devs hate it

iOS 27 'Rave' Update to Clean Up Code, Could Boost Battery Life

Modern CSS Code Snippets: Stop writing CSS like it's 2015

Vim-pencil: Rethinking Vim as a tool for writing

Magnus Carlsen Wins the Freestyle (Chess960) World Championship

Expensively Quadratic: The LLM Agent Cost Curve

1,300-year-old world chronicle unearthed in Sinai

Audio is the one area small labs are winning

LT6502: A 6502-based homebrew laptop

Thanks a lot, AI: Hard drives are sold out for the year, says WD

Arm wants a bigger slice of the chip business

Show HN: Microgpt is a GPT you can visualize in the browser

picol: A Tcl interpreter in 500 lines of code

I gave Claude access to my pen plotter

Hard problems in social media archiving

Building SQLite with a small swarm

Lost Soviet Moon Lander May Have Been Found

JavaScript-heavy approaches are not compatible with long-term performance goals

EU bans the destruction of unsold apparel, clothing, accessories and footwear

Gwtar: A static efficient single-file HTML format

Real-time PathTracing with global illumination in WebGL