frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I generated a "stress test" of 200 rare defects from 7 real photos

5•jmalevez•3d ago
Hello HN,

I work on vision systems for structural inspection. A common pain point is usually that while we have a lot of "healthy" images, we often lack a reliable "Golden Set" of rare failures (like shattered porcelain) to validate our models before deployment.

You can't trust your model's recall if your test set only has 5 examples of the failure mode for example.

So to fix this, I built a pipeline to generate datasets. In this example, I took 7 real-world defect samples, extracted their topology/texture, and procedurally generated 200 hard-to-detect variations across different lighting and backgrounds.

I’m releasing this batch of broken insulators (CC0) specifically to help teams benchmark their model's recall on rare classes:

https://www.silera.ai/blog/free-200-broken-insulators-datase...

- Input: 7 real samples.

- Output: 200 fully labeled evaluation images (COCO/YOLO).

- Use Case: Validation / Test Set (not full training).

How do you guys currently validate recall for "1 in 10,000" edge cases?

Jérôme

Comments

embedding-shape•1h ago
> I’m releasing this batch of broken insulators (CC0) specifically to help teams benchmark their model's recall on rare classes:

If you're releasing this CC0, couldn't you just offer a download link instead of registering and having to purchase credits for the download? Otherwise you'll just be encouraging others to rehost the content, and then you won't even be able to tell how many downloads it from the server logs.

Ps, your "Get Dataset" button breaks once you've clicked on it and then go back from the signup page, no longer possible to click on it anymore after that.

jmalevez•40m ago
Thanks for the heads up on the broken button. I added the direct HF link: https://huggingface.co/datasets/silera/broken-insulators-syn...
yellow_lead•1h ago
It's not free if you have to trade your info for them. It's not like I have a business case for photos of broken insulators, just trying to check what you made.
jmalevez•58m ago
My bad guys, I didn't mean to make it feel like email trap.

Here is the direct Huggingface link: https://huggingface.co/datasets/silera/broken-insulators-syn...

Ministry of Justice orders deletion of the UK's largest court reporting database

https://www.legalcheek.com/2026/02/ministry-of-justice-orders-deletion-of-the-uks-largest-court-r...
125•harel•1h ago•81 comments

Running My Own XMPP Server

https://blog.dmcc.io/journal/xmpp-turn-stun-coturn-prosody/
37•speckx•1h ago•18 comments

Ghidra by NSA

https://github.com/NationalSecurityAgency/ghidra
52•handfuloflight•2d ago•15 comments

MessageFormat: Unicode standard for localizable message strings

https://github.com/unicode-org/message-format-wg
101•todsacerdoti•4h ago•43 comments

Qwen3.5: Towards Native Multimodal Agents

https://qwen.ai/blog?id=qwen3.5
149•danielhanchen•5h ago•64 comments

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

https://mastodon.world/@knowmadd/116072773118828295
925•novemp•8h ago•595 comments

I’m joining OpenAI

https://steipete.me/posts/2026/openclaw
1233•mfiguiere•17h ago•914 comments

What Your Bluetooth Devices Reveal About You

https://blog.dmcc.io/journal/2026-bluetooth-privacy-bluehood/
5•ssgodderidge•29m ago•0 comments

Rolling your own serverless OCR in 40 lines of code

https://christopherkrapu.com/blog/2026/ocr-textbooks-modal-deepseek/
52•mpcsb•4d ago•25 comments

Anthropic tries to hide Claude's AI actions. Devs hate it

https://www.theregister.com/2026/02/16/anthropic_claude_ai_edits/
140•beardyw•4h ago•85 comments

iOS 27 'Rave' Update to Clean Up Code, Could Boost Battery Life

https://www.macrumors.com/2026/02/16/apple-plans-snow-leopard-cleanup-ios-27/
13•tosh•18m ago•6 comments

Modern CSS Code Snippets: Stop writing CSS like it's 2015

https://modern-css.com
575•eustoria•21h ago•227 comments

Vim-pencil: Rethinking Vim as a tool for writing

https://github.com/preservim/vim-pencil
64•gurjeet•3d ago•26 comments

Magnus Carlsen Wins the Freestyle (Chess960) World Championship

https://www.fide.com/magnus-carlsen-wins-2026-fide-freestyle-world-championship/
325•prophylaxis•16h ago•220 comments

Expensively Quadratic: The LLM Agent Cost Curve

https://blog.exe.dev/expensively-quadratic
76•luu•3d ago•42 comments

1,300-year-old world chronicle unearthed in Sinai

https://www.heritagedaily.com/2026/02/1300-year-old-world-chronicle-unearthed-in-sinai/156948
87•telotortium•4d ago•10 comments

Audio is the one area small labs are winning

https://www.amplifypartners.com/blog-posts/arming-the-rebels-with-gpus-gradium-kyutai-and-audio-ai
254•rocauc•3d ago•73 comments

LT6502: A 6502-based homebrew laptop

https://github.com/TechPaula/LT6502
381•classichasclass•21h ago•186 comments

Thanks a lot, AI: Hard drives are sold out for the year, says WD

https://mashable.com/article/ai-hard-drive-hdd-shortages-western-digital-sold-out
170•dClauzel•2h ago•133 comments

Arm wants a bigger slice of the chip business

https://www.economist.com/business/2026/02/12/arm-wants-a-bigger-slice-of-the-chip-business
116•andsoitis•12h ago•78 comments

Show HN: Microgpt is a GPT you can visualize in the browser

https://microgpt.boratto.ca
239•b44•20h ago•23 comments

picol: A Tcl interpreter in 500 lines of code

https://github.com/antirez/picol
77•tosh•7h ago•41 comments

I gave Claude access to my pen plotter

https://harmonique.one/posts/i-gave-claude-access-to-my-pen-plotter
236•futurecat•2d ago•157 comments

Hard problems in social media archiving

https://alexwlchan.net/2025/hard-problems-in-social-media-archiving/
24•surprisetalk•3d ago•4 comments

Building SQLite with a small swarm

https://kiankyars.github.io/machine_learning/2026/02/12/sqlite.html
86•kyars•9h ago•70 comments

Lost Soviet Moon Lander May Have Been Found

https://www.nytimes.com/2026/02/10/science/luna-9-moon-lander-soviet.html
87•Brajeshwar•5d ago•56 comments

JavaScript-heavy approaches are not compatible with long-term performance goals

https://sgom.es/posts/2026-02-13-js-heavy-approaches-are-not-compatible-with-long-term-performanc...
125•luu•14h ago•145 comments

EU bans the destruction of unsold apparel, clothing, accessories and footwear

https://environment.ec.europa.eu/news/new-eu-rules-stop-destruction-unsold-clothes-and-shoes-2026...
1123•giuliomagnifico•21h ago•745 comments

Gwtar: A static efficient single-file HTML format

https://gwern.net/gwtar
269•theblazehen•23h ago•79 comments

Real-time PathTracing with global illumination in WebGL

https://erichlof.github.io/THREE.js-PathTracing-Renderer/
188•tobr•3d ago•15 comments