frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Adventures in Imbalanced Learning and Class Weight

http://andersource.dev/2025/05/05/imbalanced-learning.html
49•andersource•9mo ago

Comments

ipunchghosts•9mo ago
I read the article and the take away is that class weights and stratified sampling did not help for the OPs problem.
lamename•9mo ago
Nice writeup. F1, balanced accuracy, etc. In truth it depends on your problem and what a practical "best" solution is, especially in imbalanced scenarios, but Matthews Correlation Coefficient (MCC) is probably the best comprehensive and balanced blind go-to metric, because it guarantees that more portions of the confusion matrix are good [0,1].

I made a quick interactive, graphical exploration to demonstrate this in python [2].

[0]: https://biodatamining.biomedcentral.com/articles/10.1186/s13...

[1]: https://biodatamining.biomedcentral.com/articles/10.1186/s13...

[2]: https://www.glidergrid.xyz/post-archive/understanding-the-ro...

klysm•9mo ago
MCC also generalizes to multi-class well. I wish it had a better name though. It seems like F1 score has better marketing
andersource•9mo ago
Really neat visualization! And thanks for the tip on MCC.

Out of curiosity I plugged it to the same visualization (performance vs. class weight when optimized with BCE) and it behaves similar to F1, i.e. best without weighting.

gitroom•9mo ago
Insane how tricky imbalanced stuff gets- I always end up second guessing my metrics tbh. You think there's ever such thing as a "right" number for real world junk or is it just endless tradeoffs?
andersource•9mo ago
Yeah it gets tricky. I think eventually it has to be about tradeoffs - no ML system can be 100% correct. I do think there's a "right" decision (up to a point) in the context of the product or business.
zai_nabasif1234•9mo ago
[5/11, 1:41 PM] Meta AI: Imbalanced learning refers to the challenge of training machine learning models on datasets where the classes are not represented equally. This can lead to biased models that perform well on the majority class but poorly on the minority class.

Class Weight Class weight is a technique used to address class imbalance by assigning different weights to each class during training. The idea is to give more weight to the minority class and less weight to the majority class.

Key Considerations 1. Class weight calculation: Weights can be calculated based on class frequency or other heuristics. 2. Hyperparameter tuning: Class weights can be tuned as hyperparameters during model training. 3. Evaluation metrics: Metrics like F1-score, precision, and recall are often used to evaluate model performance on imbalanced datasets.

Techniques 1. Oversampling: Oversampling the minority class to balance the dataset. 2. Undersampling: Undersampling the majority class to balance the dataset. 3. SMOTE: Synthetic Minority Over-sampling Technique (SMOTE) generates synthetic samples of the minority class.

Applications 1. Fraud detection: Imbalanced learning is crucial in fraud detection, where the minority class (fraudulent transactions) is often much smaller than the majority class (legitimate transactions). 2. [5/11, 1:41 PM] Meta AI: Imbalanced learning refers to the challenge of training machine learning models on datasets where the classes are not represented equally. This can lead to biased models that perform well on the majority class but poorly on the minority class.

Class Weight Class weight is a technique used to address class imbalance by assigning different weights to each class during training. The idea is to give more weight to the minority class and less weight to the majority class.

Key Considerations 1. Class weight calculation: Weights can be calculated based on class frequency or other heuristics. 2. Hyperparameter tuning: Class weights can be tuned as hyperparameters during model training. 3. Evaluation metrics: Metrics like F1-score, precision, and recall are often used to evaluate model performance on imbalanced datasets.

Techniques 1. Oversampling: Oversampling the minority class to balance the dataset. 2. Undersampling: Undersampling the majority class to balance the dataset. 3. SMOTE: Synthetic Minority Over-sampling Technique (SMOTE) generates synthetic samples of the minority class.

Applications 1. Fraud detection: Imbalanced learning is crucial in fraud detection, where the minority class (fraudulent transactions) is often much smaller than the majority class (legitimate transactions). 2. Medical diagnosis: Imbalanced learning can be applied to medical diagnosis, where the minority class (diseased patients) may be much smaller than the majority class (healthy patients).

Would you like to know more about imbalanced learning or class weight?

bbstats•9mo ago
The only thing that matters is your estimation of how the balance will change out of distribution or with future data etc

Show HN: COGext – A minimalist, open-source system monitor for Chrome (<550KB)

https://github.com/tchoa91/cog-ext
1•tchoa91•23s ago•0 comments

FOSDEM 26 – My Hallway Track Takeaways

https://sluongng.substack.com/p/fosdem-26-my-hallway-track-takeaways
1•birdculture•1m ago•0 comments

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•4m ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•4m ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•5m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•5m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•7m ago•0 comments

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•8m ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•9m ago•0 comments

Fantasy football that celebrates great games

https://www.silvestar.codes/articles/ultigamemate/
1•blenderob•9m ago•0 comments

Show HN: Animalese

https://animalese.barcoloudly.com/
1•noreplica•9m ago•0 comments

StrongDM's AI team build serious software without even looking at the code

https://simonwillison.net/2026/Feb/7/software-factory/
1•simonw•10m ago•0 comments

John Haugeland on the failure of micro-worlds

https://blog.plover.com/tech/gpt/micro-worlds.html
1•blenderob•10m ago•0 comments

Show HN: Velocity - Free/Cheaper Linear Clone but with MCP for agents

https://velocity.quest
2•kevinelliott•11m ago•2 comments

Corning Invented a New Fiber-Optic Cable for AI and Landed a $6B Meta Deal [video]

https://www.youtube.com/watch?v=Y3KLbc5DlRs
1•ksec•12m ago•0 comments

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

https://xapis.dev
2•nmfccodes•13m ago•1 comments

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
2•eatitraw•19m ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
2•anipaleja•19m ago•0 comments

The Super Sharp Blade

https://netzhansa.com/the-super-sharp-blade/
1•robin_reala•20m ago•0 comments

Smart Homes Are Terrible

https://www.theatlantic.com/ideas/2026/02/smart-homes-technology/685867/
1•tusslewake•22m ago•0 comments

What I haven't figured out

https://macwright.com/2026/01/29/what-i-havent-figured-out
1•stevekrouse•23m ago•0 comments

KPMG pressed its auditor to pass on AI cost savings

https://www.irishtimes.com/business/2026/02/06/kpmg-pressed-its-auditor-to-pass-on-ai-cost-savings/
1•cainxinth•23m ago•0 comments

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

https://twitter.com/b1rdmania/status/2020155122181869666
3•birdmania•23m ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
7•samasblack•25m ago•2 comments

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•26m ago•0 comments

Kagi Translate

https://translate.kagi.com
2•microflash•27m ago•0 comments

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

https://fosdem.org/2026/schedule/event/QX3RPH-building_interactive_cc_workflows_in_jupyter_throug...
1•stabbles•28m ago•0 comments

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
2•facundo_olano•30m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•30m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•30m ago•1 comments