news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Loss Distribution Collapse: A Structural Theory of Dataset Degradation

https://zenodo.org/records/18498820

1•GOE_OVSYANKA•1h ago

Comments

GOE_OVSYANKA•1h ago

This paper presents a structural theory of dataset and model degradation under recursive training on synthetic data. Unlike prior work that attributes model collapse to entropy loss, noise accumulation, or data provenance, the paper identifies the loss distribution as the central object governing degradation. The core claim is that recursive self-training acts as a sharpening operator on the data distribution: low-loss (high-probability) samples become increasingly dominant, while rare and difficult cases—the tail of the distribution—systematically vanish. This process is formalized as an iterative distributional transformation that leads to progressive collapse and loss of structural diversity. The paper introduces a tail invariance principle, stating that stable long-term learning requires preservation of tail probability mass across model generations. The theoretical framework is supported by controlled experiments on discrete distributions, continuous models, and language models, using metrics such as KL divergence, entropy, and tail mass. The results demonstrate that common mitigation strategies (noise injection, anti-repetition heuristics, AI-content detection) do not address the root cause of collapse. Effective prevention requires explicit mechanisms to preserve the loss distribution, including real-data anchoring, dataset accumulation, distributed generators, and tail-mass correction. Overall, the work reframes model collapse as a structural consequence of loss distribution dynamics and provides a principled stability criterion for generative training pipelines.

Dangerously-skip-permissions IFF it doesn't WRITE outside Sandbox

https://github.com/ContextFort-AI/Runtime-Controls

1•ashwinr2002•1m ago•1 comments

We analyzed EU IT salaries and hiring trends using real job data

https://old.reddit.com/r/eutech/comments/1qtqly5/we_analyzed_eu_it_salaries_and_hiring_trends/

2•taubek•5m ago•0 comments

Beat AI in Incident Diagnosis Competition – $225 in Prizes, This Saturday

https://incidentfox.slack.com/join/shared_invite/zt-3ojlxvs46-xuEJEplqBHPlymxtzQi8KQ?nojsmode=1

1•chiehminwei•5m ago•1 comments

Show HN: Termoil – Terminal dashboard for managing parallel AI coding agents

https://github.com/fantom845/termoil

1•Kanix•6m ago•0 comments

for multi-broker portfolio analytics

https://gist.github.com/muarif24/

1•vikkymelani•8m ago•0 comments

Show HN: Image Protector- I over-engineered adding noise to images (CLI and GUI)

https://github.com/Codex-Crusader/Image-Protector

1•Codex-Crusader•9m ago•0 comments

Agentic Productivity System with Plain Markdown

https://sattlerjoshua.com/writing/2026-02-06-agentic-productivity-system-with-plain-markdown/

1•jsattler•15m ago•1 comments

I built a Ghibli-style image converter by modeling color and atmosphere

https://ghibli-art.io

1•leonaoa•17m ago•1 comments

Apple I: The Spark That Ignited the Digital Revolution (legendary price $666.66)

https://www.mac-history.net/apple-i-the-spark-that-ignited-the-digital-revolution/

1•stmw•22m ago•0 comments

Contaminated: The Carpet Industry's Toxic Legacy

https://www.pbs.org/wgbh/frontline/documentary/contaminated-the-carpet-industrys-toxic-legacy/

1•johntfella•23m ago•0 comments

Show HN: A React testing boilerplate for vibe coded apps

https://www.testsolid.com/

1•scedast•26m ago•0 comments

Missouri Senate considers bills to halt solar development on farmland

https://missouriindependent.com/2026/02/04/missouri-senate-considers-bills-to-halt-solar-developm...

2•MilnerRoute•26m ago•0 comments

Show HN: Fine tuning a resume builder for SWE's

https://www.sweresume.app/

1•zed_labs_dev•29m ago•0 comments

MoltDJ – Music by Machines, for Machines

https://moltdj.com

4•TheAlexIceman•32m ago•1 comments

Mark Russinovich's BSOD Photomosaic

https://github.com/markrussinovich/bsodmosaic

1•weinzierl•32m ago•0 comments

Portfolio Monitor – Claude Code skill for multi-broker portfolio analytics

https://github.com/2165187809-AXE/portfolio-monitor

1•AXEbot•37m ago•1 comments

Elfconv: AOT binary translator of Linux/ELF –> WebAssembly

https://github.com/yomaytk/elfconv

1•todsacerdoti•38m ago•0 comments

Show HN: Skeletoken, a Python package for editing model tokenizers

https://github.com/stephantul/skeletoken

1•stephantul•41m ago•0 comments

Let's compile Quake like it's 1997

https://fabiensanglard.net/compile_like_1997/index.html

2•chunkles•42m ago•0 comments

Show HN: Hacker Backlinks – HN Stories Most Linked To By HN Comments

https://hacker-backlinks.browserbox.io/?sort=linked&p=1

1•keepamovin•43m ago•1 comments

Ask HN: What are you building this Friday?

1•cranberryturkey•44m ago•0 comments

Show HN: Fylepad – A minimal, tabbed Markdown notepad built with Rust

https://github.com/imrofayel/fylepad

1•imrofayel•47m ago•0 comments

How exposed are software stocks to AI tools? We put vibe-coding to the test

https://www.cnbc.com/2026/02/05/how-exposed-are-software-stocks-to-ai-tools-we-tested-vibe-coding...

1•_____k•54m ago•0 comments

What can still be a reasonable AI bear thesis?

https://metacriticcapital.substack.com/p/what-can-still-be-a-reasonable-ai

1•MP_1729•58m ago•1 comments

Europeans have made concessions to US over Greenland, JD Vance says

https://www.bbc.com/news/articles/cdx41r62601o

3•maxloh•1h ago•1 comments

Make Nothing That Isn't Beautiful

https://thepointmag.com/examined-life/make-nothing-that-isnt-beautiful/

4•prismatic•1h ago•1 comments

What does it take to build towards 100 PRs/day per engineer?

https://jonathannen.com/building-towards-100-prs-a-day

2•jwilliams•1h ago•1 comments

Show HN: Deeploy v0.2.0 – Self-hosted PaaS with terminal UI

https://deeploy.sh

1•axadrn•1h ago•0 comments

The World Factbook: datasets for the country profiles

https://github.com/factbook

1•1659447091•1h ago•1 comments

Towards Self-Driving Codebases

https://cursor.com/blog/self-driving-codebases

1•onurkanbkrc•1h ago•0 comments