frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

0.1% synthetic data is enough to degrade AI models (Nature, 2024)

https://medium.com/ai-advances/model-collapse-when-ai-trains-on-ai-generated-data-2c4baf60a016
2•Aedelon•2h ago

Comments

Aedelon•2h ago
Survey of 65+ papers on model collapse. Key finding from Dohmatob et al. (ICLR 2025): even 0.1% synthetic contamination in training data causes measurable degradation.

No major dataset (FineWeb, RedPajama, C4) currently filters for AI-generated content.

bediger4000•1h ago
How about complete crap data? We know there are people generating rubbish specifically to feed to "AI". Can they generate enough to cause problems?
Aedelon•59m ago
Both angles are real but they play out differently. On the deliberate side: Nightshade showed you can poison image models with a few hundred modified samples. Backdoor attacks on LLMs (sleeper agents, trojan triggers) are an active research area, and the attack surface is huge because most training pipelines just scrape the open web. So yes, someone generating garbage on purpose can cause targeted damage, especially if they understand how the data gets collected.

But the scarier part is that nobody needs to try. The accidental contamination is already happening. Models train on web data, produce outputs that end up on the web, next generation trains on that. Dohmatob et al. showed 0.1% synthetic contamination is enough to cause measurable degradation. Right now no major dataset (FineWeb, RedPajama, C4) filters for AI-generated content.

What makes this harder to think about: data quality and model performance don't always follow "garbage in, garbage out." I wrote about a related paradox where Qwen2.5-Math trained with deliberately wrong reward signals still improved almost as much as with correct ones: https://ai.gopubby.com/false-rewards-make-ai-smarter-paradox...

Models are simultaneously fragile to recursive contamination and weirdly resilient to corrupted training signals. The picture is messier than either side suggests.

Yifi: A macOS menu bar app that monitors your network health in real time

https://github.com/itsnauman/yifi
1•naumanxyz•5m ago•1 comments

Show HN: How to challenge technical assumptions before they cost you

https://platformtoolsmith.com/blog/challenging-assumptions/
1•sharp-dev•7m ago•0 comments

Show HN: 3D and World Models for Consistent AI Filmmaking

https://getartcraft.com/news/world-models-for-film
1•echelon•8m ago•0 comments

The Solution to Prompt Injection: Mapping SSL/TLS Trust Architecture onto LLMs [pdf]

https://solvingpromptinjection.com/wp-content/uploads/solution-to-prompt-injection.pdf
1•detroitwebsites•14m ago•0 comments

Don't give away to the gradient descent

https://carteakey.dev//blog/dont-give-away-to-the-gradient-descent/
1•carteakey•18m ago•0 comments

Shell and Skills and Compaction: Tips for long-running agents that do real work

https://developers.openai.com/blog/skills-shell-tips/
1•vinhnx•18m ago•0 comments

Anna's Archive 'Releases' Spotify Tracks, Despite Legal Pushback

https://torrentfreak.com/annas-archive-quietly-releases-millions-of-spotify-tracks-despite-legal-...
2•pabs3•24m ago•0 comments

Terms of Service

https://felix.dognebula.com/art/terms-of-service.html
1•luu•24m ago•0 comments

Healthcare Jobs Have Become the Engine of America's Labor Market

https://www.wsj.com/economy/jobs/healthcare-jobs-have-become-the-engine-of-americas-labor-market-...
1•petethomas•29m ago•0 comments

Benchmarking 8 remote browser providers with 250 concurrent AI agents

https://research.aimultiple.com/remote-browsers/
1•toliveistobuild•29m ago•1 comments

A language model made in Latin America, for Latin America

https://www.latamgpt.org/en
2•ofou•30m ago•1 comments

SpaceX Makes a Pivot, Wants to Build on the Moon Instead

https://www.universetoday.com/articles/spacex-makes-a-huge-pivot-wants-to-build-on-the-moon-instead
1•geox•31m ago•1 comments

Building Chess in about 350 lines of Clojure

https://www.sammystraus.com/#building-chess-in-about-350-lines-of-clojure
2•sammy0910•31m ago•0 comments

Show HN: Claude Remote

https://github.com/jamierpond/claude-remote
2•jamiepond•32m ago•2 comments

I found a way to reduce context redundancy 30-60%

https://www.triage-sec.com/blog/delta-ltsc
1•nicksec•32m ago•0 comments

Show HN: IQT – Why space feels panoramic and time feels fleeting

https://github.com/creatorrr/intrinsic-quality-theory
1•diwank•34m ago•0 comments

Mistral's revenues soar over $400M as Europe seeks AI independence

https://www.ft.com/content/664249e7-e8d5-4425-b397-ad3ed590b305
1•petethomas•35m ago•0 comments

Ask HN: What resources do you use to fill specialized positions?

1•jasbur•36m ago•0 comments

Show HN: Double blind entropy using Drand for verifiably fair randomness

https://blockrand.net/live.html
5•rishi_blockrand•38m ago•0 comments

US payment processor BridgePay outage lasts a week due to ransomware attack

https://www.bleepingcomputer.com/news/security/payments-platform-bridgepay-confirms-ransomware-at...
3•echo7394•39m ago•0 comments

How Do You Patch This? Red Team Down

https://github.com/moketchups/permanently-jailbroken
1•MoKetchups•40m ago•0 comments

Hyperliquidity Provider (HLP)

https://app.hyperliquid.xyz/vaults/0xdfc24b077bc1425ad1dea75bcb6f8158e10df303
2•andxor•40m ago•0 comments

We Bought the First Fake Toyota from China [video]

https://www.youtube.com/watch?v=_uoCadOum-A
1•JojoFatsani•40m ago•0 comments

Apple reportedly pushing back Gemini-powered Siri features beyond iOS 26.4

https://9to5mac.com/2026/02/11/apple-reportedly-pushing-back-gemini-powered-siri-features-beyond-...
4•doctoboggan•43m ago•0 comments

The Problem with LLMs

https://www.deobald.ca/essays/2026-02-10-the-problem-with-llms/
1•vinhnx•45m ago•0 comments

The Dark Side of This AI Startup's Super-Fast Growth

https://www.forbes.com/sites/rashishrivastava/2026/02/11/racist-videos-and-payment-problems-the-d...
2•echelon•47m ago•1 comments

Deriving the Fisher Equation from 2D Fluid Dynamics (SSRN)

https://ssrn.com/abstract=6152150
1•alex_w_systems•48m ago•0 comments

Mathematicians Are Putting A.I. To the Test

https://www.nytimes.com/2026/02/07/science/mathematics-ai-proof-hairer.html
1•sonabinu•51m ago•0 comments

Russia blocks Meta's WhatsApp messaging service

https://www.ft.com/content/468ebeec-3d38-4f8c-8513-97f533d8f43b
1•petethomas•53m ago•0 comments

Without XSLT, user is prompted to download RSS in browser [video]

https://www.youtube.com/watch?v=YxfUwbliilQ
2•mijustin•53m ago•0 comments