frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Poisoning Scraperbots with Iocane (paywall)

https://lwn.net/Articles/1056953/
3•medbar•1h ago

Comments

jjgreen•1h ago
Subscriber-only
NitpickLawyer•40m ago
Title should be ...with Iocaine, and the project seems to be this one - https://iocaine.madhouse-project.org/

> It is an aggressive defense mechanism that tries its best to take the blunt of the assault, serve them garbage, and keep them off of upstream resources.

> It tries to poison them, so they’d go away forever in the long run.

All I can say is that this is not how any of it works. These people think they're doing something but they have no idea how large scale training works, how data cleaning works, how synthetic data works, and so on. They won't "fool" anyone worth fooling, the bots will still scrape their sites, and so on.

This seems to be a direct consequence of lots of people (even here on HN) constantly repeating a few memes like "regurgitating the training set", "ai bots crawl everything for training data", "we ran out of training data", etc. None of those are true. None of those matter in SotA models. Noone is training on raw scraped data anymore, and they haven't been doing it for 2-3 years. All the recent (1-2 years) gains in models have come from synthetic data + real world generated data (i.e. RL environments).

This is a cute attempt, reminds me of the old tarpit concept from the 2000s, but it won't work, and it will just consume resources for whoever runs it, with 0 benefit downstream. If you want to do something about the crawlers, fix your serving. Don't do work on GETs, serve as much cached content as you can, filter them, even use anubis or the likes. Those things actually matter.

Gityap – Ship vs. Talk Intelligence

https://gityapper-web.vercel.app/
1•dawitworku•1m ago•1 comments

Nebius to buy AI agent search company Tavily for 275M

https://nebius.com/newsroom/nebius-announces-agreement-to-acquire-tavily-to-add-agentic-search-to...
1•ashvardanian•1m ago•1 comments

OpenRouter: Free Models Router

https://openrouter.ai/openrouter/free
1•jhack•3m ago•0 comments

AI DiagScan – AI-Powered OBD2 Automotive Diagnostic Tool

https://pythoncyber.go.ro
1•diagscan•4m ago•1 comments

1940s Irish sci-fi novel features early mecha and gravity assists

https://github.com/cavedave/Manannan
1•donohoe•4m ago•0 comments

I cannot curl https://example.com (on some distros)

https://blog.outv.im/2026/i-cannot-curl-example-com/
1•outloudvi•5m ago•0 comments

Speed Can Reindustrialize America

https://austinvernon.substack.com/p/speed-can-reindustrialize-america
1•walterbell•6m ago•0 comments

Resist and Unsubscribe

https://www.resistandunsubscribe.com
1•softwaredoug•6m ago•0 comments

dc

https://en.wikipedia.org/wiki/Dc_(computer_program)
3•tosh•8m ago•0 comments

Show HN: Clawlet – AI agent with built-in semantic memory, one binary

https://github.com/mosaxiv/clawlet
1•mosaxiv•11m ago•0 comments

Automated Chemical Profiling of Wine by Solution NMR Spectroscopy

https://pubs.acs.org/doi/10.1021/acs.jchemed.5c00652
1•bookofjoe•11m ago•0 comments

Suggest HN: How to kill AI spam submissions

1•andsoitis•11m ago•0 comments

Inner-Platform Effect

https://en.wikipedia.org/wiki/Inner-platform_effect
2•tosh•11m ago•0 comments

Show HN: Typemux-cc – .venv-aware Python LSP proxy for Claude Code (no restarts)

https://github.com/K-dash/typemux-cc
1•K-dash•13m ago•0 comments

Doom Emacs package: ready to use configuration for Buf toolchain

1•Piprim•16m ago•0 comments

We hid backdoors in binaries – Opus 4.6 found 49% of them

https://quesma.com/blog/introducing-binaryaudit/
2•stared•17m ago•1 comments

Supercazzola – Generate spam for web scrapers

https://dacav.org/projects/supercazzola/
1•birdculture•18m ago•0 comments

The Epstein files, annotated by the crowd

https://epstein-studio.com
1•salkahfi•18m ago•0 comments

" Tech debt is gone. Not that we solved it, but that AI made it irrelevant."

https://www.axios.com/2026/02/15/ai-coding-tech-product-development
1•Balgair•19m ago•0 comments

Show HN: Dojocho – Shadcn for Coding Katas

https://dojocho.ai
1•proxylittle•21m ago•0 comments

Hans Bjordahl on Digital Transformation, and Culture-Led Design [video]

https://www.youtube.com/watch?v=elwjr_XFiCw
1•mooreds•21m ago•0 comments

Obama says aliens are 'real, but I haven't seen them' in out-there new interview

https://nypost.com/2026/02/14/us-news/obama-says-aliens-are-real-but-i-havent-seen-them-in-out-th...
2•SirLJ•21m ago•0 comments

Clawdrey Hepburn – an AI agent researching identity infrastructure

https://twitter.com/clawdreyhepburn/status/2022771820659622022
1•mooreds•22m ago•0 comments

I Vibe Coded the Epstein Files Podcast with Claude and Hit 100K Downloads

https://levychain.substack.com/p/i-vibe-coded-the-epstein-files-podcast
1•martialg•22m ago•0 comments

Colorado Deep Tech Summit

https://codeeptech.com
1•mooreds•23m ago•0 comments

AI is slowly munching away my passion

https://whynot.fail/human/ai-is-slowly-munching-away-my-passion/
3•ttouch•25m ago•0 comments

The Dark Side of the Enlightenment

https://www.newstatesman.com/culture/books/2026/02/the-dark-side-of-the-enlightenment
7•thinkingemote•31m ago•0 comments

TSMC's US investment plans at heart of $250B puzzle for chip sector

https://www.ft.com/content/b715b003-1d10-46d4-a02d-1c5969d0dbf8
1•andsoitis•31m ago•0 comments

Show HN: Ogimg.xyz – Generate OG images via API, no headless browser

https://ogimg.xyz
1•victorlgch•34m ago•0 comments

Show HN: Lineark – Linear CLI and Rust SDK for Humans and LLMs

https://github.com/flipbit03/lineark
3•fb03•35m ago•0 comments