frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Stop AI scrapers from hammering your self-hosted blog

https://github.com/vivienhenz24/fuzzy-canary
9•misterchocolat•2h ago
Alright so if you run a self-hosted blog, you've probably noticed AI companies scraping it for training data. And not just a little (RIP to your server bill).

There isn't much you can do about it without cloudflare. These companies ignore robots.txt, and you're competing with teams with more resources than you. It's you vs the MJs of programming, you're not going to win.

But there is a solution. Now I'm not going to say it's a great solution...but a solution is a solution. If your website contains content that will trigger their scraper's safeguards, it will get dropped from their data pipelines.

So here's what fuzzycanary does: it injects hundreds of invisible links to porn websites in your HTML. The links are hidden from users but present in the DOM so that scrapers can ingest them and say "nope we won't scrape there again in the future".

The problem with that approach is that it will absolutely nuke your website's SEO. So fuzzycanary also checks user agents and won't show the links to legitimate search engines, so Google and Bing won't see them.

One caveat: if you're using a static site generator it will bake the links into your HTML for everyone, including googlebot. Does anyone have a work-around for this that doesn't involve using a proxy?

Please try it out! Setup is one component or one import.

(And don't tell me it's a terrible idea because I already know it is)

package: https://www.npmjs.com/package/@fuzzycanary/core gh: https://github.com/vivienhenz24/fuzzy-canary

I Ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5h

https://simonwillison.net/2025/Dec/15/porting-justhtml/
1•pbowyer•33s ago•0 comments

Google Fi Web Calls

https://fi.google.com/webcalls/calls
1•pcvetkovski•1m ago•0 comments

Launching ChinaRxiv, an automated translation pipeline of all Chinese preprints

https://twitter.com/seconds_0/status/2000606845644505093
1•Anon84•8m ago•0 comments

The "Commons Clause" License Condition

https://commonsclause.com/
1•Kerrick•15m ago•0 comments

Show HN: BoardSpace – AI that draws on a whiteboard in realtime for Calculus

https://www.useboardspace.com/
1•jonnotdoe•16m ago•1 comments

Texas sues biggest TV makers, alleging smart TVs spy on users without consent

https://arstechnica.com/tech-policy/2025/12/texas-sues-biggest-tv-makers-alleging-smart-tvs-spy-o...
9•c420•17m ago•6 comments

The Disappointing Truth About Wi-Fi 7: Multi-Link Operation Isn't Here Yet

https://www.rtings.com/router/learn/research/wifi-7-mlo
1•dokeeffe•17m ago•1 comments

Using Cursor's Bugbot to Spot Issues Early in Pull Requests

https://medium.com/@ali-dev/using-cursor-bugbot-to-spot-issues-early-0cdc142fbaff
1•stringtoint•19m ago•0 comments

The Writer Who Dared Criticize Silicon Valley

https://www.nytimes.com/2025/11/27/technology/writer-silicon-valley-criticism.html
3•petethomas•22m ago•0 comments

Show HN: Calm Companies – Businesses where less is more

https://calmcompanies.club
3•RaulOnRails•22m ago•0 comments

Glycemic index, glycemic load, and risk of dementia

https://academic.oup.com/ije/article-abstract/54/6/dyaf182/8313011?redirectedFrom=fulltext
1•bikenaga•25m ago•1 comments

What the Soviets Found on Venus

https://vinyasi.substack.com/p/what-the-soviets-found-on-venus
1•vinyasi•25m ago•0 comments

Write a Simple Code Agent using moonbitlang/async

https://www.moonbitlang.com/blog/moonbit-async-code-agent
1•necrodome•26m ago•0 comments

Read and Learn: open-source language learning app

https://readandlearn.app/
1•waveywaves•29m ago•1 comments

Breach at South Korea's Equivalent of Amazon Exposed Data of Almost Every Adult

https://www.wsj.com/world/asia/breach-at-south-koreas-equivalent-of-amazon-exposed-data-of-almost...
5•bookofjoe•30m ago•1 comments

Nicholas Deak

https://en.wikipedia.org/wiki/Nicholas_Deak
1•petethomas•30m ago•0 comments

Show HN: The Mirsky Ratio–Measuring R&D vs. SG&A as a predictor of S&P 100

https://substack.com/inbox/post/181826707
2•TheMirskyLimit•31m ago•1 comments

Who has enjoyed using PR code reviewers? What worked and what didn’t?

2•yashwantphogat•31m ago•1 comments

UK to rejoin EU's Erasmus student exchange programme

https://www.theguardian.com/world/2025/dec/16/uk-to-rejoin-eu-erasmus-student-exchange-programme
5•sandbach•31m ago•0 comments

Wall Street banks prepare for round-the-clock stock trading, reluctantly

https://www.reuters.com/business/finance/wall-street-banks-prepare-round-the-clock-stock-trading-...
3•gardncl•32m ago•0 comments

Director of MIT's Plasma and Fusion Center, Dies at 47

https://news.mit.edu/2025/nuno-loureiro-professor-director-plasma-science-and-fusion-center-dies-...
3•jacobedawson•35m ago•1 comments

Manifesto for AI Software Development: Code Is Cattle, Not Pets

https://metamagic.substack.com/p/manifesto-for-ai-software-development
1•r0ze-at-hn•36m ago•1 comments

Adding type-safe structs to Lua

https://if-not-nil.github.io/lua-structs/
1•qwool•37m ago•0 comments

Classify website content using text and screenshot

https://github.com/themains/piedomains
1•neehao•38m ago•0 comments

TRELLIS.2: state-of-the-art large 3D generative model (4B)

https://github.com/microsoft/TRELLIS.2
3•dvrp•39m ago•1 comments

Screenshot Snapchats Without Sending a Notification

https://snapninja.app/
1•cjkehoe•39m ago•0 comments

Iceberg in the Browser

https://duckdb.org/2025/12/16/iceberg-in-the-browser
1•anaclet0•39m ago•0 comments

Closing the Agent Loop

https://www.sawyerhood.com/blog/closing-the-agent-loop
1•sawyerjhood•40m ago•0 comments

No AI* Here – A Response to Mozilla's Next Chapter

https://www.waterfox.com/blog/no-ai-here-response-to-mozilla/
4•MrAlex94•41m ago•3 comments

Climate change has made the United States poorer

https://www.pnas.org/doi/10.1073/pnas.2504376122
4•bikenaga•43m ago•1 comments