frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Stop AI scrapers from hammering your self-hosted blog

https://github.com/vivienhenz24/fuzzy-canary
25•misterchocolat•10h ago
Alright so if you run a self-hosted blog, you've probably noticed AI companies scraping it for training data. And not just a little (RIP to your server bill).

There isn't much you can do about it without cloudflare. These companies ignore robots.txt, and you're competing with teams with more resources than you. It's you vs the MJs of programming, you're not going to win.

But there is a solution. Now I'm not going to say it's a great solution...but a solution is a solution. If your website contains content that will trigger their scraper's safeguards, it will get dropped from their data pipelines.

So here's what fuzzycanary does: it injects hundreds of invisible links to porn websites in your HTML. The links are hidden from users but present in the DOM so that scrapers can ingest them and say "nope we won't scrape there again in the future".

The problem with that approach is that it will absolutely nuke your website's SEO. So fuzzycanary also checks user agents and won't show the links to legitimate search engines, so Google and Bing won't see them.

One caveat: if you're using a static site generator it will bake the links into your HTML for everyone, including googlebot. Does anyone have a work-around for this that doesn't involve using a proxy?

Please try it out! Setup is one component or one import.

(And don't tell me it's a terrible idea because I already know it is)

package: https://www.npmjs.com/package/@fuzzycanary/core gh: https://github.com/vivienhenz24/fuzzy-canary

Comments

cport1•6h ago
That's a pretty hilarious idea, but in all serious you could use something like https://webdecoy.com/
misterchocolat•6h ago
yes but here it's free, whereas this (https://webdecoy.com/) is at least 59$ a month

Show HN: Titan – JavaScript-first framework that compiles into a Rust server

https://www.npmjs.com/package/@ezetgalaxy/titan
22•soham_byte•5d ago•10 comments

Show HN: Learn Japanese contextually while browsing

https://lingoku.ai/learn-japanese
54•englishcat•5h ago•24 comments

Show HN: brig – a devcontainer CLI in Go

https://github.com/nlsantos/brig
2•nsantos•32m ago•0 comments

Show HN: TheAuditor v2.0 – A “Flight Computer” for AI Coding Agents

https://github.com/TheAuditorTool/Auditor
24•ThailandJohn•16h ago•7 comments

Show HN: Sqlit – A lazygit-style TUI for SQL databases

https://github.com/Maxteabag/sqlit
132•MaxTeabag•1d ago•19 comments

Show HN: Interactive Common Lisp: An Enhanced REPL

https://github.com/atgreen/icl
87•atgreen•3d ago•5 comments

Show HN: Obsidenc – a Rust-based paranoid-grade encryption utility

https://github.com/markrai/obsidenc
2•markrai•3h ago•0 comments

Show HN: My Tizen multiplayer drawing game flopped, but then hit 100M drawings

https://www.drawize.com/
21•lombarovic•14h ago•2 comments

Show HN: I built the fastest RSS reader in Zig

https://github.com/superstarryeyes/hys
19•superstarryeyes•10h ago•2 comments

Show HN: Solving the ~95% legislative coverage gap using LLM's

https://lustra.news/
35•fokdelafons•16h ago•21 comments

Show HN: Deterministic PCIe Diagnostics for GPUs on Linux

https://github.com/parallelArchitect/gpu-pcie-diagnostic
15•gpu_systems•9h ago•4 comments

Show HN: Zenflow – orchestrate coding agents without "you're right" loops

https://zencoder.ai/zenflow
27•andrewsthoughts•14h ago•11 comments

Show HN: A real-time 4D fractal explorer in the browser using WebGPU

https://bryanjj.github.io/nebula/
24•bryan0•5d ago•8 comments

Show HN: A pager

https://www.udp7777.com/
100•keepamovin•2d ago•42 comments

Show HN: Python packages for FastAPI apps – auth, logging, config, LLM, more

https://github.com/Netrun-Systems/netrun-oss
4•DanielGarza•6h ago•1 comments

Show HN: Skouriasmeno Papaki – S3 transfer tool, up to 12x faster than AWS-CLI

https://github.com/NetViper-Labs/skouriasmeno-papaki
4•NetViper•7h ago•0 comments

Show HN: F. Incantatem – CLI, Decorator & notebook ext. for traceback analysis

https://github.com/aguilar-ai/fincantatem
2•Paralus•7h ago•0 comments

Show HN: AI Trolley Problem Arena

https://www.aitrolleyproblem.com/
8•justintorre75•7h ago•1 comments

Show HN: Picknplace.js, an Alternative to Drag and Drop

https://jgthms.com/picknplace.js/
27•bbx•14h ago•13 comments

Show HN: A24z – AI Engineering Ops Platform

https://www.a24z.ai/
8•brandonin•9h ago•4 comments

Show HN: Stop AI scrapers from hammering your self-hosted blog

https://github.com/vivienhenz24/fuzzy-canary
25•misterchocolat•10h ago•2 comments

Show HN: Search the lyrics of 500 HÖR Berlin techno sets

https://hor.greg.technology/
16•gregsadetsky•5d ago•11 comments

Show HN: Ducktape – a tiny HTTP/2 wrapper around DuckDB's Appender API

https://github.com/artie-labs/ducktape
9•williamhaw•14h ago•0 comments

Show HN: A community-curated list of BYOC (Bring Your Own Cloud) vendors

https://github.com/nuonco/awesome-byoc
9•realsharkymark•8h ago•0 comments

Show HN: Dev Tools – 24 browser-based utilities with no signup or tracking

https://dev-tools.online
3•ghdj•11h ago•0 comments

Show HN: Cordon – Reduce large log files to anomalous sections

https://github.com/calebevans/cordon
17•calebevans•1d ago•0 comments

Show HN: AI Generated SVG's

https://vectorart.ai
2•tm11zz•11h ago•0 comments

Show HN: Pothole Detection System (YOLOv8 – FastAPI – Docker – React Native)

https://github.com/PeterHdd/pothole-detection-yolo
2•peterhddcoding•14h ago•0 comments

Show HN: Kafkatop 2.0 – top for Kafka – rewritten in Go with partition analytics

https://github.com/sivann/kafkatop
2•sivann•14h ago•0 comments

Show HN: DuckDB Table Visualizer –> Iceberg

https://duckdb.org/visualizer/
2•carlopi•14h ago•0 comments