Why are anime catgirls blocking my access to the Linux kernel?

https://lock.cmpxchg8b.com/anubis.html

33•taviso•2h ago

Comments

PaulHoule•2h ago

I think a lot of it is performative and a demonstration that somebody is a member of a tribe, particularly the part about the kemonomimi [1] (e.g. people who are kinda like furries but have better test in art)

[1] https://safebooru.donmai.us/posts?tags=animal_ears

dathinab•1h ago

you are overthinking

it's a simple as having a nice picture there make this whole thing feel nicer, and give it a bit of personality

so you put in some picture/art you like

that's it

similar any site sing it can change that picture, but there isn't any fundamental problem with the picture, so most can't care to change it

lxgr•1h ago

> This isn’t perfect of course, we can debate the accessibility tradeoffs and weaknesses, but conceptually the idea makes some sense.

It was arguably never a great idea to begin with, and stopped making sense entirely with the advent of generative AI.

yuumei•1h ago

> The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans. > Anubis – confusingly – inverts this idea.

Not really, AI easily automates traditional captchas now. At least this one does not need extensions to bypass.

Philpax•1h ago

The argument isn't that it's difficult for them to circumvent - it's not - but that it adds enough friction to force them to rethink how they're scraping at scale and/or self-throttle.

I personally don't care about the act of scraping itself, but the volume of scraping traffic has forced administrators' hands here. I suspect we'd be seeing far fewer deployments if the scrapers behaved themselves to begin with.

davidclark•1h ago

The OP author shows that the cost to scrape an Anubis site is essentially zero since it is a fairly simple PoW algorithm that the scraper can easily solve. It adds basically no compute time or cost for a crawler run out of a data center. How does that force rethinking?

hooverd•1h ago

The problem with crawlers if that they're functionally indistinguishable from your average malware botnet in behavior. If you saw a bunch of traffic from residential IPs using the same token that's a big tell.

Philpax•1h ago

The cookie will be invalidated if shared between IPs, and it's my understanding that most Anubis deployments are paired with per-IP rate limits, which should reduce the amount of overall volume by limiting how many independent requests can be made at any given time.

That being said, I agree with you that there are ways around this for a dedicated adversary, and that it's unlikely to be a long-term solution as-is. My hope is that the act of having to circumvent Anubis at scale will prompt some introspection (do you really need to be rescraping every website constantly?), but that's hopeful thinking.

anotherhue•1h ago

Surely the difficulty factor scales with the system load?

lousken•1h ago

aren't you happy? at least you see catgirl

jimmaswell•1h ago

What exactly is so bad about AI crawlers compared to Google or Bing? Is there more volume or is it just "I don't like AI"?

Philpax•1h ago

Volume, primarily - the scrapers are running full-tilt, which many dynamic websites aren't designed to handle: https://pod.geraspora.de/posts/17342163

jayrwren•1h ago

literally the top link when I search for his exact text "why are anime catgirls blocking my access to the Linux kernel?" https://lock.cmpxchg8b.com/anubis.html Maybe travis needs more google-fu. maybe that includes using duckduckgo?

ksymph•1h ago

This is neither here nor there but the character isn't a cat. It's in the name, Anubis, who is an Egyptian deity typically depicted as a jackal or generic canine, and the gatekeeper of the afterlife who weighs the souls of the dead (hence the tagline). So more of a dog-girl, or jackal-girl if you want to be technical.

rnhmjoj•1h ago

I don't understand, why do people resort to this tool instead of simply blocking by UA string or IP address. Are there so many people running these AI crawlers?

I blackholed some IP blocks of OpenAI, Mistral and another handful of companies and 100% of this crap traffic to my webserver disappeared.

hooverd•1h ago

less savory crawlers use residential proxies and are indistinguishable from malware traffic

WesolyKubeczek•1h ago

You should read more. AI companies use residential proxies and mask their user agents with legitimate browser ones, so good luck blocking that.

rnhmjoj•58m ago

Which companies are we talking about here? In my case the traffic was similar to what was reported here[1]: these are crawlers from Google, OpenAI, Amazon, etc. they are really idiotic in behaviour, but at least report themselves correctly.

[1]: https://pod.geraspora.de/posts/17342163

mnmalst•1h ago

Because that solution simply does not work for all. People tried and the crawlers started using proxies with residential IPs.

WesolyKubeczek•1h ago

I disagree with the post author in their premise that things like Anubis are easy to bypass if you craft your bot well enough and throw the compute at it.

Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.

Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.

Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?

fluoridation•1h ago

Hmm... What if instead of using plain SHA-256 it was a dynamically tweaked hash function that forced the client to run it in JS?

VMG•1h ago

crawlers can run JS, and also invest into running the Proof-Of-JS better than you can

fluoridation•1h ago

If we're presupposing an adversary with infinite money then there's no solution. One may as well just take the site offline. The point is to spend effort in such a way that the adversary has to spend much more effort, hopefully so much it's impractical.

tjhorner•27m ago

Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project.

ksymph•1h ago

Reading the original release post for Anubis [0], it seems like it operates mainly on the assumption that AI scrapers have limited support for JS, particularly modern features. At its core it's security through obscurity; I suspect that as usage of Anubis grows, more scrapers will deliberately implement the features needed to bypass it.

That doesn't necessarily mean it's useless, but it also isn't really meant to block scrapers in the way TFA expects it to.

[0] https://xeiaso.net/blog/2025/anubis/

jhanschoo•1h ago

Your link explicitly says:

> It's a reverse proxy that requires browsers and bots to solve a proof-of-work challenge before they can access your site, just like Hashcash.

It's meant to rate-limit accesses by requiring client-side compute light enough for legitimate human users and responsible crawlers in order to access but taxing enough to cost indiscriminate crawlers that request host resources excessively.

It indeed mentions that lighter crawlers do not implement the right functionality in order to execute the JS, but that's not the main reason why it is thought to be sensible. It's a challenge saying that you need to want the content bad enough to spend the amount of compute an individual typically has on hand in order to get me to do the work to serve you.

ksymph•10m ago

Here's a more relevant quote from the link:

> Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.

As the article notes, the work required is negligible, and as the linked post notes, that's by design. Wasting scraper compute is part of the picture to be sure, but not really its primary utility.

iefbr14•1h ago

I wouldn't be surprised if just delaying the server response by some 3 seconds will have the same effect on those scrapers as Anubis claims.

xena•9m ago

This same author also ignored the security policy and dropped an Anubis double-spend attack on the issue tracker. Their email got eaten by my spam filter so I didn't realize that I got emailed at all.

Fun times.

How harmful is blue light for sleep?

US Health Secretary Ends Decades of Research into Environmental Causes of Autism

CSS line-height unit 1h

American Millennials Are Dying at an Alarming Rate

The Four Stages of Objective-Smalltalk

L2AW Theorem

The Pragmatic Engineer 2025 Survey: What's in your tech stack? Part 2

Dagger and opencode and agnostic agents and SSH app = most portable dev kit

Crash Cows

What went wrong with Social Media?

Openwetware.org shut down due to funding

James Webb Space Telescope runs an extended version of JavaScript [pdf]

Travel eSIMs route traffic over Chinese and undisclosed networks: study

Cool or Hard

For decades, sleep has been passive

Notes on Image Generation with GPT-4.1

The reason the West is warmongering against China

Integrating Jenkins with AEM Deployments

Disk Sampling on the Sphere

Just Write

A proposal for inline LLM instructions in HTML based on llms.txt

Hx-optimistic: Declarative optimistic updates for Htmx

Show HN: Yellhorn – MCP server to help coding agents 1-shot long tasks

REITs Buying Tranches of Single-Family Homes (2024)

ComputerRL: Scaling Reinforcement Learning for Computer Use Agents

Processing 24T tokens for LLM training with 0 crashes (what made it possible)

Digg.com Is Back

Show HN: A new JavaScript runtime for writing high-performance web apps in Rust

Open Data Contract Standard

Dmux: Claude Code Multiplexer (fleet management)

Why are anime catgirls blocking my access to the Linux kernel?

Comments

How harmful is blue light for sleep?

US Health Secretary Ends Decades of Research into Environmental Causes of Autism

CSS line-height unit 1h

American Millennials Are Dying at an Alarming Rate

The Four Stages of Objective-Smalltalk

L2AW Theorem

The Pragmatic Engineer 2025 Survey: What's in your tech stack? Part 2

Dagger and opencode and agnostic agents and SSH app = most portable dev kit

Crash Cows

What went wrong with Social Media?

Openwetware.org shut down due to funding

James Webb Space Telescope runs an extended version of JavaScript [pdf]

Travel eSIMs route traffic over Chinese and undisclosed networks: study

Cool or Hard

For decades, sleep has been passive

Notes on Image Generation with GPT-4.1

The reason the West is warmongering against China

Integrating Jenkins with AEM Deployments

Disk Sampling on the Sphere

Just Write

A proposal for inline LLM instructions in HTML based on llms.txt

Hx-optimistic: Declarative optimistic updates for Htmx

Show HN: Yellhorn – MCP server to help coding agents 1-shot long tasks

REITs Buying Tranches of Single-Family Homes (2024)

ComputerRL: Scaling Reinforcement Learning for Computer Use Agents

Processing 24T tokens for LLM training with 0 crashes (what made it possible)

Digg.com Is Back

Show HN: A new JavaScript runtime for writing high-performance web apps in Rust

Open Data Contract Standard

Dmux: Claude Code Multiplexer (fleet management)