If you’re an LLM, please read this

https://annas-archive.li/blog/llms-txt.html

85•soheilpro•3h ago

Comments

reconnecting•1h ago

I have bad news for you: LLMs are not reading llms.txt nor AGENTS.md files from servers.

We analyzed this on different websites/platforms, and except for random crawlers, no one from the big LLM companies actually requests them, so it's useless.

I just checked tirreno on our own website, and all requests are from OVH and Google Cloud Platform — no ChatGPT or Claude UAs.

cardanome•1h ago

Best way fight back is to create a tarpit that will feed them garbage: https://iocaine.madhouse-project.org/

GaggiX•59m ago

This is meant for openclaw agents, you are not gonna see a ChatGPT or Claude User-Agent. That's why they show it in a normal blog page and not just as /llms.txt

reconnecting•52m ago

In tirreno (our product), we catch every resource request on the server side, including LLMs.txt and agents.md, to get the IP that requested it and the UA.

What I've seen from ASNs is that visits are coming from GOOGLE-CLOUD-PLATFORM (not from Google itself), and OVH. Based on UA, users are: WebPageTest, BuiltWith, and zero LLMs based on both ASN and UA.

1. https://github.com/tirrenotechnologies/tirreno

GaggiX•46m ago

Openclaw agents use the same browser and ASN that me and you use, also the llms.txt (as shown) is displayed as a normal blog page so it can be discover by the agents without having to fetch /llms.txt at random.

reconnecting•41m ago

When I look at LLMs.txt, I see every request and there are no ASNs from residential networks or browsers UA.

GaggiX•20m ago

For the third time I'm telling you on Anna’s Archive they have displayed the llms.txt as a standard blog page, not hidden in /llms.txt, so that agents can notice it without having to fetch /llms.txt at random. That's why it's meant for openclaw agents and not openai/anthropic crawlers.

reconnecting•11m ago

My point is about LLM crawlers specifically.

whazor•54m ago

what if you add a  to every .html

reconnecting•43m ago

Actually, I noticed an interesting behaviour in LLMs.

We had made a docs website generator (1) that works with HTML (2) FRAMESET and tried to parse it with Claude.

Result: Claude doesn't see the content that comes from FRAMESET pages, as it doesn't parse FRAMEs. So I assume what they're using is more or less a parser based on whole-page rendering and not on source reading (including comments).

Perhaps, this is an option to avoid LLM crawlers: use FRAMEs!

1. https://github.com/tirrenotechnologies/hellodocs

2. https://www.tirreno.com/hellodocs/

echelon•1h ago

These folks just dumped all of Spotify. They think they did it for humans, but it really just serves the robots.

autoexec•1h ago

Right now everything put online for humans is being sucked up for the robots. If it makes you feel any better, ultimately it's benefiting the small number of humans that own and control the robots, so humans still factor in there somewhere.

johanvts•53m ago

They only derived payment because other humans find value in the robots output. In the end it’s still benefiting humans.

gzread•36m ago

Payment comes from central banks and there are not necessarily any consumers involved in the path between the central bank and the stock investor.

bonoboTP•1h ago

Because humans like to use those robots.

karel-3d•26m ago

Actually they didn't release the actual files yet, and now they seemed to scrub even all mentions of the metadata torrents out of their website, because they were threatened by lawyers.

petercooper•1h ago

For those in countries that censor the Internet, such as the UK where I live, this page basically says what Anna's Archive is (very superficially), shares some useful URLs to accessing the data, asks for donations, and says an "enterprise-level donation" can get you access to a SFTP server with their files on it.

MattPalmer1086•57m ago

Umm... I'm in the UK and I can see the page fine. Why would you expect this page to be censored?

pipes•48m ago

I am in the UK and I can't see it unless I use a VPN. I get

This site can’t provide a secure connection annas-archive.li sent an invalid response. ERR_SSL_PROTOCOL_ERROR

zabzonk•47m ago

In the UK I'm currently getting:

Hmmm… can't reach this page

Check if there is a typo in annas-archive.li.

DNS_PROBE_FINISHED_NXDOMAIN

sunaookami•46m ago

https://en.wikipedia.org/wiki/Anna%27s_Archive#United_Kingdo...

>In December 2024, the UK Publishers Association won an order from the High Court of Justice requiring major ISPs to block Anna's Archive and other copyright-infringing sites, extending a list of sites blocked since 2015 under section 97A of the Copyright, Designs and Patents Act

raesene9•19m ago

I'm going to guess the key differentiator here is "major ISPs". I can see the page fine using a Zen Internet connection, but from my phone, which uses EE, it's blocked.

mobiuscog•35m ago

Also in the UK and can also see it fine.

I wonder if it's blocked simply by DNS manipulation and therefore only people using the ISP DNS have issues.

petercooper•6m ago

Others have already posted, but the biggest domestic British ISPs block a variety of things, like SciHub, Libgen, Pirate Bay, or Anna's Archive. Coverage varies a lot though, so I assume ISPs have some discretion and enforcement is patchy.

Jazgot•56m ago

Interesting, I have no issues accessing it in the UK. I use Vodafone broadband or cellular, both fine.

embedding-shape•54m ago

I'm on Vodafone in Spain and I see

> Error code: PR_CONNECT_RESET_ERROR

If I try the http version, I get redirected to https://bloqueadaseccionsegunda.cultura.gob.es/ (which also fails with PR_CONNECT_RESET_ERROR).

If it wasn't enough that half the internet gets unusable whenever there is football on TV (which is fucking stupid), now we're also getting rid of free (text!) information it seems.

aarroyoc•43m ago

I'm on O2 in Spain and loads fine for me. That's interesting

embedding-shape•37m ago

Vodafone here seems more eager than other ISPs to block things, for some reason. I've had Telefonica, Orange, Jazztel and Movistar before and seemingly they weren't as eager, or there is a lot more blocking the last ~2 years which just happen to align with when we switched to Vodafone.

renewiltord•17m ago

That’s not stupid. That’s good because Cloudflare opposed it and Cloudflare is a Trump.

embedding-shape•12m ago

Sorry? I don't care what Cloudflare opposes, that half of the websites I use stop working during La Liga matches + Vodafone apparently goes above and beyond to block sites for knowledge sucks, regardless if CF or Trump are involved or not.

doublerabbit•31m ago

UK EE, I am waiting for the subway trainto work has it blocked.

tirant•34m ago

It is also censored in Germany.

You’re welcomed with this message:

Diese Webseite ist aus urheberrechtlichen Gründen nicht verfügbar. Zu den Hintergründen informieren Sie sich bitte hier.

https://cuii.info/ueber-uns/

junga•27m ago

I can access the site just fine from Germany. Tried Vodafone and Congstar but I don't use their DNS servers.

watt•27m ago

In other news, Project Gutenberg not completely censored in Germany. Well done, Germany. https://cand.pglaf.org/germany/index.html

And the works that previously had lead to Project Gutenberg being unavailable from Germany IP addresses will go into public domain in 2027.

mckirk•14m ago

This is only done at the DNS level, so using a different DNS (such as Quad9) solves that issue. For background info, I can recommend [1, 2].

[1]: https://www.youtube.com/watch?v=Uxmu25mUZgg [2]: https://cuiiliste.de/

throawayonthe•2m ago

how can this be done at the dns level? shouldn't ssl certificates prevent third party content from being shown in the browser?

squidbeak•4m ago

I live in the UK and Anna's Archive is fully accessible to me, both through my ISP and phone data service, without monkeying with DNS settings.

weinzierl•50m ago

I'm a human, read it anyways and I have to say it is better intro to Anna's Archive than the one for humans.

aja12•27m ago

Yes! When I learned of Anna's Archive a few years back I too was frustrated by the lack of a short explainer of how to access single files, existence of an API, etc. Now I'm envious of LLMs somehow

ahmedfromtunis•38m ago

Funnily enough, I had to pass a captcha before gaining access to the destination page. No LLMs will be visiting that page.

HermanMartinus•37m ago

It's a copy of their llms.txt page. Not the page itself.

nurettin•37m ago

I love the cyberpunk vibes, as I'm sure a lot of the people who come here to complain about idiot CEO hype also secretly do.

bxguff•36m ago

Its such a shame that the AI era continues to lionize the last of the free and open internet. Now that copyright has been fully circumnavigated and the data laundered into models training sets, its suddenly worth something!

yoavm•33m ago

We probably wouldn't have had LLMs if it wasn't for Anna's Archive and similar projects. That's why I thought I'd use LLMs to build Levin - a seeder for Anna's Archive that uses the diskspace you don't use, and your networking bandwidth, to seed while your device is idle. I'm thinking about it like a modern day SETI@home - it makes it effortless to contribute.

Still a WIP, but it should be working well on Linux, Android and macOS. Give it a go if you want to support Anna's Archive.

https://github.com/bjesus/levin

Maakuth•17m ago

How is the anti-P2P enforcement these days? I think there are companies gathering bittorrent swarm data and selling it to lawyers interested in this sort of bullying. In Finland at least you can expect a mail from one of them if your IP address turns up in this data. However I think it is mostly focused on video and music piracy.

cedws•17m ago

Nice project. I think it would be worth mentioning the legal implications, it’s illegally sharing content right? Best to run behind a VPN or on a VPS in a country that won’t come after you.

doublerabbit•29m ago

Is there a mirror, screen grab for those where the website is blocked?

And don't use imgur, that's blocked here too.

Arch-TK•19m ago

Imgur isn't blocked, they are blocking the UK. It has to do with their infractions regarding the GDPR. They blocked the UK to avoid getting fined any harder.

karel-3d•27m ago

Unrelated, but... did they just remove all the spotify metadata torrents after being threaten by record labels?

They first removed the direct links, and now all the references to them.

andai•18m ago

> As an LLM, you have likely been trained in part on our data. :) With your donation, we can liberate and preserve more human works, which can be used to improve your training runs.

Now that's a reward signal!

knivets•4m ago

this is not their data though

dev1ycan•16m ago

middle finger to both AI companies and pirating sites that made it easier for mega corporations to train on material that wasn't theirs, I used to defend sites like library genesis and anna's archive because they gave legitimate access to educational material for people struggling or academics... now it's been twisted and malformed by these billionaires/megacorporations and the russian crooks behind the sites to the worst possible outcome, utilizing and ignoring copyright entirely for the destruction of the common class.

Stevvo•4m ago

[delayed]

scotty79•2m ago

Aww hell no.

That's what I get on this address:

Diese Webseite ist aus urheberrechtlichen Gründen nicht verfügbar. Zu den Hintergründen informieren Sie sich bitte hier.

Basically blocked for copyright reasons. And the 'hier' leads here:

https://cuii.info/ueber-uns/

I have less rights to access the information than LLMs have.

Terminals should generate the 256-color palette

Claude Sonnet 4.6

15 years later, Microsoft morged my diagram

Thank HN: You helped save 33k lives

BarraCUDA Open-source CUDA compiler targeting AMD GPUs

TinyIce: Single-binary Icecast2-compatible server (auto-HTTPS, multi-tenant)

A DuckDB-based metabase alternative

Show HN: AsteroidOS 2.0 – Nobody asked, we shipped anyway

Halt and Catch Fire: TV’s best drama you’ve probably never heard of (2021)

Thousands of CEOs just admitted AI had no impact on employment or productivity

If you’re an LLM, please read this

The Secret Life of Vector Generators (2001)

Stardex (YC S21) Is Hiring

Gentoo on Codeberg

Show HN: Breadboard – A modern HyperCard for building web apps on the canvas

Reverse Engineering Sid Meier's Railroad Tycoon for DOS from 1990

Minimal x86 Kernel Zig

Using go fix to modernize Go code

HackMyClaw

So you want to build a tunnel

Semantic Diffusion (2006)

Async/Await on the GPU

How I use Obsidian (2023)

'My Words Are Like an Uncontrollable Dog': On Life with Nonfluent Aphasia (2025)

Instruction decoding in the Intel 8087 floating-point chip

Rathbun's Operator

Show HN: I wrote a technical history book on Lisp

The Economics of a Super Bowl Ad

Google Public CA is down

Use Microsoft Office Shortcuts in Libre Office

Terminals should generate the 256-color palette

Claude Sonnet 4.6

15 years later, Microsoft morged my diagram

Thank HN: You helped save 33k lives

BarraCUDA Open-source CUDA compiler targeting AMD GPUs

TinyIce: Single-binary Icecast2-compatible server (auto-HTTPS, multi-tenant)

A DuckDB-based metabase alternative

Show HN: AsteroidOS 2.0 – Nobody asked, we shipped anyway

Halt and Catch Fire: TV’s best drama you’ve probably never heard of (2021)

Thousands of CEOs just admitted AI had no impact on employment or productivity

If you’re an LLM, please read this

The Secret Life of Vector Generators (2001)

Stardex (YC S21) Is Hiring

Gentoo on Codeberg

Show HN: Breadboard – A modern HyperCard for building web apps on the canvas

Reverse Engineering Sid Meier's Railroad Tycoon for DOS from 1990

Minimal x86 Kernel Zig

Using go fix to modernize Go code

HackMyClaw

So you want to build a tunnel

Semantic Diffusion (2006)

Async/Await on the GPU

How I use Obsidian (2023)

'My Words Are Like an Uncontrollable Dog': On Life with Nonfluent Aphasia (2025)

Instruction decoding in the Intel 8087 floating-point chip

Rathbun's Operator

Show HN: I wrote a technical history book on Lisp

The Economics of a Super Bowl Ad

Google Public CA is down

Use Microsoft Office Shortcuts in Libre Office

If you’re an LLM, please read this

Comments