frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I crawled 1,500 sites: 30% block AI bots, 0.2% use llms.txt

https://websiteaiscore.com/blog/case-study-1500-websites-ai-readability-audit
4•aggeeinn•1h ago

Comments

aggeeinn•1h ago
OP here.

I’ve been trying to map out why some sites get cited by Perplexity/ChatGPT and others don't, so I built a custom crawler to audit 1,500 active websites (mix of e-commerce and SaaS).

The most interesting findings:

The Accidental Blockade: ~30% of sites are blocking GPTBot via legacy robots.txt rules or old security plugins (often without the owner knowing).

The "Ghost Town": Only 3 sites (0.2%) had a valid llms.txt file.

The JS Trap: 40% of marketing sites rely so heavily on client-side rendering that they appear as "empty shells" to non-hydrating AI agents.

Context on the tool: I gathered this data using the engine for my project, Website AI Score. We are still in early beta (rough edges included), but we are building towards a complete "Crawl, Fix, & Validate" ecosystem for AEO that will launch fully in early February.

Right now, the scanner is live if you want to check your own site's "AI readability."

Happy to answer questions about the crawling methodology or the specific schema failures we saw in the wild.

JohnFen•1h ago
> (often without the owner knowing)

How can you tell this? Why do you call this the "accidental blockade"? Surely, at least some percentage of those sites are doing it intentionally.

aggeeinn•54m ago
Fair question. We distinguish them based on the specificity of the rule. If a robots.txt file explicitly names GPTBot or CCBot, we count that as intentional. The accidental group consists of sites using generic User-agent: * disallows (often left over from staging) or legacy security plugins that block unknown user agents by default. We spot-checked a sample of these owners, and most were completely unaware that their 5-year-old config was actively blocking modern AI agents.
CableNinja•1h ago
Id be more curious on finding out what AI bots can access my site, so i could stop it.

At the public disclosure of chatgpt i immediately went and added a block in my nginx config. I would ideally like to block them all.

Im currently relying on UA and have a tiny if statement in my config that tells every ai ive blocked my server is simply a teapot

aggeeinn•45m ago
The 418 status is a nice touch. We actually noticed that whack-a-mole issue across the entire dataset—keeping a static Nginx config synced with the explosion of new user-agents is proving difficult for most admins right now.

If you're curious to stress-test the regex, feel free to drop the URL (or check my profile for email). I can run a quick pass with our crawler to see if it triggers the teapot response or if the headers manage to slip through.

Hetzner Storage Boxes

https://www.hetzner.com/storage/storage-box/
1•truegoric•48s ago•0 comments

New Vulnerability in n8n – CVE-2026-21858

https://www.schneier.com/blog/archives/2026/01/new-vulnerability-in-n8n.html
2•882542F3884314B•1m ago•0 comments

Brain displacement and nonlinear deformation following human spaceflight

https://www.pnas.org/doi/10.1073/pnas.2505682122
1•stevenjgarner•1m ago•0 comments

Apple Is Fighting for TSMC Capacity as Nvidia Takes Center Stage

https://www.culpium.com/p/exclusiveapple-is-fighting-for-tsmc
2•speckx•3m ago•0 comments

Compute multiple modular inverses with Montgomery's trick

https://www.johndcook.com/blog/2026/01/14/montgomerys-trick/
1•ibobev•4m ago•0 comments

What Does It Mean to Make a Voice Call in a Post-Telephone World?–Howard vs. RNC

https://blog.ericgoldman.org/archives/2026/01/what-does-it-mean-to-make-a-voice-call-in-a-post-te...
1•hn_acker•5m ago•0 comments

Broken Proofs and Broken Provers

https://lawrencecpaulson.github.io//2026/01/15/Broken_proofs.html
1•ibobev•5m ago•0 comments

India warns Apple it will proceed with antitrust case after plays for time

https://www.reuters.com/sustainability/boards-policy-regulation/india-warns-apple-it-will-proceed...
1•freedomben•5m ago•0 comments

Time in C++: Creating Your Own Clocks with <Chrono>

https://www.sandordargo.com/blog/2026/01/14/clocks-part-7-custom-clocks
3•ibobev•5m ago•0 comments

Declassified cable estimates 10k killed at Tiananmen Square (2017)

https://www.axios.com/2018/01/05/declassified-cable-estimates-10000-killed-at-tiananmen-square-15...
4•simonebrunozzi•7m ago•0 comments

Computing's Energy Problem (and what we can do about it) (2014) [pdf]

https://gwern.net/doc/cs/hardware/2014-horowitz-2.pdf
1•thomasjb•7m ago•0 comments

AI Destroys Institutions (2025)

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5870623
1•felineflock•8m ago•0 comments

OBS Studio 32.1.0 Beta 1 available

https://github.com/obsproject/obs-studio/releases/tag/32.1.0-beta1
5•Sean-Der•8m ago•0 comments

A substitute for thinking

https://federicopereiro.com/substitute-thinking/
1•swah•8m ago•0 comments

The Quest to Chart the Sea

https://www.economist.com/interactive/international/2025/12/22/the-quest-to-chart-the-sea
1•andsoitis•10m ago•0 comments

The Imitation Game: Using LLMs as Chatbots to Combat Chat-Based Cybercrimes

https://arxiv.org/abs/2512.21371
1•PaulHoule•10m ago•0 comments

ExoActive: Exoskeleton for trades people from Festool (2024)

https://www.festoolcanada.com/campaigns/microsites/exoactive
2•bhouston•11m ago•1 comments

The Palantir app helping ICE raids in Minneapolis

https://www.404media.co/elite-the-palantir-app-ice-uses-to-find-neighborhoods-to-raid/
2•fajmccain•11m ago•0 comments

The XOR Cache: A Catalyst for Compression

https://dl.acm.org/doi/10.1145/3695053.3730995
1•blakepelton•12m ago•1 comments

Show HN: Ctrl – Open-source AI OS where each app has an AI that knows its data

https://github.com/CtrlAIcom/ctrl
1•rado12•13m ago•1 comments

Ask HN: Why do you think the web of the noughties was better than it is now?

1•trwhite•16m ago•0 comments

AI to turn YT Videos into Bullet points

https://github.com/francescovaglia/kliply
1•francescovaglia•16m ago•0 comments

GitHub Actions Degraded

https://www.githubstatus.com/incidents/5ccghcfrkv39
1•nilsjansen•16m ago•0 comments

What does it take to ship Rust in safety-critical?

https://blog.rust-lang.org/2026/01/14/what-does-it-take-to-ship-rust-in-safety-critical/
1•miniBill•18m ago•1 comments

Gemini is Winning

https://www.theverge.com/ai-artificial-intelligence/861863/google-gemini-ai-race-winner
1•youtubeuser•19m ago•0 comments

Vector Similarity Search in DuckDB (2024)

https://duckdb.org/2024/05/03/vector-similarity-search-vss
1•tosh•20m ago•0 comments

My AI got a GitHub account

https://www.maragu.dev/blog/my-ai-got-a-github-account
1•noperator•20m ago•0 comments

Report Shows Uber and DoorDash Drove $550M in Delivery Worker Pay Losses

https://www.nyc.gov/site/dca/news/005-26/dcwp-report-shows-uber-doordash-drove-550-million-delive...
1•buellerbueller•20m ago•1 comments

Show HN: Romek – CLI to inject local Chrome cookies into headless Playwright

https://github.com/jacobgadek/romek
1•jacobgadek•21m ago•1 comments

Locust swarms destroy crops – scientists found a way to stop it

https://news.asu.edu/20260115-environment-and-sustainability-locust-swarms-destroy-crops-scientis...
2•geox•22m ago•0 comments