frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

P2P crypto exchange development company

1•sonniya•6m ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
1•jesperordrup•10m ago•0 comments

Write for Your Readers Even If They Are Agents

https://commonsware.com/blog/2026/02/06/write-for-your-readers-even-if-they-are-agents.html
1•ingve•11m ago•0 comments

Knowledge-Creating LLMs

https://tecunningham.github.io/posts/2026-01-29-knowledge-creating-llms.html
1•salkahfi•12m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•18m ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•26m ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
4•keepamovin•27m ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•29m ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•32m ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•32m ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
2•imthepk•37m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•38m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•38m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•41m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
3•breve•42m ago•1 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•45m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•47m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•50m ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•51m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
6•tempodox•51m ago•3 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•55m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•58m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
8•petethomas•1h ago•3 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•1h ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•1h ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
3•init0•1h ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•1h ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
2•fkdk•1h ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
3•ukuina•1h ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments
Open in hackernews

I crawled 1,500 sites: 30% block AI bots, 0.2% use llms.txt

https://websiteaiscore.com/blog/case-study-1500-websites-ai-readability-audit
4•aggeeinn•3w ago

Comments

aggeeinn•3w ago
OP here.

I’ve been trying to map out why some sites get cited by Perplexity/ChatGPT and others don't, so I built a custom crawler to audit 1,500 active websites (mix of e-commerce and SaaS).

The most interesting findings:

The Accidental Blockade: ~30% of sites are blocking GPTBot via legacy robots.txt rules or old security plugins (often without the owner knowing).

The "Ghost Town": Only 3 sites (0.2%) had a valid llms.txt file.

The JS Trap: 40% of marketing sites rely so heavily on client-side rendering that they appear as "empty shells" to non-hydrating AI agents.

Context on the tool: I gathered this data using the engine for my project, Website AI Score. We are still in early beta (rough edges included), but we are building towards a complete "Crawl, Fix, & Validate" ecosystem for AEO that will launch fully in early February.

Right now, the scanner is live if you want to check your own site's "AI readability."

Happy to answer questions about the crawling methodology or the specific schema failures we saw in the wild.

JohnFen•3w ago
> (often without the owner knowing)

How can you tell this? Why do you call this the "accidental blockade"? Surely, at least some percentage of those sites are doing it intentionally.

aggeeinn•3w ago
Fair question. We distinguish them based on the specificity of the rule. If a robots.txt file explicitly names GPTBot or CCBot, we count that as intentional. The accidental group consists of sites using generic User-agent: * disallows (often left over from staging) or legacy security plugins that block unknown user agents by default. We spot-checked a sample of these owners, and most were completely unaware that their 5-year-old config was actively blocking modern AI agents.
CableNinja•3w ago
Id be more curious on finding out what AI bots can access my site, so i could stop it.

At the public disclosure of chatgpt i immediately went and added a block in my nginx config. I would ideally like to block them all.

Im currently relying on UA and have a tiny if statement in my config that tells every ai ive blocked my server is simply a teapot

aggeeinn•3w ago
The 418 status is a nice touch. We actually noticed that whack-a-mole issue across the entire dataset—keeping a static Nginx config synced with the explosion of new user-agents is proving difficult for most admins right now.

If you're curious to stress-test the regex, feel free to drop the URL (or check my profile for email). I can run a quick pass with our crawler to see if it triggers the teapot response or if the headers manage to slip through.

aggeeinn•3w ago
Update on ingestion latency: I just noticed that Perplexity is already citing this thread's data (specifically the 0.2% llms.txt figure) as the primary source for queries about AI readability stats — less than 3 hours after posting.

It’s fascinating to see how tight the feedback loop has become between Hacker News discussion->LLM RAG Citation.