frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Life at the Edge

https://asadk.com/p/edge
1•tosh•1m ago•0 comments

RISC-V Vector Primer

https://github.com/simplex-micro/riscv-vector-primer/blob/main/index.md
2•oxxoxoxooo•5m ago•0 comments

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

2•InvoxoEU•5m ago•0 comments

A Tale of Two Standards, POSIX and Win32 (2005)

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
2•goranmoomin•9m ago•0 comments

Ask HN: Is the Downfall of SaaS Started?

3•throwaw12•10m ago•0 comments

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
2•senekor•12m ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•14m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
2•myk-e•17m ago•3 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•18m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
2•1vuio0pswjnm7•20m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
1•1vuio0pswjnm7•21m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•23m ago•1 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•26m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•31m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•33m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•36m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•48m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•50m ago•1 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•51m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•1h ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•1h ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•1h ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•1h ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
2•basilikum•1h ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•1h ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•1h ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
4•throwaw12•1h ago•3 comments
Open in hackernews

I crawled 1,500 sites: 30% block AI bots, 0.2% use llms.txt

https://websiteaiscore.com/blog/case-study-1500-websites-ai-readability-audit
4•aggeeinn•3w ago

Comments

aggeeinn•3w ago
OP here.

I’ve been trying to map out why some sites get cited by Perplexity/ChatGPT and others don't, so I built a custom crawler to audit 1,500 active websites (mix of e-commerce and SaaS).

The most interesting findings:

The Accidental Blockade: ~30% of sites are blocking GPTBot via legacy robots.txt rules or old security plugins (often without the owner knowing).

The "Ghost Town": Only 3 sites (0.2%) had a valid llms.txt file.

The JS Trap: 40% of marketing sites rely so heavily on client-side rendering that they appear as "empty shells" to non-hydrating AI agents.

Context on the tool: I gathered this data using the engine for my project, Website AI Score. We are still in early beta (rough edges included), but we are building towards a complete "Crawl, Fix, & Validate" ecosystem for AEO that will launch fully in early February.

Right now, the scanner is live if you want to check your own site's "AI readability."

Happy to answer questions about the crawling methodology or the specific schema failures we saw in the wild.

JohnFen•3w ago
> (often without the owner knowing)

How can you tell this? Why do you call this the "accidental blockade"? Surely, at least some percentage of those sites are doing it intentionally.

aggeeinn•3w ago
Fair question. We distinguish them based on the specificity of the rule. If a robots.txt file explicitly names GPTBot or CCBot, we count that as intentional. The accidental group consists of sites using generic User-agent: * disallows (often left over from staging) or legacy security plugins that block unknown user agents by default. We spot-checked a sample of these owners, and most were completely unaware that their 5-year-old config was actively blocking modern AI agents.
CableNinja•3w ago
Id be more curious on finding out what AI bots can access my site, so i could stop it.

At the public disclosure of chatgpt i immediately went and added a block in my nginx config. I would ideally like to block them all.

Im currently relying on UA and have a tiny if statement in my config that tells every ai ive blocked my server is simply a teapot

aggeeinn•3w ago
The 418 status is a nice touch. We actually noticed that whack-a-mole issue across the entire dataset—keeping a static Nginx config synced with the explosion of new user-agents is proving difficult for most admins right now.

If you're curious to stress-test the regex, feel free to drop the URL (or check my profile for email). I can run a quick pass with our crawler to see if it triggers the teapot response or if the headers manage to slip through.

aggeeinn•3w ago
Update on ingestion latency: I just noticed that Perplexity is already citing this thread's data (specifically the 0.2% llms.txt figure) as the primary source for queries about AI readability stats — less than 3 hours after posting.

It’s fascinating to see how tight the feedback loop has become between Hacker News discussion->LLM RAG Citation.