frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Blocking LLMs from your website cuts you off from next-generation search

https://johnjianwang.medium.com/why-blocking-llms-from-your-website-is-dumb-3dc7c3c9097d
26•johnjwang•2h ago

Comments

riffraff•1h ago
> LLMs are the next generation’s search layer. They’re already generating massive amounts of pipeline for the companies and websites that have gotten good at getting their content displayed in LLMs

[citation needed]

cpursley•1h ago
Just check your analytics dashboards and see where hits are now starting to come from. Saw on LinkedIn the other day that in the space I serve that a new customer found them via ChatGPT.
righthand•1h ago
It’s not dumb because Googlebot follows the robots.txt rules. That is the sincere crux of it all. No one is going to casually open their site up to Llms that are blatantly scraping their site to then use their information to displace them.

Not blocking violent, bad-actor scrapers is dumb. Letting through bad-actor scrapers because a bunch of rich people want to make it the norm is dumb.

Llms are not directing traffic to the sites and that is the tradeoff that site owners allow with Googlebot. Even if Perplexity or Claude will provide a source, the Llm user is most likely not asking/clicking for it 99% of the time.

nerdjon•1h ago
That is basically several paragraphs to just say "well you should just adapt to the new world instead of pushing against bad practices". There is barely any "why" actually said here.

We just had the article about how AI search is leading to less clicks, so where is that supposed "pipeline"?

Also completely ignores how you may not want your information to be misconstrued (lied basically) to the user with a helpful link telling them where the source is, but they may never click through. And worse if they know that the information being told to them is wrong, they may then think it was because your site was wrong and trust you less, all without ever clicking that link.

pryelluw•1h ago
“Providing high quality content that LLMs will actually cite is the new game in town.”

That is not my job nor is it my goal. These companies are taking my work, repurposing it, and selling it under the assumption that because they can access it they can sell it.

Maybe the OP should leave their house door open so people can come in and use his couch. The new game in town is to let other people use your couch.

The mental gymnastics in this post qualify for the Special Olympics.

jerf•1h ago
This post gets the reason why people are cutting off LLMs exactly backwards and consequently completely fails to address the core issue. The whole reason people are blocking LLMs is precisely that they believe it kills the flow of readers to your content. The LLMs present your ideas and content, maybe with super-tiny attribution that nobody notices or uses [1], maybe with no attribution at all, and you get nothing. People are blocking LLMs with the precise intent of trying to preserve the flow to their content, be it commercially, reputationally, whatever.

[1]: https://www.pewresearch.org/short-reads/2025/07/22/google-us...

jkingsman•1h ago
> how many of you wouldn’t hook up your website to Google?

If there was a paid-only search engine with dubious ethics practices that was overwhelming my site with traffic in order resell search trained off of (among other things) my personally generated content, I would absolute block it.

LLMs are not search engines, and I'm not gaining any followers or customers in any meaningful way because an LLM indexes my site.

> it also cuts you off from the fastest-growing distribution channel on the web.

I haven't seen the needle tip at all in my acquisition channels from LLMs. Unless you're a household name or very large, LLMs aren't going to shill for your business.

> most LLMs have an agentic web-search component that will actively generate links

Totally. Which is why I don't care if the LLMs index it. Let web content search be good, and lead LLMs to good content; product placement in LLM weights ain't what I'm gonna optimize for, or even permit, if it comes at a cost to me and my infra.

vb-8448•1h ago
> LLMs are not search engines, and I'm not gaining any followers or customers in any meaningful way because an LLM indexes my site.

^^^^

This

For the moment, and for the foreseeable future, you are just giving your content for free (and have to pay the hosting bill).

jkingsman•12m ago
And the freeness cuts both ways — if I could, I'd happily open my content to Mistral and all the other totally-free/open-source-releasing LLM companies' scrapers. But I can't; they're going into big corpuses or scraped directly by the commercial actors with funds to scrape the whole kit & kaboodle.
caseyohara•11m ago
> LLMs are not search engines, and I'm not gaining any followers or customers in any meaningful way because an LLM indexes my site.

Counterpoint: my wife owns an accounting firm and publishes a lot of highly valuable informational content on their website's blog. Stuff like sales tax policies and rates in certain states, accounting/payroll best practices articles, etc. I guess you could call it "content marketing".

Lately they have been getting highly qualified leads coming from LLMs that cite her website's content when answering questions like "What is the sales tax nexus policy in California?". Users presumably follow the citation and then engage with the website, eventually becoming a very warm lead.

So LLMs are obviously not search engines in the conventional sense, but it doesn't mean they are not useful at generating valuable traffic to your marketing website.

bellBivDinesh•1h ago
Incredibly simplistic. I’m having a hard time believing a real person wrote this, read it over and decided they had made anything resembling a point.

How about the fact that Google (ideally) sends users to you rather than sharing your work unattributed?

calyth2018•1h ago
Even if we take the argument at face value, we should allow LLMs to train their models for free, on the backs of real people's work, just so that there's a chance that they actually improved well enough to replace humans, all that just to have a temporary boost on search discovery of our content.

Not to mention LLMs still spew a lot of badly wrong results (no I will not anthropomorphize the models, they're not ready yet).

This is one heck of a poison chalice. 王先生,你願意喝這杯鶴酒嗎?

ryandrake•1h ago
Like everything else on the web, LLMs are going to eventually be ruined by marketing teams trying to get them to say "Pepsi" instead of "Coke."
tartoran•20m ago
Long live local LLMs!
lambdadelirium•1h ago
Stupid bait post
mflaherty22•1h ago
Very reductionist - so much so that I'm not even sure you understand why websites block LLMs.
ashwinsundar•1h ago

    But how many of you wouldn’t hook up your website to Google?
Me. https://ashwinsundar.com/robots.txt

Your computer doesn't have the right to scrape what I say or do anything with it.

    I know one of the primary reasons that I do anything online is to provide an outlet for someone else to see it. If I didn’t want someone else to see it, I’d write it down on my notebook, not on the public web.
Sounds like the same schpiel from the anti-privacy advocates who think that we should all expose everything we're doing because "you should have nothing to hide".

https://archive.is/WjbcU

This article was written for Wired by Moxie Marlinspike in 2013, who went on to later develop the Signal protocol.

I don't want my thoughts or ideas spread across the web promiscuously. The things I say publicly are curated and full of context. That's why I have my own website, and don't post elsewhere.

I'm not playing the same game you are, which appears to be to post liberally and have loose thoughts to maximize "reach".

JSR_FDED•1h ago
Nonsensical article. Even if your goal is to create something on the web “for others” (as the article asserts), when 99.9% of your costs go to serving LLM crawlers, it puts that very objective at risk.
frozenseven•58m ago
Why was this flagged? A difference in opinion is no excuse for censorship.
debugnik•5m ago
[delayed]
politelemon•47m ago
I never managed to get far on this post due to the obnoxious pop-ups. Perhaps blocking humans from reading your posts is ok.
andreagrandi•27m ago
Another one not having a clue about what “consent” means. Next?
watwut•10m ago
The whole thing about LLM is training on content other people created, redirecting that traffic to you and ultimately earn money on it. The whole thing about LLM being pushed everywhere is to get free training data too.
merelysounds•9m ago
> most LLMs have an agentic web-search component that will actively generate links

I guess that’s the problem - search being only a component.

Is the possible search traffic worth having your content become part of an LLM’s training set and possibly used elsewhere?

I guess the answer depends on the content and the website’s business model.

skwee357•2m ago
I'm somewhat torn on this one.

As an amateur blogger, I would not like LLMs to "steal" my content, display the users the needed pieces they are looking for, while leaving me with zero visitors. The reason I write is to convey a particular message, which the meaning of gets lost, or worse communicated wrongly, due to LLMs.

As an online business owner, I do see both ChatGPT and Perplexity as referrers to my business, meaning that potential customers ask LLM a question/service recommendation, and LLM is directing them to my service, and I would not like to lose this vertical of organic customer acquisition.

---

On a completely different note, medium should die as a platform, together with substack. The amount of intrusive popups, "install our app" bars, and paywalls is just insane. Bloggers, especially technically savvy ones, should be able to host their own blog.

Toonopedia

https://www.toonopedia.com/
1•bookofjoe•4m ago•0 comments

Litestar Is Worth a Look

https://www.b-list.org/weblog/2025/aug/06/litestar/
2•todsacerdoti•5m ago•0 comments

2025.8: The Summer of AI

https://www.home-assistant.io/blog/2025/08/06/release-20258/
1•balloob•6m ago•0 comments

Why it would be utter madness to stop funding mRNA vaccine technology

https://www.newscientist.com/article/2473180-why-it-would-be-utter-madness-to-stop-funding-mrna-vaccine-technology/
8•billybuckwheat•8m ago•0 comments

Agree/Disagree: You can only ship as fast as you can test?

1•dmitrycube•8m ago•0 comments

We'd be Better Off with 9-bit Bytes

https://pavpanchekha.com/blog/9bit.html
1•luu•8m ago•0 comments

Google search boss says AI isn't killing search clicks

https://arstechnica.com/google/2025/08/google-search-boss-says-ai-isnt-killing-search-clicks/
1•rntn•9m ago•0 comments

How and Why to Ditch GitHub

https://taggart-tech.com/migrate-to-codeberg/
1•rasso•9m ago•0 comments

I spent 80% of my time planning and 20% coding with AI tools

2•cgvas•10m ago•0 comments

Testing PowerSync with Jepsen for Causal Consistency and More

https://github.com/nurturenature/jepsen-powersync
1•kobieps•11m ago•0 comments

We Built Exactly-Once Delivery Without Checkpoints or Latency Penalties

https://blog.epsiolabs.com/exactly-once-semantics-without-checkpoints
1•rnmmrnm•11m ago•0 comments

Implementing Viewstamped Replication protocol (2023)

https://distributed-computing-musings.com/2023/10/implementing-viewstamped-replication-protocol/
1•eatonphil•12m ago•0 comments

Received a Mysterious Package with a QR Code? Don't Scan It

https://www.pcmag.com/news/received-a-mysterious-package-with-a-qr-code-dont-scan-it
5•gnabgib•14m ago•0 comments

Getting Started with Cloudflare Tunnels

https://www.davidma.co/blog/2025-08-06-cloudflare-tunnel/
1•taikon•17m ago•0 comments

Ask HN: Does simulation theory invalidate its own evidence?

1•Jimmc414•18m ago•0 comments

Quad bike deaths have reduced since Australian safety standards changed in 2019

https://medicalxpress.com/news/2025-07-quad-bike-deaths-australian-safety.html
1•PaulHoule•21m ago•0 comments

A Survey of Context Engineering for Large Language Models

https://arxiv.org/abs/2507.13334
1•Anon84•22m ago•0 comments

Pac-Man changed gaming – and the world

https://www.bbc.com/culture/article/20250730-how-pac-man-changed-gaming-and-the-world
1•bookofjoe•24m ago•0 comments

UK Royal Society adopts 'subscribe to open' publishing model

https://www.nature.com/articles/d41586-025-02483-0
2•gnabgib•28m ago•0 comments

Squigly.link – Universal Music Links

https://squigly.link
1•LiamMac•28m ago•3 comments

There is only one agent in August 2025

https://backnotprop.substack.com/p/there-is-only-one-agent-in-august
2•ramoz•29m ago•0 comments

Is Economics education fit for the 21st Century?

https://rethinkeconomics.org/resources/educational-material/is-economics-education-fit-for-the-21st-century-executive-summary/
2•pramodbiligiri•31m ago•0 comments

Tech does not deserve special legal treatment

https://www.ft.com/content/6a2826fc-2bc8-4f1a-bb37-143b464090d0
10•petethomas•32m ago•2 comments

Show HN: George, a Slack-native assistant for teams that hate ticketing

https://towerapp.ai/george
1•mehdig10•32m ago•0 comments

19% of California houses are owned by investors

https://www.ocregister.com/2025/07/21/19-of-california-houses-are-owned-by-investors/
22•milleramp•32m ago•7 comments

Privacy, Code, and the Future

3•rasengan•34m ago•0 comments

Tornado Cash Developer Roman Storm Guilty on One Count in Federal Crypto Case

https://www.wired.com/story/tornado-cash-developer-roman-storm-guilty-on-one-count-in-federal-crypto-case/
6•pain_perdu•40m ago•1 comments

Voodoo 2 on a RPi 5: A Tale of Two Generations [video]

https://www.youtube.com/watch?v=mkg7lGjVckI
2•attah_•42m ago•0 comments

68.1M Amazon Affiliate links. See what products are trending

https://affiliate-tracking.com/
2•mariusjor•44m ago•0 comments

Nintendo of America suggested giving Pikachu large breasts

https://automaton-media.com/en/news/nintendo-of-america-really-did-suggest-giving-pikachu-large-feminine-breasts-at-one-point-japanese-media-confirms/
2•HelloUsername•44m ago•1 comments