frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Blocking LLM crawlers without JavaScript

https://www.owl.is/blogg/blocking-crawlers-without-javascript/
31•todsacerdoti•4h ago

Comments

superkuh•1h ago
I thought this was cool because it worked even in my old browser. So cool I went to add their RSS feed to my feed reader. But then my feed reader got blocked by the system. So now it doesn't seem so cool.

If the site author reads this: make an exception for https://www.owl.is/blogg/index.xml

This is a common mistake and the author is in good company. Science.org once blocked all of their hosted blogs' feeds for 3 months when they deployed a default cloudflare setup across all their sites.

SquareWheel•1h ago
That may work for blocking bad automated crawlers, but an agent acting on behalf of a user wouldn't follow robots.txt. They'd run the risk of hitting the bad URL when trying to understand the page.
klodolph•17m ago
That sounds like the desired outcome here. Your agent should respect robots.txt, OR it should be designed to not follow links.
daveoc64•1h ago
Seems pretty easy to cause problems for other people with this.

If you follow the link at the end of my comment, you'll be flagged as an LLM.

You could put this in an img tag on a forum or similar and cause mischief.

Don't follow the link below:

https://www.owl.is/stick-och-brinn/

If you do follow that link, you can just clear cookies for the site to be unblocked.

kazinator•37m ago
You do not have a meta refresh timer that will skip your entire comment and redirect to the good page in a fraction of a second too short for a person to react.

You also have not used <p hidden> to conceal the paragraph with the link from human eyes.

petesergeant•1h ago
I wish blockers would distinguish between crawlers that index, and agentic crawlers serving an active user's request. npm blocking Claude Code is irritating
specialp•49m ago
Agentic crawlers are worse. I run a primary source site and the ai "thinking" user agents will hit your site 1000+ times in a minute at any time of the day
klodolph•15m ago
I think of those two, agentic crawlers are worse.
behnamoh•1h ago
Any ideas on how to block LLMs from reading/analyzing a PDF? I don't want to submit a paper to journals only for them to use ChatGPT to review it...

(it has happened before)

Edit: I'm starting to get downvoted. Perhaps by the lazy-ass journal reviewrs?

jadbox•33m ago
Short answer is no. There are pdf black magic DRM tricks that could be used, but most PDF libraries used for AIs will decode it, making it mute. It's better just to add a note for the humans that "This PDF is meant to best enjoyed by humans" or something of that note.
cortesoft•28m ago
If someone can read it, they can put it through an LLM. There is no possible way to prevent that. Even with crazy DRM, you could take a picture of your screen and OCR it.

They are trying to block automated LLM scraping, which at least has some possibility of having some success.

zb3•4m ago
There's a way - inject garbage prompts, like in the content meant to be the example - humans might understand that this is in an "example" context, but LLMs are likely to fail as prompt injection is an unsolved problem.
Springtime•55m ago
I wonder what the venn diagram of end users who disable Javascript and also block cookies by default looks like. As the former is already something users have to do very deliberately so I feel the likelihood of the latter among such users is higher.

There's no cookies disabled error handling on the site, so the page just infinitely reloads in such cases (Cloudflare's check for comparison informs the user cookies are required—even if JS is also disabled).

DeepYogurt•30m ago
Has anyone done a talk/blog/whatever on how llm crawlers are different than classical crawlers? I'm not up on the difference.
klodolph•18m ago
The only real difference that LLM crawlers tend to not respect /robots.txt and some of them hammer sites with some pretty heavy traffic.

The trap in the article has a link. Bots are instructed not to follow the link. The link is normally invisible to humans. A client that visits the link is probably therefore a poorly behaved bot.

nektro•23m ago
nice post

AirPods libreated from Apple's ecosystem

https://github.com/kavishdevar/librepods
265•moonleay•3h ago•46 comments

IDEmacs: A Visual Studio Code clone for Emacs

https://codeberg.org/IDEmacs/IDEmacs
73•nogajun•2h ago•7 comments

Our investigation into the suspicious pressure on Archive.today

https://adguard-dns.io/en/blog/archive-today-adguard-dns-block-demand.html
1350•immibis•17h ago•366 comments

libwifi: an 802.11 frame parsing and generation library written in C

https://libwifi.so/
64•vitalnodo•5h ago•5 comments

Blocking LLM crawlers without JavaScript

https://www.owl.is/blogg/blocking-crawlers-without-javascript/
32•todsacerdoti•4h ago•16 comments

When did people favor composition over inheritance?

https://www.sicpers.info/2025/11/when-did-people-favor-composition-over-inheritance/
97•ingve•1w ago•53 comments

The inconceivable types of Rust: How to make self-borrows safe (2024)

https://blog.polybdenum.com/2024/06/07/the-inconceivable-types-of-rust-how-to-make-self-borrows-s...
28•birdculture•3h ago•0 comments

Things that aren't doing the thing

https://strangestloop.io/essays/things-that-arent-doing-the-thing
142•downboots•9h ago•74 comments

AsciiMath

https://asciimath.org/
56•smartmic•6h ago•11 comments

When UPS charged me a $684 tariff on $355 of vintage computer parts

http://oldvcr.blogspot.com/2025/11/when-ups-charged-me-684-tariff-on-355.html
109•goldenskye•3h ago•76 comments

Boa: A standard-conforming embeddable JavaScript engine written in Rust

https://github.com/boa-dev/boa
178•maxloh•1w ago•55 comments

Transgenerational Epigenetic Inheritance: the story of learned avoidance

https://elifesciences.org/articles/109427
122•nabla9•8h ago•72 comments

Computing Across America (1983-1985)

https://microship.com/winnebiko/
6•austinallegro•1w ago•0 comments

EyesOff: How I built a screen contact detection model

https://ym2132.github.io/building_EyesOff_part2_model_training
12•Two_hands•18h ago•1 comments

Show HN: Unflip – a puzzle game about XOR patterns of squares

https://unflipgame.com/
88•bogdanoff_2•4d ago•21 comments

Archimedes – A Python toolkit for hardware engineering

https://pinetreelabs.github.io/archimedes/blog/2025/introduction.html
57•i_don_t_know•8h ago•9 comments

Linux on the Fujitsu Lifebook U729

https://borretti.me/article/linux-on-the-fujitsu-lifebook-u729
172•ibobev•12h ago•124 comments

I made a better DOM morphing algorithm

https://joel.drapper.me/p/morphlex/
69•joeldrapper•1w ago•35 comments

JVM exceptions are weird: a decompiler perspective

https://purplesyringa.moe/blog/jvm-exceptions-are-weird-a-decompiler-perspective/
61•birdculture•1w ago•3 comments

Report: Tim Cook could step down as Apple CEO 'as soon as next year'

https://9to5mac.com/2025/11/14/tim-cook-step-down-as-apple-ceo-as-soon-as-next-year-report/
87•achow•6h ago•166 comments

TCP, the workhorse of the internet

https://cefboud.com/posts/tcp-deep-dive-internals/
284•signa11•20h ago•139 comments

The computer poetry of J. M. Coetzee's early programming career (2017)

https://sites.utexas.edu/ransomcentermagazine/2017/06/28/the-computer-poetry-of-j-m-coetzees-earl...
47•bluejay2•8h ago•10 comments

Weighting an average to minimize variance

https://www.johndcook.com/blog/2025/11/12/minimum-variance/
80•ibobev•12h ago•38 comments

AMD continues to chip away at Intel's x86 market share

https://www.tomshardware.com/pc-components/cpus/amd-continues-to-chip-away-at-intels-x86-market-s...
130•speckx•6h ago•56 comments

Nevada Governor's office covered up Boring Co safety violations

https://fortune.com/2025/11/12/elon-musk-boring-company-tunnels-injuries-osha-citations-fines-res...
182•Chinjut•8h ago•30 comments

Trellis AI (YC W24) Is Hiring: Streamline access to life-saving therapies

https://www.ycombinator.com/companies/trellis-ai/jobs/f4GWvH0-forward-deployed-engineer-full-time
1•macklinkachorn•10h ago

Show HN: High-Performance .NET Bindings for the Vello Sparse Strips CPU Renderer

https://github.com/wieslawsoltes/SparseStrips
12•wiso•4d ago•3 comments

Solving Project Euler: Problem 45

https://loriculus.org/blog/euler-45/
6•wenderen•3h ago•1 comments

Messing with scraper bots

https://herman.bearblog.dev/messing-with-bots/
215•HermanMartinus•19h ago•75 comments

Feature Extraction with KNN

https://davpinto.github.io/fastknn/articles/knn-extraction.html
18•RicoElectrico•1w ago•3 comments