frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built an SDK that scrambles HTML so scrapers get garbage

https://www.obscrd.dev/
14•larsmosr•2h ago
Hey HN -- I'm a solo dev. Built this because I got tired of AI crawlers reading my HTML in plain text while robots.txt did nothing.

The core trick: shuffle characters and words in your HTML using a seed, then use CSS (flexbox order, direction: rtl, unicode-bidi) to put them back visually. Browser renders perfectly. textContent returns garbage.

On top of that: email/phone RTL obfuscation with decoy characters, AI honeypots that inject prompt instructions into LLM scrapers, clipboard interception, canvas-based image rendering (no img src in DOM), robots.txt blocking 30+ AI crawlers, and forensic breadcrumbs to prove content theft.

What it doesn't stop: headless browsers that execute CSS, screenshot+OCR, or anyone determined enough to reverse-engineer the ordering. I put this in the README's threat model because I'd rather say it myself than have someone else say it for me. The realistic goal is raising the cost of scraping -- most bots use simple HTTP requests, and we make that useless.

TypeScript, Bun, tsup, React 18+. 162 tests. MIT licensed. Nothing to sell -- the SDK is free and complete.

Best way to understand it: open DevTools on the site and inspect the text.

GitHub: https://github.com/obscrd/obscrd

Comments

mystraline•2h ago
This is also what Facebook does.

Same result: screen readers and assistive software is rendered useless. Basically is a sign of "I hate disabled people, and AI too"

larsmosr•2h ago
Fair concern. obscrd actually preserves screen reader access. CSS flexbox order is a visual reordering property, so assistive tech follows the visual order and reads the text correctly. Contact components use sr-only spans with clean text and aria-hidden on the obfuscated layer. We target WCAG 2.2 AA compliance.

Happy to have a11y experts poke at it and point out gaps.

PaulHoule•1h ago
Accessibility APIs have long been the royal road to automation. If scrapers were well-written they'd be using this already, but of course if scrapers were well-written they would scrape your site and you'd never notice.
lich_king•2h ago
You break highlighting and copy-and-paste. If I want to share or comment on a piece of your website... I can't. I guess this can be a "feature" in some rare cases, but a major usability pain otherwise.

I'm not a fan of all the documentation and marketing content for this project evidently being AI-generated because I don't know which parts of it are the things you believe and designed for, and which are just LLM verbal diarrhea. For example, your GitHub threat model says this stops "AI training crawlers (GPTBot, ClaudeBot, CCBot, etc.)" - is this something you've actually confirmed, or just something that AI thinks is true? I don't know how their scrapers work; I'd assume they use headless browsers.

larsmosr•1h ago
Copy-paste breaking is intentional for protected content but it's opt-in per component, not whole-site.

On the AI docs concern, fair point. To answer directly: I've confirmed the obfuscation defeats any scraper reading raw HTML via HTTP requests. Whether GPTBot or ClaudeBot use headless browsers internally, I honestly don't know. The README threat model lists headless browsers under "what it does NOT stop" for that reason.

larsmosr•1h ago
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3;

Official OpenAI documentation: https://platform.openai.com/docs/gptbot

dwa3592•2h ago
Nice. I have been working on something which utilizes obfuscation, honeypots etc and I have come to a few realizations-

- today you don't have to be a dedicated/motivated reverse engineer- you just need Sonnet 4.6 and let it do the work.

- you need to throw constant/new gotchas to LLMs to keep them on their tows while they try to reverse engineer your website.

larsmosr•1h ago
The bar for reverse engineering dropped to "paste the HTML into Claude and ask it to decode." That's partly why the v2 roadmap moves toward techniques where the readable text never exists in the DOM at all. Static obfuscation patterns need to keep evolving or they become a one-prompt solve.
dec0dedab0de•2h ago
Reminds me of when AOL broke all the script kiddy tools in 1996 by adding an extra space to the title of the window. I didn't have AOL, but my friend made one of those tools, and I helped him figure it out.
lokimedes•1h ago
All I want is an API for my AI, you can ask me for my public key, if you want my human identity verified. The collateral damage of this bot hunting is the emergence of personal AIs. Do we really want that? It feels regressive. (I see the hypocrisy here, we are fighting the scrapers that feed the LLMs that runs our personal agents)
larsmosr•1h ago
You are not wrong. But the use case I keep seeing is companies with proprietary content they spent real money creating, who don't want it showing up in someone else's training data for free. It's less about bot hunting and more about content owners having a choice.
gzread•1h ago
Another thing you can do is to install a font with jumbled characters: "a" looks like "x", "b" looks like "n", and so on. Then instead of writing "abc" you write "jmw" and it looks like "abc" on the screen. This has been used as a form of DRM for eBooks.

It breaks copy/paste and screen readers, but so does your idea.

larsmosr•1h ago
Font remapping is actually on the v2 roadmap. The reason v1 uses CSS ordering instead is it preserves screen reader access. Tradeoff is it's reversible (as another commenter just showed). Font remapping is stronger but breaks assistive tech. Solving both is the hard problem.
obsrcdsucks•1h ago

    function decodeObscrd(htmlOrElement) {
      let root;
      if (typeof htmlOrElement === 'string') {
        root = new DOMParser().parseFromString(htmlOrElement, 'text/html').body;
      } else {
        root = htmlOrElement || document;
      }
    
      const container = root.querySelector('[class*="obscrd-"]');
      if (!container) { return; }
    
      const words = [...container.children].filter(el => el.hasAttribute('data-o'));
      words.sort((a, b) => +a.dataset.o - +b.dataset.o);
    
      const result = words.map(word => {
        const chars = [...word.querySelectorAll('[data-o]')]
          .filter(el => el.querySelector('[data-o]') === null);
        chars.sort((a, b) => +a.dataset.o - +b.dataset.o);
        return chars.map(c => c.textContent).join('');
      }).join('');
    
      console.log(result);
      return result;
    }
larsmosr•1h ago
Yep, that works. The data-o attributes are readable in the DOM so you can reverse it with custom code. That's in the threat model. The goal is raising the cost from "curl + cheerio" to "write a custom decoder per site." Most scrapers move on to easier targets.
costco•1h ago
This is an interesting idea... it'd be a fun side project to implement enough of a CSS engine to undo this
larsmosr•1h ago
You are more than welcome to do so. Please keep in mind the realistic goal is raising the cost of scraping. Most bots use simple HTTP requests, and we make that useless.
GaryBluto•1h ago
> Your content, obscured.

Is that supposed to be a good thing?

larsmosr•1h ago
For content you want public, no.
kevinsync•1h ago
I'm surprised that you don't appear to be using it on obscrd.dev lol
larsmosr•1h ago
Well the information is not to hide, quiet the opposite haha. There is a Demo page
well_ackshually•1h ago
I too, hate people that:

* Copy text

* use a screen reader for accessibility purposes (not just on the web, but on mobile too. Your 'light' obfuscation is entirely broken with TalkBack on Android. individual words/characters read, text is not a single block)

* use an RSS feed

* use reader mode in their browser

If you don't want your stuff to be read, and that includes bots, don't put it online.

> Built this because I got tired of AI crawlers reading my HTML in plain text while robots.txt did nothing.

You could have spent that time working on your project, instead of actively making the web worse than it already is.

larsmosr•1h ago
The TalkBack issue is useful feedback, thank you. I tested with NVDA and VoiceOver but not TalkBack on Android. If light mode is reading individual words instead of a continuous block that's a real bug I want to fix.

On the broader point, I hear you, but I think there's a middle ground. Not all content is public knowledge. Some of it is premium, proprietary, or behind a paywall. The people publishing it should get to decide whether it becomes free training data.

h2zizzle•1h ago
I hate everything about this, please use your time on this planet to make life better for people instead of worse.

It is better for a million AI crawlers to get through than for even one search index crawler, that might expose the knowledge on your site to someone who needs it, to be denied.

larsmosr•1h ago
For public knowledge sites this would be the wrong tool entirely. The use case is more like paywalled articles, proprietary product data, or premium content that companies paid to create and don't want scraped into a competitor's training set. obscrd is opt-in per component, not a whole-site lockdown.
verse•1h ago
couldn't read the hero text on my phone

it's white text and the shader background is also mostly white

larsmosr•1h ago
Thanks, what phone/browser? I'll fix that.
yesitcan•1h ago
The irony of building an anti-AI project but writing your marketing and HN post with AI.

Malus – Clean Room as a Service

https://malus.sh
285•microflash•2h ago•91 comments

The Met Releases High-Def 3D Scans of 140 Famous Art Objects

https://www.openculture.com/2026/03/the-met-releases-high-definition-3d-scans-of-140-famous-art-o...
28•coloneltcb•43m ago•3 comments

US banks' exposure to private credit hits $300B (2025)

https://alternativecreditinvestor.com/2025/10/22/us-banks-exposure-to-private-credit-hits-300bn/
137•JumpCrisscross•3h ago•78 comments

Kotlin creator's new language: a formal way to talk to LLMs instead of English

https://codespeak.dev/
101•souvlakee•2h ago•75 comments

Dolphin Progress Release 2603

https://dolphin-emu.org/blog/2026/03/12/dolphin-progress-report-release-2603/
204•BitPirate•7h ago•27 comments

Colon cancer now leading cause of cancer deaths under 50 in US

https://www.theguardian.com/us-news/2026/mar/12/colon-cancer-leading-deaths
59•stevenwoo•53m ago•30 comments

Asia rolls out 4-day weeks, WFH to solve fuel crisis caused by Iran war

https://fortune.com/2026/03/11/iran-war-fuel-crisis-asia-work-from-home-closed-schools-price-caps/
86•speckx•56m ago•29 comments

ATMs didn't kill bank Teller jobs, but the iPhone did

https://davidoks.blog/p/why-the-atm-didnt-kill-bank-teller
99•colinprince•1h ago•115 comments

The Cost of Indirection in Rust

https://blog.sebastiansastre.co/posts/cost-of-indirection-in-rust/
11•sebastianconcpt•2d ago•1 comments

Hive (YC S14) is hiring scrappy product managers and product/data engineers

https://jobs.ashbyhq.com/hive.co
1•patman_h•1h ago

Avoiding Trigonometry (2013)

https://iquilezles.org/articles/noacos/
158•WithinReason•7h ago•34 comments

Italian prosecutors seek trial for Amazon, 4 execs in alleged $1.4B tax evasion

https://www.reuters.com/world/italian-prosecutors-seek-trial-amazon-four-execs-over-alleged-14-bl...
31•amarcheschi•51m ago•4 comments

3D-Knitting: The Ultimate Guide

https://www.oliver-charles.com/pages/3d-knitting
177•ChadNauseam•7h ago•62 comments

Emacs internals: Tagged pointers vs. C++ std:variant and LLVM (Part 3)

https://thecloudlet.github.io/blog/project/emacs-03/
31•thecloudlet•3h ago•14 comments

Show HN: s@: decentralized social networking over static sites

http://satproto.org/
387•remywang•16h ago•186 comments

Big Data on the Cheapest MacBook

https://duckdb.org/2026/03/11/big-data-on-the-cheapest-macbook
232•bcye•4h ago•198 comments

Printf-Tac-Toe

https://github.com/carlini/printf-tac-toe
93•carlos-menezes•4d ago•8 comments

Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

https://github.com/obsessiondb/rudel
88•keks0r•2h ago•52 comments

High fidelity font synthesis for CJK languages

https://github.com/kaonashi-tyc/zi2zi-JiT
27•kaonashi-tyc-01•3d ago•4 comments

Another DOGE staffer explaining how he flagged grants at NEH for "DEI"

https://bsky.app/profile/404media.co/post/3mgupw4v3ak2j
39•doener•24m ago•10 comments

Reliable Software in the LLM Era

https://quint-lang.org/posts/llm_era
63•mempirate•8h ago•21 comments

Returning to Rails in 2026

https://www.markround.com/blog/2026/03/05/returning-to-rails-in-2026/
282•stanislavb•10h ago•182 comments

Datahäxan

https://0dd.company/galleries/witches/7.html
109•akkartik•3d ago•9 comments

SHOW HN: A usage circuit breaker for Cloudflare Workers

19•ethan_zhao•2d ago•8 comments

SBCL: A Sanely-Bootstrappable Common Lisp (2008) [pdf]

https://research.gold.ac.uk/id/eprint/2336/1/sbcl.pdf
98•pabs3•9h ago•64 comments

Tested: How Many Times Can a DVD±RW Be Rewritten? Methodology and Results

https://goughlui.com/2026/03/07/tested-how-many-times-can-a-dvd%C2%B1rw-be-rewritten-part-2-metho...
216•giuliomagnifico•4d ago•69 comments

Suburban school district uses license plate readers to verify student residency

https://www.nbcchicago.com/consumer/suburban-school-district-uses-license-plate-readers-to-verify...
108•josephcsible•1h ago•131 comments

Don't post generated/AI-edited comments. HN is for conversation between humans

https://news.ycombinator.com/newsguidelines.html#generated
3974•usefulposter•20h ago•1488 comments

ArcaOS 5.1.2 (based on OS/2 Warp 4.52) now available

https://www.arcanoae.com/arcaos-5-1-2-now-available/
33•speckx•2h ago•11 comments

1B identity records exposed in ID verification data leak

https://www.aol.com/articles/1-billion-identity-records-exposed-152505381.html
167•robtherobber•6h ago•40 comments