Detect and crash Chromium bots

https://blog.castle.io/detect-and-crash-chromium-bots-with-one-weird-trick-bots-hate-it/

61•avastel•2d ago

Comments

lifthrasiir•3h ago

Previously on HN: Detecting Noise in Canvas Fingerprinting https://news.ycombinator.com/item?id=43170079

The reception was not really positive for the obvious reason at that time.

chrismorgan•3h ago

Checking https://issues.chromium.org/issues/340836884, I’m mildly surprised to find the report just under a year old, with no attention at all (bar a me-too comment after four months), despite having been filed with priority P1, which I understand is supposed to mean “aim to fix it within 30 days”. If it continues to get no attention, I’m curious if it’ll get bumped automatically in five days’ time when it hits one year, given that they do something like that with P2 and P3 bugs, shifting status to Available or something, can’t quite remember.

I say only “mildly”, because my experience on Chromium bugs (ones I’ve filed myself, or ones I’ve encountered that others have filed) has never been very good. I’ve found Firefox much better about fixing bugs.

oefrha•2h ago

> The call to page.evaluate just hangs, and the browser dies silently. browser.close() is never reached, which can cause memory leaks over time.

Not just memory leaks. Since a couple months ago, if you use Chrome via playwright etc. on macOS, it will deposit a copy of Chrome (more than 1GB) into /private/var/folders/kd/<...>/X/com.google.Chrome.code_sign_clone/, and if you exit without a clean browser.close(), the copy of Chrome will remain there. I noticed after it ate up ~50GB in two days. No idea what's the point of this code sign clone thing, but I had to add --disable-features=MacAppCodeSignClone to all my invocations to prevent it, which is super annoying.

closewith•1h ago

That's an open bug at the minute, but the one saving grace is that they're APFS clones so don't actually consume disk space.

oefrha•38m ago

Interesting, IIRC I did free up quite a bit of disk space when I removed all the clones, but I also deleted a lot of other stuff that time so I could be mistaken. du(1) being unaware of APFS clones makes it hard to tell.

omneity•2h ago

Relevant plug: At Herd we offer a browser automation and orchestration framework that uses real browsers and thus sidesteps several of these issues[0]. The API is puppeteer-like but doesn't use it as we built the entire framework[1] from scratch.

If you're wondering about the emphasis on MCPs, Herd is a generalist automation framework with a bespoke package format – trails[2], that supports MCP and REST out-of-the-box.

0: https://herd.garden

1: https://herd.garden/docs/reference

2: https://herd.garden/docs/trails-automations

---

EDIT: I understand not everyone likes a shameless plug in another thread. The intention behind it however is also informative, as not every browser automation strategy is subject to the issues as in TFA.

The title does say crashing Chromium bots, yet our approach creates "Chromium bots" that do not crash under this premise, providing a useful counter-example.

randunel•1h ago

How do you deal with the usual CF, akamai and other fingerprinting and blocking you? Or is that the customer's job to figure out?

omneity•1h ago

Thank you for the question! It depends on the scale you're operating at.

1. For individual use (or company use but each user is on their device) typically the traffic is drown out in regular user activity since we use the same browser and no particular measure is needed, it just works. We have options for power users.

2. For large scale use, we offer tailored solutions depending on the anti-bot measures encountered. Part of it is to emulate #1.

3. We don't deal with "blackhat bots", so we don't offer support to work around legitimate anti-bot measures such as social spambots etc.

lyu07282•1h ago

If you don't put significant effort into it, any headless browser from cloud IP ranges will be banned by large parts of the internet. This isn't just about spam bots, you can't even read news articles in many cases. You will have some competition from residential proxies and other custom automation solutions that take care of all of that for their customers.

omneity•1h ago

Thanks, that's so true! We learned this the hard way building Monitoro[0] and large data scraping pipelines in the past, so we had the opportunity to build up the required muscle.

One thing to note, there are different "tiers" of websites, each requiring different counter-measures. Not everyone is pursuing the high competition websites, and most importantly as we learned in several cases scraping is fully consensual or within the rights of the user. For example:

* Many of our users scrape their own websites to send notifications to their discord community. It's a super easy way to create alerts without code.

* Sometimes users are locked in their own providers, for example some companies have years of job posting information in their ATS they cannot get out. We do help with that.

* Public data websites who are underutilized precisely because the data is difficult to access. We help make that data operational and actionable. We had for example a sailor setup alerts on buoys to stay safe in high waters. A random example[1]

0: https://monitoro.co

1: https://wavenet.cefas.co.uk/details/312/EXT

LTXVideo 13B AI video generation

The Cult of Doing Business

The Fallacy of Techno-Feudalism

Vision Now Available in Llama.cpp

Radxa Orion O6 brings Arm to the midrange PC (with caveats)

Spanish Shipwreck Reveals Evidence of Earliest Known Pet Cats to Arrive in US

Private Japanese lunar lander enters orbit around moon ahead of a June touchdown

Loss of dance and infant-directed song among the Northern ACHé

Detect and crash Chromium bots

Slow software for a burning world

Gmail to SQLite

Henry James's family tried to keep him in the closet (2016)

Business books are entertainment, not strategic tools

Industry groups are not happy about the imminent demise of Energy Star

A simple 16x16 dot animation from simple math rules

Embracer Games Archive is preserving 75000 video games and needs contributions

ALICE detects the conversion of lead into gold at the LHC

In praise of grobi for auto-configuring X11 monitors

The Deathbed Fallacy

Cosmos 482 Descent Craft tracker

Internet Roadtrip: Vote to steer

Intel: Winning and Losing

QueryLeaf: SQL for Mongo

Ash (Almquist Shell) Variants

How much information is in DNA?

WebGL Water (2010)

Brandon's Semiconductor Simulator

Rust’s dependencies are starting to worry me

Sofie: open-source web based system for automating live TV news production

Fleurs du Mal

Detect and crash Chromium bots

Comments

LTXVideo 13B AI video generation

The Cult of Doing Business

The Fallacy of Techno-Feudalism

Vision Now Available in Llama.cpp

Radxa Orion O6 brings Arm to the midrange PC (with caveats)

Spanish Shipwreck Reveals Evidence of Earliest Known Pet Cats to Arrive in US

Private Japanese lunar lander enters orbit around moon ahead of a June touchdown

Loss of dance and infant-directed song among the Northern ACHé

Detect and crash Chromium bots

Slow software for a burning world

Gmail to SQLite

Henry James's family tried to keep him in the closet (2016)

Business books are entertainment, not strategic tools

Industry groups are not happy about the imminent demise of Energy Star

A simple 16x16 dot animation from simple math rules

Embracer Games Archive is preserving 75000 video games and needs contributions

ALICE detects the conversion of lead into gold at the LHC

In praise of grobi for auto-configuring X11 monitors

The Deathbed Fallacy

Cosmos 482 Descent Craft tracker

Internet Roadtrip: Vote to steer

Intel: Winning and Losing

QueryLeaf: SQL for Mongo

Ash (Almquist Shell) Variants

How much information is in DNA?

WebGL Water (2010)

Brandon's Semiconductor Simulator

Rust’s dependencies are starting to worry me

Sofie: open-source web based system for automating live TV news production

Fleurs du Mal