Ask HN: Is a JVM/CDP based browser agent stack fundamentally a bad idea?

1•galaxyeye•1mo ago

Hi HN,

We built a very early prototype: a Browser-Agent/browser-automation runtime using Kotlin/JVM and raw CDP. Before investing further, we’d like advice from anyone who has worked on browser agents, AI browsers, large-scale automation, crawling, browser farms, or who has deep knowledge of Chromium/CDP.

We ourselves suspect many of our design assumptions may be flawed, so sharp criticism is very welcome.

---

TL;DR

We’re building an open-source runtime:

• AI planning/reasoning/logic lives on the JVM

• Browser actions are driven via raw CDP

• High concurrency via Kotlin coroutines

• A small ML agent learns page structure

But we’re not sure any of this is actually meaningful. Feedback—especially negative feedback—is appreciated.

---

1. JVM + CDP: possibly the wrong abstraction layer AI planning/reasoning/logic is on the JVM; browser actions are sent through CDP.

Some doubts we cannot resolve internally:

• Is the JVM too heavy for this domain? Will GC and scheduling cause tail latency?

• Is CDP inherently unsuitable for high-throughput automation?

• Does nobody actually need a JVM-native browser agent?

• Would Go/Node/Python be more sensible choices?

If the answer is “no, this is the wrong direction,” we’d really like to hear it.

---

2. High-concurrency runtime: likely to fall apart in real workloads

We’re trying to push single-machine throughput on real, complex pages by relying on:

• Kotlin coroutines

• Minimizing DevTools round-trips

• Raw CDP with multi-tab concurrency

But our doubts are even larger:

• Can Chromium realistically survive this scale?

(render-process contention, GPU-thread limits, compositor stalls, etc.)

• Are multi-tab workloads doomed to event interference, reordering, and deadlocks?

• Will CDP scheduling become the true bottleneck?

• Is raw CDP unavoidably more brittle than Playwright?

If you’ve seen similar attempts fail, we’d especially like to know how they failed.

---

3. Non-LLM page-structure learning: probably not generalizable

We built a small ML module to avoid calling an LLM every time we parse HTML.

It works well on e-commerce pages, but we strongly suspect it will break elsewhere.

Concerns:

• Will it fail outright on news, forums, SaaS dashboards, and other domains?

• Has anyone built DOM-structure-learning systems and then abandoned them? Why?

• Is the long tail of the web fundamentally hostile to non-LLM approaches?

Failure stories are particularly valuable.

---

4. Some questions we have zero confidence about

• Does the world actually need yet another browser-automation stack?

• Do “Browser Agents” have long-term practical value at all?

• Do coroutine-style concurrency models provide real benefits under heavy CDP I/O?

• Should we drop the “agent” layer entirely and just build a runtime?

• What fatal issues exist around resource isolation, multi-tenancy, event storms, or long-tail page behaviors?

• Do all high-concurrency browser runtimes eventually die for the same reasons?

If the answer is “yes, stop now,” we’d prefer to know early.

---

Prototype status

We’ll open-source a very early version (missing docs, missing examples, and possibly flawed designs).

Known issues include:

• Deadlocks on certain complex sites that are hard to reproduce

• CDP event reordering under high concurrency

• Worse-than-expected memory behavior

• Structure-learning module is inaccurate on non-e-commerce pages

If you’ve built systems with heavy browser interaction, automation, data extraction, or treating the browser as a runtime, we’d love to hear about the bottlenecks you hit—so we don’t optimize toward the wrong direction.

---

Finally

Any single sentence of criticism may save us months.

— Browser4 Team

Comments

grizzles•1mo ago

Open source it and you'll get all the feedback you desire.

galaxyeye•1mo ago

We appreciate your interest and look forward to open-sourcing the project in a few days.

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Bogus Pipeline

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender