Ask HN: Is a JVM/CDP based browser agent stack fundamentally a bad idea?
1•galaxyeye•1h ago
Hi HN,
We built a very early prototype: a Browser-Agent/browser-automation runtime using Kotlin/JVM and raw CDP. Before investing further, we’d like advice from anyone who has worked on browser agents, AI browsers, large-scale automation, crawling, browser farms, or who has deep knowledge of Chromium/CDP.
We ourselves suspect many of our design assumptions may be flawed, so sharp criticism is very welcome.
---
TL;DR
We’re building an open-source runtime:
• AI planning/reasoning/logic lives on the JVM
• Browser actions are driven via raw CDP
• High concurrency via Kotlin coroutines
• A small ML agent learns page structure
But we’re not sure any of this is actually meaningful. Feedback—especially negative feedback—is appreciated.
---
1. JVM + CDP: possibly the wrong abstraction layer
AI planning/reasoning/logic is on the JVM; browser actions are sent through CDP.
Some doubts we cannot resolve internally:
• Is the JVM too heavy for this domain? Will GC and scheduling cause tail latency?
• Is CDP inherently unsuitable for high-throughput automation?
• Does nobody actually need a JVM-native browser agent?
• Would Go/Node/Python be more sensible choices?
If the answer is “no, this is the wrong direction,” we’d really like to hear it.
---
2. High-concurrency runtime: likely to fall apart in real workloads
We’re trying to push single-machine throughput on real, complex pages by relying on:
• Kotlin coroutines
• Minimizing DevTools round-trips
• Raw CDP with multi-tab concurrency
But our doubts are even larger:
• Can Chromium realistically survive this scale?
(render-process contention, GPU-thread limits, compositor stalls, etc.)
• Are multi-tab workloads doomed to event interference, reordering, and deadlocks?
• Will CDP scheduling become the true bottleneck?
• Is raw CDP unavoidably more brittle than Playwright?
If you’ve seen similar attempts fail, we’d especially like to know how they failed.
---
3. Non-LLM page-structure learning: probably not generalizable
We built a small ML module to avoid calling an LLM every time we parse HTML.
It works well on e-commerce pages, but we strongly suspect it will break elsewhere.
Concerns:
• Will it fail outright on news, forums, SaaS dashboards, and other domains?
• Has anyone built DOM-structure-learning systems and then abandoned them? Why?
• Is the long tail of the web fundamentally hostile to non-LLM approaches?
Failure stories are particularly valuable.
---
4. Some questions we have zero confidence about
• Does the world actually need yet another browser-automation stack?
• Do “Browser Agents” have long-term practical value at all?
• Do coroutine-style concurrency models provide real benefits under heavy CDP I/O?
• Should we drop the “agent” layer entirely and just build a runtime?
• What fatal issues exist around resource isolation, multi-tenancy, event storms, or long-tail page behaviors?
• Do all high-concurrency browser runtimes eventually die for the same reasons?
If the answer is “yes, stop now,” we’d prefer to know early.
---
Prototype status
We’ll open-source a very early version (missing docs, missing examples, and possibly flawed designs).
Known issues include:
• Deadlocks on certain complex sites that are hard to reproduce
• CDP event reordering under high concurrency
• Worse-than-expected memory behavior
• Structure-learning module is inaccurate on non-e-commerce pages
If you’ve built systems with heavy browser interaction, automation, data extraction, or treating the browser as a runtime, we’d love to hear about the bottlenecks you hit—so we don’t optimize toward the wrong direction.
---
Finally
Any single sentence of criticism may save us months.
— Browser4 Team
Comments
grizzles•39m ago
Open source it and you'll get all the feedback you desire.
grizzles•39m ago