Show HN: Open-source browser for AI agents

https://github.com/theredsix/agent-browser-protocol

32•theredsix•3h ago

Hi HN, I forked chromium and built agent-browser-protocol (ABP) after noticing that most browser-agent failures aren’t really about the model misunderstanding the page. Instead, the problem is that the model is reasoning from a stale state.

ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.

The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.

A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed

As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.

Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs)

Demo video: https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369

Comments

theredsix•3h ago

Op here, happy to answer any question!

esafak•1h ago

How does it compare with https://agent-browser.dev/ ? It would be great if you could add it to your table: https://github.com/theredsix/agent-browser-protocol?#compari...

theredsix•31m ago

agent-browser's biggest selling point is a CLI wrapper around CDP/puppeteer for context management. It'll have mostly the same pros/cons as CDP on the table.

giancarlostoro•2h ago

Interesting, I wonder if this would help with other projects too, one project that comes to mind is archivebox, I don't know if they still have the issue I'm thinking of, but archivebox eventually had the Chrome instances (as the meme goes) basically consume all available RAM. If by freezing execution this could stop that, it could be useful for more than just AI agents.

theredsix•27m ago

Yeah, I noticed CPU use goes to near zero during the pausing phase. You can also trigger pause via REST/MCP so a script can take advantage of these abilities as well.

Retr0id•1h ago

> As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark

And what does opus score with "regular" browser harnesses?

esafak•1h ago

https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderb...

Retr0id•1h ago

Hm I can't see Opus 4.6 on there

theredsix•33m ago

I tweeted at the OSUNLP and they're backed up on eval validation. In the meantime, here's the benchmark repo with the saved runs and also instructions on how to run it locally. https://github.com/theredsix/abp-online-mind2web-results

9wzYQbTYsAIc•54m ago

90% easy or 90% average?

theredsix•34m ago

90% average with 85.51% hard!

9wzYQbTYsAIc•32m ago

Nice! Will take a look at this for my homelab - was debating using crawl.cloudflare.com to try it out, as browser rendering was my next stretch goal.

gregpr07•43m ago

Love it! From first principles: this kinda answers the "do we really even need CDP" I always have in my head building browser use...

theredsix•35m ago

Totally, I feel that CDP was designed for a different category of automations.

Show HN: Klaus – OpenClaw on a VM, batteries included

Show HN: Open-source browser for AI agents

Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids

Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills

Show HN: Rewriting Mongosh in Golang Using Claude

Show HN: OpenUI – A code-like rendering spec for Generative UI

Show HN: Loquix – Open-source Web Components for AI chat interfaces

Show HN: StreamHouse – Open-source Kafka alternative

Show HN: PayrollEngine – Open-source regulation-based payroll framework (.NET)

Show HN: AgentSign – Open-source zero trust engine for AI agents

Show HN: Faster, cheaper Claude Code with local semantic code search via sqlite

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

Show HN: Silly Faces

Show HN: DD Photos – open-source photo album site generator (Go and SvelteKit)

Show HN: Joha – a free browser-based drawing playground with preset shape tools

Show HN: Ash, an Agent Sandbox for Mac

Show HN: Modulus – Cross-repository knowledge orchestration for coding agents

Show HN: I Was Here – Draw on street view, others can find your drawings

Show HN: Claude Code Token Elo

Show HN: kitty-graphics.el – Images, LaTeX and PDFs in terminal Emacs

Show HN: Remotely use my guitar tuner

Show HN: The Mog Programming Language

Show HN: Liteparser – a complete SQLite parser in C

Show HN: DenchClaw – Local CRM on Top of OpenClaw

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Show HN: A playable version of the Claude Code Terraform destroy incident

Show HN: A modern React onboarding tour library

Show HN: 2D RPG base game client recreated in modern HTML5 game engine with AI

Show HN: Klaus – OpenClaw on a VM, batteries included

Show HN: Open-source browser for AI agents

Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids

Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills

Show HN: Rewriting Mongosh in Golang Using Claude

Show HN: OpenUI – A code-like rendering spec for Generative UI

Show HN: Loquix – Open-source Web Components for AI chat interfaces

Show HN: StreamHouse – Open-source Kafka alternative

Show HN: PayrollEngine – Open-source regulation-based payroll framework (.NET)

Show HN: AgentSign – Open-source zero trust engine for AI agents

Show HN: Faster, cheaper Claude Code with local semantic code search via sqlite

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

Show HN: Silly Faces

Show HN: DD Photos – open-source photo album site generator (Go and SvelteKit)

Show HN: Joha – a free browser-based drawing playground with preset shape tools

Show HN: Ash, an Agent Sandbox for Mac

Show HN: Modulus – Cross-repository knowledge orchestration for coding agents

Show HN: I Was Here – Draw on street view, others can find your drawings

Show HN: Claude Code Token Elo

Show HN: kitty-graphics.el – Images, LaTeX and PDFs in terminal Emacs

Show HN: Remotely use my guitar tuner

Show HN: The Mog Programming Language

Show HN: Liteparser – a complete SQLite parser in C

Show HN: DenchClaw – Local CRM on Top of OpenClaw

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Show HN: A playable version of the Claude Code Terraform destroy incident

Show HN: A modern React onboarding tour library

Show HN: 2D RPG base game client recreated in modern HTML5 game engine with AI

Show HN: Open-source browser for AI agents

Comments