With most other tools, the model is interacting with a live browser and effectively has to reason through a stream of low-level events while the page keeps changing. We instead freeze the page, let the model request one action, execute it, allow all resulting browser events to play out, then freeze again and return one bundled response with everything that happened plus the new stable page state.
So the model isn’t chasing a moving UI or event stream. It gets one grounded step at a time. A big part of the performance gain seems to come from that holistic action envelope.
The cleanliness of this approach that improves the ability for the model to interact without having to completely redefine the interface system with respect to still being able to use websites and the computer as is and not having to develop an entirely new interface protocol at the machine level.
I already have it set up for local Claude agent use and seeing significant improvement, both in accuracy and task efficiency: `claude mcp add browser -- npx -y agent-browser-protocol@rc --mcp`
Additionally, if you want to configure with Claude Desktop, add the following to your `claude_desktop_config.json` after installing the MCP:
``` "mcpServers": { "browser": { "command": "npx", "args": [ "-y", "agent-browser-protocol@rc", "--mcp" ] } } ```
theredsix•8h ago
The browser shows the model the current page, the model chooses the next action, and the browser returns the new state. Between steps, JavaScript and time are frozen so the page stays still while the model thinks.
That makes things like ecommerce shopping and popup-heavy web app workflows much more reliable.
Using this setup, the project gets ~90% on Online Mind2Web. My bet is that browser agents need a protocol designed for models, not just wrappers around CDP.