There are several existing approaches to AI-driven browsing (Atlas, Comet, Claude in Chrome, Dia, Nanobrowser, browserOS). I built this because I wanted something with different trade-offs:
Zero dependencies: pure Chrome DevTools Protocol. No Playwright, LangChain, or browser-use.
Use existing subscriptions: works with Claude Pro/Max or ChatGPT Pro via their official CLI tools. No API keys or per-call billing.
Adversarial-site support: explicit handling for anti-bot measures, captchas, and brittle form flows. Tested on job applications, Gmail automation, and other sites that break most agents.
Hybrid perception: accessibility tree for semantic structure + screenshots for visual-only elements.
Model-agnostic: works with Claude, GPT, Gemini, Mistral, Qwen, or any OpenRouter-compatible model.
Single-agent architecture with human-like mouse movement, using your existing Chrome profile.
The demo shows it completing job applications on sites with captchas and bot protections, which is where many browser agents fail.
Would appreciate feedback on failure cases or sites it struggles with.