Show HN: Sentinel – browser agent using 3x+ fewer tokens (open benchmark)

https://github.com/ArasHuseyin/browser-agent-benchmark

1•isoldex•31m ago

Comments

isoldex•31m ago

Hi HN - author of Sentinel here.

I built Sentinel after using browser-use and Stagehand on a client project and hitting two recurring issues: flaky reliability on multi-step flows, and token costs that ate the budget on anything non-trivial. I suspected the root cause was architectural - both lean on the LLM re-reading large portions of the page each step - and tried Chrome's Accessibility Object Model (AOM) as the observation layer instead.

To check whether that architectural choice actually mattered, I built a 9-task benchmark comparing Sentinel, Stagehand, and browser-use against the same Gemini 3 Flash Preview model, same prompts, same programmatic validators, 5 runs per task-tool combo. Raw per-run JSON is committed so you can recompute or challenge every number.

Headline numbers: - Tokens: Sentinel uses 3.1x-56.9x fewer than browser-use, 1.4x-13.3x fewer than Stagehand. - Reliability: Sentinel 100% (45/45), browser-use 100% (45/45), Stagehand 86.7% (39/45). - Speed: Sentinel is fastest on 5 of 9 tasks. - The harder the task, the bigger the token gap.

Caveats up front: - I built Sentinel - treat this as a starting point for your own verification, not an impartial survey. README has a full known-limitations section. - Single model (Gemini 3 Flash Preview, which is also Stagehand's documented recommendation). - 9 tasks is small; raw JSON is there if you want to add tasks or rerun on a different model. - Each framework is used with its idiomatic API (Sentinel/Stagehand: discrete act()/extract(); browser-use: agent-loop prompt). Forcing them into the same call pattern would disadvantage whichever is optimized for the other.

Sentinel is already in production with paying clients (all self-hosted), which covers development costs. A managed offering is on the table if there's real demand: you'd pay infra + model usage at cost, no margin. Drop a comment if that would unblock you, otherwise I'd rather not maintain hosting nobody needs.

Show HN: Bench-Bets – A World Cup Prediction App Built with Nuxt and SQLite

There Shall Be Cathedrals

Introducing Kiro Web: Build, delegate, and steer right from the browser

Show HN: AgentShield – Stop AI agents from spending money unsupervised

Gemini Omni

Wild Young People

Deciphering the Hashihara Castle Town Map

On Blind, Anxious Tech Workers Get the Lowdown on Layoffs

The Iranian Government Filtering Machine Is Getting Into Mozilla PSL

Google Changes Its Search Box for the First Time in 25 Years

Gemini CLI will stop working from June 18, 2026

Cat Organ

Gemini Spark

Google Antigravity 2.0

We made our filesystem 47× faster by deleting it

Universal Commerce Protocol

Bipartisan Bill Would Impose New Annual Fee on Electric Vehicles

'Capitalism has to become more humane': a Stanford economist on big tech

Google Search as you know it is over

Agent Evaluation: A Detailed Guide

Show HN: LaunchDock – App Launcher in Rust

De‐Bloating JavaScript

Co-Scientist: A multi-agent AI partner to accelerate research

Streamer Realtime Deepfakes Himself into Mr. Beast

Show HN: Local LLM code-generation with Gemma 4 e2B via JSON AST to Clojure

Demis Hassabis Thinks AI Job Cuts Are Dumb

IBM Brings Its Most Advanced AI-Powered Security Portfolio to Clients

Google Search is getting its biggest changes

You're not ready for minions

Parallel execution for Node.js, done right