frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Sentinel – browser agent using 3x+ fewer tokens (open benchmark)

https://github.com/ArasHuseyin/browser-agent-benchmark
1•isoldex•31m ago

Comments

isoldex•31m ago
Hi HN - author of Sentinel here.

I built Sentinel after using browser-use and Stagehand on a client project and hitting two recurring issues: flaky reliability on multi-step flows, and token costs that ate the budget on anything non-trivial. I suspected the root cause was architectural - both lean on the LLM re-reading large portions of the page each step - and tried Chrome's Accessibility Object Model (AOM) as the observation layer instead.

To check whether that architectural choice actually mattered, I built a 9-task benchmark comparing Sentinel, Stagehand, and browser-use against the same Gemini 3 Flash Preview model, same prompts, same programmatic validators, 5 runs per task-tool combo. Raw per-run JSON is committed so you can recompute or challenge every number.

Headline numbers: - Tokens: Sentinel uses 3.1x-56.9x fewer than browser-use, 1.4x-13.3x fewer than Stagehand. - Reliability: Sentinel 100% (45/45), browser-use 100% (45/45), Stagehand 86.7% (39/45). - Speed: Sentinel is fastest on 5 of 9 tasks. - The harder the task, the bigger the token gap.

Caveats up front: - I built Sentinel - treat this as a starting point for your own verification, not an impartial survey. README has a full known-limitations section. - Single model (Gemini 3 Flash Preview, which is also Stagehand's documented recommendation). - 9 tasks is small; raw JSON is there if you want to add tasks or rerun on a different model. - Each framework is used with its idiomatic API (Sentinel/Stagehand: discrete act()/extract(); browser-use: agent-loop prompt). Forcing them into the same call pattern would disadvantage whichever is optimized for the other.

Sentinel is already in production with paying clients (all self-hosted), which covers development costs. A managed offering is on the table if there's real demand: you'd pay infra + model usage at cost, no margin. Drop a comment if that would unblock you, otherwise I'd rather not maintain hosting nobody needs.

Show HN: Bench-Bets – A World Cup Prediction App Built with Nuxt and SQLite

https://bench-bets.com/
1•devopsian•1m ago•0 comments

There Shall Be Cathedrals

https://zachill.substack.com/p/there-shall-be-cathedrals
1•speckx•2m ago•0 comments

Introducing Kiro Web: Build, delegate, and steer right from the browser

https://kiro.dev/blog/introducing-kiro-web/
1•siegers•3m ago•0 comments

Show HN: AgentShield – Stop AI agents from spending money unsupervised

https://agentshieldv2-dashboard-production.up.railway.app/
2•lucarizzo1010•5m ago•1 comments

Gemini Omni

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/
2•strongpigeon•6m ago•0 comments

Wild Young People

https://asteriskmag.com/issues/14/these-wild-young-people
1•littlexsparkee•6m ago•0 comments

Deciphering the Hashihara Castle Town Map

https://www.obayashi.co.jp/en/kikan_obayashi/detail/kikan_64_project.html
1•1970-01-01•7m ago•0 comments

On Blind, Anxious Tech Workers Get the Lowdown on Layoffs

https://www.nytimes.com/2026/05/19/business/tech-layoffs-blind.html
3•tekdude•7m ago•0 comments

The Iranian Government Filtering Machine Is Getting Into Mozilla PSL

https://github.com/publicsuffix/list/pull/2917
2•ent101•8m ago•0 comments

Google Changes Its Search Box for the First Time in 25 Years

https://www.nytimes.com/2026/05/19/business/google-seach-bar-ai-gemini.html
1•golfer•9m ago•0 comments

Gemini CLI will stop working from June 18, 2026

https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/
3•primaprashant•9m ago•2 comments

Cat Organ

https://en.wikipedia.org/wiki/Cat_organ
1•petethomas•9m ago•0 comments

Gemini Spark

https://gemini.google/overview/agent/spark/
4•jeremydw•10m ago•0 comments

Google Antigravity 2.0

https://antigravity.google/blog/introducing-google-antigravity-2-0
3•John7878781•10m ago•1 comments

We made our filesystem 47× faster by deleting it

https://microsandbox.dev/blog/oci-filesystem-47x-faster
3•appcypher•10m ago•0 comments

Universal Commerce Protocol

http://ucp.dev/
2•Wingy•10m ago•0 comments

Bipartisan Bill Would Impose New Annual Fee on Electric Vehicles

https://www.nytimes.com/2026/05/19/business/energy-environment/electrc-vehicles-annual-fee-congre...
2•tantalor•13m ago•0 comments

'Capitalism has to become more humane': a Stanford economist on big tech

https://www.theguardian.com/books/2026/may/18/big-tech-monopolies-democracy-mordecai-kurz
4•xyzal•14m ago•0 comments

Google Search as you know it is over

https://techcrunch.com/2026/05/19/google-search-as-you-know-it-is-over/
4•evo_9•15m ago•0 comments

Agent Evaluation: A Detailed Guide

https://cameronrwolfe.substack.com/p/agent-evals
2•gmays•17m ago•0 comments

Show HN: LaunchDock – App Launcher in Rust

https://github.com/qa3-tech/launchdock
2•qa3-tech•17m ago•0 comments

De‐Bloating JavaScript

https://github.com/naver/lispe/wiki/6.23-De%E2%80%90bloating-Javascript
2•birdculture•17m ago•0 comments

Co-Scientist: A multi-agent AI partner to accelerate research

https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
2•ryanhn•17m ago•0 comments

Streamer Realtime Deepfakes Himself into Mr. Beast

https://www.404media.co/streamer-realtime-deepfakes-himself-into-mr-beast-says-he-loves-touching-...
1•cdrnsf•18m ago•0 comments

Show HN: Local LLM code-generation with Gemma 4 e2B via JSON AST to Clojure

https://github.com/quadracollision/llmisp
1•vegnus•18m ago•0 comments

Demis Hassabis Thinks AI Job Cuts Are Dumb

https://www.wired.com/story/demis-hassabis-ai-layoffs-deepmind-google-io/
3•ent101•19m ago•0 comments

IBM Brings Its Most Advanced AI-Powered Security Portfolio to Clients

https://newsroom.ibm.com/2026-05-19-IBM-Brings-Its-Most-Advanced-AI-Powered-Security-Portfolio-to...
1•SVI•19m ago•0 comments

Google Search is getting its biggest changes

https://www.theverge.com/tech/932970/google-search-ai-update-io-2026
1•droidjj•19m ago•0 comments

You're not ready for minions

https://contextbridge.ai/blog/youre-not-ready-for-minions-01/
1•jcarver•20m ago•0 comments

Parallel execution for Node.js, done right

https://github.com/yankouskia/hurried
2•yankouskia•21m ago•0 comments