Show HN: rtrvr.ai – New Free SOTA AI Web Agent Beats Even Operator

https://www.rtrvr.ai/blog/web-bench-results

8•arjunchint•7mo ago

We just benchmarked our agent, rtrvr.ai, on the Halluminate (YC S25) Web Bench, and rtrvr.ai achieved a new State-of-the-Art performance with an 81% success rate. For perspective, this surpasses not only all other autonomous agents but also the human-intervention baseline of OpenAI's Operator (76.5%).

It also completes tasks an astonishing 7x faster than the next leading alternative.

This isn't just an incremental improvement; it's a validation of our core architectural philosophy. Our performance stems from two key differentiators:

- Local-First Operation: As a Chrome Extension, rtrvr.ai operates directly within the user's browser. This eliminates the latency, bot detection and access issues that plague cloud browser agents.

- DOM-Based Interaction: Instead of relying on brittle visual parsing (CUA), our agent interacts directly with the page's HTML structure, enabling skipping clicks and resilience to pop-ups and overlays. We also can just use the latest and fastest models such as Gemini Flash for superior performance.

This leads to a critical industry insight: Cloud Browser Agents are not a viable long-term solution for reliable web automation.

Our benchmark analysis shows that over 94% of rtrvr.ai's failures were "agent errors" (fixable AI logic), while only 5% were "infrastructure errors." For cloud agents, this ratio is often inverted. You can't build a reliable agent if you can't even guarantee access to the environment.

Finally it only cost us ~$40 to run this benchmark, whereas we estimate it cost >~$1k in infra costs for each agent for Halluminate.

The future of web automation won't be fought from remote data centers. It will be run symbiotically from your browser. Our results are the first major data point proving this thesis and putting the first nail in the coffin for cloud browser agents.

Full Report: https://www.rtrvr.ai/blog/web-bench-results

Or if you just want to tune into some Agentic-SMR of a web agent doing tasks online tune into the playlist: https://www.youtube.com/watch?v=HWPZI8PjuLY&list=PL5rk1YARPB...

Try out the magic of a working web agent yourself, install at: https://chromewebstore.google.com/detail/rtrvrai-ai-web-agen...

Bring your own API Key from ai.studio and use Google's Gemini Free Tier to use our web agent for free! We literally have a button that will get our agent to open AI Studio create key and configure itself all automatically.

Comments

quarkcarbon279•7mo ago

How did you keep your costs so low? Eval costs especially with Agents can go up a lot and what did it cost for other agents?

arjunchint•7mo ago

We directly leverage the user's own browser so no cloud browser hosting or proxying costs! We averaged only $0.1/task.

The whole idea of cloud browser agents is a stupid paradigm. The agents are not only 7x slower but have the cost of hosting and proxying for that extra time!

Our own biggest cost is just LLM inference, thus we can just let our users bring their own API Key and use our service for free!

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait