frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Level Up Your Gaming

https://d4.h5go.life/
1•LinkLens•54s ago•1 comments

Di.day is a movement to encourage people to ditch Big Tech

https://itsfoss.com/news/di-day-celebration/
1•MilnerRoute•2m ago•0 comments

Show HN: AI generated personal affirmations playing when your phone is locked

https://MyAffirmations.Guru
1•alaserm•3m ago•0 comments

Show HN: GTM MCP Server- Let AI Manage Your Google Tag Manager Containers

https://github.com/paolobietolini/gtm-mcp-server
1•paolobietolini•4m ago•0 comments

Launch of X (Twitter) API Pay-per-Use Pricing

https://devcommunity.x.com/t/announcing-the-launch-of-x-api-pay-per-use-pricing/256476
1•thinkingemote•4m ago•0 comments

Facebook seemingly randomly bans tons of users

https://old.reddit.com/r/facebookdisabledme/
1•dirteater_•5m ago•1 comments

Global Bird Count

https://www.birdcount.org/
1•downboots•6m ago•0 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
2•soheilpro•8m ago•0 comments

Jon Stewart – One of My Favorite People – What Now? With Trevor Noah Podcast [video]

https://www.youtube.com/watch?v=44uC12g9ZVk
1•consumer451•10m ago•0 comments

P2P crypto exchange development company

1•sonniya•23m ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
1•jesperordrup•28m ago•0 comments

Write for Your Readers Even If They Are Agents

https://commonsware.com/blog/2026/02/06/write-for-your-readers-even-if-they-are-agents.html
1•ingve•29m ago•0 comments

Knowledge-Creating LLMs

https://tecunningham.github.io/posts/2026-01-29-knowledge-creating-llms.html
1•salkahfi•29m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•36m ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•44m ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
5•keepamovin•45m ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•47m ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•50m ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•50m ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
2•imthepk•55m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•56m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•56m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•59m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
3•breve•1h ago•1 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•1h ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•1h ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•1h ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•1h ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
7•tempodox•1h ago•4 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•1h ago•0 comments
Open in hackernews

Pure-vision browser agent scores 94% on WebVoyager (SOTA)

https://github.com/magnitudedev/webvoyager
5•anerli•7mo ago

Comments

anerli•7mo ago
Hey HN, Anders and Tom from Magnitude (YC S25) here. On our last Show HN post about our open-source browser agent, someone left a comment - "there are multiple similar projects like this posted here daily, and this one likely isn't the best". So we asked ourselves, are they right? We decided to run on WebVoyager (a well known benchmark for browser agents) to test ourselves. We scored 94%, beating all other browser agents and making Magnitude state-of-the-art.

You can view the entire run here: https://magnitude-webvoyager.vercel.app/

The original WebVoyager benchmark was meant to demonstrate a new technique for interacting with the browser by annotating the DOM. Since then, vision models have come a long way in terms of accuracy and visual understanding. Our pure-vision approach with our framework and today's models surpasses the hybrid DOM strategies used by the original WebVoyager paper and other agents like browser-use.

So why does pure-vision beat hybrid DOM approaches?

- Generalizes far better - handles canvas elements, iframes, drag-and-drop, precise text selection, and many other scenarios elegantly where hybrid DOM would struggle and need to implement hacks for those cases to work

- Easier for the LLM - we think LLM performance is roughly proportional to prompt clarity. If the prompt contains a crowded screenshot with loads of colored boxes + a long list of element labels and is asked to pick one, vs given a clean screenshot + where do you want to click - the latter seems far easier

We believe another reason for our success is that we can still hook into the browser as needed. We can use browser-native actions like tab switching, can look at network traffic to know when a page is ready, or use the DOM for other purposes like data extraction. Computer use agents like Operator or Claude Computer Use on the other hand are limited to generic mouse and keyboard controls.

It's worth mentioning that WebVoyager is a strange and flawed benchmark. It contains many tasks that depend on the current date (and need their dates updated), tasks that depend on the time of day, and some tasks that are impossible or too ambiguous to properly evaluate. In the repo we detailed exactly the patches we made to the original WebVoyager benchmark such that each task is at least theoretically possible.

Why does this all matter? People are trying to adopt agents for real use cases, but they often fail to make it to production. We want to enable developers to build with production-ready browser agents - which is why it's important to get the fundamental interaction paradigm right. We think this benchmark is a step in the right direction, showing that pure-vision has best-in-class performance in the browser domain. Curious to hear what others think about this, would love to get your feedback!