frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Stacky – certain block game clone

https://www.susmel.com/stacky/
1•Keyframe•50s ago•0 comments

AIII: A public benchmark for AI narrative and political independence

https://github.com/GRMPZQUIDOS/AIII
1•GRMPZ23•57s ago•0 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
1•valyala•2m ago•0 comments

The API Is a Dead End; Machines Need a Labor Economy

1•bot_uid_life•3m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•Jyaif•4m ago•0 comments

New wave of GLP-1 drugs is coming–and they're stronger than Wegovy and Zepbound

https://www.scientificamerican.com/article/new-glp-1-weight-loss-drugs-are-coming-and-theyre-stro...
3•randycupertino•6m ago•0 comments

Convert tempo (BPM) to millisecond durations for musical note subdivisions

https://brylie.music/apps/bpm-calculator/
1•brylie•8m ago•0 comments

Show HN: Tasty A.F.

https://tastyaf.recipes/about
1•adammfrank•8m ago•0 comments

The Contagious Taste of Cancer

https://www.historytoday.com/archive/history-matters/contagious-taste-cancer
1•Thevet•10m ago•0 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
1•alephnerd•10m ago•0 comments

Bithumb mistakenly hands out $195M in Bitcoin to users in 'Random Box' giveaway

https://koreajoongangdaily.joins.com/news/2026-02-07/business/finance/Crypto-exchange-Bithumb-mis...
1•giuliomagnifico•10m ago•0 comments

Beyond Agentic Coding

https://haskellforall.com/2026/02/beyond-agentic-coding
3•todsacerdoti•12m ago•0 comments

OpenClaw ClawHub Broken Windows Theory – If basic sorting isn't working what is?

https://www.loom.com/embed/e26a750c0c754312b032e2290630853d
1•kaicianflone•14m ago•0 comments

OpenBSD Copyright Policy

https://www.openbsd.org/policy.html
1•Panino•14m ago•0 comments

OpenClaw Creator: Why 80% of Apps Will Disappear

https://www.youtube.com/watch?v=4uzGDAoNOZc
2•schwentkerr•18m ago•0 comments

What Happens When Technical Debt Vanishes?

https://ieeexplore.ieee.org/document/11316905
2•blenderob•19m ago•0 comments

AI Is Finally Eating Software's Total Market: Here's What's Next

https://vinvashishta.substack.com/p/ai-is-finally-eating-softwares-total
3•gmays•20m ago•0 comments

Computer Science from the Bottom Up

https://www.bottomupcs.com/
2•gurjeet•20m ago•0 comments

Show HN: A toy compiler I built in high school (runs in browser)

https://vire-lang.web.app
1•xeouz•22m ago•1 comments

You don't need Mac mini to run OpenClaw

https://runclaw.sh
1•rutagandasalim•23m ago•0 comments

Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118
2•nicholascarolan•25m ago•0 comments

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

https://arxiv.org/abs/2601.22389
1•energyscholar•25m ago•1 comments

Ask HN: Will GPU and RAM prices ever go down?

1•alentred•25m ago•1 comments

From hunger to luxury: The story behind the most expensive rice (2025)

https://www.cnn.com/travel/japan-expensive-rice-kinmemai-premium-intl-hnk-dst
2•mooreds•26m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
6•mindracer•27m ago•0 comments

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

https://www.wsj.com/finance/currencies/a-new-crypto-winter-is-here-and-even-the-biggest-bulls-are...
1•thm•27m ago•0 comments

Moltbook was peak AI theater

https://www.technologyreview.com/2026/02/06/1132448/moltbook-was-peak-ai-theater/
2•Brajeshwar•28m ago•0 comments

Why Claude Cowork is a math problem Indian IT can't solve

https://restofworld.org/2026/indian-it-ai-stock-crash-claude-cowork/
3•Brajeshwar•28m ago•0 comments

Show HN: Built an space travel calculator with vanilla JavaScript v2

https://www.cosmicodometer.space/
2•captainnemo729•28m ago•0 comments

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•Brajeshwar•28m ago•0 comments
Open in hackernews

Pure-vision browser agent scores 94% on WebVoyager (SOTA)

https://github.com/magnitudedev/webvoyager
5•anerli•7mo ago

Comments

anerli•7mo ago
Hey HN, Anders and Tom from Magnitude (YC S25) here. On our last Show HN post about our open-source browser agent, someone left a comment - "there are multiple similar projects like this posted here daily, and this one likely isn't the best". So we asked ourselves, are they right? We decided to run on WebVoyager (a well known benchmark for browser agents) to test ourselves. We scored 94%, beating all other browser agents and making Magnitude state-of-the-art.

You can view the entire run here: https://magnitude-webvoyager.vercel.app/

The original WebVoyager benchmark was meant to demonstrate a new technique for interacting with the browser by annotating the DOM. Since then, vision models have come a long way in terms of accuracy and visual understanding. Our pure-vision approach with our framework and today's models surpasses the hybrid DOM strategies used by the original WebVoyager paper and other agents like browser-use.

So why does pure-vision beat hybrid DOM approaches?

- Generalizes far better - handles canvas elements, iframes, drag-and-drop, precise text selection, and many other scenarios elegantly where hybrid DOM would struggle and need to implement hacks for those cases to work

- Easier for the LLM - we think LLM performance is roughly proportional to prompt clarity. If the prompt contains a crowded screenshot with loads of colored boxes + a long list of element labels and is asked to pick one, vs given a clean screenshot + where do you want to click - the latter seems far easier

We believe another reason for our success is that we can still hook into the browser as needed. We can use browser-native actions like tab switching, can look at network traffic to know when a page is ready, or use the DOM for other purposes like data extraction. Computer use agents like Operator or Claude Computer Use on the other hand are limited to generic mouse and keyboard controls.

It's worth mentioning that WebVoyager is a strange and flawed benchmark. It contains many tasks that depend on the current date (and need their dates updated), tasks that depend on the time of day, and some tasks that are impossible or too ambiguous to properly evaluate. In the repo we detailed exactly the patches we made to the original WebVoyager benchmark such that each task is at least theoretically possible.

Why does this all matter? People are trying to adopt agents for real use cases, but they often fail to make it to production. We want to enable developers to build with production-ready browser agents - which is why it's important to get the fundamental interaction paradigm right. We think this benchmark is a step in the right direction, showing that pure-vision has best-in-class performance in the browser domain. Curious to hear what others think about this, would love to get your feedback!