frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Vision agents vs. structured APIs on the same internal tool task

5•FirestarAlpha•1h ago
Vision agents (browser-use, computer-use) are the default for letting AI agents operate web apps without APIs. Writing an MCP or REST API per app is the alternative, but every app needs its own. Enterprise teams have 20+ internal tools.

We ran the agents on a Reflex port of a react demo (a small business’ admin panel). The task was to find the "Smith" with the most orders, accept their pending reviews, mark their most recent order as delivered.

Results (medians, n=5 API / n=3 vision):

- Vision agent: 47 steps, 495k tokens, ~14 min - API agent: 8 calls, 12k tokens, 19.7s

The vision agent failed on the abstract task and needed a 14-step UI walkthrough before completing it, and even with the walkthrough it made 47 round-trips each carrying a full-page screenshot.

Vision-run variance was wide enough (853-1296s, 407k-751k tokens) that a single run isn't representative, while API runs were tightly clustered. This is the cost of being lazy about making an agent-friendly interface.

The endpoints in Path B were auto-generated by a plugin shipped in Reflex 0.9 this week. You can find full methodology here: https://reflex.dev/blog/vision-agents-vs-api-calls/

Benchmarking Local LLM/Harness Combinations

https://neuralnoise.com///2026/harness-bench-wip/
1•pminervini•2m ago•0 comments

Cyborg Evals

https://www.lesswrong.com/posts/zctBgvzxamFThgc3T/cyborg-evals
1•frmsaul•2m ago•1 comments

Real Linux. In a browser tab. No install. No server. No Docker

https://linuxontab.com/
1•kilian-ai•3m ago•0 comments

The Evolution of Open Source with Kelsey Hightower [video]

https://www.youtube.com/watch?v=a5-zTLJprpU
1•mooreds•4m ago•0 comments

Anthropic wants to be the AWS of agentic AI

https://thenewstack.io/anthropic-agents-managed-aws-claude/
1•Brajeshwar•4m ago•0 comments

Tess Observations

https://tess.mit.edu/
1•mooreds•4m ago•0 comments

What is Windows K2? Inside Microsoft's big plan to save Windows 11

https://www.windowscentral.com/microsoft/windows-11/what-is-windows-k2-everything-you-need-to-kno...
1•robotnikman•5m ago•0 comments

What Happens in the First 24 Hours After a New Asset Goes Live

https://www.bleepingcomputer.com/news/security/what-happens-in-the-first-24-hours-after-a-new-ass...
1•mooreds•6m ago•0 comments

Ukraine Bets on Battlefield AI

https://apnews.com/article/russia-ukraine-war-artificial-intelligence-europe-a7d2cce367f68caa3598...
1•beezle•6m ago•0 comments

Monthly News – April 2026

https://blog.linuxmint.com/?p=5022
1•paulnpace•6m ago•0 comments

Coding agents expose this: same VPS, 3 runs, ~65% drift

https://webbynode.com/articles/coding-agents-infrastructure-vps-benchmarks
1•gsgreen•7m ago•0 comments

The Enhanced Games, Where Athletes Compete on Steroids, HGH, Adderall

https://www.vanityfair.com/news/story/inside-the-enhanced-games
2•zdw•8m ago•0 comments

Difference between good debt and bad debt

https://smartmoneyguides.quora.com/
1•hennix22•9m ago•0 comments

Digging into Claude Code and codex source codes to understand how they work

https://nimasadri11.github.io/random/annotated-agent/
1•nimasadri11•9m ago•0 comments

From items to users: Rebuilding Plaid's API in flight

https://medium.com/plaid-engineering/from-items-to-users-rebuilding-plaids-api-in-flight-8e8aa037...
2•bassoonspinach•9m ago•0 comments

Palantir's Al Targeting System Running the Iran War [video]

https://www.youtube.com/watch?v=CHLFl26p7Po
2•smallerfish•11m ago•0 comments

The Alice and Bob After Dinner Speech

https://hex.ooo/library/alicebob.html
1•tempodox•11m ago•0 comments

IBM Selectric

https://en.wikipedia.org/wiki/IBM_Selectric
2•paulpauper•11m ago•0 comments

A Year on an E-Reader

https://wombat.bearblog.dev/a-year-on-an-e-reader/
1•speckx•11m ago•0 comments

Paraconsistent Logic (Substantive Revision)

https://plato.stanford.edu/entries/logic-paraconsistent/
1•StatsAreFun•12m ago•0 comments

SFO Gate Explorer

https://www.flysfo.com/passengers/services/gate-explorer
1•CaliforniaKarl•12m ago•0 comments

Greptile's New Pricing Is Predatory

https://greptile-fail.vercel.app/
2•not-chatgpt•13m ago•0 comments

Before DevRel Was a Thing

https://meghangill.substack.com/p/before-devrel-was-a-thing
1•meghan•14m ago•0 comments

The invisible force making food less nutritious

https://www.washingtonpost.com/climate-environment/interactive/2026/carbon-pollution-diluting-key...
2•johnbarron•15m ago•0 comments

Introducing Stage: Engineers deserve a better code review platform

https://stagereview.app/blog/introducing-stage
2•cpan22•16m ago•0 comments

More Tokens Isn't More Intelligence

https://briannelee.substack.com/p/more-tokens-isnt-more-intelligence
1•BrianneLee011•18m ago•0 comments

AI On-Call Engineer That Fixes Prod While I Sleep

https://twitter.com/DVremenko/status/2049885593992126682
1•dimavrem22•18m ago•2 comments

Show HN: Milkdrop Visualizations with WASM+WebGPU [TW: flashing lights]

https://milkdrop.mahae.dev/
1•mkoh•22m ago•0 comments

Granite 4.1 LLMs: How They're Built

https://huggingface.co/blog/ibm-granite/granite-4-1
1•Brajeshwar•22m ago•0 comments

Main quests, subquests, side quests and minigames

https://stevepavlina.com/blog/2020/02/main-quest-subquest-side-quest-or-minigame/
1•highfrequency•23m ago•0 comments