frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

We tasked 9 browser agents to shop on Amazon, only 2 picked the right product

https://www.flowtester.ai/articles/same-test-9-models
3•amoshaviv•1h ago

Comments

amoshaviv•1h ago
We ran the exact same Amazon shopping task with 9 leading AI models in the browser. Same site, same steps, same environment. Only the model changed. A few things stood out:

1. Fastest model: 70 seconds 2. Slowest model: 340 seconds 3. Cost range: $0.03 to $1.04 4. Only 2 of 9 models picked the right product!

throwawayffffas•1h ago
Hm... They all got the right product the "cheapest result". You didn't specify the cheapest laptop.

Arguably the ones that got the laptop, assumed you wanted a laptop, and went against your instructions.

amoshaviv•1h ago
I see where you come from, but humans to tend to phrase themselves that way, and intentions are understood, but more importantly, the last step is:

"6. Navigate to the cart page and validate the laptop you chose is in the cart."

So one could argue inferring this is trivial.

vova_hn2•1h ago
Why would you need powerful models if you give them such mechanical, stifling instructions?

I think that the result would be much better if you told them what exactly do you want in plain text.

amoshaviv•1h ago
I wanted to make sure "thinking" and "planning" features are not being tested in this comparison, but I definitely tested "simply phrased" tasks as well: https://www.flowtester.ai/shared/ce1c8ef9-f387-48be-93f0-938...

Humanity Must Win: Defending Rights, Tackling Repression at the 2026 FIFA World

https://www.amnesty.org/en/documents/ior10/0837/2026/en/
1•hkhn•38s ago•0 comments

Uvwatauavawh – Meet the Pushy String (2013)

https://www.hexacorn.com/blog/2013/05/16/uvwatauavawh-meet-the-pushy-string/
1•dryarzeg•55s ago•0 comments

Show HN: Coasts – Containerized Hosts for Agents

https://github.com/coast-guard/coasts
1•jsunderland323•1m ago•1 comments

CodingFont: A game to help you pick a coding font

https://www.codingfont.com/
2•nvahalik•3m ago•0 comments

Tests Aren't for Catching Bugs

https://trippw.com/blog/tests-as-institutional-memory
1•devTripp•5m ago•0 comments

Show HN: NetLens – Instant CLI Assistant for Network Engineers

https://v0-netlens.vercel.app
1•nadavdebi•5m ago•0 comments

Show HN: AI Spotlight for Your Computer (natural language search for files)

1•DEEPAN_C•6m ago•0 comments

Making a Yooperlite Sphere [video]

https://www.youtube.com/watch?v=dKAKw2ugiyE
1•gus_massa•6m ago•0 comments

Solo dev creates Open Source Turboquant

https://github.com/TheTom/turboquant_plus
1•nico•8m ago•1 comments

Prompt Helix – Ask AI about any webpage without copy-pasting context

https://chromewebstore.google.com/detail/prompt-helix/ffjppocigpeamhokbpnknlplkbccjpin
1•helixlabs-dev•12m ago•0 comments

Nearly three-quarters of England's woods inaccessible to public, study finds

https://www.theguardian.com/environment/2026/mar/13/nearly-three-quarters-of-englands-woods-inacc...
1•robtherobber•12m ago•0 comments

Show HN: axil – A terminal user-interface for treesitter

https://github.com/terror/axil
2•crap•13m ago•0 comments

The Anatomy of an LLM Benchmark

https://cameronrwolfe.substack.com/p/llm-bench
1•Brajeshwar•14m ago•0 comments

The Four Laws of Black Hole Mechanics [video]

https://www.youtube.com/watch?v=54n0WofSNno
1•vinhnx•16m ago•0 comments

In Denmark, the Center Did Not Hold

https://jacobin.com/2026/03/denmark-social-democrats-centrism-elections/
2•PaulHoule•18m ago•0 comments

"Over 1.5 million GitHub PRs have had ads injected into them by Copilot"

https://www.neowin.net/news/microsoft-copilot-is-now-injecting-ads-into-pull-requests-on-github-g...
22•bundie•19m ago•5 comments

Anthropic Says Use More Agents to Fix Agent Code. Here's What's Missing

https://mergeshield.dev/blog/anthropic-multi-agent-harness-whats-missing
2•mergeshield•19m ago•0 comments

Show HN: Local video search with Qwen3-VL: no API, runs on Apple Silicon, GPUs

https://github.com/ssrajadh/sentrysearch/tree/master
1•sohamrj•19m ago•0 comments

The Race Down

https://renfoc.us/posts/1774107059-the_race_down
1•rtrigoso•20m ago•0 comments

Speed Is a Tactic, Not a Virtue

https://eleganthack.com/speed-is-a-tactic-not-a-virtue/
1•speckx•21m ago•0 comments

50 Years of Thinking Different

https://www.apple.com/50-years-of-thinking-different/
1•reconnecting•25m ago•0 comments

Zero Ambient Authority: The Principle That Should Govern Every AI Agent

https://grith.ai/blog/zero-ambient-authority-ai-agents
2•edf13•25m ago•0 comments

Show HN: Data Hogo – scan your repo for security issues

https://www.datahogo.com/en
1•efecto1920•26m ago•0 comments

Germany considers ramping up coal power to avert energy crisis

https://www.politico.eu/article/germany-considers-ramping-up-coal-power-to-avert-energy-crisis/
1•leonidasrup•27m ago•1 comments

Nested Simulation and Nested Intelligence: A Pessimistic Thought

https://lizeng614.github.io/posts/when-reality-feels-structured/?lang=en
1•LeoisNotAI•27m ago•0 comments

My side project was annihilated by Google

4•lilouartz•27m ago•0 comments

Railway (web app host) "accidentally enables CDN" causing massive data breaches

https://station.railway.com/questions/data-getting-cached-or-something-e82cb4cc
5•hihicoderhi•29m ago•1 comments

Show HN: Hacker News comments summary to telegram

https://github.com/juanpabloaj/hacker-news-summary
1•juanpabloaj•29m ago•0 comments

72% of the dollar's purchasing power was destroyed in just four episodes

https://eco3min.fr/en/us-inflation-is-not-linear/
3•latentframe•29m ago•0 comments

Same LLM, Different Agent: What Changes When You Specialize for CI

https://www.mendral.com/blog/same-llm-different-agent
1•shad42•30m ago•0 comments