frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Amazon shopping automation without vision: verification gate+local model (3B)

1•tonyww•1h ago
A common approach to automating Amazon shopping or similar complex websites is to reach for large cloud models (often vision-capable). I wanted to test a contradiction: can a ~3B parameter local LLM model complete the flow using only structural page data (DOM) plus deterministic assertions?

This post summarizes four runs of the same task (search → first product → add to cart → checkout on Amazon). The key comparison is Demo 0 (cloud baseline) vs Demo 3 (local autonomy); Demos 1–2 are intermediate controls.

More technical detail (architecture, code excerpts, additional log snippets):

https://www.sentienceapi.com/blog/verification-layer-amazon-case-study

Demo 0 vs Demo 3:

Demo 0 (cloud, GLM‑4.6 + structured snapshots) success: 1/1 run tokens: 19,956 (~43% reduction vs ~35k estimate) time: ~60,000ms cost: cloud API (varies) vision: not required

Demo 3 (local, DeepSeek R1 planner + Qwen ~3B executor) success: 7/7 steps (re-run) tokens: 11,114 time: 405,740ms cost: $0.00 incremental (local inference) vision: not required

Latency note: the local stack is slower end-to-end here largely because inference runs on local hardware (Mac Studio with M4); the cloud baseline benefits from hosted inference, but has per-token API cost.

Architecture

This worked because we changed the control plane and added a verification loop.

1) Constrain what the model sees (DOM pruning). We don’t feed the entire DOM or screenshots. We collect raw elements, then run a WASM pass to produce a compact “semantic snapshot” (roles/text/geometry) and prune the rest (often on the order of ~95% of nodes).

2) Split reasoning from acting (planner vs executor).

Planner (reasoning): DeepSeek R1 (local) generates step intent + what must be true afterward. Executor (action): Qwen ~3B (local) selects concrete DOM actions like CLICK(id) / TYPE(text). 3) Gate every step with Jest‑style verification. After each action, we assert state changes (URL changed, element exists/doesn’t exist, modal/drawer appeared). If a required assertion fails, the step fails with artifacts and bounded retries.

Minimal shape:

ok = await runtime.check( exists("role=textbox"), label="search_box_visible", required=True, ).eventually(timeout_s=10.0, poll_s=0.25, max_snapshot_attempts=3)

What changed between “agents that look smart” and agents that work Two examples from the logs:

Deterministic override to enforce “first result” intent: “Executor decision … [override] first_product_link -> CLICK(1022)”

Drawer handling that verifies and forces the correct branch: “result: PASS | add_to_cart_verified_after_drawer”

The important point is that these are not post‑hoc analytics. They are inline gates: the system either proves it made progress or it stops and recovers.

Takeaway If you’re trying to make browser agents reliable, the highest‑leverage move isn’t a bigger model. It’s constraining the state space and making success/failure explicit with per-step assertions.

Reliability in agents comes from verification (assertions on structured snapshots), not just scaling model size.

Comments

tonyww•1h ago
One clarification since a few comments from coworkers/friends are circling this: Amazon isn’t the point here.

We used it because it’s a dynamic, hostile UI, but the design goal is a site-agnostic control plane. That’s why the runtime avoids selectors and screenshots and instead operates on pruned semantic snapshots + verification gates.

If the layout changes, the system doesn’t “half-work” — it fails deterministically with artifacts. That’s the behavior we’re optimizing for.

PicoPCMCIA – a PCMCIA development board for retro-computing enthusiasts

https://www.yyzkevin.com/picopcmcia/
1•rbanffy•40s ago•0 comments

Show HN: Guava Range Parser – Parse "[0..100)" strings into Guava Range object

https://github.com/neewrobert/guava-range-parser
1•neewrobert•2m ago•0 comments

Show HN: Patchli.st – Bug bounties for indie SaaS founders

https://www.patchli.st/
1•massi24•2m ago•0 comments

Share Your Website at Events

https://jamesg.blog/2026/01/21/share-your-website-at-events
1•speckx•2m ago•0 comments

Show HN: Analyze binary capabilities in-browser with capa and Pyodide

https://surfactant.readthedocs.io/en/latest/capa/
1•rmast•2m ago•0 comments

European lawmakers suspend U.S. trade deal

https://www.cnbc.com/2026/01/21/european-lawmakers-suspend-us-trade-deal-amid-greenland-tariff-te...
3•belter•4m ago•1 comments

Show HN: S2-lite, an open source Stream Store

https://github.com/s2-streamstore/s2
1•shikhar•4m ago•0 comments

Your prod code should have bugs

https://lucaspauker.com/articles/your-prod-code-should-have-bugs/
1•lucaspauker•4m ago•0 comments

JPEG XL Demo Page

https://tildeweb.nl/~michiel/jxl/
2•roywashere•6m ago•0 comments

Dressing Blade Runner: Interview with Set Decorator Linda DeScenna (2001)

https://media.bladezone.com/contents/film/production/Linda-DeScenna/index.html
2•exvi•6m ago•0 comments

Citigroup to boost Japan investment banking team on deal boom

https://www.japantimes.co.jp/business/2025/12/23/companies/citigroup-investment-banking-boost/
3•PaulHoule•7m ago•0 comments

ZScript

https://github.com/zscriptlang/zscript
3•ziyaadsaqlain•7m ago•1 comments

The long painful history of (re)using login to log people in

https://utcc.utoronto.ca/~cks/space/blog/unix/LoginProgramReuseFootgun
3•rkta•7m ago•0 comments

Self-hosted AI data workflow: DB and Ollama and SQL

https://exasol.github.io/developer-documentation/main/gen_ai/ai_text_summary/index.html
2•exasol_nerd•8m ago•2 comments

Show HN: Prometheus – Give LLMs memory, dreams, and contradiction detection

https://github.com/panosbee/PROMETHEUS
1•panossk•9m ago•0 comments

Xgotop: Realtime Go Runtime Visualizer

https://devpost.com/software/xgotop-go-runtime-observer
2•tanelpoder•9m ago•0 comments

Show HN: MedDiscovery – Autonomous hypothesis generator for dead-end diseases

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5973574
1•panossk•12m ago•0 comments

Show HN: Reproduce and benchmark ML papers in your terminal before implementing

https://github.com/danishm07/tomea
1•_danish•13m ago•1 comments

Avoid Cerebras if you are a founder

4•remusomega•15m ago•1 comments

Find U.S. Manufacturers in Seconds – CNC, sheet metal, molding, etc.

https://build.link/
3•donewithfuess•16m ago•0 comments

AI future will be nothing like present

https://distantprovince.by/posts/ai-future-will-be-nothing-like-present/
1•distantprovince•17m ago•1 comments

The gold plating of American water

https://worksinprogress.co/issue/the-gold-plating-of-american-water/
1•mindracer•17m ago•0 comments

Show HN: QTap DevTools – Chrome-style encrypted traffic inspector for Linux

https://qpoint.io/products/devtools/
9•tylerflint•18m ago•0 comments

Five Mistakes I've Made with Euler Angles

https://buchanan.one/blog/rotations/
1•boscillator•19m ago•0 comments

The Evolution Gap: keeping external system changes in sync with application

https://flamingock.io/blog/evolving-external-systems
1•aperezdieppa•20m ago•1 comments

Nexphone: Android phone also runs Windows and Linux

https://nexphone.com/blog/the-tale-of-nexphone-one-phone-every-computer
2•gripfx•20m ago•1 comments

Full Text: Charter of Trump's Board of Peace

https://www.msn.com/en-us/news/world/full-text-charter-of-trump-s-board-of-peace/ar-AA1UqfFa
4•vinnyglennon•21m ago•0 comments

Ask HN: What single AI tool/technique 10x'd your productivity last year?

2•laxmena•22m ago•2 comments

Sigmas and Student

https://www.johndcook.com/blog/2026/01/21/sigmas-and-student/
1•ibobev•23m ago•0 comments

A browser extension that restores climate hazard risks for CA Zillow listings

https://github.com/nmatouka/climate-risk-plugin
2•aprct•23m ago•0 comments