frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Amazon shopping automation without vision

5•tonyww•9h ago
A common approach to automating Amazon shopping or similar complex websites is to reach for large cloud models (often vision-capable). I wanted to test a contradiction: can a ~3B parameter local LLM model complete the flow using only structural page data (DOM) plus deterministic assertions?

This post summarizes four runs of the same task (search → first product → add to cart → checkout on Amazon). The key comparison is Demo 0 (cloud baseline) vs Demo 3 (local autonomy); Demos 1–2 are intermediate controls.

More technical detail (architecture, code excerpts, additional log snippets):

https://www.sentienceapi.com/blog/verification-layer-amazon-case-study

Demo 0 vs Demo 3:

Demo 0 (cloud, GLM‑4.6 + structured snapshots) success: 1/1 run tokens: 19,956 (~43% reduction vs ~35k estimate) time: ~60,000ms cost: cloud API (varies) vision: not required

Demo 3 (local, DeepSeek R1 planner + Qwen ~3B executor) success: 7/7 steps (re-run) tokens: 11,114 time: 405,740ms cost: $0.00 incremental (local inference) vision: not required

Latency note: the local stack is slower end-to-end here largely because inference runs on local hardware (Mac Studio with M4); the cloud baseline benefits from hosted inference, but has per-token API cost.

Architecture

This worked because we changed the control plane and added a verification loop.

1) Constrain what the model sees (DOM pruning). We don’t feed the entire DOM or screenshots. We collect raw elements, then run a WASM pass to produce a compact “semantic snapshot” (roles/text/geometry) and prune the rest (often on the order of ~95% of nodes).

2) Split reasoning from acting (planner vs executor).

Planner (reasoning): DeepSeek R1 (local) generates step intent + what must be true afterward. Executor (action): Qwen ~3B (local) selects concrete DOM actions like CLICK(id) / TYPE(text). 3) Gate every step with Jest‑style verification. After each action, we assert state changes (URL changed, element exists/doesn’t exist, modal/drawer appeared). If a required assertion fails, the step fails with artifacts and bounded retries.

Minimal shape:

ok = await runtime.check( exists("role=textbox"), label="search_box_visible", required=True, ).eventually(timeout_s=10.0, poll_s=0.25, max_snapshot_attempts=3)

What changed between “agents that look smart” and agents that work Two examples from the logs:

Deterministic override to enforce “first result” intent: “Executor decision … [override] first_product_link -> CLICK(1022)”

Drawer handling that verifies and forces the correct branch: “result: PASS | add_to_cart_verified_after_drawer”

The important point is that these are not post‑hoc analytics. They are inline gates: the system either proves it made progress or it stops and recovers.

Takeaway If you’re trying to make browser agents reliable, the highest‑leverage move isn’t a bigger model. It’s constraining the state space and making success/failure explicit with per-step assertions.

Reliability in agents comes from verification (assertions on structured snapshots), not just scaling model size.

Comments

tonyww•9h ago
One clarification since a few comments from coworkers/friends are circling this: Amazon isn’t the point here.

We used it because it’s a dynamic, hostile UI, but the design goal is a site-agnostic control plane. That’s why the runtime avoids selectors and screenshots and instead operates on pruned semantic snapshots + verification gates.

If the layout changes, the system doesn’t “half-work” — it fails deterministically with artifacts. That’s the behavior we’re optimizing for.

cjbarber•1h ago
looks interesting, though note:

> Show HN is for something you've made that other people can play with.

> Off topic: blog posts, sign-up pages, newsletters, lists, and other reading material. Those can't be tried out, so can't be Show HNs. Make a regular submission instead.

https://news.ycombinator.com/showhn.html

Take potentially dangerous PDFs, and convert them to safe PDFs

https://github.com/freedomofpress/dangerzone
29•dp-hackernews•1h ago•8 comments

Show HN: ChartGPU – WebGPU-powered charting library (1M points at 60fps)

https://github.com/ChartGPU/ChartGPU
481•huntergemmer•9h ago•143 comments

Claude's new constitution

https://www.anthropic.com/news/claude-new-constitution
268•meetpateltech•8h ago•256 comments

Golfing APL/K in 90 Lines of Python

https://aljamal.substack.com/p/golfing-aplk-in-90-lines-of-python
41•aburjg•5d ago•7 comments

Show HN: RatatuiRuby wraps Rust Ratatui as a RubyGem – TUIs with the joy of Ruby

https://www.ratatui-ruby.dev/
41•Kerrick•4d ago•4 comments

Skip is now free and open source

https://skip.dev/blog/skip-is-free/
254•dayanruben•9h ago•99 comments

Letting Claude play text adventures

https://borretti.me/article/letting-claude-play-text-adventures
65•varjag•5d ago•23 comments

The WebRacket language is a subset of Racket that compiles to WebAssembly

https://github.com/soegaard/webracket
85•mfru•4d ago•20 comments

Challenges in join optimization

https://www.starrocks.io/blog/inside-starrocks-why-joins-are-faster-than-youd-expect
37•HermitX•7h ago•8 comments

Show HN: Rails UI

https://railsui.com/
98•justalever•5h ago•62 comments

Stevey's Birthday Blog

https://steve-yegge.medium.com/steveys-birthday-blog-34f437139cb5
14•throwawayHMM19•1d ago•3 comments

Jerry (YC S17) Is Hiring

https://www.ycombinator.com/companies/jerry-inc/jobs/QaoK3rw-software-engineer-core-automation-ma...
1•linaz•3h ago

TrustTunnel: AdGuard VPN protocol goes open-source

https://adguard-vpn.com/en/blog/adguard-vpn-protocol-goes-open-source-meet-trusttunnel.html
47•kumrayu•7h ago•10 comments

Waiting for dawn in search: Search index, Google rulings and impact on Kagi

https://blog.kagi.com/waiting-dawn-search
209•josephwegner•6h ago•133 comments

Mystery of the Head Activator

https://www.asimov.press/p/head-activator
12•mailyk•3d ago•1 comments

Three types of LLM workloads and how to serve them

https://modal.com/llm-almanac/workloads
29•charles_irl•8h ago•1 comments

An explanation of cheating in Doom2 Deathmatch (1999)

https://www.doom2.net/doom2/cheating.html
16•Lammy•4d ago•1 comments

Setting Up a Cluster of Tiny PCs for Parallel Computing

https://www.kenkoonwong.com/blog/parallel-computing/
23•speckx•5h ago•11 comments

SIMD programming in pure Rust

https://kerkour.com/introduction-rust-simd
43•randomint64•2d ago•14 comments

Nested code fences in Markdown

https://susam.net/nested-code-fences.html
183•todsacerdoti•11h ago•61 comments

Can you slim macOS down?

https://eclecticlight.co/2026/01/21/can-you-slim-macos-down/
162•ingve•16h ago•201 comments

Tell HN: 2 years building a kids audio app as a solo dev – lessons learned

27•oliverjanssen•10h ago•24 comments

Scientists find a way to regrow cartilage in mice and human tissue samples

https://www.sciencedaily.com/releases/2026/01/260120000333.htm
242•saikatsg•6h ago•65 comments

Slouching Towards Bethlehem – Joan Didion (1967)

https://www.saturdayeveningpost.com/2017/06/didion/
53•jxmorris12•6h ago•4 comments

Open source server code for the BitCraft MMORPG

https://github.com/clockworklabs/BitCraftPublic
28•sfkgtbor•7h ago•9 comments

I finally got my sway layout to autostart the way I like it

https://hugues.betakappaphi.com/2026/01/19/sway-layout/
18•__hugues•15h ago•4 comments

Without benchmarking LLMs, you're likely overpaying

https://karllorey.com/posts/without-benchmarking-llms-youre-overpaying
132•lorey•1d ago•70 comments

JPEG XL Test Page

https://tildeweb.nl/~michiel/jxl/
158•roywashere•7h ago•109 comments

Show HN: TerabyteDeals – Compare storage prices by $/TB

https://terabytedeals.com
57•vektor888•3h ago•49 comments

TeraWave Satellite Communications Network

https://www.blueorigin.com/news/blue-origin-introduces-terawave-space-based-network-for-global-co...
113•T-A•5h ago•83 comments