frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Amazon shopping automation without vision

5•tonyww•9h ago
A common approach to automating Amazon shopping or similar complex websites is to reach for large cloud models (often vision-capable). I wanted to test a contradiction: can a ~3B parameter local LLM model complete the flow using only structural page data (DOM) plus deterministic assertions?

This post summarizes four runs of the same task (search → first product → add to cart → checkout on Amazon). The key comparison is Demo 0 (cloud baseline) vs Demo 3 (local autonomy); Demos 1–2 are intermediate controls.

More technical detail (architecture, code excerpts, additional log snippets):

https://www.sentienceapi.com/blog/verification-layer-amazon-case-study

Demo 0 vs Demo 3:

Demo 0 (cloud, GLM‑4.6 + structured snapshots) success: 1/1 run tokens: 19,956 (~43% reduction vs ~35k estimate) time: ~60,000ms cost: cloud API (varies) vision: not required

Demo 3 (local, DeepSeek R1 planner + Qwen ~3B executor) success: 7/7 steps (re-run) tokens: 11,114 time: 405,740ms cost: $0.00 incremental (local inference) vision: not required

Latency note: the local stack is slower end-to-end here largely because inference runs on local hardware (Mac Studio with M4); the cloud baseline benefits from hosted inference, but has per-token API cost.

Architecture

This worked because we changed the control plane and added a verification loop.

1) Constrain what the model sees (DOM pruning). We don’t feed the entire DOM or screenshots. We collect raw elements, then run a WASM pass to produce a compact “semantic snapshot” (roles/text/geometry) and prune the rest (often on the order of ~95% of nodes).

2) Split reasoning from acting (planner vs executor).

Planner (reasoning): DeepSeek R1 (local) generates step intent + what must be true afterward. Executor (action): Qwen ~3B (local) selects concrete DOM actions like CLICK(id) / TYPE(text). 3) Gate every step with Jest‑style verification. After each action, we assert state changes (URL changed, element exists/doesn’t exist, modal/drawer appeared). If a required assertion fails, the step fails with artifacts and bounded retries.

Minimal shape:

ok = await runtime.check( exists("role=textbox"), label="search_box_visible", required=True, ).eventually(timeout_s=10.0, poll_s=0.25, max_snapshot_attempts=3)

What changed between “agents that look smart” and agents that work Two examples from the logs:

Deterministic override to enforce “first result” intent: “Executor decision … [override] first_product_link -> CLICK(1022)”

Drawer handling that verifies and forces the correct branch: “result: PASS | add_to_cart_verified_after_drawer”

The important point is that these are not post‑hoc analytics. They are inline gates: the system either proves it made progress or it stops and recovers.

Takeaway If you’re trying to make browser agents reliable, the highest‑leverage move isn’t a bigger model. It’s constraining the state space and making success/failure explicit with per-step assertions.

Reliability in agents comes from verification (assertions on structured snapshots), not just scaling model size.

Comments

tonyww•9h ago
One clarification since a few comments from coworkers/friends are circling this: Amazon isn’t the point here.

We used it because it’s a dynamic, hostile UI, but the design goal is a site-agnostic control plane. That’s why the runtime avoids selectors and screenshots and instead operates on pruned semantic snapshots + verification gates.

If the layout changes, the system doesn’t “half-work” — it fails deterministically with artifacts. That’s the behavior we’re optimizing for.

cjbarber•1h ago
looks interesting, though note:

> Show HN is for something you've made that other people can play with.

> Off topic: blog posts, sign-up pages, newsletters, lists, and other reading material. Those can't be tried out, so can't be Show HNs. Make a regular submission instead.

https://news.ycombinator.com/showhn.html

Show HN: ChartGPU – WebGPU-powered charting library (1M points at 60fps)

https://github.com/ChartGPU/ChartGPU
477•huntergemmer•9h ago•142 comments

Show HN: RatatuiRuby wraps Rust Ratatui as a RubyGem – TUIs with the joy of Ruby

https://www.ratatui-ruby.dev/
41•Kerrick•4d ago•4 comments

Show HN: Rails UI

https://railsui.com/
97•justalever•5h ago•62 comments

Show HN: TerabyteDeals – Compare storage prices by $/TB

https://terabytedeals.com
57•vektor888•3h ago•49 comments

Show HN: Company hiring trends and insights from job postings

https://jobswithgpt.com/company-profiles/
35•sp1982•6h ago•4 comments

Show HN: Semantic search engine for Studio Ghibli movie

https://ghibli-search.anini.workers.dev/
11•aninibread•10h ago•7 comments

Show HN: Retain – A unified knowledge base for all your AI coding conversations

https://github.com/BayramAnnakov/retain
9•Bayram•4h ago•5 comments

Show HN: See the carbon impact of your cloud as you code

https://dashboard.infracost.io/
52•hkh•9h ago•20 comments

Show HN: Amazon shopping automation without vision

5•tonyww•9h ago•2 comments

Show HN: SpeechOS – Wispr Flow-inspired voice input for any web app

https://www.speechos.ai/
11•gangster_dave•8h ago•5 comments

Show HN: yolo-cage – AI coding agents that can't exfiltrate secrets

https://github.com/borenstein/yolo-cage
43•borenstein•9h ago•63 comments

Show HN: Hyve – Parallel isolated workspaces for coding agents, multi-repo dev

5•eladkishon•16h ago•1 comments

Show HN: PicoFlow – a tiny DSL-style Python library for LLM agent workflows

4•shijizhi_1919•9h ago•0 comments

Show HN Guidelines

https://news.ycombinator.com/yli.html
2•cjbarber•1h ago•0 comments

Show HN: Mastra 1.0, open-source JavaScript agent framework from the Gatsby devs

https://github.com/mastra-ai/mastra
206•calcsam•1d ago•69 comments

Show HN: Automatically build sales playbook. For founders doing sales

6•Mrakermo•4h ago•0 comments

Show HN: I'm eating at all the phở restaurants in Portland, at least twice

https://pho.curtisbarnard.com/
2•oregoncurtis•2h ago•3 comments

Show HN: UltraContext – A simple context API for AI agents with auto-versioning

https://ultracontext.ai/
15•ofabioroma•9h ago•18 comments

Show HN: I built an AI coach for introverted leaders

https://www.leadquiet.com/landing
2•chux52•3h ago•0 comments

Show HN: Sornic – Turn any article into a podcast in 10 seconds

https://sornic.com
2•digi_wares•4h ago•1 comments

Show HN: Consensus for Side Effects

https://github.com/abokhalill/chr2
2•yousef06•4h ago•0 comments

Show HN: Haven – Anti Brain Rot Android Launcher

https://play.google.com/store/apps/details?id=dev.speczo.haven&hl=en_US
2•sunamic•57m ago•0 comments

Show HN: I built a chess explorer that explains strategy instead of just stats

https://www.atlaschess.me/
9•Ahmad_shuja•8h ago•5 comments

Show HN: Exploring structure in a 1D cellular automaton

https://github.com/arvatamas/The-Cosmic-Mirror
3•aaatamas•4h ago•0 comments

Show HN: Agent Skills Leaderboard

https://skills.sh
124•andrewqu•1d ago•41 comments

Show HN: Claude Skill for App Store Compliance

https://github.com/safaiyeh/app-store-review-skill
2•jsafaiyeh•4h ago•0 comments

Show HN: SeeClaudeCode – visualize Claude Code's edits to your repo in real time

https://seeclaudecode.fly.dev/
3•ninajlu•5h ago•1 comments

Show HN: A minimal beads-like issue tracker for AI agents

https://github.com/obsfx/trekker
2•obsfx•5h ago•1 comments

Show HN: What I learned building a local-only password manager (PassForgePro)

https://github.com/can-deliktas/PassForgePro
5•can-deliktas•5h ago•0 comments

Show HN: TopicRadar – Track trending topics across HN, GitHub, ArXiv, and more

https://apify.com/mick-johnson/topic-radar
34•MickolasJae•1d ago•9 comments