frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
242•isitcontent•16h ago•27 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
345•vecti•18h ago•153 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
2•sam256•44m ago•1 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
310•eljojo•19h ago•192 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
5•sakanakana00•1h ago•1 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•1h ago•0 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
77•phreda4•16h ago•14 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
93•antves•1d ago•70 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
17•denuoweb•2d ago•2 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
26•dchu17•21h ago•12 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
49•nwparker•1d ago•11 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
2•melvinzammit•3h ago•0 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
152•bsgeraci•1d ago•64 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•4h ago•2 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
18•NathanFlurry•1d ago•9 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
10•michaelchicory•5h ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
15•keepamovin•6h ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•21h ago•7 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
172•vkazanov•2d ago•49 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
5•rahuljaguste•15h ago•1 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•9h ago•0 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•10h ago•4 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•11h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
12•KevinChasse•21h ago•16 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•12h ago•1 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•22h ago•0 comments

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•13h ago•0 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•13h ago•0 comments
Open in hackernews

A verification layer for browser agents: Amazon case study

https://www.sentienceapi.com/blog/verification-layer-amazon-case-study
28•tonyww•2w ago
A common approach to automating Amazon shopping or similar complex websites is to reach for large cloud models (often vision-capable). I wanted to test a contradiction: can a ~3B parameter local LLM model complete the flow using only structural page data (DOM) plus deterministic assertions?

This post summarizes four runs of the same task (search → first product → add to cart → checkout on Amazon). The key comparison is Demo 0 (cloud baseline) vs Demo 3 (local autonomy); Demos 1–2 are intermediate controls.

More technical detail (architecture, code excerpts, additional log snippets):

https://www.sentienceapi.com/blog/verification-layer-amazon-...

Demo 0 vs Demo 3:

Demo 0 (cloud, GLM‑4.6 + structured snapshots) success: 1/1 run tokens: 19,956 (~43% reduction vs ~35k estimate) time: ~60,000ms cost: cloud API (varies) vision: not required

Demo 3 (local, DeepSeek R1 planner + Qwen ~3B executor) success: 7/7 steps (re-run) tokens: 11,114 time: 405,740ms cost: $0.00 incremental (local inference) vision: not required

Latency note: the local stack is slower end-to-end here largely because inference runs on local hardware (Mac Studio with M4); the cloud baseline benefits from hosted inference, but has per-token API cost.

Architecture

This worked because we changed the control plane and added a verification loop.

1) Constrain what the model sees (DOM pruning). We don’t feed the entire DOM or screenshots. We collect raw elements, then run a WASM pass to produce a compact “semantic snapshot” (roles/text/geometry) and prune the rest (often on the order of ~95% of nodes).

2) Split reasoning from acting (planner vs executor).

Planner (reasoning): DeepSeek R1 (local) generates step intent + what must be true afterward. Executor (action): Qwen ~3B (local) selects concrete DOM actions like CLICK(id) / TYPE(text). 3) Gate every step with Jest‑style verification. After each action, we assert state changes (URL changed, element exists/doesn’t exist, modal/drawer appeared). If a required assertion fails, the step fails with artifacts and bounded retries.

Minimal shape:

ok = await runtime.check( exists("role=textbox"), label="search_box_visible", required=True, ).eventually(timeout_s=10.0, poll_s=0.25, max_snapshot_attempts=3)

What changed between “agents that look smart” and agents that work Two examples from the logs:

Deterministic override to enforce “first result” intent: “Executor decision … [override] first_product_link -> CLICK(1022)”

Drawer handling that verifies and forces the correct branch: “result: PASS | add_to_cart_verified_after_drawer”

The important point is that these are not post‑hoc analytics. They are inline gates: the system either proves it made progress or it stops and recovers.

Takeaway If you’re trying to make browser agents reliable, the highest‑leverage move isn’t a bigger model. It’s constraining the state space and making success/failure explicit with per-step assertions.

Reliability in agents comes from verification (assertions on structured snapshots), not just scaling model size.

Comments

tonyww•2w ago
One clarification since a few comments from coworkers/friends are circling this: Amazon isn’t the point here.

We used it because it’s a dynamic, hostile UI, but the design goal is a site-agnostic control plane. That’s why the runtime avoids selectors and screenshots and instead operates on pruned semantic snapshots + verification gates.

If the layout changes, the system doesn’t “half-work” — it fails deterministically with artifacts. That’s the behavior we’re optimizing for.

tomhow•2w ago
Can you please clarify: is this project something that "people can play with"? I.e., can users download the code and sample data and try it out for themselves, or play with it some other way?

That's a prerequisite for Show HN.

I'm removing the Show HN prefix for now, until we get clarity. Then we can consider re-upping the post once we know exactly how to present it.

tonyww•2w ago
yes, the repo is publicly available: https://github.com/SentienceAPI/sentience-sdk-playground you can pull it and set up the dependencies including sentience API key, then run the main.py in the planner_executor_local folder
ares623•2w ago
> If the layout changes, the system doesn’t “half-work” — it fails deterministically with artifacts. That’s the behavior we’re optimizing for.

how is this different than building a scraper script that does it traditionally?

blibble•2w ago
it costs a lot more
tonyww•2w ago
Good question. On the surface, it does look very similar to the traditional scraper/script, but there's a subtle difference in where the logic lives and how failures are handled.

A traditional scraper/script hard-codes selectors and control flow up front. When the layout changes, it usually breaks at an arbitrary line and you debug it manually.

In this setup, the agent chooses actions at *runtime* from a bounded action space, and the system uses the built-in predicates (e.g. url_changes, drawer_appeared, etc) to verify the outcomes. When it fails, it fails at a specific semantic assertion with artifacts, not a missing selector.

So it’s less “replace scripts” and more “apply test-style verification and recovery to AI-driven decisions instead of static code.”

cjbarber•2w ago
looks interesting, though note:

> Show HN is for something you've made that other people can play with.

> Off topic: blog posts, sign-up pages, newsletters, lists, and other reading material. Those can't be tried out, so can't be Show HNs. Make a regular submission instead.

https://news.ycombinator.com/showhn.html

tonyww•2w ago
Sorry for the misunderstanding, I intended to post it as news or engineering article, which is why I didn't include *Show HN* in the title