frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

https://github.com/memovai/mimiclaw
1•ssslvky1•10s ago•0 comments

I Maintain My Blog in the Age of Agents

https://www.jerpint.io/blog/2026-02-07-how-i-maintain-my-blog-in-the-age-of-agents/
1•jerpint•35s ago•0 comments

The Fall of the Nerds

https://www.noahpinion.blog/p/the-fall-of-the-nerds
1•otoolep•2m ago•0 comments

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

https://the-lexicon-project.netlify.app/
1•breadwithjam•5m ago•1 comments

How close is AI to taking my job?

https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job
1•cjbarber•5m ago•0 comments

You are the reason I am not reviewing this PR

https://github.com/NixOS/nixpkgs/pull/479442
2•midzer•7m ago•1 comments

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

https://familymemories.video
1•tareq_•8m ago•0 comments

How Meta Made Linux a Planet-Scale Load Balancer

https://softwarefrontier.substack.com/p/how-meta-turned-the-linux-kernel
1•CortexFlow•8m ago•0 comments

A Turing Test for AI Coding

https://t-cadet.github.io/programming-wisdom/#2026-02-06-a-turing-test-for-ai-coding
2•phi-system•9m ago•0 comments

How to Identify and Eliminate Unused AWS Resources

https://medium.com/@vkelk/how-to-identify-and-eliminate-unused-aws-resources-b0e2040b4de8
2•vkelk•9m ago•0 comments

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

https://github.com/MrTechGadget/A2C_DVI_SMD
2•mmoogle•10m ago•0 comments

CLI for Common Playwright Actions

https://github.com/microsoft/playwright-cli
3•saikatsg•11m ago•0 comments

Would you use an e-commerce platform that shares transaction fees with users?

https://moondala.one/
1•HamoodBahzar•12m ago•1 comments

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

https://github.com/ykdojo/safeclaw
2•ykdojo•16m ago•0 comments

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3
3•gmays•16m ago•0 comments

The Evolution of the Interface

https://www.asktog.com/columns/038MacUITrends.html
2•dhruv3006•18m ago•1 comments

Azure: Virtual network routing appliance overview

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-routing-appliance-overview
2•mariuz•18m ago•0 comments

Seedance2 – multi-shot AI video generation

https://www.genstory.app/story-template/seedance2-ai-story-generator
2•RyanMu•22m ago•1 comments

Πfs – The Data-Free Filesystem

https://github.com/philipl/pifs
2•ravenical•25m ago•0 comments

Go-busybox: A sandboxable port of busybox for AI agents

https://github.com/rcarmo/go-busybox
3•rcarmo•26m ago•0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf
2•gmays•27m ago•0 comments

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

https://www.bloomberg.com/news/newsletters/2026-02-03/musk-s-xai-merger-poses-bigger-threat-to-op...
2•andsoitis•27m ago•0 comments

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

https://www.youtube.com/watch?v=UNorxwlZlFk
2•lysace•28m ago•0 comments

Zen Tools

http://postmake.io/zen-list
2•Malfunction92•30m ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
2•carnevalem•30m ago•1 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•32m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
2•rcarmo•33m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•34m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•34m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
3•Brajeshwar•34m ago•0 comments
Open in hackernews

Show HN: Semantic geometry visual grounding for AI web agents (Amazon demo)

2•tonyww•1mo ago
Hi HN,

I’m a solo founder working on SentienceAPI, a perception & execution layer that helps LLM agents act reliably on real websites.

LLMs are good at planning steps, but they fail a lot when actually interacting with the web. Vision-only agents are expensive and unstable, and DOM-based automation breaks easily on modern pages with overlays, dynamic layouts, and lots of noise.

My approach is semantic geometry-based visual grounding.

Instead of giving the model raw HTML (huge context) or a screenshot (imprecise) and asking it to guess, the API first reduces a webpage into a small, grounded action space made only of elements that are actually visible and interactable. Each element includes geometry plus lightweight visual cues, so the model can decide what to do without guessing.

I built a reference app called MotionDocs on top of this. The demo below shows the system navigating Amazon Best Sellers, opening a product, and clicking “Add to cart” using grounded coordinates (no scripted clicks).

Demo video (Add to Cart): [https://youtu.be/1DlIeHvhOg4](https://youtu.be/1DlIeHvhOg4)

How the agent sees the page (map mode wireframe): [https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co...](https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co...)

This wireframe shows the reduced action space surfaced to the LLM. Each box corresponds to a visible, interactable element.

Code excerpt (simplified):

``` from sentienceapi_sdk import SentienceApiClient from motiondocs import generate_video

video = generate_video( url="https://www.amazon.com/gp/bestsellers/", instructions="Open a product and add it to cart", sentience_client=SentienceApiClient(api_key="your-api-key-here") )

video.save("demo.mp4") ```

How it works (high level):

The execution layer treats the browser as a black box and exposes three modes:

* Map: identify interactable elements with geometry and visual cues * Visual: align geometry with screenshots for grounding * Read: extract clean, LLM-ready text

The key insight is visual cues, especially a simple is_primary signal. Humans don’t read every pixel — we scan for visual hierarchy. Encoding that directly lets the agent prioritize the right actions without processing raw pixels or noisy DOM.

Why this matters:

* smaller action space → fewer hallucinations * deterministic geometry → reproducible execution * cheaper than vision-only approaches

TL;DR: I’m building a semantic geometry grounding layer that turns web pages into a compact, visually grounded action space for LLM agents. It gives the model a cheat sheet instead of asking it to solve a vision puzzle.

This is early work, not launched yet. I’d love feedback or skepticism, especially from people building agents, RPA, QA automation, or dev tools.

— Tony W

Comments

tonyww•1mo ago
Example JSON Response (Simplified):

```

[ { "id": 42, "role": "button", "text": "Add to Cart", "bbox": { "x": 935, "y": 529, "w": 200, "h": 50 }, "visual_cues": { "cursor": "pointer", "is_primary": true, "color_name": "yellow" } }, { "id": 43, "role": "link", "text": "Privacy Policy", "bbox": { "x": 100, "y": 1200, "w": 80, "h": 20 }, "visual_cues": { "cursor": "pointer", "is_primary": false } } ]

```

This prototype builds on several open-source libraries:

MoviePy – video composition and rendering Pillow (PIL) – image processing and overlays

The demo app (MotionDocs) uses the public SentienceAPI SDK, generated from OpenAPI, which is the same interface used by the system internally.