frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

I replaced the front page with AI slop and honestly it's an improvement

https://slop-news.pages.dev/slop-news
1•keepamovin•3m ago•1 comments

Economists vs. Technologists on AI

https://ideasindevelopment.substack.com/p/economists-vs-technologists-on-ai
1•econlmics•5m ago•0 comments

Life at the Edge

https://asadk.com/p/edge
1•tosh•11m ago•0 comments

RISC-V Vector Primer

https://github.com/simplex-micro/riscv-vector-primer/blob/main/index.md
2•oxxoxoxooo•14m ago•1 comments

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

2•InvoxoEU•15m ago•0 comments

A Tale of Two Standards, POSIX and Win32 (2005)

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
2•goranmoomin•19m ago•0 comments

Ask HN: Is the Downfall of SaaS Started?

3•throwaw12•20m ago•0 comments

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
2•senekor•21m ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•24m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
2•myk-e•26m ago•4 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•27m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
3•1vuio0pswjnm7•29m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
2•1vuio0pswjnm7•31m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•33m ago•2 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•36m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•41m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•42m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•46m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•58m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•1h ago•1 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•1h ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•1h ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•1h ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•1h ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•1h ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
2•basilikum•1h ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•1h ago•1 comments
Open in hackernews

Show HN: Semantic geometry visual grounding for AI web agents (Amazon demo)

2•tonyww•1mo ago
Hi HN,

I’m a solo founder working on SentienceAPI, a perception & execution layer that helps LLM agents act reliably on real websites.

LLMs are good at planning steps, but they fail a lot when actually interacting with the web. Vision-only agents are expensive and unstable, and DOM-based automation breaks easily on modern pages with overlays, dynamic layouts, and lots of noise.

My approach is semantic geometry-based visual grounding.

Instead of giving the model raw HTML (huge context) or a screenshot (imprecise) and asking it to guess, the API first reduces a webpage into a small, grounded action space made only of elements that are actually visible and interactable. Each element includes geometry plus lightweight visual cues, so the model can decide what to do without guessing.

I built a reference app called MotionDocs on top of this. The demo below shows the system navigating Amazon Best Sellers, opening a product, and clicking “Add to cart” using grounded coordinates (no scripted clicks).

Demo video (Add to Cart): [https://youtu.be/1DlIeHvhOg4](https://youtu.be/1DlIeHvhOg4)

How the agent sees the page (map mode wireframe): [https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co...](https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co...)

This wireframe shows the reduced action space surfaced to the LLM. Each box corresponds to a visible, interactable element.

Code excerpt (simplified):

``` from sentienceapi_sdk import SentienceApiClient from motiondocs import generate_video

video = generate_video( url="https://www.amazon.com/gp/bestsellers/", instructions="Open a product and add it to cart", sentience_client=SentienceApiClient(api_key="your-api-key-here") )

video.save("demo.mp4") ```

How it works (high level):

The execution layer treats the browser as a black box and exposes three modes:

* Map: identify interactable elements with geometry and visual cues * Visual: align geometry with screenshots for grounding * Read: extract clean, LLM-ready text

The key insight is visual cues, especially a simple is_primary signal. Humans don’t read every pixel — we scan for visual hierarchy. Encoding that directly lets the agent prioritize the right actions without processing raw pixels or noisy DOM.

Why this matters:

* smaller action space → fewer hallucinations * deterministic geometry → reproducible execution * cheaper than vision-only approaches

TL;DR: I’m building a semantic geometry grounding layer that turns web pages into a compact, visually grounded action space for LLM agents. It gives the model a cheat sheet instead of asking it to solve a vision puzzle.

This is early work, not launched yet. I’d love feedback or skepticism, especially from people building agents, RPA, QA automation, or dev tools.

— Tony W

Comments

tonyww•1mo ago
Example JSON Response (Simplified):

```

[ { "id": 42, "role": "button", "text": "Add to Cart", "bbox": { "x": 935, "y": 529, "w": 200, "h": 50 }, "visual_cues": { "cursor": "pointer", "is_primary": true, "color_name": "yellow" } }, { "id": 43, "role": "link", "text": "Privacy Policy", "bbox": { "x": 100, "y": 1200, "w": 80, "h": 20 }, "visual_cues": { "cursor": "pointer", "is_primary": false } } ]

```

This prototype builds on several open-source libraries:

MoviePy – video composition and rendering Pillow (PIL) – image processing and overlays

The demo app (MotionDocs) uses the public SentienceAPI SDK, generated from OpenAPI, which is the same interface used by the system internally.