frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

What did we learn from the AI Village in 2025?

https://theaidigest.org/village/blog/what-we-learned-2025
1•mrkO99•16s ago•0 comments

An open replacement for the IBM 3174 Establishment Controller

https://github.com/lowobservable/oec
1•bri3d•2m ago•0 comments

The P in PGP isn't for pain: encrypting emails in the browser

https://ckardaris.github.io/blog/2026/02/07/encrypted-email.html
2•ckardaris•4m ago•0 comments

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

https://github.com/fokdelafons/lustra
1•fokdelafons•5m ago•1 comments

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

1•Chance-Device•6m ago•0 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
1•ColinWright•9m ago•0 comments

Jim Fan calls pixels the ultimate motor controller

https://robotsandstartups.substack.com/p/humanoids-platform-urdf-kitchen-nvidias
1•robotlaunch•13m ago•0 comments

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
1•HotGarbage•13m ago•0 comments

AI UX Playground: Real-world examples of AI interaction design

https://www.aiuxplayground.com/
1•javiercr•14m ago•0 comments

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•14m ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•16m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
3•sohimaster•18m ago•1 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
3•harshalone•18m ago•1 comments

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
2•PaulHoule•23m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•24m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•25m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
2•Brajeshwar•25m ago•0 comments

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•26m ago•1 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•27m ago•1 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
11•c420•27m ago•1 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•27m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
3•HotGarbage•28m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•28m ago•1 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•30m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
5•surprisetalk•33m ago•1 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
4•TheCraiggers•34m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•35m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
14•doener•35m ago•2 comments

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts

https://github.com/vgrippa/myflames
1•tanelpoder•37m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•37m ago•0 comments
Open in hackernews

Show HN: Sentience – Semantic Visual Grounding for AI Agents (WASM and ONNX)

2•tonyww•1mo ago
Hi HN, I’m the solo founder behind SentienceAPI. I’ve spent the last December building a browser automation runtime designed specifically for LLM agents.

The Problem: Building reliable web agents is painful. You essentially have two bad choices:

Raw DOM: Dumping document.body.innerHTML is cheap/fast but overwhelms the context window (100k+ tokens) and lacks spatial context (agents try to click hidden or off-screen elements).

Vision Models (GPT-4o): Sending screenshots is robust but slow (3-10s latency) and expensive (~$0.01/step). Worse, they often hallucinate coordinates, missing buttons by 10 pixels. The Solution: Semantic Geometry Sentience is a "Visual Cortex" for agents. It sits between the browser and your LLM, turning noisy websites into clean, ranked, coordinate-aware JSON.

How it works (The Stack):

Client (WASM): A Chrome Extension injects a Rust/WASM module that prunes 95% of the DOM (scripts, tracking pixels, invisible wrappers) directly in the browser process. It handles Shadow DOM, nested iframes ("Frame Stitching"), and computed styles (visibility/z-index) in <50ms.

Gateway (Rust/Axum): The pruned tree is sent to a Rust gateway that applies heuristic importance scoring with simple visual cues (e.g. is_primary)

Brain (ONNX): A server-side ML layer (running ms-marco-MiniLM via ort) semantically re-ranks the elements based on the user’s goal (e.g., "Search for shoes").

Result: Your agent gets a list of the Top 50 most relevant interactable elements with exact (x,y) coordinates with importance value and visual cues, helping LLM agent make decision.

Performance:

Cost: ~$0.001 per step (vs. $0.01+ for Vision)

Latency: ~400ms (vs. 5s+ for Vision)

Payload: ~1400 tokens (vs. 100k for Raw HTML)

Developer Experience (The "Cool" Stuff): I hated debugging text logs, so I built Sentience Studio, a "Time-Travel Debugger." It records every step (DOM snapshot + Screenshot) into a .jsonl trace. You can scrub through the timeline like a video editor to see exactly what the agent saw vs. what it hallucinated.

Links:

Docs & SDK: https://www.sentienceapi.com/docs

GitHub (SDK): SDK Python: https://github.com/SentienceAPI/sentience-python

SDK TypeScript: https://github.com/SentienceAPI/sentience-ts

Studio Demo: https://www.sentienceapi.com/docs/studio

Build Web Agent: https://www.sentienceapi.com/docs/sdk/agent-quick-start

Screenshots with importance labels (gold stars): https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

I’m handling the backend in Rust and the SDKs in Python/TypeScript. The project is now in beta launch, I would love feedbacks on the architecture or the ranking logic!

Comments

tonyww•1mo ago
One thing I didn’t emphasize enough in the post: I originally tried the “labeled screenshot + vision model” approach pretty hard. (see this screenshot labeled with bbox + ID: https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co...)

In practice it performed worse than expected. Once you overlay dense bounding boxes and numeric IDs, the model has to solve a brittle symbol-grounding problem (“which number corresponds to intent?”). On real pages (Amazon, Stripe docs, etc.) this led to more retries and mis-clicks, not fewer.

What worked better for me was moving that grounding step out of the model entirely and giving it a bounded set of executable actions (role + visibility + geometry), then letting the LLM choose which action, not where to click.

Curious if others have seen similar behavior with vision-based agents, especially beyond toy demos.