frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The original vi is a product of its time (and its time has passed)

https://utcc.utoronto.ca/~cks/space/blog/unix/ViIsAProductOfItsTime
1•ingve•3m ago•0 comments

Circumstantial Complexity, LLMs and Large Scale Architecture

https://www.datagubbe.se/aiarch/
1•ingve•10m ago•0 comments

Tech Bro Saga: big tech critique essay series

1•dikobraz•13m ago•0 comments

Show HN: A calculus course with an AI tutor watching the lectures with you

https://calculus.academa.ai/
1•apoogdk•17m ago•0 comments

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

https://github.com/Kristian5013/flow-protocol
1•kristianXXI•22m ago•0 comments

Show HN: SAA – A minimal shell-as-chat agent using only Bash

https://github.com/moravy-mochi/saa
1•mrvmochi•22m ago•0 comments

Mario Tchou

https://en.wikipedia.org/wiki/Mario_Tchou
1•simonebrunozzi•23m ago•0 comments

Does Anyone Even Know What's Happening in Zim?

https://mayberay.bearblog.dev/does-anyone-even-know-whats-happening-in-zim-right-now/
1•mugamuga•24m ago•0 comments

The last Morse code maritime radio station in North America [video]

https://www.youtube.com/watch?v=GzN-D0yIkGQ
1•austinallegro•26m ago•0 comments

Show HN: Hacker Newspaper – Yet another HN front end optimized for mobile

https://hackernews.paperd.ink/
1•robertlangdon•27m ago•0 comments

OpenClaw Is Changing My Life

https://reorx.com/blog/openclaw-is-changing-my-life/
2•novoreorx•35m ago•0 comments

Everything you need to know about lasers in one photo

https://commons.wikimedia.org/wiki/File:Commercial_laser_lines.svg
2•mahirsaid•37m ago•0 comments

SCOTUS to decide if 1988 video tape privacy law applies to internet uses

https://www.jurist.org/news/2026/01/us-supreme-court-to-decide-if-1988-video-tape-privacy-law-app...
1•voxadam•38m ago•0 comments

Epstein files reveal deeper ties to scientists than previously known

https://www.nature.com/articles/d41586-026-00388-0
3•XzetaU8•46m ago•1 comments

Red teamers arrested conducting a penetration test

https://www.infosecinstitute.com/podcast/red-teamers-arrested-conducting-a-penetration-test/
1•begueradj•53m ago•0 comments

Show HN: Open-source AI powered Kubernetes IDE

https://github.com/agentkube/agentkube
2•saiyampathak•56m ago•0 comments

Show HN: Lucid – Use LLM hallucination to generate verified software specs

https://github.com/gtsbahamas/hallucination-reversing-system
2•tywells•59m ago•0 comments

AI Doesn't Write Every Framework Equally Well

https://x.com/SevenviewSteve/article/2019601506429730976
1•Osiris30•1h ago•0 comments

Aisbf – an intelligent routing proxy for OpenAI compatible clients

https://pypi.org/project/aisbf/
1•nextime•1h ago•1 comments

Let's handle 1M requests per second

https://www.youtube.com/watch?v=W4EwfEU8CGA
1•4pkjai•1h ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
1•zhizhenchi•1h ago•0 comments

Goal: Ship 1M Lines of Code Daily

2•feastingonslop•1h ago•0 comments

Show HN: Codex-mem, 90% fewer tokens for Codex

https://github.com/StartripAI/codex-mem
1•alfredray•1h ago•0 comments

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

https://github.com/pnrajan/fastlangml
1•sachuin23•1h ago•1 comments

LineageOS 23.2

https://lineageos.org/Changelog-31/
2•pentagrama•1h ago•0 comments

Crypto Deposit Frauds

2•wwdesouza•1h ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
4•lostlogin•1h ago•0 comments

Framing an LLM as a safety researcher changes its language, not its judgement

https://lab.fukami.eu/LLMAAJ
1•dogacel•1h ago•0 comments

Are there anyone interested about a creator economy startup

1•Nejana•1h ago•0 comments

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

https://github.com/8ddieHu0314/Skill-Lab
1•qu4rk5314•1h ago•0 comments
Open in hackernews

Show HN: Sentience – Semantic Visual Grounding for AI Agents (WASM and ONNX)

2•tonyww•1mo ago
Hi HN, I’m the solo founder behind SentienceAPI. I’ve spent the last December building a browser automation runtime designed specifically for LLM agents.

The Problem: Building reliable web agents is painful. You essentially have two bad choices:

Raw DOM: Dumping document.body.innerHTML is cheap/fast but overwhelms the context window (100k+ tokens) and lacks spatial context (agents try to click hidden or off-screen elements).

Vision Models (GPT-4o): Sending screenshots is robust but slow (3-10s latency) and expensive (~$0.01/step). Worse, they often hallucinate coordinates, missing buttons by 10 pixels. The Solution: Semantic Geometry Sentience is a "Visual Cortex" for agents. It sits between the browser and your LLM, turning noisy websites into clean, ranked, coordinate-aware JSON.

How it works (The Stack):

Client (WASM): A Chrome Extension injects a Rust/WASM module that prunes 95% of the DOM (scripts, tracking pixels, invisible wrappers) directly in the browser process. It handles Shadow DOM, nested iframes ("Frame Stitching"), and computed styles (visibility/z-index) in <50ms.

Gateway (Rust/Axum): The pruned tree is sent to a Rust gateway that applies heuristic importance scoring with simple visual cues (e.g. is_primary)

Brain (ONNX): A server-side ML layer (running ms-marco-MiniLM via ort) semantically re-ranks the elements based on the user’s goal (e.g., "Search for shoes").

Result: Your agent gets a list of the Top 50 most relevant interactable elements with exact (x,y) coordinates with importance value and visual cues, helping LLM agent make decision.

Performance:

Cost: ~$0.001 per step (vs. $0.01+ for Vision)

Latency: ~400ms (vs. 5s+ for Vision)

Payload: ~1400 tokens (vs. 100k for Raw HTML)

Developer Experience (The "Cool" Stuff): I hated debugging text logs, so I built Sentience Studio, a "Time-Travel Debugger." It records every step (DOM snapshot + Screenshot) into a .jsonl trace. You can scrub through the timeline like a video editor to see exactly what the agent saw vs. what it hallucinated.

Links:

Docs & SDK: https://www.sentienceapi.com/docs

GitHub (SDK): SDK Python: https://github.com/SentienceAPI/sentience-python

SDK TypeScript: https://github.com/SentienceAPI/sentience-ts

Studio Demo: https://www.sentienceapi.com/docs/studio

Build Web Agent: https://www.sentienceapi.com/docs/sdk/agent-quick-start

Screenshots with importance labels (gold stars): https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

I’m handling the backend in Rust and the SDKs in Python/TypeScript. The project is now in beta launch, I would love feedbacks on the architecture or the ranking logic!

Comments

tonyww•1mo ago
One thing I didn’t emphasize enough in the post: I originally tried the “labeled screenshot + vision model” approach pretty hard. (see this screenshot labeled with bbox + ID: https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co...)

In practice it performed worse than expected. Once you overlay dense bounding boxes and numeric IDs, the model has to solve a brittle symbol-grounding problem (“which number corresponds to intent?”). On real pages (Amazon, Stripe docs, etc.) this led to more retries and mis-clicks, not fewer.

What worked better for me was moving that grounding step out of the model entirely and giving it a bounded set of executable actions (role + visibility + geometry), then letting the LLM choose which action, not where to click.

Curious if others have seen similar behavior with vision-based agents, especially beyond toy demos.