frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

So you wanna de-bog yourself

https://www.experimental-history.com/p/so-you-wanna-de-bog-yourself
1•calvinfo•34s ago•0 comments

Show HN: Anyware – Remote Control for Claude Code

https://anyware.run/
1•igorzij•1m ago•0 comments

The application of AI tools to Erdos problems passes a milestone

https://mathstodon.xyz/@tao/115855840223258103
1•ColinWright•1m ago•0 comments

MIT 15.773 Hands-On Deep Learning Spring 2024 [video]

https://www.youtube.com/watch?v=kyQ0CRkYhy4
1•mdp2021•3m ago•0 comments

Water Heater Mines Bitcoin. It Could Help Solve AI's Energy Problem

https://www.cnet.com/home/energy-and-utilities/superheat-bitcoin-water-heater-ces-2026/
1•rmason•6m ago•0 comments

Tips to Read More This Coming Year

https://www.millersbookreview.com/p/10-tips-to-read-more-this-coming-year
2•ingve•8m ago•0 comments

ChatGPT is losing market share as Google Gemini gains ground

https://www.bleepingcomputer.com/news/artificial-intelligence/chatgpt-is-losing-market-share-as-g...
1•speckx•8m ago•0 comments

Study examines carbon footprint of wearable health tech

https://news.cornell.edu/stories/2026/01/study-examines-carbon-footprint-wearable-health-tech
1•JeanKage•9m ago•0 comments

Why sports stars who head the ball are more likely to die of Alzheimer's

https://www.bbc.com/future/article/20260106-the-health-dangers-of-heading-the-ball-in-sport
1•breve•10m ago•0 comments

Search your past ChatGPT, Claude and perplexity chats with context

https://github.com/siv-io/Index-AI-Chat-Search
2•siv_io_•10m ago•0 comments

Operation Absolute Resolve: How the US Captured Nicolas Maduro

https://www.dailymail.co.uk/news/article-15435381/Nicolas-Maduro-captured-reconstruction-Trump-Op...
1•febed•12m ago•0 comments

Show HN: An LLM response cache that's aware of dynamic data

https://blog.butter.dev/on-automatic-template-induction-for-response-caching
3•raymondtana•12m ago•0 comments

Programming Languages in 2025 [video]

https://www.youtube.com/watch?v=CzFiPcuMnWM
1•todsacerdoti•15m ago•0 comments

Per-query energy consumption of LLMs

https://muxup.com/2026q1/per-query-energy-consumption-of-llms
2•hasheddan•15m ago•0 comments

SSDs, power loss protection and fsync latency

http://smalldatum.blogspot.com/2026/01/ssds-power-loss-protection-and-fsync.html
2•ingve•16m ago•0 comments

The Post-American Internet

https://pluralistic.net/2026/01/01/39c3/#the-new-coalition
3•csense•16m ago•1 comments

Book Review: The Jakarta Method: Washington's Anticommunist Crusade

https://blogs.lse.ac.uk/lsereviewofbooks/2020/07/29/book-review-the-jakarta-method-washingtons-an...
1•wahnfrieden•17m ago•0 comments

Show HN: Tool for Testing MCP Servers

https://www.mcp-workbench.ai/
2•opiniateddev•20m ago•0 comments

Pittsburgh Post-Gazette to Shut Down

https://www.post-gazette.com/local/city/2026/01/07/pittsburgh-post-gazette-final-edition/stories/...
2•keiferski•23m ago•1 comments

JPMorgan Chase Reaches a Deal to Take over the Apple Credit Card

https://www.wsj.com/finance/banking/jpmorgan-chase-reaches-a-deal-to-take-over-the-apple-credit-c...
4•stalfosknight•24m ago•0 comments

Coffeezilla: The most overrated scam investigator

https://greyenlightenment.com/2026/01/06/coffeezilla-the-most-overrated-scam-investigator/
4•paulpauper•24m ago•4 comments

Runkit.com has been down for months

https://runkit.com/
1•NeverBehave•28m ago•1 comments

Rare Iron Age war trumpet and boar standard found

https://www.bbc.com/news/articles/cr7jvj8d39eo
2•breve•29m ago•0 comments

LLM from scratch, part 29 – using DDP to train a base model in the cloud

https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud
2•gpjt•32m ago•0 comments

Show HN: bikemap.nyc – visualization of the entire history of Citi Bike

https://github.com/freeman-jiang/bikemap.nyc
6•freemanjiang•32m ago•2 comments

We're Thinking About Addiction Wrong

https://jacobin.com/2026/01/social-causes-drug-addiction
1•wahnfrieden•32m ago•1 comments

Amazon wants to know what every corporate employee accomplished last year

https://www.businessinsider.com/amazon-corporate-employees-performance-reviews-accomplish-last-ye...
2•petethomas•33m ago•2 comments

AI Keeps Building the Same Purple Gradient Website

https://prg.sh/ramblings/Why-Your-AI-Keeps-Building-the-Same-Purple-Gradient-Website
2•satvikpendem•35m ago•0 comments

Pittsburgh Post-Gazette Announces It Will Cease Operations

https://www.nytimes.com/2026/01/07/business/media/pittsburgh-post-gazette-closing.html
1•bookofjoe•35m ago•1 comments

The Silence of the LLaMbs: Getting LLMs to Shut Up

https://ossa-ma.github.io/blog/silence-of-the-llambs
4•ossa-ma•36m ago•1 comments
Open in hackernews

Show HN: Sentience – Semantic Visual Grounding for AI Agents (WASM and ONNX)

2•tonyww•1d ago
Hi HN, I’m the solo founder behind SentienceAPI. I’ve spent the last December building a browser automation runtime designed specifically for LLM agents.

The Problem: Building reliable web agents is painful. You essentially have two bad choices:

Raw DOM: Dumping document.body.innerHTML is cheap/fast but overwhelms the context window (100k+ tokens) and lacks spatial context (agents try to click hidden or off-screen elements).

Vision Models (GPT-4o): Sending screenshots is robust but slow (3-10s latency) and expensive (~$0.01/step). Worse, they often hallucinate coordinates, missing buttons by 10 pixels. The Solution: Semantic Geometry Sentience is a "Visual Cortex" for agents. It sits between the browser and your LLM, turning noisy websites into clean, ranked, coordinate-aware JSON.

How it works (The Stack):

Client (WASM): A Chrome Extension injects a Rust/WASM module that prunes 95% of the DOM (scripts, tracking pixels, invisible wrappers) directly in the browser process. It handles Shadow DOM, nested iframes ("Frame Stitching"), and computed styles (visibility/z-index) in <50ms.

Gateway (Rust/Axum): The pruned tree is sent to a Rust gateway that applies heuristic importance scoring with simple visual cues (e.g. is_primary)

Brain (ONNX): A server-side ML layer (running ms-marco-MiniLM via ort) semantically re-ranks the elements based on the user’s goal (e.g., "Search for shoes").

Result: Your agent gets a list of the Top 50 most relevant interactable elements with exact (x,y) coordinates with importance value and visual cues, helping LLM agent make decision.

Performance:

Cost: ~$0.001 per step (vs. $0.01+ for Vision)

Latency: ~400ms (vs. 5s+ for Vision)

Payload: ~1400 tokens (vs. 100k for Raw HTML)

Developer Experience (The "Cool" Stuff): I hated debugging text logs, so I built Sentience Studio, a "Time-Travel Debugger." It records every step (DOM snapshot + Screenshot) into a .jsonl trace. You can scrub through the timeline like a video editor to see exactly what the agent saw vs. what it hallucinated.

Links:

Docs & SDK: https://www.sentienceapi.com/docs

GitHub (SDK): SDK Python: https://github.com/SentienceAPI/sentience-python

SDK TypeScript: https://github.com/SentienceAPI/sentience-ts

Studio Demo: https://www.sentienceapi.com/docs/studio

Build Web Agent: https://www.sentienceapi.com/docs/sdk/agent-quick-start

Screenshots with importance labels (gold stars): https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co... 2026-01-06 at 7.19.41 AM.png

I’m handling the backend in Rust and the SDKs in Python/TypeScript. The project is now in beta launch, I would love feedbacks on the architecture or the ranking logic!

Comments

tonyww•1d ago
One thing I didn’t emphasize enough in the post: I originally tried the “labeled screenshot + vision model” approach pretty hard. (see this screenshot labeled with bbox + ID: https://sentience-screenshots.sfo3.cdn.digitaloceanspaces.co...)

In practice it performed worse than expected. Once you overlay dense bounding boxes and numeric IDs, the model has to solve a brittle symbol-grounding problem (“which number corresponds to intent?”). On real pages (Amazon, Stripe docs, etc.) this led to more retries and mis-clicks, not fewer.

What worked better for me was moving that grounding step out of the model entirely and giving it a bounded set of executable actions (role + visibility + geometry), then letting the LLM choose which action, not where to click.

Curious if others have seen similar behavior with vision-based agents, especially beyond toy demos.