frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: VR.dev – Open-source verifiers for what AI agents did

https://www.vr.dev/
3•SkiFreeWin3•2h ago
Hey HN,

Quick origin story: vr.dev started as a virtual reality project. The domain fit perfectly. The developer adoption did not. Rather than let a good domain go to waste, I pivoted to the other kind of VR: verification and rewards for AI agents.

The problem I kept running into: agents report success but system state tells a different story. The database row is still active. The IMAP sent folder is empty. The tests pass because the agent modified the tests. Real benchmarks put agent success at 12-30%, and even among reported successes a large fraction are procedurally wrong in ways that are hard to catch without actually checking state.

So I built a library of verifiers that check real system state rather than trusting agent self-reports. There are 38 of them across 19 domains right now, organized into three tiers: HARD (deterministic probes against databases, files, APIs, git), SOFT (LLM rubric scoring for things like tone or coherence that don't have a deterministic test), and AGENTIC (verifiers that actively probe the environment via headless browser, IMAP, or shell).

The design decision I'd most like feedback on is the composition model. SOFT scores are gated behind HARD checks, so if the deterministic check fails, the composed score is 0.0 regardless of what the LLM judge says. The idea is to make reward hacking structurally harder rather than just hoping the judge catches it.

MIT licensed, runs locally via pip install vrdev, no dependency on the hosted API which matters if you're using it in a training loop. Full verifier list at https://vr.dev/registry.

Curious whether the HARD/SOFT/AGENTIC taxonomy makes sense to people, whether fail_closed is the right default, and whether anyone has built something similar and run into problems I haven't hit yet.

https://vr.dev https://github.com/vrDotDev/vr-dev https://pypi.org/project/vrdev/

Show HN: Timelog – C-native, fast, in-memory LSM-style time index for Python

https://github.com/VldChk/timelog
1•vld_chk•7s ago•0 comments

Ardunio's new AI-centric board is the VENTUNO Q

https://hackaday.com/2026/03/10/arduinos-new-ai-centric-board-is-the-ventuno-q/
1•geerlingguy•39s ago•0 comments

I Let AI Replace Me for a Week

https://substack.com/home/post/p-190492497
1•pr7vinas•44s ago•1 comments

New spherical flexure joint designs (compliant mechanisms) [video]

https://www.youtube.com/watch?v=DAngcygU7tc
1•thunderbong•1m ago•0 comments

Winter getting shorter in 80% of major US cities, new data shows

https://www.theguardian.com/us-news/2026/feb/27/us-winters-getting-shorter
1•PaulHoule•1m ago•0 comments

Meta Acquired Moltbook

https://techcrunch.com/2026/03/10/meta-acquired-moltbook-the-ai-agent-social-network-that-went-vi...
1•rippeltippel•1m ago•0 comments

Paperclip – Open-source orchestration for zero-human companies

https://github.com/paperclipai/paperclip
1•devinfoley•2m ago•0 comments

Show HN: AgentUQ, a token-logprob runtime gate for LLM agents

https://github.com/antoinenguyen27/agentUQ
1•AntoineN2•3m ago•0 comments

Show HN: Point it at your local dev server, get a demo video with AI voiceover

https://demofly.ai
3•mhamann•3m ago•1 comments

Show HN: Streamsniff – diagnose and fix your streaming video quality

https://streamsniff.com
1•Sean-Der•6m ago•1 comments

Scientists Get a Glimpse of How New Pandemics Are Made

https://www.nytimes.com/2026/03/09/science/covid-coronavirus-evolution.html
1•Brajeshwar•6m ago•0 comments

Shift in Gulf Stream could signal the collapse of a major ocean current system

https://phys.org/news/2026-03-shift-gulf-stream-collapse-major.html
2•Brajeshwar•6m ago•1 comments

Faultline – distributed job queue with exactly-once execution guarantees

https://github.com/kritibehl/faultline
1•kritibehl•6m ago•1 comments

How to attract hummingbirds to your yard

https://www.popsci.com/environment/how-to-attract-hummingbirds-to-yard/
1•Brajeshwar•6m ago•1 comments

An ex-L3Harris Trenchant boss stole and sold cyber exploits to Russia

https://techcrunch.com/2025/11/03/how-an-ex-l3-harris-trenchant-boss-stole-and-sold-cyber-exploit...
2•tiahura•7m ago•0 comments

Ig Nobels ceremony moves to Europe over security concerns

https://arstechnica.com/science/2026/03/ig-nobels-ceremony-moves-to-europe-over-security-concerns/
1•voxadam•7m ago•1 comments

Ask HN: Are people shipping their AI "vibe-coded" apps to production?

1•infiniumtek•7m ago•0 comments

Nvidia and Thinking Machines Lab draw multi-year chip deal

https://www.siliconrepublic.com/business/nvidia-thinking-machines-lab-chip-deal-vera-rubin
2•wuschel•8m ago•1 comments

Infrastructure as Data: Why our OpenTofu main.tf has zero resources

https://medium.com/@heinancabouly/escaping-the-devops-concierge-trap-how-we-built-a-data-driven-s...
1•HeinanCA•8m ago•1 comments

Meta acquires AI agent social network Moltbook

https://www.reuters.com/business/meta-acquires-ai-agent-social-network-moltbook-2026-03-10/
3•tosh•10m ago•2 comments

The Flexible AI Agent Framework that keeps things simple

https://www.valiantlynx.com/blogs/machine-core-the-flexible-ai-agent-framework-that-keeps-things-...
2•madshalden•10m ago•0 comments

The truth behind the 2026 J.P. Morgan Healthcare Conference

https://www.lesswrong.com/posts/eopA4MqhrE4dkLjHX/the-truth-behind-the-2026-j-p-morgan-healthcare...
1•surprisetalk•10m ago•0 comments

Kniterate Notes

https://soup.agnescameron.info//2026/03/07/kniterate-notes.html
1•surprisetalk•10m ago•0 comments

Minutes is 1% of Your Day (2022)

https://taylor.town/10-minutes
1•surprisetalk•10m ago•0 comments

Joey Parrish – Streaming Video on 80s Gaming Hardware

https://www.youtube.com/watch?v=GZdxdpw-3nI
1•surprisetalk•11m ago•0 comments

System76 tries to talk Colorado down over OS age checks

https://www.theregister.com/2026/03/10/foss_age_verification_2/
1•LorenDB•11m ago•0 comments

Training a Neural Network in 16-Bit Fixed Point on a 1982 BBC Micro

https://www.jamesdrandall.com/posts/neural_network_bbc_micro/
1•mariuz•11m ago•0 comments

Understudy: Scenario Testing for AI Agents

https://github.com/gojiplus/understudy
1•neehao•11m ago•0 comments

Show HN: Ash, an Agent Sandbox for Mac

https://ashell.dev
2•amsha•11m ago•0 comments

Claude Code Attempted 752 /proc/*/environ Reads. 256 Succeeded. Codex: 0

https://grith.ai/blog/syscall-trace-ai-coding-agents
3•edf13•12m ago•0 comments