news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

MCP server that audits AI agent reasoning before decisions commit

https://espiradev.org/blog/sentinel-ai-reasoning-observatory.html

1•aespira•1h ago

Comments

aespira•1h ago

I built SENTINEL to solve a problem I kept hitting in healthcare AI: agents report high confidence on decisions backed by stale evidence. An agent says 89% confidence on a prior auth denial, but the payer policy it's referencing is 14 months old and retrieval only got 40% of the step therapy docs. Pattern accuracy for that combination is 23%. SENTINEL sits behind agentgateway as an MCP server (Go, Streamable HTTP) and runs a four-stage pipeline on every agent decision:

Signal fidelity audit — checks evidence staleness, completeness, weight divergence, confidence calibration Pattern classification — matches fidelity flags against learned failure signatures with historical accuracy data Reliability scoring — rolling per-agent × per-payer accuracy with drift detection and ECE tracking Authority gate — issues verdicts: full autonomy, act-and-notify, human-required, or quarantine

The key insight: an agent's stated confidence and its actual reliability are different things, and the gap grows silently as upstream data drifts. SENTINEL tracks that gap at the MCP protocol layer. Built for healthcare prior auth but the architecture is domain-agnostic — anything where an agent reasons over retrieved evidence before making a consequential decision. agentgateway handles RBAC (CEL policies per MCP tool), session management, and audit logging. SENTINEL handles the reasoning quality audit. Integrations with Datadog (drift monitors), Braintrust (eval scoring), and Cleric (incident escalation). Repo: https://github.com/espirado/agent-secure Blog post with architecture details and demo walkthrough: https://espiradev.org/blog/sentinel-ai-reasoning-observatory...

Linux Page Faults, MMAP, and userfaultfd for fast sandbox boot times

https://www.shayon.dev/post/2026/65/linux-page-faults-mmap-and-userfaultfd/

1•shayonj•1m ago•0 comments

Show HN: Cloud to Desktop in the Fastest Way

https://nativedesktop.com/

1•lasgawe•1m ago•0 comments

Software Maturity Wall

https://www.apolloacademy.com/software-maturity-wall/

1•akyuu•1m ago•0 comments

Fast and free coding agent written with Go

https://github.com/cheikh2shift/godex

1•cheikhshift•2m ago•0 comments

Show HN: PipeStep – Step-through debugger for GitHub Actions workflows

https://github.com/Photobombastic/pipestep

3•photobombastic•3m ago•0 comments

Apple's MacBook Neo makes repairs easier and cheaper than other MacBooks

https://arstechnica.com/gadgets/2026/03/more-modular-design-makes-macbook-neo-easier-to-fix-than-...

3•GeekyBear•4m ago•0 comments

An agentic workflow, March 2026 edition

https://twolongos.com/3/12/an-agentic-workflow-march-2026-edition/

2•suzzer99•4m ago•0 comments

Is your vet owned by private equity?

https://privateequityvet.org/vet-list/

2•hampelm•5m ago•0 comments

Show HN: LogClaw – Open-source AI SRE that auto-creates tickets from logs

https://logclaw.ai

3•Robelkidin•5m ago•0 comments

WikiCity – Where every building is a Wikipedia article

https://wikicity.app/

2•leononame•5m ago•1 comments

Harness Engineering

https://openai.com/index/harness-engineering/

3•jlas•6m ago•0 comments

A Day in the Life of an Enshittificator [video]

https://www.youtube.com/watch?v=T4Upf_B9RLQ

2•KindAndFriendly•7m ago•0 comments

Show HN: Understudy – Teach a desktop agent by demonstrating a task once

https://github.com/understudy-ai/understudy

3•bayes-song•7m ago•0 comments

Inboxscan – find every subscription hiding in your email (runs locally)

https://github.com/LakshmiSravyaVedantham/inboxscan

2•sravyavedantham•7m ago•1 comments

Ask HN: In 2026, how do you share a list of URLs to the public (or friends)?

2•wenbin•10m ago•1 comments

Work_mem: It's a Trap

https://mydbanotebook.org/posts/work_mem-its-a-trap/

2•giulianopz•10m ago•0 comments

Show HN: Fixing Agent / LLM Context Decay in VS Code with Git Worktrees

https://www.appsoftware.com/blog/fixing-agent-llm-context-decay-in-vs-code-with-git-worktrees

4•gbro3n•12m ago•0 comments

Design Tip: Enforcing Constraints Leads to Simpler, More Powerful Systems

https://www.rodriguez.today/articles/emergent-event-driven-workflows

1•birdculture•12m ago•0 comments

Show HN: I lost billable hours forgetting timers. I turned my calendar into a DB

https://www.timescanner.io/

2•sergentrif•13m ago•2 comments

Anthropic's Claude AI can respond with charts, diagrams, and other visuals now

https://www.theverge.com/ai-artificial-intelligence/893625/anthropic-claude-ai-charts-diagrams

1•newusertoday•13m ago•0 comments

Show HN: Verge Browser a self-hosted isolated browser sandbox for AI agents

https://github.com/zzzgydi/verge-browser

2•zzzgydi•14m ago•0 comments

Ask HN: How are you using personal AI assistants with local coding agents?

2•everfly•14m ago•0 comments

The Thinking Field

https://www.robpanico.com/articles/display/?entry_short=the-thinking-field

2•retrocog•15m ago•1 comments

Claude Bought Me a Car

https://www.nahtnam.com/blog/claude-bought-me-a-car

3•nahtnam•17m ago•2 comments

U.S. to suspend the Jones Act in a bid to curb oil prices

https://www.bloomberg.com/news/articles/2026-03-12/trump-administration-set-to-suspend-jones-act-...

4•geox•17m ago•0 comments

Boardsmith – text prompt to KiCad schematic, BOM, and firmware (works offline)

https://github.com/ForestHubAI/boardsmith

2•ForestHubAI•19m ago•3 comments

A plan to get more electricity to West Texas may come undone

https://www.texastribune.org/2026/03/12/west-texas-electricty-plan/

2•hn_acker•19m ago•0 comments

Maligna Kodera

https://krzyhau.itch.io/maligna-kodera

1•ta988•20m ago•0 comments

Title: Show HN: Aurora – Live dashboard watching local LLMs create autonomously

https://aurora.elijah-sylar.com

2•elijahscamp•20m ago•1 comments

Why Task Proficiency Doesn't Equal AI Autonomy

https://www.signalbloom.ai/posts/why-task-proficiency-doesnt-equal-ai-autonomy/

1•anonu•24m ago•0 comments