frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Iris – first MCP-native eval and observability tool for AI agents

https://github.com/iris-eval/mcp-server
1•iparent•1h ago
I kept running into the same problem building AI agents: once they're running, I have no idea what they're actually doing. Traditional monitoring shows me HTTP 200. It can't tell me the output was wrong, that the agent leaked a user's email address, or that a single tool call in the chain is burning through tokens.

So I built Iris. It's an open-source MCP server — not an SDK, not a proxy. Any MCP-compatible agent (Claude Desktop, Cursor, or anything built with the MCP SDK) discovers and uses it automatically. Add it to your MCP config and your agent gains observability without touching your code.

What it does:

- 3 MCP tools: log_trace (full execution traces with spans, tool calls, token usage, cost in USD), evaluate_output (score output quality against configurable rules), get_traces (query traces with filters and pagination) - 12 built-in eval rules across 4 categories: completeness (output length, coverage), relevance (keyword overlap, hallucination markers), safety (PII detection for SSN/credit card/phone/email, prompt injection patterns, blocklist), and cost (USD threshold, token efficiency) - Hierarchical span tree: trace exactly where in an agent's execution chain something went wrong — which tool call failed, which step was slow - Aggregate cost tracking: the dashboard shows total agent spend across all your agents over any time window, not just per-trace cost. You can finally answer "what are my agents costing me?" - Web dashboard: dark-mode React UI with summary cards, trace list, span tree view, eval results with per-rule breakdown - SQLite storage: single file, no database server. Back it up, move it, inspect it with any SQLite tool - Custom eval rules defined with Zod schemas

Security: API key auth, rate limiting (express-rate-limit), helmet headers, CORS, input validation, ReDoS-safe regex for user-supplied patterns, 1MB body limit.

Stack: TypeScript, Express 5, better-sqlite3, @modelcontextprotocol/sdk, Zod, pino.

Iris also exposes MCP resources — your agent can programmatically read iris://dashboard/summary to get aggregate metrics without opening the dashboard. Every trace logs full traceability, which also means you're building the audit trail that regulations like the EU AI Act will require by August 2026.

  npm install -g @iris-eval/mcp-server
  iris-mcp --transport http --dashboard
Self-hosted, MIT licensed.

GitHub: https://github.com/iris-eval/mcp-server npm: https://www.npmjs.com/package/@iris-eval/mcp-server

I'd appreciate feedback on two things specifically: 1. The eval rule system — are these the right 12 rules to ship with? What's missing? 2. The MCP tool API — three tools feels minimal but sufficient. Should trace logging and evaluation be combined or kept separate?

Check the roadmap for what's coming next: https://github.com/iris-eval/mcp-server/blob/main/docs/roadm...

Bellingcat: The Osint Gatekeepers Who Can't Secure Their Own Site

https://ringmast4r.substack.com/p/the-osint-gatekeepers-who-cant-secure
1•mostcallmeyt•2m ago•0 comments

Daily pill may cure deadly sleep disorder that affects 84M people

https://www.dailymail.co.uk/health/article-15643615/pill-cure-sleep-apnea-CPAP-breathing.html
1•Bender•3m ago•0 comments

Ask HN: How do you find collaborators?

1•voidss•3m ago•1 comments

Iran war's Qatari Helium production disruption a potential blow to chipmakers

https://finance.yahoo.com/news/iran-war-could-wreak-havoc-on-farmers-create-a-potential-bottlenec...
1•spenvo•3m ago•0 comments

Meta reportedly plans layoffs as AI costs increase

https://www.theguardian.com/technology/2026/mar/13/meta-layoffs-ai
4•saikatsg•5m ago•0 comments

Do you ship vibe coded apps with security issues?

https://usevibescore.com
1•terrythreatt•6m ago•1 comments

US told to brace for extreme weather in every single state

https://www.dailymail.co.uk/news/article-15645675/us-extreme-weather-forecast-weekend-heat-polar-...
1•Bender•6m ago•0 comments

Where Censored Words Find a Safe Haven: Inside Minecraft

https://www.nytimes.com/2026/03/11/arts/minecraft-uncensored-library-united-states.html
1•bookofjoe•8m ago•1 comments

The Washington Post Is Using Reader Data to Set Subscription Prices

https://washingtonian.com/2026/03/12/the-washington-post-is-using-reader-data-to-set-subscription...
1•kklisura•8m ago•0 comments

Postgres Is the Gateway Drug

https://viggy28.dev/article/postgres-gateway-drug/
3•vira28•9m ago•0 comments

Back End Aggregation Enables Gigawatt-Scale AI Clusters

https://engineering.fb.com/2026/02/09/data-center-engineering/building-prometheus-how-backend-agg...
1•y1n0•10m ago•0 comments

Library of Short Stories

https://www.libraryofshortstories.com/
1•debo_•11m ago•0 comments

Millennium Challenge: Iran Destroyed America in a War Game

https://nationalinterest.org/blog/reboot/millennium-challenge-iran-destroyed-america-war-game-197261
1•vrganj•11m ago•0 comments

AI Codemods for Secure-by-Default Android Apps

https://engineering.fb.com/2026/03/13/android/ai-codemods-secure-by-default-android-apps-meta-tec...
1•y1n0•11m ago•1 comments

Book: The Emerging Science of Machine Learning Benchmarks

https://mlbenchmarks.org/00-preface.html
1•jxmorris12•12m ago•0 comments

Pipechart – pipe any JSON into your terminal and get a chart, zero dependencies

https://github.com/davitotty/pipechart
1•Davitotty1•12m ago•0 comments

Show HN: An Open-Source Yoto Toy with Qwen3-TTS

https://github.com/akdeb/open-toys
2•akadeb•14m ago•1 comments

My fireside chat about agentic engineering at the Pragmatic Summit

https://simonwillison.net/2026/Mar/14/pragmatic-summit/
2•lumpa•17m ago•0 comments

My Wish for Software Engineering

https://arnoldkling.substack.com/p/my-wish-for-software-engineering
1•paulpauper•18m ago•0 comments

Claude Doubles Usage Limits During Off-Peak Hours (March 13–27, 2026)

https://support.claude.com/en/articles/14063676-claude-march-2026-usage-promotion
1•weldu•18m ago•0 comments

Glow: Render Markdown on the CLI, with Pizzazz

https://github.com/charmbracelet/glow
1•thunderbong•18m ago•0 comments

I rebuilt a daily habit because the default experience felt broken

https://apps.apple.com/us/app/brzzy-weather-local-forecasts/id6670187343
1•clambakenow•19m ago•0 comments

Trump administration to be paid $10B for brokering TikTok deal

https://www.theguardian.com/technology/2026/mar/14/tiktok-trump-administration-10bn
8•andsoitis•19m ago•1 comments

Show HN: Paperctl- An Arxiv CLI designed for agents

https://github.com/ChristianFJung/paperctl
2•christianjung•22m ago•0 comments

Activity-based CO2 sensing provides new insights into cellular metabolism

https://www.sciencedirect.com/science/article/pii/S2213231726000650
1•PaulHoule•23m ago•0 comments

VFA – Cryptographic Intent Handshake for Secure API Transactions

https://github.com/Csnyi/VFA-Spec
1•Csnyi•25m ago•1 comments

Cathars and Cathar Beliefs in the Languedoc

https://www.cathar.info
2•andsoitis•25m ago•0 comments

Show HN: Language Life – Learn a language by living a simulated life

https://www.languagelife.ai
3•bitforger•25m ago•0 comments

DOOM fully rendered in CSS

https://bsky.app/profile/html5test.com/post/3mgxr3pcjhk2k
1•ck2•25m ago•0 comments

The Anthropic Institute

https://www.anthropic.com/news/the-anthropic-institute
3•paulpauper•28m ago•0 comments