frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Auditi – open-source LLM tracing and evaluation platform

https://github.com/deduu/auditi
3•ariansyah•1h ago
I've been building AI agents at work and the hardest part isn't the prompts or orchestration – it's answering "is this agent actually good?" in production.

Tracing tells you what happened. But I wanted to know how well it happened. So I built Auditi – it captures your LLM traces and spans and automatically evaluates them with LLM-as-a-judge + human annotation workflows.

Two lines to get started:

  auditi.init(api_key="...")
  auditi.instrument()  # monkey-patches OpenAI/Anthropic/Gemini
Every API call is captured with full span trees, token usage, and costs. No code changes to your existing LLM calls.

The interesting technical bit: the SDK monkey-patches client.chat.completions.create() at runtime (similar to how OpenTelemetry auto-instruments HTTP libraries). It wraps streaming responses with proxy iterators that accumulate content and extract usage from the final chunk – so even streamed responses get full cost tracking without the user doing anything.

What makes this different from just tracing: - Built-in evaluators – 7 managed LLM judges (hallucination, relevance, correctness, toxicity, etc.) run automatically on every trace - Span-level evaluation – scores each step in a multi-step agent, not just the final output - Human annotation queues – when you need ground truth, not just vibes - Dataset export – annotated traces export as JSONL/CSV/Parquet for fine-tuning

Self-host with docker compose up.

I'd love feedback from anyone running AI agents or LLMs in production. What metrics do you actually look at? How do you decide if an agent response is "good enough"?

GitHub: https://github.com/deduu/auditi

Show HN: A compiled programming language for LLM-to-LLM communication [pdf]

https://sifsystemsmcrd.com/KL_White_Paper.pdf
1•tmbird•11s ago•0 comments

Show HN: See what your AI agents do under the hood

https://pingpulsehq.com
1•shafeeq2207•1m ago•0 comments

EPA to repeal its own conclusion that greenhouse gases warm the planet

https://www.nbcnews.com/science/climate-change/epa-to-repeal-endangerment-finding-climate-change-...
1•geox•1m ago•0 comments

Can you trust LastPass in 2026? Inside the quest to rebuild its security culture

https://www.zdnet.com/article/lastpass-2026-rebuilding-trust-ceo-interview/
3•arusahni•5m ago•0 comments

Show HN: Z-Image Base – Fast AI Image Generator (Open-Source, Free Tier)

https://z-imagebase.com/
1•chengai1106•5m ago•0 comments

When the Competition Is Down the Hall

https://k2xl.substack.com/p/when-the-competition-is-down-the
1•k2xl•6m ago•0 comments

The Banality of MAGA Evil

https://paulkrugman.substack.com/p/the-banality-of-maga-evil
4•rbanffy•6m ago•0 comments

Show HN: Onlybots.cam

https://onlybots.cam
1•m0rtyn•7m ago•0 comments

PostmarketOS at FOSDEM 2026 and Hackathon

https://postmarketos.org/blog/2026/02/10/fosdem-and-hackathon/
1•birdculture•7m ago•0 comments

How We Built the Fastest Kimi K2.5 on Artificial Analysis

https://www.baseten.co/blog/how-we-built-the-fastest-kimi-k2-5-on-artificial-analysis/
1•philipkiely•8m ago•0 comments

The Budget and Economic Outlook: 2026 to 2036

https://www.cbo.gov/publication/61882
1•mraniki•9m ago•1 comments

Web-Git-sum – Git is not GitHub

https://mitxela.com/projects/web-git-sum
1•moebrowne•13m ago•0 comments

Show HN: MEVA, a desktop Markdown reader for AI-generated docs

https://usemeva.com/
1•ss_meva•14m ago•0 comments

Trends in Prevalence of Autism by Adaptive and Intellectual Functioning Levels

https://onlinelibrary.wiley.com/doi/10.1002/aur.70167
1•hn_acker•15m ago•1 comments

Mamdani Hires Groundbreaking Computer Scientist as Chief Tech Officer

https://www.nytimes.com/2026/02/10/nyregion/mamdani-lisa-gelobter-gif.html
10•leephillips•15m ago•0 comments

Ask HN: Why electronics are still so unrecyclable?

2•alexandrehtrb•16m ago•0 comments

Stablecoins for Skeptics

https://news.alvaroduran.com/p/stablecoins-for-skeptics
1•ohduran•17m ago•1 comments

The Truth About No-KYC Crypto Cards, from Someone Who Ran One

https://twitter.com/defyneric/status/2021116183898886201
1•CrazyRobot•17m ago•0 comments

Who's the Agent Now?

https://danturkel.com/2026/02/11/agents.html
1•daturkel•17m ago•0 comments

Freenginx 1.29.5 Release

https://freenginx.org/en/CHANGES
1•neustradamus•18m ago•0 comments

Show HN: I built a tool to help generate short form videos

https://evokescenes.com/
1•delayedrelease•21m ago•2 comments

Show HN: SPICEBridge – MCP server for AI circuit design via ngspice

https://github.com/clanker-lover/spicebridge
1•clanker-lover•22m ago•0 comments

Blender source code was 9 files in January-8-1994

https://files.mastodon.social/media_attachments/files/115/825/585/900/044/589/original/b0c7ba495a...
2•marcodiego•22m ago•0 comments

The temporary closure of airspace over El Paso has been lifted

https://twitter.com/FAANews/status/2021583720465969421
2•lultimouomo•24m ago•1 comments

Sabotage Risk Report: Claude Opus 4.6 [pdf]

https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf
1•rootforce•24m ago•0 comments

Chowla conjecture on the minimum of a cosine series

https://www.johndcook.com/blog/2026/02/07/chowla/
1•ibobev•25m ago•0 comments

Fibonacci numbers and time-space tradeoffs

https://www.johndcook.com/blog/2026/02/08/time-space-tradeoffs/
2•ibobev•25m ago•0 comments

"Have I Been Stalked" post-mortem

https://dustri.org/b/have-i-been-stalked-post-mortem.html
2•speckx•25m ago•0 comments

Computing Large Fibonacci Numbers

https://www.johndcook.com/blog/2026/02/08/computing-large-fibonacci-numbers/
2•ibobev•25m ago•0 comments

Life on Earth is lucky: A rare chemical fluke may have made our planet habitable

https://www.space.com/space-exploration/search-for-life/life-on-earth-is-lucky-a-rare-chemical-fl...
2•Brajeshwar•26m ago•0 comments