frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Auditi – open-source LLM tracing and evaluation platform

https://github.com/deduu/auditi
3•ariansyah•2h ago
I've been building AI agents at work and the hardest part isn't the prompts or orchestration – it's answering "is this agent actually good?" in production.

Tracing tells you what happened. But I wanted to know how well it happened. So I built Auditi – it captures your LLM traces and spans and automatically evaluates them with LLM-as-a-judge + human annotation workflows.

Two lines to get started:

  auditi.init(api_key="...")
  auditi.instrument()  # monkey-patches OpenAI/Anthropic/Gemini
Every API call is captured with full span trees, token usage, and costs. No code changes to your existing LLM calls.

The interesting technical bit: the SDK monkey-patches client.chat.completions.create() at runtime (similar to how OpenTelemetry auto-instruments HTTP libraries). It wraps streaming responses with proxy iterators that accumulate content and extract usage from the final chunk – so even streamed responses get full cost tracking without the user doing anything.

What makes this different from just tracing: - Built-in evaluators – 7 managed LLM judges (hallucination, relevance, correctness, toxicity, etc.) run automatically on every trace - Span-level evaluation – scores each step in a multi-step agent, not just the final output - Human annotation queues – when you need ground truth, not just vibes - Dataset export – annotated traces export as JSONL/CSV/Parquet for fine-tuning

Self-host with docker compose up.

I'd love feedback from anyone running AI agents or LLMs in production. What metrics do you actually look at? How do you decide if an agent response is "good enough"?

GitHub: https://github.com/deduu/auditi

Show HN: AI agents play SimCity through a REST API

https://hallucinatingsplines.com
72•aed•2d ago•23 comments

Show HN: Renovate – The Kubernetes-Native Way

https://github.com/mogenius/renovate-operator
13•JanLepsky•1h ago•10 comments

Show HN: Musical Interval Trainer

https://valtterimaja.github.io/musical-interval-trainer/
14•Gravityloss•3h ago•7 comments

Show HN: CodeMic

https://codemic.io/#hn
45•seansh•3d ago•21 comments

Show HN: Triclock – A Triangular Clock

https://triclock.franzai.com/
3•franze•1h ago•1 comments

Show HN: Rowboat – AI coworker that turns your work into a knowledge graph (OSS)

https://github.com/rowboatlabs/rowboat
181•segmenta•23h ago•50 comments

Show HN: JavaScript-first, open-source WYSIWYG DOCX editor

https://github.com/eigenpal/docx-js-editor
117•thisisjedr•1d ago•41 comments

Show HN: ClawPool – Pool Claude tokens to make $$$ or crazy cheap Claude Code

https://clawpool.ai
4•pablojamjam•3h ago•1 comments

Show HN: Gridpaper: Scientific figures in the browser, built on gnuplot via WASM

https://gridpaper.org/examples/
2•hnarayanan•2h ago•1 comments

Show HN: Auditi – open-source LLM tracing and evaluation platform

https://github.com/deduu/auditi
3•ariansyah•2h ago•0 comments

Show HN: I built a macOS tool for network engineers – it's called NetViews

https://www.netviews.app
228•n1sni•1d ago•55 comments

Show HN: Distr 2.0 – A year of learning how to ship to customer environments

https://github.com/distr-sh/distr
94•louis_w_gk•1d ago•29 comments

Show HN: I tried to build a soundproof sleep capsule

https://www.lepekhin.com/2026/02/10/Soundproof-Sleep-Capsule
3•bizzz•3h ago•0 comments

Show HN: Lorem.video – placeholder videos generated from URLs

https://lorem.video/
4•guntis_dev•3h ago•2 comments

Show HN: Stripe-no-webhooks – Sync your Stripe data to your Postgres DB

https://github.com/pretzelai/stripe-no-webhooks
62•prasoonds•22h ago•27 comments

Show HN: I made paperboat.website, a platform for friends and creativity

https://paperboat.website/home/
70•yethiel•23h ago•29 comments

Show HN: Baby Vault – A 100% offline, privacy-first PWA for new parents

https://babyvault.moshmage.com/
4•moshmage•4h ago•2 comments

Show HN: I built managed OpenClaw hosting with 60s provisioning in 6 days

https://clawhosters.com/blog/posts/how-i-built-60-second-vps-provisioning
2•yixn_io•4h ago•0 comments

Show HN: Multimodal perception system for real-time conversation

https://raven.tavuslabs.org
48•mert_gerdan•21h ago•14 comments

Show HN: I taught GPT-OSS-120B to see using Google Lens and OpenCV

41•vkaufmann•10h ago•30 comments

Show HN: ArtisanForge: Learn Laravel through a gamified RPG adventure

https://artisanforge.online/
38•grazulex•3d ago•3 comments

Show HN: I built a tool for lazy founders – it's called BunnyDesk

https://bunnydesk.ai
2•jacobsyc•4h ago•0 comments

Show HN: Sol LeWitt-style instruction-based drawings in the browser

https://intervolz.com/sollewitt/
42•intervolz•20h ago•7 comments

Show HN: Claudit – Claude Code Conversations as Git Notes, Automatically

https://github.com/re-cinq/claudit
6•EngineerBetter•4h ago•0 comments

Show HN: OpenClaw Kubernetes Operator

https://github.com/OpenClaw-rocks/k8s-operator
2•stubbi•5h ago•1 comments

Show HN: Building My Own Google Analytics for $0

https://www.adwait.me/writings/building-my-own-google-analytics
10•adwait12345•8h ago•5 comments

Show HN: Εἶδος – A non-Turing-complete language built on Plato's Theory of Forms

https://github.com/realadeel/eidos
2•proletarian•6h ago•2 comments

Show HN: Windy – Place wind turbines on a map, see residential impact

https://windy-pi.vercel.app/
2•baqiwaqi•6h ago•0 comments

Show HN: Web Scraping Sandbox Website

https://scrapingsandbox.com/
2•vrathee•6h ago•1 comments

Show HN: Elysia JIT "Compiler", why it's one of the fastest JavaScript framework

https://elysiajs.com/internal/jit-compiler
50•saltyaom•3d ago•10 comments