frontpage.

Show HN: Lumina – Open-source observability for AI systems(OpenTelemetry-native)

https://github.com/use-lumina/Lumina

1•Evanson•1w ago

Hey HN! I built Lumina – an open-source observability platform for AI/LLM applications. Self-host it in 5 minutes with Docker Compose, all features included.

The Problem:

I've been building LLM apps for the past year, and I kept running into the same issues: - LLM responses would randomly change after prompt tweaks, breaking things. - Costs would spike unexpectedly (turns out a bug was hitting GPT-4 instead of 3.5). - No easy way to compare "before vs after" when testing prompt changes. - Existing tools were either too expensive or missing features in free tiers.

What I Built:

Lumina is OpenTelemetry-native, meaning: - Works with your existing OTEL stack (Datadog, Grafana, etc.). - No vendor lock-in, standard trace format. - Integrates in 3 lines of code.

Key features: - Cost & quality monitoring – Automatic alerts when costs spike, or responses degrade. - Replay testing – Capture production traces, replay them after changes, see diffs. - Semantic comparison – Not just string matching – uses Claude to judge if responses are "better" or "worse." - Self-hosted tier – 50k traces/day, 7-day retention, ALL features included (alerts, replay, semantic scoring)

How it works:

```bash # Start Lumina git clone https://github.com/use-lumina/Lumina cd Lumina/infra/docker docker-compose up -d ```

```typescript // Add to your app (no API key needed for self-hosted!) import { Lumina } from '@uselumina/sdk';

const lumina = new Lumina({ endpoint: 'http://localhost:8080/v1/traces', });

// Wrap your LLM call const response = await lumina.traceLLM( async () => await openai.chat.completions.create({...}), { provider: 'openai', model: 'gpt-4', prompt: '...' } ); ```

That's it. Every LLM call is now tracked with cost, latency, tokens, and quality scores.

What makes it different:

1. Free self-hosted with limits that work – 50k traces/day and 7-day retention (resets daily at midnight UTC). All features included: alerts, replay testing, and semantic scoring. Perfect for most development and small production workloads. Need more? Upgrade to managed cloud.

2. OpenTelemetry-native – Not another proprietary format. Use standard OTEL exporters, works with existing infra. Can send traces to both Lumina AND Datadog simultaneously.

3. Replay testing – The killer feature. Capture 100 production traces, change your prompt, replay them all, and get a semantic diff report. Like snapshot testing for LLMs.

4. Fast – Built with Bun, Postgres, Redis, NATS. Sub-500ms from trace to alert. Handles 10k+ traces/min on a single machine.

What I'm looking for:

- Feedback on the approach (is OTEL the right foundation?) - Bug reports (tested on Mac/Linux/WSL2, but I'm sure there are issues) - Ideas for what features matter most (alerts? replay? cost tracking?) - Help with the semantic scorer (currently uses Claude, want to make it pluggable)

Why open source:

I want this to be the standard for LLM observability. That only works if it's: - Free to use and modify (Apache 2.0) - Easy to self-host (Docker Compose, no cloud dependencies) - Open to contributions (good first issues tagged)

The business model is managed hosting for teams that don't want to run infrastructure. But the core product is and always will be free.

Try it: - GitHub: https://github.com/use-lumina/Lumina - Docs: https://docs.uselumina.io - Quick start: 5 minutes from `git clone` to dashboard

I'd love to hear what you think! Especially interested in: - What observability problems are you hitting with LLMs - Missing features that would make this useful for you - Any similar tools you're using (and what they do better)

Thanks for reading!

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

NASA now allowing astronauts to bring their smartphones on space missions

Claude Code Is the Inflection Point

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

AI Agent Automates Google Stock Analysis from Financial Reports

Voxtral Realtime 4B Pure C Implementation

I Was Trapped in Chinese Mafia Crypto Slavery [video]

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

Study of 150 developers shows AI generated code no harder to maintain long term

Spotify now requires premium accounts for developer mode API access

When Albert Einstein Moved to Princeton

Agents.md as a Dark Signal

System time, clocks, and their syncing in macOS

McCLIM and 7GUIs – Part 1: The Counter

So whats the next word, then? Almost-no-math intro to transformer models

Ed Zitron: The Hater's Guide to Microsoft

UK infants ill after drinking contaminated baby formula of Nestle and Danone

Show HN: Android-based audio player for seniors – Homer Audio Player

Starter Template for Ory Kratos