I built a demo of this. It ingests AI conversations and runs 3 workers (GPT-4o-mini): intent classifier, quality scorer (LLM-as-judge), and task completion detector. Results show up in a dashboard designed for PMs, not engineers.
Stack: Python SDK (zero deps, async) → FastAPI → Supabase → GPT-4o-mini workers → Next.js dashboard.
Demo with sample data (not live product, validating the concept): https://dashboard-xi-taupe-75.vercel.app
The sample data models an AI app builder. Interesting patterns: scaffolding works great (78% success), but API integrations fail 75% of the time, and users who enter bug-fix loops almost always churn.
Key design question: is the "insights layer" (auto-generated recommendations, revenue-at-risk estimates, root cause identification) valuable enough to differentiate from Langfuse/Helicone adding product analytics to their existing tracing tools?
Looking for honest feedback, especially from AI product builders.