frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Best tools for LLM product analytics (evals+monitoring+product metrics)?

2•aman_madhukar02•2h ago
I’m building an LLM-powered product and trying to figure out the right analytics / quality stack. By “product analytics” I mean more than token counts - I want evals, production monitoring, sliceable error analysis, release gating, and the ability to tie model/prompt changes to product KPIs.

What I’m looking for:

Offline evals / scorecards (benchmarks, rubrics, automated tests)

Production monitoring (drift, hallucination detection, latency/cost metrics)

Ability to tag & slice by model version / prompt version / user segment

Integration with product metrics (user success, retention, conversion) and CI/CD gating

Prefer options that are scriptable and support custom metrics/rubrics. Open-source or SaaS both fine. Privacy/on-prem options are a plus.

Things I’ve considered (but haven’t committed to): open-source eval frameworks, ML monitoring libs, and a few commercial platforms that claim “LLM evals + monitoring.” I’m not married to any single approach.

Questions for the community:

What tools/platforms have you used for full-stack LLM analytics (evals -> prod monitoring -> product KPI correlation)?

What worked vs what failed at scale? Any gotchas (cost, data volume, latency, false positives in hallucination detection)?

Recommended combos (e.g., offline eval + experiment platform + monitoring tool) that actually worked in production?

Any “must-have” rubrics/metrics you’d recommend for a product team shipping LLM features?

If you’ve got a short writeup, blog post, or GitHub repo showing your setup, please drop it - I’ll read and credit you. Happy to share more about my product (multi-turn assistant + retrieval + some tool calls) if that helps.

Thanks!

One Dollar Stats

https://onedollarstats.com/
1•Olshansky•59s ago•0 comments

America First? No, Billionaire Buddies First

https://paulkrugman.substack.com/p/america-first-no-billionaire-buddies
2•speckx•1m ago•0 comments

Databases Without an OS? Meet QuinineHM and the New Generation of Data Software

https://dataware.dev/blog/quininehm-next-gen-database.html
1•sreekanth850•1m ago•1 comments

Art Appreciation 2525

1•olooney•2m ago•0 comments

Medium is down, more than 4 hours

https://medium.com/
1•datelligence•2m ago•0 comments

Postman down. Time to curl again

https://twitter.com/anujxforge/status/1980196391097077922
1•gschier•3m ago•0 comments

Google Has Killed Privacy Sandbox

https://www.engadget.com/cybersecurity/google-has-killed-privacy-sandbox-130029899.html
1•adwmayer•4m ago•0 comments

The Apollo A6000

https://a6000.net/
1•kseistrup•4m ago•1 comments

Hacking the World Poker Tour: Inside ClubWPT Gold's Back Office

https://samcurry.net/hacking-clubwpt-gold
1•josephscott•4m ago•0 comments

Show HN: Fuzzy Magnet – A Radio with Proof of Work on a DHT

https://tropical.pages.dev/pow2/
1•turbidimeter•5m ago•0 comments

Building AGI Using Language Models (2020)

https://bmk.sh/2020/08/17/Building-AGI-Using-Language-Models/
1•rzk•6m ago•0 comments

Wired Roundup: Satellites Data Leak, Cybertrucks, Politicized Federal Workers

https://www.wired.com/story/uncanny-valley-podcast-wired-roundup-satellites-data-leak-cybertrucks...
1•quapster•6m ago•0 comments

A US startup plans to deliver 'sunlight on demand' after dark. Can it work?

https://theconversation.com/a-us-startup-plans-to-deliver-sunlight-on-demand-after-dark-can-it-wo...
1•PaulHoule•7m ago•0 comments

David Hockney's Xerox Prints

https://www.myartbroker.com/artist-david-hockney/articles/david-hockney-xerox-prints
1•Bogdanp•8m ago•0 comments

Ask HN: Testing AST or assembly output for a compiler

1•backslash_16•10m ago•0 comments

Optimizing LLM Context for Vulnerability Scanning

https://blog.fraim.dev/optimizing_llm_context_for_vulnerability_scanning/
1•excitedrustle•10m ago•0 comments

Frugal Living

https://so1o.xyz/blog/frugality
1•freediver•10m ago•0 comments

AWS outage exposes Achilles heel: central control plane

https://www.theregister.com/2025/10/20/aws_outage_chaos/
3•beardyw•15m ago•1 comments

YC deal size inflation calculator. Including "unicorn" adjustment

https://www.danielfalbo.com/yc-inflation-calculator
2•danielfalbo•15m ago•0 comments

Colin Jost, Pete Davidson and the Staten Island Ferry Fiasco

https://www.nytimes.com/2025/10/20/style/colin-jost-pete-davidson-staten-island-ferry.html
4•axiomdata316•18m ago•1 comments

Brain Entrain (40hz mode inspired by MIT Research)

https://brainentrain.web.app/
1•MichealCodes•18m ago•1 comments

Research results are cultural artifacts, not public goods

https://lemire.me/blog/2025/10/17/research-results-are-cultural-artifacts-not-public-goods/
2•vinhnx•19m ago•2 comments

Funny Domain

https://thisdomain.sucks/domain/penisland-net
1•nachoag7•21m ago•0 comments

Why I'm not a fan of zero-copy Apache Kafka-Apache Iceberg

https://jack-vanlightly.com/blog/2025/10/15/why-im-not-a-fan-of-zero-copy-apache-kafka-apache-ice...
2•vinhnx•22m ago•0 comments

Japan Unleashes Capitalism by Letting 'Zombie' Companies Die

https://www.bloomberg.com/features/2025-japan-zombie-companies-debt
1•pr337h4m•24m ago•0 comments

Show HN: Restring – a fast, smart web toolbox for JSON, JWT, Base64, and more

https://restring.dev
2•kang_li•25m ago•0 comments

Elegy for AWS us-east-1 (on guitar) [video]

https://www.youtube.com/shorts/gMolfiaUrPI
2•jszafran•25m ago•0 comments

The Human Only Public License

https://frederic.vanderessen.com/posts/hopl/
3•freediver•26m ago•0 comments

Show HN: Wyapy – Capture customer feedback and uncover what to improve

https://www.wyapy.com
1•tony31•27m ago•0 comments

Claude for Life Sciences

https://www.anthropic.com/news/claude-for-life-sciences
2•meetpateltech•28m ago•0 comments