frontpage.

Show HN: Agent/LLM observability for tracing, cost, evals, and debugging

https://aback-handbell-1cd.notion.site/Progress-Observability-Platform-2b081d53bbc680fa9f98e7ece233b756

1•zlatkov•2mo ago

Hi HN - I’m Alex, currently Head of Agent Development Tools at Progress. Before this, I was a Co-founder/CEO of a session replay startup called SessionStack, which was acquired in August this year.

Since then, I’ve been pretty deep in the LLM/agent dev tools, and observability has been my main thing.

I ran a small poll on LinkedIn recently about where teams are with observability for LLM-powered apps/agents. Results:

• 20% instrument LLM observability from day 1 • 30% plan to implement later • 20% are building an in-house solution • 30% are still learning about this space

That 20% building in-house was the most interesting to me, so I followed up with a mix of early-stage, YC founders and more mature orgs. The drivers I kept hearing:

1) Local / self-hosted models Some teams assume there aren’t viable observability options for local/hybrid LLM stacks, so DIY feels like the default. In practice, there are ways to do this, but they’re easy to miss right now.

2) Cost uncertainty Token usage is hard to estimate early on, so pricing feels unpredictable. A minimal in-house layer looks safer than surprise bills.

3) Control + speed Bootstrapping basic tracing/logging is straightforward and gives full ownership while teams iterate quickly on the core product.

This reminds me a lot of early APM / product analytics. Many teams started with “we’ll just implement our own logging.” Totally reasonable at the beginning — but once usage and complexity scaled, that logging quietly turned into:

• an internal platform to maintain • a backlog of features to build • a growing surface area of edge cases to debug

…often becoming a real distraction from the core business.

Our bet is LLM/agent observability follows the same path: teams start with DIY logging, then realize it’s becoming a side-product, and eventually most adopt a standard platform early. We’re also seeing APM/analytics vendors expand into LLM flows, which reinforces that direction.

What we’re building My team and I are working on LLM/agent observability focused on usage, cost/pricing, evaluations, and debugging. Most teams we talk to still don’t have anything in place, even when LLMs are core to the product, so we’re trying to make the “day 1” setup practical.

We're part of a larger org, but this team is being run like a startup within it: small group, fast cycles, heavy on user conversations, and shipping quickly based on real usage. That setup is why we’re doing early access and iterating closely with teams.

Early preview / notes here: https://aback-handbell-1cd.notion.site/Progress-Observabilit...

We’re planning to support self-hosted options as well.

If this is relevant to what you’re building and you want to help us shape the LLM Observability you need, we have a free Early Access Program here: https://www.telerik.com/agent-observability-early-access

The original vi is a product of its time (and its time has passed)

Circumstantial Complexity, LLMs and Large Scale Architecture

Tech Bro Saga: big tech critique essay series

Show HN: A calculus course with an AI tutor watching the lectures with you

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

Show HN: SAA – A minimal shell-as-chat agent using only Bash

Mario Tchou

Does Anyone Even Know What's Happening in Zim?

The last Morse code maritime radio station in North America [video]

Show HN: Hacker Newspaper – Yet another HN front end optimized for mobile

OpenClaw Is Changing My Life

Everything you need to know about lasers in one photo

SCOTUS to decide if 1988 video tape privacy law applies to internet uses

Epstein files reveal deeper ties to scientists than previously known

Red teamers arrested conducting a penetration test

Show HN: Open-source AI powered Kubernetes IDE

Show HN: Lucid – Use LLM hallucination to generate verified software specs

AI Doesn't Write Every Framework Equally Well

Aisbf – an intelligent routing proxy for OpenAI compatible clients

Let's handle 1M requests per second

OpenClaw Partners with VirusTotal for Skill Security

Goal: Ship 1M Lines of Code Daily

Show HN: Codex-mem, 90% fewer tokens for Codex

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

LineageOS 23.2

Crypto Deposit Frauds

Substack makes money from hosting Nazi newsletters

Framing an LLM as a safety researcher changes its language, not its judgement

Are there anyone interested about a creator economy startup

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills