Here's a demo if you're interested: https://www.youtube.com/watch?v=cc4DWrJF7hk. When agents fail, choose wrong tools, or blow cost budgets, there's no way to know why - usually just logs and guesswork. As agents move from demos to production with real SLAs and real users, this is not sustainable.
Neel and I lived this, building agents at SenseHQ and Accenture where we found that debugging agents was often harder than actually building them. Agents are untrustworthy in prod because there’s no good infrastructure to verify what they’re actually doing.
In practice this looks like: - A support agent that began misclassifying refund requests as product questions, which meant customers never reached the refund flow. - A document drafting agent that would occasionally hallucinate missing sections when parsing long specs, producing confident but incorrect outputs. There’s no stack trace or 500 error and you only figure this out when a customer is angry.
We both realized teams were flying blind in production, and that agent native monitoring was going to be foundational infrastructure for every serious AI product. We started Sentrial as a verification layer designed to take care of this.
How it works: You wrap your client with our SDK in only a couple of lines. From there, we detect drift for you: - Wrong tool invocations - Misunderstood intents - Hallucinations - Quality regressions over time. You see it on our platform before a customer files a ticket.
There’s a quick mcp set up, just give claude code: claude mcp add --transport http Sentrial https://www.sentrial.com/docs/mcp
We have a free tier (14 days, no credit card required). We’d love any feedback from anyone running agents whether they be for personal use or within a professional setting.
We’ll be around in the comments!
rajit•1h ago
anayrshukla•1h ago
What we do in practice is let the team mark a few tool calls as right or wrong in context, then use that to learn the pattern for that agent. From there, we can flag similar cases automatically by looking at the convo state, the tool chosen, the arguments, and what happened next.
So we’re learning what “correct” looks like for your workflow and then catching repeats of the same kind of mistake.