frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

16•atarus•2h ago
Hey HN - we're Tarush, Sidhant, and Shashij from Cekura (https://www.cekura.ai). We've been running voice agent simulation for 1.5 years, and recently extended the same infrastructure to chat. Teams use Cekura to simulate real user conversations, stress-test prompts and LLM behavior, and catch regressions before they hit production.

The core problem: you can't manually QA an AI agent. When you ship a new prompt, swap a model, or add a tool, how do you know the agent still behaves correctly across the thousands of ways users might interact with it? Most teams resort to manual spot-checking (doesn't scale), waiting for users to complain (too late), or brittle scripted tests.

Our answer is simulation: synthetic users interact with your agent the way real users do, and LLM-based judges evaluate whether it responded correctly - across the full conversational arc, not just single turns. Three things make this actually work: Scenario generation + real conversation import - Our scenario generation agent bootstraps your test suite from a description of your agent. But real users find paths no generator anticipates, so we also ingest your production conversations and automatically extract test cases from them. Your coverage evolves as your users do.

Mock tool platform - Agents call tools. Running simulations against real APIs is slow and flaky. Our mock tool platform lets you define tool schemas, behavior, and return values so simulations exercise tool selection and decision-making without touching production systems.

Deterministic, structured test cases - LLMs are stochastic. A CI test that passes "most of the time" is useless. Rather than free-form prompts, our evaluators are defined as structured conditional action trees: explicit conditions that trigger specific responses, with support for fixed messages when word-for-word precision matters. This means the synthetic user behaves consistently across runs - same branching logic, same inputs - so a failure is a real regression, not noise.

Cekura also monitors your live agent traffic. The obvious alternative here is a tracing platform like Langfuse or LangSmith - and they're great tools for debugging individual LLM calls. But conversational agents have a different failure mode: the bug isn't in any single turn, it's in how turns relate to each other. Take a verification flow that requires name, date of birth, and phone number before proceeding - if the agent skips asking for DOB and moves on anyway, every individual turn looks fine in isolation. The failure only becomes visible when you evaluate the full session as a unit. Cekura is built around this from the ground up. Where tracing platforms evaluate turn by turn, Cekura evaluates the full session. Imagine a banking agent where the user fails verification in step 1, but the agent hallucinates and proceeds anyway. A turn-based evaluator sees step 3 (address confirmation) and marks it green - the right question was asked. Cekura's judge sees the full transcript and flags the session as failed because verification never succeeded.

Try us out at https://www.cekura.ai - 7-day free trial, no credit card required. Paid plans from $30/month.

We also put together a product video if you'd like to see it in action: https://www.youtube.com/watch?v=n8FFKv1-nMw. The first minute dives into quick onboarding - and if you want to jump straight to the results, skip to 8:40.

Curious what the HN community is doing - how are you testing behavioral regressions in your agents? What failure modes have hurt you most? Happy to dig in below!

Comments

sidhantkabra•1h ago
Was really fun building this - would love feedback from the HN community and get insights on your current process.
moinism•55m ago
congrats on the launch! do you guys have anything planned to test chat agents directly in the ui? I have an agent, but no exposed api so can't really use your product even though I have a genuine need.
atarus•47m ago
Yes, we do support integrations with different chat agent providers and also SMS/Whastap agents where you can just drop a number of the agent.

Let us know how your agent can be connected to and we can advise best on how to test it.

FailMore•19m ago
Any ideas how to solve the agent's don't have total common sense problem?

I have found when using agents to verify agents, that the agent might observe something that a human would immediately find off-putting and obviously wrong but does not raise any flags for the smart-but-dumb agent.

atarus•10m ago
To clarify you are using the "fast brain, slow brain" pattern? Maybe an example would help.

Broadly speaking, we see people experiment with this architecture a lot often with a great deal of success. A few other approaches would be an agent orchestrator architecture with an intent recognition agent which routes to different sub-agents.

Obviously there are endless cases possible in production and best approach is to build your evals using that data.

I'm reluctant to verify my identity or age for any online services

https://neilzone.co.uk/2026/03/im-struggling-to-think-of-any-online-services-for-which-id-be-will...
316•speckx•2h ago•164 comments

India's top court angry after junior judge cites fake AI-generated orders

https://www.bbc.com/news/articles/c178zzw780xo
267•tchalla•4h ago•120 comments

My first science video in 3 years (Pysics Girl)

https://www.youtube.com/watch?v=B3m3AMRlYfc
28•pcdavid•1h ago•3 comments

The Xkcd thing, now interactive

https://editor.p5js.org/isohedral/full/vJa5RiZWs
728•memalign•5h ago•91 comments

Claude's Cycles: Claude Opus 4.6 solves a problem posed by Don Knuth [pdf]

https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf
162•fs123•5h ago•70 comments

Meta’s AI smart glasses and data privacy concerns

https://www.svd.se/a/K8nrV4/metas-ai-smart-glasses-and-data-privacy-concerns-workers-say-we-see-e...
1281•sandbach•18h ago•724 comments

Apple Introduces MacBook Pro with All‑New M5 Pro and M5 Max

https://www.apple.com/newsroom/2026/03/apple-introduces-macbook-pro-with-all-new-m5-pro-and-m5-max/
313•scrlk•2h ago•328 comments

Don't Become an Engineering Manager

https://newsletter.manager.dev/p/dont-become-an-engineering-manager
75•flail•2h ago•54 comments

British Columbia is permanently adopting daylight time

https://www.cbc.ca/news/canada/british-columbia/b-c-adopting-year-round-daylight-time-9.7111657
1030•ireflect•20h ago•500 comments

Arm's Cortex X925: Reaching Desktop Performance

https://chipsandcheese.com/p/arms-cortex-x925-reaching-desktop
191•ingve•9h ago•101 comments

I'm losing the SEO battle for my own open source project

https://twitter.com/Gavriel_Cohen/status/2028821432759717930
216•devinitely•2h ago•102 comments

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

16•atarus•2h ago•5 comments

Ars Technica fires reporter after AI controversy involving fabricated quotes

https://futurism.com/artificial-intelligence/ars-technica-fires-reporter-ai-quotes
494•danso•15h ago•300 comments

I Used Claude to File My Taxes for Free

https://kachess.dev/taxes/ai/personal-finance/2026/02/27/breaking-up-with-turbotax.html
5•gdudeman•52m ago•1 comments

The Internet's Top Tech Publications Lost 58% of Their Google Traffic Since 2024

https://growtika.com/blog/tech-media-collapse
104•Growtika•2h ago•75 comments

Points on a ring: An interactive walkthrough of a popular math problem

https://growingswe.com/blog/points-on-ring
10•evakhoury•23h ago•0 comments

Apple introduces the new MacBook Air with M5

https://www.apple.com/newsroom/2026/03/apple-introduces-the-new-macbook-air-with-m5/
127•Garbage•2h ago•90 comments

History of the Graphical User Interface: The Rise (and Fall?) Of WIMP Design

https://www.uxtigers.com/post/gui-history
23•todsacerdoti•3d ago•12 comments

Simple screw counter

https://mitxela.com/projects/screwcounter
224•jk_tech•2d ago•62 comments

Apple unveils new Studio Display and all-new Studio Display XDR

https://www.apple.com/newsroom/2026/03/apple-unveils-new-studio-display-and-all-new-studio-displa...
111•victorbjorklund•2h ago•94 comments

Disable Your SSH access accidentally with scp

https://sny.sh/hypha/blog/scp
11•zdw•3d ago•2 comments

Computer Says No

https://koenvangilst.nl/lab/computer-says-no
57•vnglst•2d ago•26 comments

We Built a Video Rendering Engine by Lying to the Browser About What Time It Is

https://blog.replit.com/browsers-dont-want-to-be-cameras
125•darshkpatel•2d ago•50 comments

C64: Putting Sprite Multiplexing to Work

https://bumbershootsoft.wordpress.com/2026/02/28/c64-putting-sprite-multiplexing-to-work/
43•ibobev•1d ago•1 comments

Show HN: I built a sub-500ms latency voice agent from scratch

https://www.ntik.me/posts/voice-agent
508•nicktikhonov•19h ago•149 comments

A [Firefox, Chromium] extension that converts Microsoft to Microslop

https://addons.mozilla.org/en-US/android/addon/microslop/
15•gaius_baltar•54m ago•2 comments

Florida public universities to pause hiring new H-1B workers

https://www.wusf.org/education/2026-03-03/hiring-h1b-workers-florida-public-universities-pause-en...
34•rawgabbit•1h ago•16 comments

Show HN: React-Kino – Cinematic scroll storytelling for React (1KB core)

https://github.com/btahir/react-kino
12•bilater•2d ago•0 comments

Why No AI Games?

https://franklantz.substack.com/p/why-no-ai-games
18•pavel_lishin•43m ago•0 comments

I built a pint-sized Macintosh

https://www.jeffgeerling.com/blog/2026/pint-sized-macintosh-pico-micro-mac/
75•ingve•9h ago•19 comments