I built AgentCheck, an open-source testing tool for LLM agents. It lets you:
Snapshot full agent runs (prompt, LLM calls, tool outputs, final answer)
Replay the trace locally — no API calls, no token costs
Diff agent behavior over time
Assert outputs to catch regressions
Why? Because today, most AI agents are tested by spot-checking outputs or rerunning flaky evals — which breaks CI, costs money, and misses edge cases. AgentCheck works more like Jest or VCR.py, but for LLM workflows. It records and replays traces so you can test agents like real software.
It’s CLI-first, dev-friendly, and designed to plug into LangChain/OpenAI workflows.
Still early I’d love feedback, contributors, and use cases from folks building agentic systems. The code’s here: https://github.com/hvardhan878/agentcheck
Thanks!