We built our eval studio tool, equipped with a CLI tool for testing AI agents locally.
Most agent workflows I’ve seen don’t have any real evaluation layer. People test manually or rely on prompt tweaks.
I wanted something closer to how we treat backend systems, where you can run tests before shipping.
Eval Studio:
* scans your repo and detects likely agents
* generates eval datasets based on your agent
* runs tests locally against your implementation
* surfaces failures and behavioral gaps
It doesn’t require deploying anything — it runs directly on your local setup.
Get your API key and try it: dutchmanlabs.com
Would really appreciate feedback, especially from people building LLM apps or agent workflows.
thesarsour•1h ago
Most agent workflows I’ve seen don’t have any real evaluation layer. People test manually or rely on prompt tweaks.
I wanted something closer to how we treat backend systems, where you can run tests before shipping.
Eval Studio:
* scans your repo and detects likely agents * generates eval datasets based on your agent * runs tests locally against your implementation * surfaces failures and behavioral gaps
It doesn’t require deploying anything — it runs directly on your local setup.
Get your API key and try it: dutchmanlabs.com
Would really appreciate feedback, especially from people building LLM apps or agent workflows.