I've been building AI agents lately and kept running into the same problem: how do you test AI Agents?
I find that manually prompting the Agent for each release is tedious and not scalable. Also, existing solutions for testing agents are often complex to integrate.
To help with this I built a simple open-source testing framework that uses AI to validate AI: you define expected behavior and let an LLM judge if the output is semantically correct.
The LLMJudge returns a score (0-1) and reasoning for why it passed/failed.
You can try it live here (no signups): https://semantictest.dev
The playground runs real LLMJudge validation so you can see how the semantic testing works.
The code is completely open source and you can find extensive documentation here: https://docs.semantictest.dev
Would love feedback from you guys!
Thank you!