Your agent worked yesterday. Today it's broken. What changed?
EvalView catches regressions before they hit prod. Tool changes, hallucinations, cost spikes.
Save a baseline. Run evalview run --diff. CI fails if behavior drifts.
pip install evalview && evalview demo
No API key needed. Ollama support for free local evals. Chat mode if you hate memorizing CLI flags.
Built this after an agent started inventing numbers in production. Would love feedback from anyone shipping agents.
Hidai
hidai25•1d ago