I built it because I was tired of debugging agent failures by grepping through logs, and the available AI observability tools all seemed to require intrusive instrumentation and/or sending my prompts and responses to a cloud service. I wanted something that would let me debug agent runs locally, without having to worry about vendor lock-in or data privacy.
Orchid is that tool. The call inspection features work extremely well, at least for my use cases, but the replay feature is perhaps more interesting. It makes LLM pipeline testing deterministic without mocking or re-running expensive API calls.
Free, self-hosted, runs on your machine or infrastructure: https://github.com/mario-guerra/orchid-trace
Would love feedback from anyone building multi-step agentic systems or struggling with non-deterministic LLM test failures.