It’s a forward-deployed research agent designed to live alongside real AI systems and continuously:
Generate hypotheses about where models may fail
Design and run experiments in: LAB (sandboxed) SHADOW (mirrored production traffic) PRODUCTION (real users, gated)
Classify failures across: - Reasoning - Long-context behavior - Tool use - Feedback loops - Deployment & latency
Propose interventions
Simulate those interventions on real traces before deployment
Gate risky changes with optional human approval
It’s meant for teams who already run AI in production and want continuous, structured failure discovery, not just offline evals.
It’s: Open source (Apache 2.0) Python-first Designed to integrate as a sidecar via a pipeline adapter
Built around explicit modes, risk tiers (SAFE / REVIEW / BLOCK), and severity levels (S0–S4)
This is early but functional. I’d really appreciate:
Skeptical feedback
Edge cases you think would break this
Whether this solves a real problem for you or not
Repo: https://github.com/oliveskin/Agent-Tinman
Happy to answer anything technical.
oliveskin•1d ago
It’s a forward-deployed research agent designed to live alongside real AI systems and continuously:
Generate hypotheses about where models may fail
Design and run experiments in: LAB (sandboxed) SHADOW (mirrored production traffic) PRODUCTION (real users, gated)
Classify failures across: - Reasoning - Long-context behavior - Tool use - Feedback loops - Deployment & latency
Propose interventions
Simulate those interventions on real traces before deployment
Gate risky changes with optional human approval
It’s meant for teams who already run AI in production and want continuous, structured failure discovery, not just offline evals.
It’s: Open source (Apache 2.0) Python-first Designed to integrate as a sidecar via a pipeline adapter
Built around explicit modes, risk tiers (SAFE / REVIEW / BLOCK), and severity levels (S0–S4)
This is early but functional. I’d really appreciate:
Skeptical feedback
Edge cases you think would break this
Whether this solves a real problem for you or not
Repo: https://github.com/oliveskin/Agent-Tinman
Happy to answer anything technical.