frontpage.

What relai-sdk is an open-source toolkit for making AI agents reliable via a complete learning loop: simulate → evaluate → optimize.

Why Agent runs are stochastic; tool-calls fail; hard to reproduce, measure, and fix at scale. It’s also hard to align behavior with goals across output quality/format, cost, and latency. We need a loop that integrates user feedback and LLM evaluators directly into the agent code (prompts, configs, models, graphs) without overfitting.

How - Simulation: LLM personas, mocked MCP servers/tools, synthetic data; can condition on real traces - Evaluation: code-based + LLM-based evaluators; turn human reviews into optimization-ready benchmarks - Optimization with Maestro: tune prompts, configs and even agent graph for improved quality, cost and latency

Try it pip install relai

GitHub: https://github.com/relai-ai/relai-sdk

Docs: https://docs.relai.ai/ (2-min overview: https://youtu.be/qKsJUD_KP40)

Looking for feedback on - Where graph-level suggestions help (beyond prompt tuning) - Evaluator signals you rely on for reliability (and what we’re missing) - Simulation setups/environments you’d want out of the box

Notes Founder here. Happy to share internals, tradeoffs, and limitations.

Works with LangGraph / OpenAI Agents / Google ADK / etc. SDK Apache-2.0 license.

fp.

Show HN: Relai-SDK – simulate → evaluate → optimize AI agents