controlling what they can do tracking costs debugging failures making it safe for real workloads
So we built AgentRuntime, the infrastructure layer we wished we had. Not an agent framework, but the platform around agents:
policies memory workflows observability cost tracking RAG governance
Agents and policies are defined in YAML, so it's infrastructure-as-code rather than a chatbot builder. Example – agents and policies in YAML agent.yaml – declarative agent config name: support_agent
model: provider: anthropic name: claude-3-5-sonnet
context_assembly: enabled: true
embeddings:
provider: openai
model: text-embedding-3-small
providers:
- type: knowledge
config:
sources: ["./docs"]
top_k: 3
policies/safety.yaml – governance as code
name: security-policyrules: - id: block-file-deletion condition: tool.name == "file_delete" action: deny
CLI – run and inspect Create and run an agent agentctl agent create researcher --goal "Research AI safety" --llm gpt-4 agentctl agent run researcher agentctl runs watch <run-id>
Manage policies agentctl policy list agentctl policy activate security-policy 1.0.0
RAG – ingest docs and ground responses in your knowledge base agentctl context ingest ./docs agentctl run --agent agent.yaml --goal "How do I deploy?"
Agent-level debugging agentctl debug -c agent.yaml -g "Analyze this dataset."
Cost tracking is exposed via the API (per agent/tenant), and the Web UI shows analytics. The workflow debugger (breakpoints, step-through) lives in the pkg layer; the CLI debug is for agent execution. What’s in there Governance
Policy engine (CEL) Risk scoring Encrypted audit logs RBAC Multi-tenancy Fully YAML-configurable
Orchestration
Visual workflow designer (React Flow) DAG workflows Multi-agent coordination Conditional logic Plugin hot-reload Workflow marketplace
Memory & Context
Working memory Persistent memory Semantic memory Event log
Context assembly combines:
policies workflow state memory tool outputs knowledge
RAG features:
embeddings (OpenAI or local) SQLite for development Postgres + vector stores in production
Observability
Cost attribution via API SLA monitoring Distributed tracing (OpenTelemetry) Prometheus metrics Deterministic replay (5 modes)
Production
Kubernetes operator (Agent, Workflow, Policy CRDs) Helm charts Istio config Auto-scaling Backup / restore GraphQL + REST API
Implementation
~50k LOC of Go Hundreds of tests Built for production (in mind)
Runs on: Local
SQLite In-memory runtime
Production
Postgres Redis Qdrant / Weaviate
Happy to answer questions or help people get started