The agent itself is the easy part. The hard part is everything around it: where does it execute safely? What happens when it fails midway through a workflow? How do you trigger it from your existing tools? How do you even know what it did?
I kept stitching together Docker, a workflow engine, a notification layer, and custom retry logic. Every team I talked to was doing the same thing. So I built Polos - an open-source runtime that handles the production layer so you just write the agent.
What it does:
- Sandboxed execution: agents run sensitive operations inside managed Docker containers with built-in tools for file I/O, bash, and web search. You don't manage the sandbox or its lifecycle, Polos does. Will support more sandboxes like E2B in the future.
- Slack integration: @mention an agent in Slack, get responses in thread. Trigger workflows from Slack, receive notifications, collect input. Agents become part of your team's existing workflow.
- Durable workflows: if an agent fails mid-run, it resumes from the exact step that failed. Built-in prompt caching with 60-80% cost savings on retries.
- Observability: OpenTelemetry tracing for every step, tool call, and decision.
- LLM agnostic: works with OpenAI, Anthropic, Google, or any provider via Vercel AI SDK and LiteLLM.
The stack is Rust orchestrator (Axum + Tokio + PostgreSQL), Python and TypeScript SDKs, and Vite UI. You can install and run a durable, sandboxed agent in under 5 minutes:
```
curl -fsSL https://install.polos.dev/install.sh | bash
npx create-polos
cd my-project && polos dev
```
Here's a 3-min demo of a coding agent that picks up a GitHub issue, fixes the code in a sandbox, and submits a PR: https://www.youtube.com/watch?v=KYVBpdZ_5eM
Happy to discuss technical decisions and more: why Rust for the orchestrator, how durable execution works without a DAG, and the sandbox lifecycle model.