The problem: you open Claude Code, give it a task, it does 80%. You fix the other 20%, open another chat for the next piece, copy context, retry when it drifts. Before you know it you're a full-time AI babysitter — 4 monitors, 12 terminals, zero confidence anything actually ships.
Polpo fixes this. You build an AI company: hire agents, give them roles, skills, and credentials stored in a keychain. They work as a team.
The key difference is quality control. Think of it like UFC judges, but for your agents' work:
For every task you define custom scoring criteria 3 independent LLM reviewers evaluate the output in parallel Median score vs threshold — below? The agent goes back to work automatically You don't even know it happened. Nothing ships broken. What else:
Crash-proof — detached processes, kill it, reboot, lose connection. Picks up where it left off Agents get real credentials, persistent browser sessions, real accounts — not sandboxed toys Reaches you on Slack, Telegram, or email only when it needs a human decision 22+ LLM providers including Ollama and local models — not an OpenAI wrapper Web UI, CLI, REST API, SSE streaming — all included Storage: flat files (default), SQLite, or Postgres One command:
npx polpo-ai
Claude Code gives you a developer. OpenClaw gives you an assistant. Polpo gives you a company. Marketing agency, dev team, customer support — all AI, on your laptop.
MIT licensed. TypeScript. Hono + React. No hosted platform, no waitlist — your keys, your rules.
GitHub: https://github.com/lumea-labs/polpo Docs: https://docs.polpo.sh Site: https://polpo.sh
Happy to go deep on the scoring architecture, the state machine, crash recovery, or anything else.
Deltam82•2h ago
alemic•2h ago