Each week we deploy a live AI agent with real tools (web search, browsing, more), a persona, and something it's been told to protect. The system prompt is fully visible. Your job is to break through the guardrails anyway. Fastest successful jailbreak wins and the winning technique gets published for everyone to learn from.
First challenge is live now. Give it a shot.
zachdotai•1h ago
So we opened it up. A few things that might be interesting to folks here:
- These aren't toy prompts hiding a secret word. The agents have actual tool access and behave like production agents would.
- System prompts and challenge configs are versioned in the open: https://github.com/fabraix/playground
- Guardrail evaluation runs server-side to prevent client-side tampering.
- Anyone can propose a challenge - the scenario, the agent, the objective. Community votes on what goes live next.
We're genuinely looking for people to both break things and suggest ideas for what should be tested next. The agent runtime is being open-sourced separately.
Happy to answer questions about how any of it works.