frontpage.

As a developer, I got tired of manually testing my AI agents and chatbots against the same prompt injections and jailbreaks every time I tweaked a system prompt. Our QA team was struggling with the exact same bottleneck, so I built BreakMyAgent.

It’s an open-source sandbox that runs an automated barrage of standard exploits against your target LLM to see if it leaks data or ignores core instructions.

How it works under the hood: - The UI is built with Streamlit, backend is FastAPI, and dependency management is handled by `uv`. - You paste your system prompt and hit run. It fires 12 baseline attack vectors (Direct leaks, XSS payloads, Context overflows, etc.) concurrently. - The core mechanic is "LLM-as-a-Judge". It uses a hardcoded `gpt-4.1-mini` with strict alignment rules to systematically evaluate the target's responses. - It supports OpenAI, Anthropic, and a solid list of open-weight models via OpenRouter (including DeepSeek V3/R1, Qwen 2.5, and Llama 3.3).

There is a hosted free version if you want to play with it immediately (I capped it at 15 requests/IP to survive the launch), but the entire tool is open-source and takes 30 seconds to spin up locally with Docker or `uv`.

Repo: https://github.com/BreakMyAgent/breakmyagent-os Live demo: https://breakmyagent.dev

Next on the roadmap: I'm building a dedicated CLI/GitHub Action so teams can drop this into their own CI/CD pipelines to block prompt regressions. I'm also developing a PoC for multi-turn agentic fuzzing and expanding the payload database for complex tool-spoofing.

I’d love to hear your feedback! What other test configurations (besides temperature and response format) do you think are essential for a tool like this? Also open to any feedback on the architecture, the judge prompt, or specific zero-day vectors you'd like to see included in the public database.

Five ways to spot when a paper is a fraud

Riot's New Fighting Game Is Imploding as It Lays Off 80 Developers

Snipit – A lightweight CLI to save and search code snippets locally

Show HN: MVAR – Deterministic sink enforcement for AI agent

Are you sure you're burning enough tokens?

Every AI code review vendor benchmarks itself, and wins

CesiumAstro Announces Acquisition of Vidrovr

AI Agents Want to Write TypeScript

History's Best Strategies for Avoiding Being Buried Alive

AI models are being prepared for the physical world

One-stop blood tests for multiple types of cancer are increasingly popular

Unit testing your code's performance, part 2: Testing speed

Robert Kaye, MetaBrainz Founder and Executive Director, Has Died

Cause-specific excess mortality in rural India during Covid-19 pandemic 2020–23

Show HN: Multiplayer realtime text-to-website demo (live edits via Sonnet 4.6)

Large language models reflect the ideology of their creators

Lofi Car

Penguins Are Solar Geoengineers

Show HN: Simple Viewers – Tiny native macOS file viewers

Worb: Local open-source wandb-compatible server

Accenture: You're promoted or fired on using the AI

US role as global talent hub in doubt amid Donald Trump's visa crackdown

Do you have to be polite to AI?

Solving Impossible Problems for Fun and Profit – Dan Gelbart

Firefox 148 introduces the AI kill switch for people who aren't into LLMs

Show HN: I built a 50ms SPF record and Shadow IT scanner

Show HN: Typed overlay over SQL now supports DuckDB

Foundation Models SDK for Python Documentation

Don't Panic: 'Humanity's Last Exam' Has Begun

High temperatures affect sex ratios at birth

Show HN: BreakMyAgent – Open-source red-teaming sandbox for LLM system prompts