We built ModelRed to test AI models and apps for security issues. Ran 4,182 attack probes against 9 leading models to see what would break.
Leaderboard: https://modelred.ai (no signup, just check it out)
Claude scored 9.5/10 but still failed on medical/financial prompts. Mistral Large scored 3.3/10. The gap between best and worst is huge.
We test for prompt injections, data leaks, jailbreaks, risky tool calls, domain specific hacks, basically everything that goes wrong when your LLM has access to real data and APIs. The platform runs these tests continuously and blocks CI/CD deployments when scores drop.
Works with any provider (OpenAI, Anthropic, AWS, Huggingface endpoints, OpenRouter etc).
Looking for around 20 people/teams shipping AI in production to be early design partners, help us figure out what features actually matter, contribute attack vectors, shape the roadmap.
Weirdest finding: same prompt injection works on 60% of models because everyone copies the same defense patterns.
Happy to answer questions about methodology, specific vulnerabilities, or if you want to be a design partner.