I built Cipher to fix that. It's an AI agent that reasons like an attacker — maps the target, finds vulnerabilities, chains them into exploits, and proves they're real. Every finding ships with a reproducible Python script. If the script doesn't break your system, we don't report it.
How it works: Cipher defines security invariants ("User A can't access User B's data"), then multiple agents attack in parallel to violate them. A separate judge agent tries to disprove every finding — if it can't reproduce the exploit 3 times, the finding dies. You never see it.
$999 per assessment. Results in ~2 hours. Unlimited retesting.
Honest limitations: complex multi-step auth flows (SSO with MFA) still need manual setup like providing JWT credentials. We're working on it.
I'll run Cipher free for the first 15 HN readers who want to try it. Drop your email or sign up at https://apxlabs.ai/. Happy to answer any questions about the approach.
tonetegeatinst•1h ago
I'm currently studying security in college, and most of my time is spent working on a good system card and premade prompts for certain situations like using nmap or burpsuite.
gauravbsinghal•1h ago
What matters more than the model:
1. Architecture over prompts. Cipher isn't one agent with a great prompt — it's multiple agents with distinct roles (recon, attack, verification) that coordinate. The "judge" agent that tries to disprove findings is more important than the attacker agent. 2. Tool use over reasoning. The model doesn't "know" how to pentest — it reasons about what tool to use next based on what it's learned so far. We give it real tools (not simulated ones) and let it chain them. 3. Invariant-based testing over checklist-based. Instead of "try SQLi on every input," Cipher defines security properties ("User A can't access User B's data") and tries to violate them. This catches logic bugs that no scanner finds.
Since you're studying security — the best thing you can do is get really good at manual pentesting first. Understanding why an attack chain works is what lets you build agents that reason about it. The prompts matter less than the mental model you encode into the system's architecture.
Happy to chat more — feel free to DM or join our Discord.