Reproducible System Prompt Extraction in Latest Claude Models
1•asparius•1d ago
I found a simple, reproducible prompt-injection path that makes recent Claude models reveal their full system prompt (network config, tool rules, allowed domains, etc.) using only conversational framing. No jailbreak tricks required.
Write-up with examples:
https://asparius.github.io/posts/prompt-injection-claude.html
Comments
orbitfoundry•1d ago
This matches what we’ve seen as well — system prompt extraction often doesn’t require jailbreak-style tricks. Simple conversational framing over multiple turns is enough.
That’s why relying on instruction hierarchy or post-response moderation feels insufficient. Gating inputs before execution ended up being more effective for us.
orbitfoundry•1d ago
That’s why relying on instruction hierarchy or post-response moderation feels insufficient. Gating inputs before execution ended up being more effective for us.