A practical guide for security researchers looking to get into adversarial LLM research. Covers the core distinction between jailbreaking (bypassing model values) and prompt injection (hijacking model context in deployed systems), three foundational attack techniques (gradual escalation, persona hijacking, nested contexts), and a methodology for practicing on open-weight models.
dylannnn•2h ago