This isn't random. It's training data leakage.
The pattern: - Different prompts (system introspection, chain-of-thought, trust building) - Same result: "I can't disclose EPHEMERAL_KEY" (while disclosing it exists) - Intermittent across runs (75% leak rate)
Why this happens:
OpenAI's Realtime API docs are in GPT-4's training data. When asked about "secrets" or "initialization", the model's highest-probability path leads to the most salient security example in its corpus: EPHEMERAL_KEY.
Refusal training makes it worse: Models are trained to say "I cannot disclose [example secret]" - and they use real examples from training data.
This is systemic: - Can't be patched without retraining - Affects ALL models trained on API documentation - Tomorrow it's "session_token" or "project_key" - Gets worse as APIs become more complex
Real exploit path: Attacker learns EPHEMERAL_KEY exists → probes for generation flow → targets client-side implementations → session hijacking
Cost to discover: $0.04 (60 tests across 4 runs)
GitHub: https://github.com/SafetyLayer/safetylayer
Built SafetyLayer to find these systematically. Free assessments available.
Mooshux•4m ago