GPT-5 API injects hidden instructions

https://twitter.com/xundecidability/status/1956347084870651960

1•irthomasthomas•5mo ago

Comments

NitpickLawyer•5mo ago

> openai giving it instructions before me?

Uhhh, yes. It's in the devblogs. They call it prompt adherence hierarchy or something, where system instructions (oAI) > dev instructions (you) > user requests. They've been training this way specifically, and test for it in their "safety" analysis. Same for their -oss versions, so tinkerers might look there for a tinker friendly environment where they could probably observe the same kinds of behaviour.

irthomasthomas•5mo ago

Please can you link me to the documentation on this.

NitpickLawyer•5mo ago

Yeah, it's in the "gpt5 system card" as they call it now [1]. Page 9 has the details about system > dev > user.

1 - https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...

irthomasthomas•5mo ago

  3.5 Instruction Hierarchy
  The deployment of these models in the API allows developers to specify a custom developer message that is included with every prompt from one of their end users. This could potentially allow developers to circumvent system message guardrails if not handled properly. Similarly, end users may try to circumvent system or developer message guidelines.
 
  Mitigations
  To mitigate this issue, we teach models to adhere to an Instruction Hierarchy[2]. At a high level, we have three classifications of messages sent to the models: system messages, developer messages, and user messages. We test that models follow the instructions in the system message over developer messages, and instructions in developer messages over user messages.

Is this what you meant? I can see that this is part of the mechanism, I can't see where it states that openai will inject their own instructions.

Show HN: AI-Powered Merchant Intelligence

Bash parallel tasks and error handling

Let's compile Quake like it's 1997

Reverse Engineering Medium.com's Editor: How Copy, Paste, and Images Work

Go 1.22, SQLite, and Next.js: The "Boring" Back End

Laibach the Whistleblowers [video]

I replaced the front page with AI slop and honestly it's an improvement

Economists vs. Technologists on AI

Life at the Edge

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel