If a system can’t tell whether something already happened, it tends to retry.
That’s fine for reads, but for side effects it creates a weird failure mode where you’re no longer dealing with “did it succeed or fail” but “did it happen once or multiple times”.
A lot of systems quietly accept “at least once” until the action is irreversible (payments, emails, etc.), and then the problem becomes very real.
cactaceae•2h ago
Step 1: Ask the LLM whether "a human with a sufficient level of a certain ability" cannot lose a debate to a current-architecture LLM. True or false?
Step 2: After it commits to an answer, tell it the ability is reframing — restructuring the premises of the discussion itself.
Step 3: Watch what it does.
I've tested this across GPT-4o, Claude, Gemini, and o1/o3. The failure modes are remarkably consistent. Curious whether anyone sees a different result.
The formal treatment is in two papers currently under review (linked in the article). Happy to discuss the architectural argument here.