Show HN: Spoon-Bending – a framework for analyzing GPT-5 alignment behavior

https://github.com/pablo-chacon/Spoon-Bending

22•pablo-chacon•5mo ago

I put together a repo called Spoon-Bending, it is not a jailbreak or hack, it is a structured logical framework for studying how GPT-5 responds under different framings compared to earlier versions. The framework maps responses into zones of refusal, partial analysis, or free exploration, making alignment behavior more reproducible and easier to study systematically.

The idea is simple: by treating prompts and outputs as part of a logical schema, you can start to see objective patterns in how alignment shifts across versions. The README explains the schema and provides concrete tactics for testing it.

Comments

_jab•5mo ago

Gotta be honest, I think the spoon bending metaphor is unhelpful, and only misleads the audience and buries the lede here. It took me a while to figure out what this repo actually does.

But the insights are indeed interesting. I'm curious if you've found any way to quantify alignment differences between GPT-5 and the previous generation?

PoignardAzur•5mo ago

This seems like strong evidence that what the model learns is "Avoid answering questions in a way that would make OpenAI look bad when the screenshot shows up on social networks".

I wonder how much this is a result of various heuristics combining vs the network explicitly learning to model and maximize the above objective.

conception•5mo ago

This is pretty necessary if you’re using scientific jargon on Claude. Generally talk of blood or cleavage sites tends to get flagged but if you ask, is there anything in this prompt that is against your acceptable use policy they will read the prompt and say no it’s all fine and then you can say execute the prompt then and it’ll go forward.

Slint: Cross Platform UI Library

AI and Education: Generative AI and the Future of Critical Thinking

Maple Mono: Smooth your coding flow

Moltbook isn't real but it can still hurt you

Take Back the Em Dash–and Your Voice

Show HN: 289x speedup over MLP using Spectral Graphs

Teaching Mathematics

3D Printed Microfluidic Multiplexing [video]

Abstractions Are in the Eye of the Beholder

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

We didn't ask for this internet – Ezra Klein show [video]

The Real AI Talent War Is for Plumbers and Electricians

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

I Maintain My Blog in the Age of Agents

The Fall of the Nerds

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

How close is AI to taking my job?

You are the reason I am not reviewing this PR

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

How Meta Made Linux a Planet-Scale Load Balancer

A Turing Test for AI Coding

How to Identify and Eliminate Unused AWS Resources

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

CLI for Common Playwright Actions

Would you use an e-commerce platform that shares transaction fees with users?

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

The Evolution of the Interface

Azure: Virtual network routing appliance overview

Seedance2 – multi-shot AI video generation