While this is really neat work, it's not entirely accurate to describe this as a base model or unaligned. Instruct training does two things, broadly speaking: it teaches the model about this "assistant" character and how the assistant tends to respond, and it gives the model a strong prior that all prompts are part of just such a user/assistant conversation. GPT-OSS is notable both because the latter effect is incredibly strong (leading many to suspect that its training was very heavy on synthetic data) and because the assistant character it learned is especially sanctimonious.
This finetune seems to work by removing that default assumption that every prompt is a user/assistant chat, but the model still knows everything it was taught about the assistant persona, so inputs that remind it of a user/assistant chat will still tend to elicit the same responses as before.
NitpickLawyer•2h ago
This is not a base model, and it's not "extracted" anything. That's not how any of it works. This is a finetune that someone made, and used all the wrong names for it.
fallpeak•2h ago
This finetune seems to work by removing that default assumption that every prompt is a user/assistant chat, but the model still knows everything it was taught about the assistant persona, so inputs that remind it of a user/assistant chat will still tend to elicit the same responses as before.