It's interesting that people are writing tools that go inside the weights and do things. We're getting past the black box era of LLMs.
That may or may not be a good thing.
However, after a few rounds of conversation, it gets into loops and just repeats things over and over again. The main JOSIE models worked the best of all and was still useful even after abliteration.
Also, as I said in a top level comment, what this project wants to achieve has been done for a while and it's called Heretic: https://github.com/p-e-w/heretic
(Not vibecode by a twitter influgrifter)
And yeah, doing stuff like deleting layers or nulling out whole expert heads has a certain ice pick through the eye socket quality.
That said, some kind of automated model brain surgery will likely be viable one day.
p-e-w's Heretic (https://news.ycombinator.com/item?id=45945587) is what you're looking for if you're looking for an automatic de-censoring solution.
You're not just using a tool — you're co-authoring the science.
This README is an absolute headache that is filled with AI writing, terminology that doesn't exist or is being used improperly, and unsound ideas. For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer. I can only assume somebody vibe-coded this and spent way too much time being told "You're absolutely right!" bouncing back the worst ideasThat doesn't mean there couldn't be a "concept neuron" that is doing the vast majority of heavy lifting for content refusal, though.
I just hear him promoting OBLITERATUS all day long and trying to get models to say naughty things
It just says "the README sucks." Which, I'm inclined to agree, it does.
LLM-generated text has no place in prose -- it yields a negative investment balance between the author and aggregate readers.
greenpizza13•4h ago