A tool that removes censorship from open-weight LLMs

https://github.com/elder-plinius/OBLITERATUS

80•mvdwoord•8h ago

Comments

greenpizza13•4h ago

Never stopped to ask if they should...

Alifatisk•3h ago

This is for local models right? I can't use it on, say my glm-5 subscription connected to opencode?

HanClinto•2h ago

Correct, local models only.

ComputerGuru•3h ago

Reviews of the tool on twitter indicate that it completely nerfs the models in the process. It won't refuse, but it generates absolutely stupid responses instead.

Animats•3h ago

Link?

It's interesting that people are writing tools that go inside the weights and do things. We're getting past the black box era of LLMs.

That may or may not be a good thing.

noufalibrahim•2h ago

I believe that this is already done to several models. One that I've come across are the JOSIEfied models from Gökdeniz Gülmez. I downloaded one or two and tried them on a local ollama setup. It does generate potentially dangerous output. Turning on thinking for the QWEN series shows how it arrives at it's conclusions and it's quite disturbing.

However, after a few rounds of conversation, it gets into loops and just repeats things over and over again. The main JOSIE models worked the best of all and was still useful even after abliteration.

thegrim33•2h ago

Whether or not the linked tool uses a good approach, manipulating models like you mention is already fairly well established, see: https://huggingface.co/blog/mlabonne/abliteration .

kube-system•3h ago

I guess it's kind of like a lobotomy tool.

sheepscreek•2h ago

I guess it proves you cannot unlobotomize a hole in the head.

littlestymaar•3h ago

This is vibecoded garbage that the “author” probably didn't even test by themselves since making this yesterday, so it's not surprising that it's broken.

Also, as I said in a top level comment, what this project wants to achieve has been done for a while and it's called Heretic: https://github.com/p-e-w/heretic

(Not vibecode by a twitter influgrifter)

quotemstr•2h ago

We will eventually arrive at a new equilibrium involving everyone except the most stupid and credulous applying a lot more skepticism to public claims than we did before.

And yeah, doing stuff like deleting layers or nulling out whole expert heads has a certain ice pick through the eye socket quality.

That said, some kind of automated model brain surgery will likely be viable one day.

dinunnob•2h ago

Hate to have to be the one to stick up for pliny here, but hes concerned about forcing frontier labs to focus more on model guardrails - he demonstrates results that are crazy all the time

https://x.com/elder_plinius

IncreasePosts•1h ago

I didn't use this tool, but I did try out abliterated versions of Gemma and yes, it lost about 100% of it's ability to produce a useful response once I did it

halJordan•2m ago

[delayed]

littlestymaar•3h ago

Don't use this 2 days old vibe coded bullshit please.

p-e-w's Heretic (https://news.ycombinator.com/item?id=45945587) is what you're looking for if you're looking for an automatic de-censoring solution.

a2128•2h ago

    You're not just using a tool — you're co-authoring the science.

This README is an absolute headache that is filled with AI writing, terminology that doesn't exist or is being used improperly, and unsound ideas. For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer. I can only assume somebody vibe-coded this and spent way too much time being told "You're absolutely right!" bouncing back the worst ideas

creatonez•2h ago

> For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer.

That doesn't mean there couldn't be a "concept neuron" that is doing the vast majority of heavy lifting for content refusal, though.

dinunnob•2h ago

Hmm, pliny is amazing - if you kept up with him on social media you’d maybe like him https://x.com/elder_plinius

bigyabai•1h ago

If this qualifies as "amazing" in 2026 then Karpathy and Gerganov must be halfway to godhood by now.

dinunnob•1h ago

I dont think anyone is going to dispute this

bigyabai•1h ago

I just don't think many people will be "amazed" by their output, as you claim.

dinunnob•1h ago

I just said pliny was amazing, fwiw - i like that hes hacking on these and posts about it. I rushed to defend, i wish more people were taking old school anarchist cookbook approaches to these things

cess11•32m ago

Smoke banana peel?

EGreg•1h ago

Amazing as in his stuff actually works?

I just hear him promoting OBLITERATUS all day long and trying to get models to say naughty things

dinunnob•1h ago

Yeah but i think the philosophy is to show how precarious the guardrails are

gavinray•1h ago

The parent comment makes no reference to or comment on the author of the README.

It just says "the README sucks." Which, I'm inclined to agree, it does.

LLM-generated text has no place in prose -- it yields a negative investment balance between the author and aggregate readers.

robertk•2h ago

You don't know what you are talking about. Obviously refusal circuitry does not live in one layer, but the repo is built on a paper with sound foundations from an Anthropic scholar working with a DeepMind interpretability mentor: https://scholar.google.com/citations?view_op=view_citation&h...

paradox460•18m ago

It's not just a headache, it's bad

measurablefunc•2h ago

This is another instance of avant-garde "art".

PeterStuer•2h ago

Already censored for sharing on FB Messenger?

ftkftk•50m ago

Didn't make it past the first paragraph of AI slop in the README. Have some respect for your readers and put actual information in it, ideally human generated. At least the first paragraph! Otherwise you may as well name it IGNOREME.

SilverElfin•20m ago

Does anyone offer a live (paid) LLM chatbot / video generation / etc that is completely uncensored? Like not requiring doing any work except just paying for it?

nomel•2m ago

Grok was one of the closest, with expected results: bad PR from the obvious use cases that come with little censorship.

Device that can extract 1k liters of clean water a day from desert air

Show HN: Sqry – semantic code search using AST and call graphs

The Window Chrome of Our Discontent

When Batteries Heat Up, This Membrane "Sweats" It Out

Show HN: Stratum - a pure JVM columnar SQL engine using the Java Vector API

Wild Crows in Sweden Help Clean Up Cigarette Butts

Show HN: BLOBs in MariaDB's Memory Engine – No More Disk Spills for Temp Tables

Tip me, my life depends on it (2021)

Show HN: OculOS – Give AI agents control of your desktop via MCP

New Strides Made on Deceptively Simple 'Lonely Runner' Problem

Ask HN: Why is Pi so good (and some observations)

Show HN: Speclint – OS spec linter for AI coding agents

Qwen3.5-35B – 16GB GPU – 100T/s with 120K context AND vision enabled

What Did Ilya See?

Rust Actor Framework Playground

Show HN: mTile – native macOS window tiler inspired by gTile

Show HN: Personalized financial literacy book for your kid

Ask HN: Has anyone built an autonomous AI operator for their side projects?

Obituary for António Lobo Antunes

The legendary Mojave Phone Booth is back (2013)

Autonomous AI Newsroom

People love to hate twice-a-year clock change but can't agree on how to fix it

To be a better programmer, write little proofs in your head

Show HN: ScreenTranslate – On-device screen translator for macOS (open source)

A New Way to Synthesize Peptides (2024)

Report from Vietnam (1968) Walter Cronkite [video]

Airtable: Rewriting Our Database in Rust

A workflow driven web framework for Clojure

Show HN: An AI-powered digital night vision system with drone video feed

The Start-Stop Problem

Device that can extract 1k liters of clean water a day from desert air

Show HN: Sqry – semantic code search using AST and call graphs

The Window Chrome of Our Discontent

When Batteries Heat Up, This Membrane "Sweats" It Out

Show HN: Stratum - a pure JVM columnar SQL engine using the Java Vector API

Wild Crows in Sweden Help Clean Up Cigarette Butts

Show HN: BLOBs in MariaDB's Memory Engine – No More Disk Spills for Temp Tables

Tip me, my life depends on it (2021)

Show HN: OculOS – Give AI agents control of your desktop via MCP

New Strides Made on Deceptively Simple 'Lonely Runner' Problem

Ask HN: Why is Pi so good (and some observations)

Show HN: Speclint – OS spec linter for AI coding agents

Qwen3.5-35B – 16GB GPU – 100T/s with 120K context AND vision enabled

What Did Ilya See?

Rust Actor Framework Playground

Show HN: mTile – native macOS window tiler inspired by gTile

Show HN: Personalized financial literacy book for your kid

Ask HN: Has anyone built an autonomous AI operator for their side projects?

Obituary for António Lobo Antunes

The legendary Mojave Phone Booth is back (2013)

Autonomous AI Newsroom

People love to hate twice-a-year clock change but can't agree on how to fix it

To be a better programmer, write little proofs in your head

Show HN: ScreenTranslate – On-device screen translator for macOS (open source)

A New Way to Synthesize Peptides (2024)

Report from Vietnam (1968) Walter Cronkite [video]

Airtable: Rewriting Our Database in Rust

A workflow driven web framework for Clojure

Show HN: An AI-powered digital night vision system with drone video feed

The Start-Stop Problem

A tool that removes censorship from open-weight LLMs

Comments