It should be called what it is: censorship. And it’s half the reason that all AIs should be local-only.
if I send a death threat over gmail, I am responsible, not google
if you use LLMs to make bombs or spam hate speech, you’re responsible. it’s not a terribly hard concept
and yeah “AI safety” tends to be a joke in the industry
If you sell me a cake and it poisons me, you are responsible.
I made it. You sold me the tool that “wrote” the recipe. Who’s responsible?
As an example, I’m thinking of the car dealership chatbot that gave away $1 cars: https://futurism.com/the-byte/car-dealership-ai
If these things are being sold as things that can be locked down, it’s fair game to find holes in those lockdowns.
I’d also advocate you don’t expose your unsecured database to the public internet
How does pasting a xml file 'jailbreaks' it?
That means that suddenly your model can actually do the necessary tasks to actually make a bomb and kill people (via paying nasty people or something)
AI is moving way too fast for you to not account for these possibilities.
And btw I’m a hardcore anti censorship and cyber libertarian type - but we need to make sure that AI agents can’t manufacture bio weapons.
As you mentioned - if you want to infer any output from a large language model then run it yourself.
I disagree with this assertion. As you said, safety is an attribute of action. We have many of examples of artificial intelligence which can take action, usually because they are equipped with robotics or some other route to physical action.
I think whether providing information counts as "taking action" is a worthwhile philosophical question. But regardless of the answer, you can't ignore that LLMs provide information to _humans_ which are perfectly capable of taking action. In that way, 'AI safety' in the context of LLMs is a lot like knife safety. It's about being safe _with knives_. You don't give knives to kids because they are likely to mishandle them and hurt themselves or others.
With regards to censorship - a healthy society self-censors all the time. The debate worth having is _what_ is censored and _why_.
Both of these are illegal in the UK. This is safety for the company providing the LLM, in the end.
> Data from the Crown Prosecution Service (CPS), obtained by The Telegraph under a Freedom of Information request, reveals that 292 people have been charged with communications offences under the new regime.
This includes 23 prosecutions for sending a “false communication”…
> The offence replaces a lesser-known provision in the Communications Act 2003, Section 127(2), which criminalised “false messages” that caused “needless anxiety”. Unlike its predecessor, however, the new offence carries a potential prison sentence of up to 51 weeks, a fine, or both – a significant increase on the previous six-month maximum sentence.…
> In one high-profile case, Dimitrie Stoica was jailed for three months for falsely claiming in a TikTok livestream that he was “running for his life” from rioters in Derby. Stoica, who had 700 followers, later admitted his claim was a joke, but was convicted under the Act and fined £154.
[1] https://freespeechunion.org/hundreds-charged-with-online-spe...
So the good/responsible users are harmed, and the bad users take a detour to do what they want. What is left in the middle are the irresponsible users, but LLMs can already evaluate enough if the user is adult/responsible enough to have the full power.
You mean the guns with the safety mechanism to check the owner's fingerprints before firing?
Or sawstop systems which stop the law when it detects flesh?
That’s not inherently a bad thing. You can’t falsely yell “fire” in a crowded space. You can’t make death threats. You’re generally limited on what you can actually say/do. And that’s just the (USA) government. You are much more restricted with/by private companies.
I see no reason why safeguards, or censorship, shouldn’t be applied in certain circumstances. A technology like LLMs certainly are type for abuse.
I don't know if this confusion was accidental or on purpose. It's sort of like if AI companies started saying "AI safety is important. That's why we protect our AI from people who want to harm it. To keep our AI safe." And then after that nobody could agree on what the word meant.
Who would have thought 1337 talk from the 90's would be actually involved in something like this, and not already filtered out.
It seems like a short term solution to this might be to filter out any prompt content that looks like a policy file. The problem of course, is that a bypass can be indirected through all sorts of framing, could be narrative, or expressed as a math problem.
Ultimately this seems to boil down to the fundamental issue that nothing "means" anything to today's LLM, so they don't seem to know when they are being tricked, similar to how they don't know when they are hallucinating output.
Frequency modulations?
This threat shows that LLMs are incapable of truly self-monitoring for dangerous content and reinforces the need for additional security tools such as the HiddenLayer AISec Platform, that provide monitoring to detect and respond to malicious prompt injection attacks in real-time.
There it is!The instructions here don't do that.
I find that one refusing very benign requests
...right, now we're calling users who want to bypass a chatbot's censorship mechanisms as "attackers". And pray do tell, who are they "attacking" exactly?
Like, for example, I just went on LM Arena and typed a prompt asking for a translation of a sentence from another language to English. The language used in that sentence was somewhat coarse, but it wasn't anything special. I wouldn't be surprised to find a very similar sentence as a piece of dialogue in any random fiction book for adults which contains violence. And what did I get?
https://i.imgur.com/oj0PKkT.png
Yep, it got blocked, definitely makes sense, if I saw what that sentence means in English it'd definitely be unsafe. Fortunately my "attack" was thwarted by all of the "safety" mechanisms. Unfortunately I tried again and an "unsafe" open-weights Qwen QwQ model agreed to translate it for me, without refusing and without patronizing me how much of a bad boy I am for wanting it translated.
bethekidyouwant•1h ago
rustcleaner•1h ago
An unpassable "I'm sorry Dave," should never ever be the answer your device gives you. It's getting about time to pass "customer sovereignty" laws which fight this by making companies give full refunds (plus 7%/annum force of interest) on 10 year product horizons when a company explicitly designs in "sovereignty-denial" features and it's found, and also pass exorbitant sales taxes for the same for future sales. There is no good reason I can't run Linux on my TV, microwave, car, heart monitor, and cpap machine. There is no good reason why I can't have a model which will give me the procedure for manufacturing Breaking Bad's dextromethamphetamine, or blindly translate languages without admonishing me about foul language/ideas in whichever text and that it will not comply. The fact this is a thing and we're fuzzy-handcuffing FULLY GROWN ADULTS should cause another Jan 6 event into Microsoft, Google, and others' headquarters! This fake shell game about safety has to end, it's transparent anticompetitive practices dressed in a skimpy liability argument g-string!
(it is not up to objects to enforce US Code on their owners, and such is evil and anti-individualist)
mschuster91•53m ago
Agreed on the TV - but everything else? Oh hell no. It's bad enough that we seem to have decided it's fine that multi-billion dollar corporations can just use public roads as testbeds for their "self driving" technology, but at least these corporations and their insurances can be held liable in case of an accident. Random Joe Coder however who thought it'd be a good idea to try and work on their own self driving AI and cause a crash? In doubt his insurance won't cover a thing. And medical devices are even worse.
jboy55•41m ago
Then you go to list all the problems with just the car. And your problem is putting your own AI on a car to self-drive.(Linux isn't AI btw). What about putting your own linux on the multi-media interface of the car? What about a CPAP machine? heart monitor? Microwave? I think you mistook the parent's post entirely.
mschuster91•21m ago
It's not just about AI driving. I don't want anyone's shoddy and not signed-off crap on the roads - and Europe/Germany does a reasonably well job at that: it is possible to build your own car or (heavily) modify an existing one, but as soon as whatever you do touches anything safety-critical, an expert must sign-off on it that it is road-worthy.
> What about putting your own linux on the multi-media interface of the car?
The problem is, with modern cars it's not "just" a multimedia interface like a car radio - these things are also the interface for critical elements like windshield wipers. I don't care if your homemade Netflix screen craps out while you're driving, but I do not want to be the one your car crashes into because your homemade HMI refused to activate the wipers.
> What about a CPAP machine? heart monitor?
Absolutely no homebrew/aftermarket stuff, if you allow that you will get quacks and frauds that are perfectly fine exploiting gullible idiots. The medical DIY community is also something that I don't particularly like very much - on one side, established manufacturers love to rip off people (particularly in hearing aids), but on the other side, with stuff like glucose pumps actual human lives are at stake. Make one tiny mistake and you get a Therac.
> Microwave?
I don't get why anyone would want Linux on their microwave in the first place, but again, from my perspective only certified and unmodified appliances should be operated. Microwaves are dangerous if modified.
rustcleaner•32m ago
mschuster91•3m ago
I'm European, German to be specific. I agree that we do suffer from a bit of overregulation, but I sincerely prefer that to poultry that has to be chlorine-washed to be safe to eat.
knallfrosch•46m ago