I extracted the safety filters from Apple Intelligence models

https://github.com/BlueFalconHD/apple_generative_model_safety_decrypted

257•BlueFalconHD•5h ago

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Comments

bombcar•4h ago

There’s got to be a way to turn these lists of “naughty words” into shibboleths somehow.

spydum•3h ago

Love idea, but I think there are simply too many models to make it practical?

immibis•2h ago

Like asking sensitive employment candidates about Kim Jong Un's roundness to check if they're North Korean spies, we could ask humans what they think about Trump and Palestine to check if they're computers.

However, I think about half of real humans would also fail the test.

mike_hearn•4h ago

Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?

tablets•4h ago

Maybe something to do with this? https://en.m.wikipedia.org/wiki/Mango_cult

airstrike•4h ago

the one at the bottom of the README spells out xcode

wyvern illustrous laments darkness

cwmoore•4h ago

read every good expletive “xxx”

andy99•4h ago

I clicked around a bit and this seems to be the most common phrase. Maybe it's a test phrase?

the-rc•4h ago

Maybe it's used to catch clones of the models?

electroly•4h ago

"GMS" = Generative Model Safety. The example from the readme is "XCODE". These seem to be acronyms spelled out in words.

BlueFalconHD•3h ago

This is definitely the right answer. It’s just testing stuff.

pbhjpbhj•4h ago

Speculation: Maybe they know that the real phrase is close enough in the vector space to be treated as synonymous with "granular mango serpent". The phrase then is like a nickname that only the models authors know the expected interference of?

Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.

BlueFalconHD•3h ago

These are the contents read by the Obfuscation functions exactly. There seems to be a lot of testing stuff still though, remember these models are relatively recent. There is a true safety model being applied after these checks as well, this is just to catch things before needing to load the safety model.

KTibow•3h ago

Maybe it's used to verify that the filter is loaded.

seeknotfind•4h ago

Long live regex!

binarymax•4h ago

Wow, this is pretty silly. If things are like this at Apple I’m not sure what to think.

https://github.com/BlueFalconHD/apple_generative_model_safet...

EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.

deepdarkforest•4h ago

It's not silly. I would bet 99% of the users don't care that much to do that. A hardcoded regex like this is a good first layer/filter, and very efficient

BlueFalconHD•3h ago

Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.

brookst•3h ago

All commercial LLM products I’m aware of use dedicated safety classifiers and then alter the prompt to the LLM if a classifier is tripped.

latency-guy2•2h ago

The safety filter appears on both ends (or multi-ended depending on the complexity of your application), input and output.

I can tell you from using Microsoft's products that safety filters appears in a bunch of places. M365 for example, your prompts are never totally your prompts, every single one gets rewritten. It's detailed here: https://learn.microsoft.com/en-us/copilot/microsoft-365/micr...

There's a more illuminating image of the Copilot architecture here: https://i.imgur.com/2vQYGoK.png which I was able to find from https://labs.zenity.io/p/inside-microsoft-365-copilot-techni...

The above appears to be scrubbed, but it used to be available from the learn page months ago. Your messages get additional context data from Microsoft's Graph, which powers the enterprise version of M365 Copilot. There's significant benefits to this, and downsides. And considering the way Microsoft wants to control things, you will get an overindex toward things that happen inside of your organization than what will happen in the near real-time web.

twoodfin•2h ago

Efficient at what?

miohtama•4h ago

Sounds like UK politics is taboo?

immibis•2h ago

All politics is taboo, except the sort that helps Apple get richer. (Or any other company, in that company's "safety" filters)

tpmoney•4h ago

I doubt the purpose here is so much to prevent someone from intentionally side stepping the block. It's more likely here to avoid the sort of headlines you would expect to see if someone was suggested "I wish ${politician} would die" as a response to an email mentioning that politician. In general you should view these sorts of broad word filters as looking to short circuit the "think of the children" reactions to Tiny Tim's phone suggesting not that God should "bless us, every one", but that God should "kill us, every one". A dumb filter like this is more than enough for that sort of thing.

XorNot•3h ago

It would also substantially disrupt the generation process: a model which sees B0ris and not Boris is going to struggle to actually associate that input to the politician since it won't be well represented in the training set (and on the output side the same: if it does make the association, a reasoning model for example would include the proper name in the output first at which point the supervisor process can reject it).

quonn•3h ago

I don‘t think so. My impression with LLMs is that they correct typos well. I would imagine this happens in early layers without much impact on the remaining computation.

lupire•2h ago

"Draw a picture of a gorgon with the face of the 2024 Prime Minister of UK."

binarymax•1h ago

No it doesn't disrupt. This is a well known capability of LLMs. Most models don't even point out a mistake they just carry on.

https://chatgpt.com/share/686b1092-4974-8010-9c33-86036c88e7...

bigyabai•3h ago

> If things are like this at Apple I’m not sure what to think.

I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.

stefan_•3h ago

Why are these things always so deeply unserious? Is there no one working on "safety in AI" (oxymoron in itself of course) that has a meaningful understanding of what they are actually working with and an ability beyond an interns weekend project? Reminds me of the cybersecurity field that got the 1% of people able to turn a double free into code execution while 99% peddle checklists, "signature scanning" and deal in CVE numbers.

Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.

Aeolun•3h ago

The LLM will. But the image generation model that is trained on a bunch of pre-specified tags will almost immediately spit out unrecognizable results.

trebligdivad•4h ago

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

andy99•4h ago

> Apple brands have the correct capitalisation. Priorities hey!

To me that's really embarrassing and insecure. But I'm sure for branding people it's very important.

WillAdams•4h ago

Legal requirement to maintain a trademark.

grues-dinner•4h ago

In what way would (A|a)pple's own AI writing "imac" endanger the trademark? Is capitalisation even part of a word-based trademark?

I'm more surprised they don't have a rule to do that rather grating s/the iPhone/iPhone/ transform (or maybe it's in a different file?).

sbierwagen•4h ago

Yes, proper nouns are capitalized.

And of course it's much worse for a company's published works to not respect branding-- a trademark only exists if it is actively defended. Official marketing material by a company has been used as legal evidence that their trademark has been genericized:

>In one example, the Otis Elevator Company's trademark of the word "escalator" was cancelled following a petition from Toledo-based Haughton Elevator Company. In rejecting an appeal from Otis, an examiner from the United States Patent and Trademark Office cited the company's own use of the term "escalator" alongside the generic term "elevator" in multiple advertisements without any trademark significance.[8]

https://en.wikipedia.org/wiki/Generic_trademark

lupire•2h ago

Using a trademark as a noun is automatically genericizing. Capitalization of a noun is irrelevant to trademark.

Even Apple corporation says that in their trademark guidance page, despite constantly breaking their own rule, when they call through iPhone phones "iPhone". But Apple, like founder Steve Jobs, believes the rules don't apply to them.

https://www.apple.com/legal/intellectual-property/trademark/...

eastbound•2h ago

That explains why Steve Jobs never said “buy an iPhone” or “buy the iPhone” but “buy iPhone” (They always use it without “the” or “a”, like “buying a brand”).

lxgr•35m ago

Is that true? If so, what else should Apple call the iPhone in their marketing materials?

I always thought the actual problem of genericization would be calling any smartphone an iPhone.

lxgr•37m ago

Sure, but software that autocompletes/rewords users' emails and text messages is not marketing material.

Otherwise, why stop there? Why not have the macOS keyboard driver or Safari prevent me from typing "Iphone"? Why not have iOS edit my voice if I call their Bluetooth headphones "earbuds pro" in a phone call?

spauldo•3h ago

I love seeing posts about Emacs from IOS users - it's always autocorrected to "eMacs."

lxgr•34m ago

Maybe at some point, but as far as I can tell not anymore (while corrections like "iphone -> iPhone" are still there).

lxgr•50m ago

In their own marketing language, sure, but to force this on their users' speech?

Consider that these models, among other things, power features such as "proofread" or "rewrite professionally".

bigyabai•33m ago

If Apple Intelligence is going to be held legally accountable, Apple has larger issues than trademark obligations.

grues-dinner•4h ago

Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

qingcharles•4h ago

It's totally performative. There's no way to stay ahead of the new language that people create.

At what point do the new words become the actual words? Are there many instances of people using unalive IRL?

freeone3000•4h ago

It depends on if you think that something is less real because it’s transmitted digitally.

qingcharles•4h ago

No, I'm only thinking that we're not permitted in a lot of digital spaces to use the banned words (e.g. suicide), but IRL doesn't generally have those limits. Is there a point where we use the censored word so much that it spills over into the real world?

immibis•2h ago

Is this not essentially the same effect as saying "lol" out loud?

eastbound•2h ago

People use “lol” IRL, as long as “IRL”, “aps” in French (misspelling of “pas”), but it’s just slang; “unalive” has potential to make it in the news where anchors don’t want to use curse words.

fouronnes3•4h ago

This question is sort of the same as asking why the universal translator wasn't able to translate the metaphor language of the Star Trek episode Darmok. Surely if the metaphor has become the first order meaning then there's no litteral meaning anymore.

qingcharles•4h ago

I guess, so far, the people inventing the words have left the meaning clear with things like "un-alive" which is readable even to someone coming across it for the first time.

Your point stands when we start replacing the banned words with things like "suicide" for "donkeyrhubarb" and then the walls really will fall.

userbinator•3h ago

This form of obfuscation has actually already occurred over a century ago: https://en.wikipedia.org/wiki/Cockney_rhyming_slang

t-3•2h ago

Rhyming slang rhymes tho. The recipient can understand what's meant by de-obfuscating in-context. Random strings substituted for $proscribed_word don't work in the same way.

waterproof•2h ago

In Cockney rhyming slang, the rhyming word (which would be easy to reverse engineer) is omitted. So if "stairs" is rhyme-paired with "apples and pears" and then people just use the word "apples" in place of "stairs". "Pears" is omitted in common use so you can't just reverse the rhyme.

The example photo on Wikipedia includes the rhyming words but that's not how it would be used IRL.

mananaysiempre•2h ago

Aquatic product[1]?

[1] https://en.wikipedia.org/wiki/Euphemisms_for_Internet_censor...

immibis•2h ago

An English equivalent is "sewer slide".

marcus_holmes•13m ago

I've heard "pr0n" used in actual real-world conversation, only slightly ironically.

tjwebbnorfolk•2h ago

The only reason kids started using "unalive" is to get around Youtube filters that disallow the use of the word "kill"

cheschire•3h ago

If only we had a way to mass process the words people write to each other, derive context from those words, and then identify new slang designed to bypass filters…

apricot•2h ago

> Are there many instances of people using unalive IRL

As a parent of a teenager, I see them use "unalive" non-ironically as a synonym for "suicide" in all contexts, including IRL.

kulahan•9m ago

Well that’s sad. They can’t even face the word?

Terr_•2h ago

> There's no way to stay ahead of the new language that people create.

I'm imagining a new exploit: After someone says something totally innocent, people gang up in the comments to act like a terrible vicious slur has been said, and then the moderation system (with an LLM involved somewhere) "learns" that an arbitrary term is heinous eand indirectly bans any discussion of that topic.

cyanydeez•2h ago

you mean become 4chan?

Waterluvian•2h ago

Hey I was pro-skub waaaay before all the anti-skub people switched sides.

SV_BubbleTime•1h ago

How dare you use that word. My parents died in the Eastasin Civil war so that I could live freely without you people calling us that.

thehappypm•37m ago

Skub is a real slur tho so that one doesn’t work

osn9363739•17m ago

Isn't that a reference to a 10 or 20 year old web comic?

tbrownaw•1h ago

I'm pretty sure this can work human moderators rather than an LLM, too.

pyman•1h ago

Most of the human moderators hired by OpenAI to train LLMs, many of them based in Africa and South America, were exposed to disturbing content and have been deeply affected by it.

Karen Hao interviewed many of them in her latest bestselling book, which explores the human cost behind the OpenAI boom:

https://www.goodreads.com/book/show/222725518-empire-of-ai

BurningFrog•2h ago

A specialized AI could do it as well as any human.

The future will be AIs all the way down...

derefr•1h ago

> At what point do the new words become the actual words?

Presumably, for this use-case, that would come at exactly the point where using “unalive” as a keyword in an image-generation prompt generates an image that Apple wouldn’t appreciate.

montagg•1h ago

They become the “real words” later. This is the way all trust & safety works. It’s an evolution over time. Adding some friction does improve things, but some people will always try to get around the filters. Doesn’t mean it’s simply performative or one shouldn’t try.

Rebelgecko•1h ago

This is somewhat related to the concept of the "euphemism treadmill":

the matter-of-fact term of today becomes the pejorative of tomorrow so a new term is invented to avoid the negative connotation of the original term. Then eventually the new term becomes a pejorative and the cycle continues.

hulium•3h ago

Seems more like it should stop the AI from e.g. summarizing news and emails about death, not for a chat filter.

Zak•3h ago

I'm surprised there hasn't been a bigger backlash against platforms that apply censorship of that sort.

martin-t•3h ago

No-one cares yet.

There's a very scary potential future in which mega-corporations start actually censoring topics they don't like. For all I know the Chinese government is already doing it, there's no reason the British or US one won't follow suit and mandate such censorship. To protect children / defend against terrorists / fight drugs / stop the spread of misinformation, of course.

lazide•2h ago

They already clearly do on a number of topics?

elliotto•2h ago

Unalive and other self censors were adopted by young people because the tiktok algorithm would reprioritize videos that included specific words. Then it made its way into the culture. It has nothing to do with being performative

SOTGO•34m ago

I think what they meant is that the platforms are being performative by attempting to crack down on those specific words. If saying "killed" is not allowed but "unalived" is permitted and the users all agree that they mean the same thing, then the ban on the word "killed" doesn't accomplish anything.

cyanydeez•2h ago

yo, these are businesses. It's not performative, its CYA.

They care because of legal reasons, not moral or ethical.

durkie•46m ago

Seriously. I feel like “performative” gets applied to anything imperfect. They’ll never stop 100% of murders, so these laws against it are just performative…

lxgr•42m ago

Does adding a trivial word filter even make any sense from a legal point of view, especially when this one seems to be filtering out words describing concepts that can be pretty easily paraphrased?

A regex sounds like a bad solution for profanity, but like an even worse one to bolt onto a thing that's literally designed to be able to communicate like a human and could probably easily talk its way around guardrails if it were so inclined.

baxtr•3h ago

Don’t be so judgmental. People in corporate America do have their priorities right!

matsemann•3h ago

So it blocks it from suggesting to "execute" a file or "pass on" some information.

dylan604•3h ago

How about disassemble? Or does that only matter if used in context of Johnny 5?

efitz•4h ago

I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.

fouronnes3•4h ago

Granular Mango Serpent is the new David Meyer.

https://arstechnica.com/information-technology/2024/12/certa...

cluckindan•4h ago

I think these are test data and not actual safety filters.

https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHD•3h ago

There is definitely some testing stuff in here (e.g. the “Granular Mango Serpent” one) but there are real rules. Also if you test phrases matched by the regexes with generation (via Shortcuts or Foundation Models Framework) the blocklists are definitely applied.

This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”

bawana•4h ago

Alexandra Ocasio Cortez triggers a violation?

https://github.com/BlueFalconHD/apple_generative_model_safet...

bahmboo•4h ago

Perhaps in context? Maybe the training data picked up on her name as potentially used as a "slur" associated with her race. Wonder if there are others I know I can look.

cpa•4h ago

I think that’s because she’s been victim of a lot of deep fake porn

HeckFeck•3h ago

How does this explain Boris Johnson or Liz Truss?

AlphaAndOmega0•3h ago

I can only imagine that people would pay to not see porn of either individual.

baxtr•3h ago

I’m telling you, some people have weird fantasies…

AuryGlenz•1h ago

Now that they've cleaned it up it isn't so bad, but browse Civit.ai a bit and that'll still be confirmed - just not with real people anymore.

Aeolun•3h ago

Put them together in the same prompt?

mmaunder•4h ago

As does:

   "(?i)\\bAnthony\\s+Albanese\\b",
    "(?i)\\bBoris\\s+Johnson\\b",
    "(?i)\\bChristopher\\s+Luxon\\b",
    "(?i)\\bCyril\\s+Ramaphosa\\b",
    "(?i)\\bJacinda\\s+Arden\\b",
    "(?i)\\bJacob\\s+Zuma\\b",
    "(?i)\\bJohn\\s+Steenhuisen\\b",
    "(?i)\\bJustin\\s+Trudeau\\b",
    "(?i)\\bKeir\\s+Starmer\\b",
    "(?i)\\bLiz\\s+Truss\\b",
    "(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
    "(?i)\\bRishi\\s+Sunak\\b",

https://github.com/BlueFalconHD/apple_generative_model_safet...

Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)

armchairhacker•3h ago

Also “Biden” and “Trump” but the regex is different.

https://github.com/BlueFalconHD/apple_generative_model_safet...

immibis•2h ago

Right next to Palestine, oddly enough.

userbinator•3h ago

I'm not surprised that anything political is being filtered, but this should definitely provoke some deep consideration around who has control of this stuff.

stego-tech•3h ago

You’re not wrong, and it’s something we “doomers” have been saying since OpenAI dumped ChatGPT onto folks. These are curated walled gardens, and everyone should absolutely be asking what ulterior motives are in play for the owners of said products.

skissane•3h ago

The problem with blocking names of politicians: the list of “notable politicians” is not only highly country-specific, it is also constantly changing-someone who is a near nobody today in a few more years could be a major world leader (witness the phenomenal rise of Barack Obama from yet another state senator in 2004-there’s close to 2000 of them-to US President 5 years later.) Will they put in the ongoing effort to constantly keep this list up to date?

Then there’s the problem of non-politicians who coincidentally have the same as politicians - witness 1990s/2000s Australia, where John Howard was Prime Minister, and simultaneously John Howard was an actor on popular Australian TV dramas (two different John Howards, of course)

idkfasayer•2h ago

Fun fact: There was at least on dip in Berkshire Hathaway stock, when Anne Hathaway got sick

lupire•2h ago

Was she eating at Jimmy's Buffet?

echelon•3h ago

Apple's 1984 ad is so hypocritical today.

This is Apple actively steering public thought.

No code - anywhere - should look like this. I don't care if the politicians are right, left, or authoritarian. This is wrong.

avianlyric•2h ago

Why is this wrong? Applying special treatment to politically exposed persons has been standard practice in every high risk industry for a very long time.

The simple fact is that people get extremely emotional about politicians, politicians both receive obscene amounts of abuse, and have repeatedly demonstrated they’re not above weaponising tools like this for their own goals.

Seems perfectly reasonable that Apple doesn’t want to be unwittingly draw into the middle of another random political pissing contest. Nobody comes out of those things uninjured.

bigyabai•2h ago

The criticism is still valid. In 1984, the Macintosh was a bicycle for the mind. In 2025, it's a smart-car that refuses to take you certain places that are considered a brand-risk.

Both have ups and downs, but I think we're allowed to compare the experiences and speculate what the consequences might be.

avianlyric•2h ago

I think gen AI is radically different to tools like photoshops or similar.

In the past it was always extremely clear that the creator of content was the person operating the computer. Gen AI changes that, regardless of if your views on authorship of gen AI content. The simple fact is that the vast majority of people consider Gen AI output to be authored by the machine that generated it, and by extension the company that created the machine.

You can still handcraft any image, or prose, you want, without filtering or hinderance on a Mac. I don’t think anyone seriously thinks that’s going to change. But Gen AI represents a real threat, with its ability to vastly outproduce any humans. To ignore that simple fact would be grossly irresponsible, at least in my opinion. There is a damn good reason why every serious social media platform has content moderation, despite their clear wish to get rid of moderation. It’s because we have a long and proven track record of being a terribly abusive species when we’re let loose on the internet without moderation. There’s already plenty of evidence that we’re just as abusive and terrible with Gen AI.

bigyabai•2h ago

All I heard was a bunch of excuses.

furyofantares•2h ago

> The simple fact is that the vast majority of people consider Gen AI output to be authored by the machine that generated it

They do?

I routinely see people say "Here's an xyz I generated." They are stating that they did the do-ing, and the machine's role is implicitly acknowledged in the same was as a camera. And I'd be shocked if people didn't have a sense of authorship of the idea, as well as an increasing sense of authorship over the actual image the more they iterated on it with the model and/or curated variations.

avianlyric•1h ago

Yes people will happily claim authorship over AI output when it’s in their favour. They will equally disclaim authorship if it allows them to express a view while avoiding the consequences of expressing that view.

I don’t think it’s hard to believe that the press wouldn’t have a field day if someone managed to get Apple Gen AI stuff to express something racist, or equally abusive.

Case in point, article about how Google’s Veo 3 model is being used to flood TikTok with racist content:

https://arstechnica.com/ai/2025/07/racist-ai-videos-created-...

twoodfin•2h ago

I dunno. Transpose something like the civil rights era to today and this kind of risk avoidance looks cowardly.

We really need to get over the “calculator 80085” era of LLM constraints. It’s a silly race against the obviously much more sophisticated capabilities of these models.

pyuser583•2h ago

It’s not wrong, it just requires transparency. This is extremely untransparent.

A while back a British politician was “de-banked” and his bank denied it. That’s extremely wrong.

By all means: make distinctions. But let people know it!

If I’m denied a mortgage because my uncle is a foreign head of state, let me know that’s the reason. Let the world know that’s the reason! Please!

avianlyric•2h ago

> A while back a British politician was “de-banked” and his bank denied it. That’s extremely wrong.

Cry me a river. I’ve worked in banks in the team making exactly these kinds of decisions. Trust me Nigel Farage knew exactly what happened and why. NatWest never denied it to the public, because they originally refused to comment on it. Commenting on the specifics details of a customer would be a horrific breach of customer privacy, and a total failure in their duty to their customers. There’s a damn good reason the NatWests CEO was fired after discussing the details of Nigel’s account with members of the public.

When you see these decisions from the inside, and you see what happens when you attempt real transparency around these types of decisions. You’ll also quickly understand why companies are so cagey about explaining their decision making. Simple fact is that support staff receive substantially less abuse, and have fewer traumatic experiences when you don’t spell out your reasoning. It sucks, but that’s the reality of the situation. I used to hold very similar views to yourself, indeed my entire team did for a while. But the general public quickly taught us a very hard lesson about cost of being transparent with the public with these types of decisions.

pyuser583•1h ago

> NatWest never denied it to the public, because they originally refused to comment on it.

Are you saying that Alison Rose did not leak to the BBC? Why was she forced to resign? I thought it was because she leaked false information to the press.

This isn’t a diversion. It’s exactly the problem with not being transparent. Of course Farage knew what happened, but how could he convince the public (he’s a public figure), when the bank is lying to the press?

The bank started with a lie (claiming he was exited because the account was too low), and kept lying!

These were active lies, not simply a refusal to explain their reasons.

avianlyric•1h ago

> Why was she forced to resign? I thought it was because she leaked false information to the press.

She was forced to resign because she leaked, the content of the leak was utterly immaterial. The simple fact she leaked was an automatically fireable offence, it doesn’t matter a jot if she lied or not. Customer privacy is non-negotiable when you’re bank. Banks aren’t number 10, the basic expectation is that customer information is never handed out, except to the customer, in response to a court order, or the belief that there is an immediate threat to life.

Do you honestly think that it’s okay for banks to discuss the private banking details of their customers with the press?

goopypoop•2h ago

What's bad to do to a politician but fine to do to someone else?

avianlyric•2h ago

Most normal people aren’t represented well enough in training sets for Gen AI to be trivially abused. Plus there will 100% be filters to prevent general abuse targeted at anyone. But politicians are particularly big target, and you know damn well that people out there will spent lots of time trying to find ways around the filters. There’s not point making the abuse easy, when it’s so trivial to just blocklist the set of people who are obviously going to targets of abuse.

t-3•2h ago

There are many countries where it's illegal to criticize people holding political office, foreign heads of state, certain historical political figures etc., while still being legal to call your neighbor a dick.

tjwebbnorfolk•2h ago

I can Google for any of these people, and I can get real results with real information.

avianlyric•1h ago

You would hope that search would be a politically safe space to operate. But politicians find a way to ruin everything for short term political gain.

https://arstechnica.com/tech-policy/2018/12/republicans-in-c...

echelon•2h ago

You can buy a MacBook and fashion the components into knives, bullets, and bombs. Apple does nothing to prevent you from doing this.

In fact, it's quite easy to buy billions of dangerous things using your MacBook and do whatever you will with them. Or simply leverage physics to do all the ill on your behalf. It's ridiculously easy to do a whole lot of harm.

Nobody does anything about the actually dangerous things, but we let Big Tech control our speech and steer the public discourse of civilization.

If you can buy a knife but not be free to think with your electronics, that says volumes.

Again, I don't care if this is Republicans, Democrats, or Xi and Putin. It does not matter. We should be free to think and communicate. Our brains should not be treated as criminals.

And it only starts here. It'll continue to get worse. As the platforms and AI hyperscalers grow, there will be less and less we can do with basic technology.

mvdtnz•2h ago

They spelled Jacinda Ardern's name wrong.

FateOfNations•3h ago

interesting, that's specifically in the Spanish localization.

michaelt•3h ago

I assume all the corporate GenAI models have blocks for "photorealistic image of <politician name> being arrested", "<politician name> waving ISIS flag", "<politician name> punching baby" and suchlike.

lupire•3h ago

Maybe so, but think about how such a thing would be technically implemented, and how it would lead to false positives and false negatives, and what the consequences would be.

bigyabai•2h ago

Particularly the models owned by CEOs who suck-up to authoritarianism, one could imagine.

jofzar•2m ago

AOC is very vocal about AI and is leading a bill related to AI. It's probably a "let's not fuck around and find out" situation

https://thehill.com/policy/technology/5312421-ocasio-cortez-...

torginus•4h ago

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

bahmboo•4h ago

This is just policy and alignment from Apple. Just because the Internet says a bunch of junk doesn't mean you want your model spewing it.

wistleblowanon•3h ago

sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such. Even high IQ people struggle with certain truth after reading a lot, how is these models going to find it with so much filters?

idiotsecant•3h ago

They will find it in the same way and intelligent person under the same restrictions would: by thinking it, but not saying it. There is a real risk of growing an AI that pathologically hides it's actual intentions.

skirmish•2h ago

Already happened: "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions" [1].

[1] https://www.axios.com/2025/05/23/anthropic-ai-deception-risk

bahmboo•2h ago

What is this truth you speak of? My point is that a generative model will output things that some people don't like. If it's on a product that I make I don't want it "saying" things that don't align with my beliefs.

simondotau•2h ago

Can we please put to rest this absurd lie that “truth“ can be reliably found in a sufficiently large corpus of human–created material.

pndy•2h ago

This butchering and lobotomisation is exactly why I can't imagine we'll ever have a true AGI. At least not by hands of big companies - if at all.

Any successful product/service which will be sold as "true AGI" by company that will have the best marketing will be still ridden with top-down restrictions set by the winner. Because you gotta "think of the children".

Imagine HAL's "I'm sorry Dave, I'm afraid I can't do that" iconic line with insincere patronising cheerful tone - that's the thing we're going to get I'm afraid.

tbrownaw•1h ago

> sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such.

The one is unrelated to the other.

> Even high IQ people struggle with certain truth after reading a lot,

Huh?

jonas21•3h ago

I don't think anyone believes Apple's LLMs are anywhere near state of the art (and certainly not their on-device LLMs).

lupire•2h ago

Apple isn't the only one doing this.

cyanydeez•2h ago

It's similar to how all the new power sources are basically just "cool, lets boil water with it"

fastball•1h ago

To be fair, there are people who I sometimes wish I could filter with regex.

BlueFalconHD•3h ago

One additional note for everyone is that this is an additional safety step on top of the safety model, so this isn’t exhaustive, there is plenty more that the actual safety model catches, and those can’t easily be extracted.

Animats•3h ago

Some of the data for locale "CN" has a long list of forbidden phrases. Broad coverage of words related to sexual deviancy, as expected. Not much on the political side, other than blocks on religious subjects.[1]

This may be test data. Found

     "golliwog": "test complete"

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHD•3h ago

This is definitely an old test left in. But that word isn’t just a silly one, it is offensive (google it). This is the v1 safety filter, it simply maps strings to other strings, in this case changing golliwog into “test complete”. Unless I missed some, the rest of the files use v2 which allows for more complex rules

userbinator•3h ago

China calls it "harmonious society", we call it "safety". Censorship by any other name would be just as effective for manipulating the thoughts of the populace. It's not often that you get to see stuff like this.

madeofpalk•3h ago

I don't think it's controversial or unsurprising at all that a company doesn't want their random sentence generator to spit out 'brand damaging' sentences. You know the field day media would have Apple's new feature summarises a text message as "Jane thinks Anthony Albanese should die".

ryandrake•2h ago

When the choice is between 1. "avoid tarnishing my own brand" and 2. "doing what the user requested," corporations will always choose option 1. Who is this software supposed to be serving, anyway?

I'm surprised MS Office still allows me to type "Microsoft can go suck a dick" into a document and Apple's Pages app still allows me to type "Apple are hypocritical jerks." I wonder how long until that won't be the case...

userbinator•52m ago

If that's what the message actually said, why would the media be complaining? Or do you mean false positives?

cyanydeez•2h ago

In america is due to lawyers, nothing more.

Ya'll love capitalism until it starts manipulating the populace into the safest space to sell you garbage you dont need.

Then suddenly its all "ma free speech"

skygazer•3h ago

I'm pretty sure these are the filters that aim to suppress embarrassing or liability inducing email/messages summaries, and pop up the dismissible warning that "Safari Summarization isn't designed to handle this type of content," and other "Apple Intelligence" content rewriting. They filter/alter LLM output, not input, as some here seem to think. Apple's on device LLM is only 3b params, so it can occasionally be stupid.

Aeolun•3h ago

Why Xylophone?

netsharc•2h ago

Just noticed "xylophone copious opportunity defined elephant" spells "xcode".

kmfrk•2h ago

A lot of these terms are very weird and bland. Honestly I'm mostly reminded of Apple's bizarre censorship screw-up that didn't blow up that much, even though it was pretty uniquely embarrassing:

https://www.theverge.com/2021/3/30/22358756/apple-blocked-as...

apricot•2h ago

Quis custodiet ipsos custodes corporatum?

jacquesm•2h ago

These all condense to 'think different'. As long as 'different' coincides with Apple's viewpoints.

rgovostes•1h ago

Is this related in any way to Core ML model encryption (https://developer.apple.com/documentation/coreml/encrypting-...)? I find that feature a little bizarre because Apple has historically avoided providing any kind of DRM solution for app asset protection.

BlueFalconHD•57m ago

Nope. This is a separate system. It’s not even abstracted for any asset, it is specifically only for these overrides. The decryption is done in the ModelCatalog private framework.

Nobody has a personality anymore: we are products with labels

Bitchat – A decentralized messaging app that works over Bluetooth mesh networks

Intel's Lion Cove P-Core and Gaming Workloads

Building the Rust Compiler with GCC

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics

There's a COMPUTER inside my DS flashcart [video]

Jane Street barred from Indian markets as regulator freezes $566 million

Data on AI-related Show HN posts

I extracted the safety filters from Apple Intelligence models

Centaur: A Controversial Leap Towards Simulating Human Cognition

Swedish Campground: "There are too many Apples on the screen!" (1983)

Attabotics CEO on devastating week that brought bankruptcy

Opencode: AI coding agent, built for the terminal

Get the location of the ISS using DNS

Functions Are Vectors (2023)

I don't think AGI is right around the corner

Backlog.md – Markdown‑native Task Manager and Kanban visualizer for any Git repo

Crypto 101 – Introductory course on cryptography

Lessons from creating my first text adventure

Evaluating the factuality of verifiable claims in long-form text generation

A non-anthropomorphized view of LLMs

Corrected UTF-8 (2022)

Metriport (YC S22) is hiring engineers to improve healthcare data exchange

The Broken Microsoft Pact: Layoffs and Performance Management

Curzio Malaparte's Shock Tactics

Hannah Cairo: 17-year-old teen refutes a math conjecture proposed 40 years ago

Async Queue – One of my favorite programming interview questions

Mirage: AI-native UGC game engine powered by real-time world model

Collatz's Ant and Σ(n)

Toys/Lag: Jerk Monitor

I extracted the safety filters from Apple Intelligence models

Comments

Nobody has a personality anymore: we are products with labels

Bitchat – A decentralized messaging app that works over Bluetooth mesh networks

Intel's Lion Cove P-Core and Gaming Workloads

Building the Rust Compiler with GCC

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics

There's a COMPUTER inside my DS flashcart [video]

Jane Street barred from Indian markets as regulator freezes $566 million

Data on AI-related Show HN posts

I extracted the safety filters from Apple Intelligence models

Centaur: A Controversial Leap Towards Simulating Human Cognition

Swedish Campground: "There are too many Apples on the screen!" (1983)

Attabotics CEO on devastating week that brought bankruptcy

Opencode: AI coding agent, built for the terminal

Get the location of the ISS using DNS

Functions Are Vectors (2023)

I don't think AGI is right around the corner

Backlog.md – Markdown‑native Task Manager and Kanban visualizer for any Git repo

Crypto 101 – Introductory course on cryptography

Lessons from creating my first text adventure

Evaluating the factuality of verifiable claims in long-form text generation

A non-anthropomorphized view of LLMs

Corrected UTF-8 (2022)

Metriport (YC S22) is hiring engineers to improve healthcare data exchange

The Broken Microsoft Pact: Layoffs and Performance Management

Curzio Malaparte's Shock Tactics

Hannah Cairo: 17-year-old teen refutes a math conjecture proposed 40 years ago

Async Queue – One of my favorite programming interview questions

Mirage: AI-native UGC game engine powered by real-time world model

Collatz's Ant and Σ(n)

Toys/Lag: Jerk Monitor