I can't immediately quantify such phenomenon, but it feels so to me that they tend to be more noun rich with preference for longer and academic terms, than making heavy use of conjugations and series of idiomatic expressions with a tempo.
When models suggest edits, they’re not offering insight — they’re offering what’s safest, most average, most familiar to the dominant culture. And that’s often Western, white, male-coded language that reads as “neutral” because it’s historically overrepresented in training data and platform norms.
This isn’t just about grammar or clarity. It’s about whose voice gets flattened and whose story gets smoothed out until it sounds like a TED Talk.
We should stop thinking of AI as neutral by default. The bias isn’t a bug — it’s baked into the system of reinforcement learning and feedback loops that reward comfort over challenge, safety over truth, sameness over difference.
Anyone here doing work to counteract this? How do you keep LLMs from deradicalizing or deracializing your writing?
I just want to make sure others agree and it wasn’t just me (or perhaps non-Americans in general)—it was blindingly obvious this would be, must be, the case, right? That although this might be the first formal study of it, there would have been literally no doubts as to what the outcome of such a study might be? That at least some degree of language homogenisation will be quite inescapable if you do LLMs the way we have?
On the cultural aspects, it’s well-documented and -understood what effects US TV and movies have had on other countries. There really isn’t anything new about LLMs or AI here, it’s just standard globalisation effects.
(I also just now learned what a crazy term “Global South” is <https://en.m.wikipedia.org/wiki/Global_North_and_Global_Sout...>, and how it does not mean at all what I thought it meant or what any sane person would expect. Was it not enough that “Western” bears no strong correlation to geography, that we need more terms that utterly abuse geographical references when they’re actually about socioeconomic characteristics? Apparently I have moved south by migrating from Australia to India.)
If that had continued, combined with where actual users are, perhaps it would have broadened English instead?
Yeah, I we understand "south" to mean "closer to the equator". (That's kind of how it works in the popular imagination. E.g., southern Brazil is more "nordic" than northern Brazil.)
It absolutely does not. South means, well, South.
> E.g., southern Brazil is more "nordic" than northern Brazil.)
How so? Southern Brazil is clearly closer to the South pole than Nothern Brazil.
Argentine is 97% European descent. Where is non-white countries like Japan and South Korea are in the Global North.
It isn't about race/skin color. The "Third World" was well suitable until Soviet Block (Second World) collapsed with most of its components going either into the First/Western World or into the Third World thus resulting in the world being partitioned just into 2 large parts with Russia+Belarus being the distant small 3rd, so small that it is just easier to count them into the First/Western style developed world especially considering that they both moved into capitalism losing that major trait - socialism - of the Second World. (Though 30+ years later Russia does more and more drift toward the former Third World, i.e Global South).
My point wasn't about the grouping itself though, just the term global south effectively being "worse" than the previous terms. For the northern hemisphere, where most people live, south = darker skin.
Use third world country if you want, but at least in my mind, the terms "first/second/third world" are more tied to which hegemony you fall under, where the first world is the US hegemony, and the third world is without a hegemon. "First world" is kinda synonymous with "the west". To me, the term "global south" communicates that it's being contrasted against all of "the north", both east and west, while using the term "third world" communicates that the context makes a distinction between "the west" ("first world") and the old eastern bloc ("second world") somehow relevant. Some "second world" countries such as China are also part of the "global south".
I'm sure there are people who use the term "global south" simply because they perceive it to be less judgemental somehow, but I might've used it in this context because "third world" communicates something different.
Honestly though, I would've probably just used the term "non-western" (maybe "non-first-world"?), since that's the distinction the article actually draws. Eastern Europe is also affected here, after all. Or maybe I would've drawn the distinction at "US" and "non-US", since Europeans don't necessarily want their writing to sound American either and the old US hegemony seems to be on its last legs.
Anything AI writes is dull anyway, it writes stuff that nobody wants read beyond getting some information. Maybe if you are learning English you may pick up something from it though.
Also, I recall something about AI English actually being Nigerian English because those companies used a lot of Nigerians in training.
There are lots of blindingly obvious qualitative statements, where the quantitative parts is far from obvious. That makes them a good starting point for research.
This is like writing about Newton's theory of gravity as "scientist finds out apples fall downwards. Wasn't that already obvious?".
Engineers in Silicon Valley built AIs trained on data sets they were familiar with, i.e. from that part of the Internet they themselves interact with.
Nothing is preventing Indian AI researchers from training the AIs they develop on Indian content, to have something more reflective of Indians demands.
Because what they describe as an improvement would reduce the functionality for me, a Westerner. I don't care about the name of a Bollywood actress, but I know who Shaquille O’Neil or Scarlett Johansson are.
On the upside, there would be very little Western bias in an AI trained exclusively on Hindi or Tamil content, quite the opposite.
Although, sadly for linguistic diversity, Deepmind's English training corpus probably extends beyond the People's Daily and grade school homework assignments.
Please explain that one to me, because every time I've heard it used it seems to amount to "I have a question," which to me is confusing.
We used to have "doubt-solving sessions" in coaching centres. Everytime one of the students would ask "Sir, I have a doubt" I would always snigger within that the student was insinuating something sinister or nefarious about the instructor's character. I always found it hilarious.
But that's just how English is used in India.
This one is problematic when used with non-Indians.
When an Indian says they have a doubt, they mean “I have a question and seek clarification on one point”. Someone not familiar with this Indian English idiosyncrasy will instead interpret it as “I’m not convinced that what you’re saying is true”, potentially even casting aspersions on your integrity. The question that follows will normally clear things up enough that it’s not disastrous, but it will still tend to leave a bad taste in the hearer’s mouth. It took me quite some time to really get used to it.
I imagine Spanish speakers will have no problem either.
When I, a Pole, conversed with a German in English we were saying 5pm instead of 17:00. Even though it's not a natural way of telling time in both of our native languages, just because that's the custom of the nation the language of we were using.
I don't do it unless the other party uses 24-hour time though. I wish everyone did because people often omit AM/PM and then you have to ask for clarification, where as "eighteen-hundred" is always obviously 6PM and can't be confused with 6AM.
Beyond a certain point, there is simply no return on investment to sounding more and more like your target audience. You need some basics, but beyond that, if you are valued by your target audience, they will listen, take you seriously, accept your legitimacy as a fellow person whose wisdom is worth knowing even if the content or style diverges from the schema.
If they cannot accept your offer in good faith, then I would disengage rather than push past that.
Let's press pause and note the deliberate decision in using AI to help you with your "favorite food & holidays".
A better remedy to the problem invented by this article, is to recommend against using any AI for writing about your personal experiences, values and preferences.
As an Australian man, my favourite food is [shrimp on the barbie mate] Thai noodles. You see? If my AI model was drenched in Australiana culture, it will come up with stereotypes I don't subscribe to.
AI slop reads unnatural even in English due to its lack of variance. And it heavily leaks into all other languages, even Ancient Greek.
Smaller languages have suffered from the dominance of English long before AI. Most of the content in Reddit, X, or any internet platform really, is in English. All new tech is, at least initially, only in English. English language, and the culture of those who produce the English language content, dominates the world now. Especially when it comes to commercial culture. With government grants, etc. smaller languages can be propped up to some degree, but how about creating a massive block buster movie in Estonian language? Forget about it.
> Most of the content in Reddit, X, or any internet platform really, is in English. All new tech is, at least initially, only in English.
Content advertised into your timeline. Not content in general. Twitter had been like only 35% English, Bluesky is 30% Brazilian or something. Only Reddit is like actually >80% English because those other languages has other dominant platforms.
You don't see stats like "xyz is 99% English" because every Chinese guys speak unaccented American English, it's because WWW statistics are based on and reference counts, rather than by wgeting random IP, and they start from an English URL, so discovery ends where anglosphere ends.
It's not like Chinese contents actually occupy >85% of everything, just that English is not the 99.999%, but still. "American English won the great game, Earth 999.999% English" is just a collective hallucination.
I'm using NewPipe instead of the official UI, so I know for a fact this stuff isn't being advertised at me. I pick my own feeds, and all the best content is English-based.
> Put differently, English-language videos comprised just 17% of the videos that were published by popular channels during the week, but they received 28% of all of the ...
or [1]: > Of the top 250 YouTube channels, 66% of the content is in English, 15% in Spanish, 7% in Portuguese, ...
There's quite a distance between "all the best content in STEM field and on YouTube is in English" and "all the content is in English" and a lot of qualifiers gets in there.0: https://www.pewresearch.org/internet/2019/07/25/popular-yout...
1: https://en.wikipedia.org/wiki/Languages_used_on_the_Internet
At an individual level people have always been doing it, now with automation it's not surprising that a study finds it happening collectively. That's why I don't see much good in these tools. They strip writing of personality, subjectivity, unique perspective, and they just seem to diminish the capacity of people to use their own minds.
Why didn't the study use something like Grammarly, which has awareness of American English, British English, Canadian English, Australian English, and Indian English?
I should clarify that I get the point. Like it's still useful to study how an American-English-biased model affects writers of a different dialect, but being able to see what it does when it can switch dialects would be way more useful and still be able to convey the same point that models specialized to a dialect will affect writing outside that dialect.
Offer more than just 1 master version that everyone must share.
Improve training processes to not overwhelm the regional expression and reasoning with synthetic/curated data of just one culture.
Hire annotators and data entry services across a whole multitude of countries that cover a varied array of cultures, styles, languages, etc.
At least the things above should counteract the effects somewhat.
hedora•9mo ago
simonw•9mo ago
"For example, when participants were asked to write about their favorite food or holiday, AI consistently suggested American favorites, pizza and Christmas, respectively."
I'm confident that a system prompt saying "the user lives in India and writes about Indian culture" would prevent the above problem from occurring. Whether it actually output useful cultural suggestions is another, more interesting question but I very much doubt Christmas and pizza would show up by default.