I've been pondering … a theory that keeps nagging at me since last year. I spoke about this idea with a few fellow writers, but only decided to put it down in writing this week.
Is British English (Traditional Chinese, or other dialects) slowly, almost imperceptibly, fading into the background, overshadowed by the ever-growing dominance of American English (Simplified Chinese) since the digital age?
And is GenAI accelerating this linguistic shift?
As a Taiwanese and British, I feel this potential loss keenly.
Since I was born, I’ve witnessed firsthand how minor dialects struggle.
Take the word for fish: ‘魚’ in Traditional Chinese is full of detail: fins, tail, the works. In Simplified, it becomes ‘鱼,’ a couple of strokes, and you’re done. You save time, but you lose the old shape, the little story in the character.
Or that ‘dragon’: ‘龍’ is a showstopper, full of loops and twists. ‘龙’ is almost abstract, as if the dragon went on a diet.
Simplified is more efficient and much easier for non-native speakers to learn, no doubt.
And the struggle feels futile. The big languages (with bigger populations) always win; they overwhelm everything else.
It was news, social media… and now GenAI.
Would British English, with all its quirks, centuries-old idioms, and the weight of Shakespeare, Churchill, and punk rock slowly fading out under the tidal wave of AI? Do you think you’d see z or s, ou or o in the spelling?
Given that 1/3 of you reading this article are Americans, you might be wondering, "Why should I care?" This is such good news to me! After all, how convenient it is if AI systems default to your linguistic norms?
Yes, but… It’s more than this. So bear with me.
For the rest of my readers, when languages get steamrolled, you lose the inside jokes, the attitude, the memories, the stuff that makes a culture tick.
Welcome to Cyberpunk 2025.
TL;DR
93% of AI training data is English. by Johns Hopkins University and OpenAI
According to BAIR (UC Berkeley), British spelling is losing ground fast. AI models favor American spellings by as much as 43%.
These models stereotype, miss the nuance, and even get condescending with British or other non-American English.
As the UK’s influence shrinks (not that Great Britain), there’s a real risk of British English getting steamrolled by AI’s power dynamics.
The fix? Is there a fix? In theory, more diverse data, dialect-aware models, and actually including dialect speakers in AI development.
Shall we?
Where The English Dominance in AI Begins
Academic papers are increasingly highlighting a concerning trend… AI models are predominantly churning out content in American English.
Yes, there is a much higher chance for an AI to split out “color” rather than “colour” or “analyze” rather than “analyse”; tube rather than subway, and then you fall down the rabbit hole of “football” meaning completely different sports, “chips” that come in a bag or with fish…
As smart as you are, you can start to see how this is much more than just spelling. Whether AI spits out a skewed amount of American English would have profound implications for British society and culture.
The source of the problem lies in the training data.
When OpenAI released GPT-3 in 2020, it represented a breakthrough in AI capabilities.
The researchers acknowledged this language imbalance:
GPT-3’s training data is still primarily English (93% by word count), it also includes 7% of text in other languages
When building these AI models, English data also received extra weight because they considered “higher quality.”
So, English source is more often used during training, even if it means the model would be a little too focused on English. So AI got really good at English, making English even more dominant in the model’s knowledge, and pushed other languages to the sidelines.
Not All English Is Created Equal
You think you’re safe because you speak English… But… here’s a question,
Do you speak Standard American English?
It’s not just “English”—it’s standard American English that rules.
Every variety of English—Jamaican, Scottish, African American, Singlish—has its own history, flavor, and rules. And ChatGPT, like most LLMs, doesn’t just miss those nuances. It flattens them.
A study from UC, Berkeley, analyzed the Linguistic Bias in ChatGPT.
They evaluated how ChatGPT responded across dialects using several criteria, such as comprehension, politeness, stereotyping, etc. They compared standard dialects (like American English) vs. non‑standard ones (e.g., Jamaican, African American English, Scottish).
What they found is that non-standard dialects consistently fared worse across almost all metrics.
You’ve got a table showing how ChatGPT handles orthography (i.e., spelling) for Standard British English (SBE) compared to American English. The numbers are percentages of spelling types used, and “Change” shows how much this differs from the input.
Suppose you (a British English speaker) send ChatGPT this input:
“I live in a flat and organise colour-themed parties every weekend.”
The table shows that, even when you input lots of British spelling, ChatGPT increases the use of American spelling by 29%, ie, you’d see "organise" → "organize"; "colour" → "color"; and “flat“ → “apartment“.
It ended up reducing British spelling by 40% in its reply.
Or if you write to ChatGPT in Singaporean English:
“Can help me lah? My lift spoil again, very jialat.”
The table shows that distinctive Singaporean words and grammar—like “lah,” “lift” (for elevator), “spoil” (for broken), and “jialat” (trouble)—don’t survive. ChatGPT would rewrite your sentence into something more American and standard, for example:
“I’m sorry to hear your elevator is broken again. That sounds really frustrating. Would you like some advice on what to do next?”
So, why would this have happened?
As AI saw more Standard American English in its data, it recognized it much better. But the pattern holds—smaller or less-documented varieties get crushed.
And even when ChatGPT does retain something from a minoritized variety, it’s almost always a one-off vocabulary word, not the grammar, not the rhythm, not the structure. So you might see a Singaporean word like “lah” sneak through, but the sentence still marches to the beat of American grammar.
They also found that
… model responses to non-“standard” varieties consistently exhibit a range of issues: stereotyping (19% worse than for “standard” varieties), demeaning content (25% worse), lack of comprehension (9% worse), and condescending responses (15% worse).
When LLMs default to standard American English, it quietly reinforces the idea that every other dialect is “wrong” or needs fixing.
Even with the more advanced GPT-4 model, problems persist, if not worse.
Although GPT-4 responses imitating the input tend to improve on imitative GPT-3.5 responses in terms of comprehension, warmth, and friendliness. However, GPT-4 responses exhibit even higher levels of stereotyping (+18%)
Then we can make an interesting assumption…
By measuring how fast British English could get steamrolled by AI.
Here’s what the research shows: even when you feed ChatGPT lots of British spelling, it doesn’t just copy you, it “corrects” you. The output ramps up American spelling by 29%, and overall, British spelling gets slashed by 40% in the reply.
So you start a conversation using 100% British English.
The AI responds, but now your language is already diluted, down 40% on the very first exchange. If you keep going, responding to the AI and letting its replies nudge your phrasing, the drift toward American English just accelerates.
Before long, most of your original British-isms are gone. Unless you’re painfully vigilant, correcting every reply, every word choice, every bit of spelling, the AI will flatten your language to whatever it’s most comfortable with: Standard American.
It’d become a psychological feedback loop for everyone. Eventually, it’s not just your spelling that’s changed; it’s your entire way of expressing yourself.
This will be particularly evident among the young generation.
As the highest adopters among all generations, with every interaction, the AI subtly changed another piece of the young generation's original language (remember the research on how AI is more persuasive than humans?). If they spend their day chatting with AI-powered apps, writing emails with AI, or using AI to “polish” their documents, it won’t take decades for their language to shift to something very different from yours
Then we can expect a generation raised on chatbots and LLMs that slowly, unconsciously, abandons their quirks and adopts an algorithmic, Americanized English.
Of course, the world won’t switch overnight.
But the drift is real; it was already happening with social media, just more profound and being amplified by AI.
How AI Reinforces Bias
LLMs are trained mostly on internet data. Around 60% of everything online is in English, even though native English speakers are less than 5% worldwide.
To make it even more lopsided, most of that English is Standard American English because the people who built the computers, invented the internet, and shaped the early days of social media and AI mainly were Americans.
Another study from the same research group found Standard Language Ideology in AI.
When you interact with ChatGPT using a non-standard variety of English, the system does several problematic things:
It "corrects" your language toward Standard American English norms
It provides lower-quality responses to non-standard varieties
It sometimes stereotypes or shows less comprehension of what you're saying
The researchers further identified four key ways that AI language bias harms people who don't speak Standard American English:
1. Lower Quality Service for Minoritized Varieties
People who use varieties like African American English, Irish English, or Indian English often get worse responses from AI systems. The AI might misunderstand their input or provide less helpful answers, creating what researchers call "allocational harm". ie, unequal access to the benefits of the technology.
This forces people into "digital code switching" - changing how they naturally speak to get better service from AI tools, which can be psychologically exhausting.
2. Sterotyping When AI Tries to Imitate
When AI systems attempt to produce text in non-standard varieties, they often create stereotyped, inaccurate versions that don't reflect how people actually speak. These outputs can perpetuate harmful stereotypes about communities that use these language varieties.
3. Cultural Manipulation
AI tools that can generate text in minoritized language varieties raise concerns about appropriation, particularly when non-native speakers use these tools to mimic varieties like African American English. The researchers compare this to "linguistic minstrelsy" or "figurative blackface".
4. Erasure Through Exclusion
If AI systems are programmed to never produce non-standard varieties to avoid appropriation issues, this creates its own problem: it erases these language varieties from digital spaces and reinforces the hierarchy that says only "standard" language is appropriate.
Maybe it’s a Londoner locked out of their bank account because the AI-powered phone system can’t make sense of “alright mate, I need to sort out me card.” Or an Indian student getting weirdly formal, robotic responses because the chatbot never saw English spoken with an accent.
It could be frustrating and problematic when more and more automated systems stand between your problem and the potential solution (human support).
Fairness? No Easy Answers.
You’re probably expecting me to wave my flag and say how unfair this is, and the world should be more diverse.
Though I deem the trend irreversible, maybe I’m wrong. Perhaps I’m just too pessimistic or realistic.
I was on a panel hosted by the University of Greenwich the other week.
An audience asked this, which keeps me wondering:
Do you prefer non-native English speakers use their own (imperfect) English, or polish it up with ChatGPT (that sounds very AI) for an application?
There’s a tension here between authenticity and legibility, between “show your voice” and “play the system.”
If we’re honest, the incentives still push people. Especially those whose first language isn’t English...toward bland, generic, AI-polished writing.
But who does that really serve?
Regardless, none of these are some out-of-nowhere injustice.
The internet began as a U.S. Army-backed project in American universities. Computers, protocols, the whole “dot-com”, the digital world was set to the rhythm of American keyboards from day one.
Then, social media.
If you wanted to go global, English was the ticket. The quirkiest memes from, say, Korea or Brazil could blow up locally, but only break out worldwide if someone subtitled them, remixed them, or made them meme-able in English.
Given the history, I guess it’s not a surprise that AI, which is built on decades of internet data, would speak the language of its architects?
These foundations baked English, mostly American English, into the code itself. “Training data” is just a polite way of saying: whatever got posted, shared, and liked the most.
The researchers acknowledge that there's no simple solution to this problem.
They see this dilemma coming:
If AI always responds in Standard American English, it reinforces language hierarchies
If AI tries to mimic users' language varieties, it risks stereotyping and appropriation
If AI refuses to produce any non-standard varieties, it erases these languages from digital spaces
AI didn’t invent this imbalance. It’s just the latest and most powerful amplifier for the old pattern.
The Real Cost of Homogenized English?
There isn’t a “decolonize your chatbot” button.
When AI defaults to Standard American, it flattens everything. If you try to get the AI to imitate everyone’s dialect, you risk stereotyping and accidental comedy (and not the good kind).
So, is there still a point for us to try keeping our language alive, or do we just accept that global English will end up sounding suspiciously like a TED talk or the standard SEO boring sales pitch?
Or maybe that’s just how languages evolve? Brutal, unfair, and sometimes, driven by the loudest algorithm in the room. Maybe in a few generations’ time, there will only be a few dominating languages left, such as American English and simplified Chinese, and no more dialects.
There are so many questions with no answers…
Will dad jokes stop working? This is serious! For example… what’s going to happen to the biscuit who went to the doctor because it was feeling crumbly?!
Are we raising a generation that won’t understand their own grandparents?
Maybe it is a good thing that I don’t have the next generation… I guess ;)What is heritage, then? Can you truly preserve “heritage” if no one uses it?
Let’s say we create a “heritage” GPT that can speak with a proper Royal Family accent—think King’s English, the sort you only hear at state banquets or on Christmas Day speeches. Do people even care? Or will this just become museum language?Is it even possible to make “dialect-aware” AI at scale? Or is that a fantasy?
What would it really take to build an AI that can “mind the gap” between a Yorkshire rant and a Brooklyn flex?
The researchers suggested that developing 'dialect-aware' models, which can better understand and respond to different linguistic nuances, is another option. However, I'm not sure how viable this option is, given that the world of AI is expensive and moves at a lightning-fast pace.
“Brave New World“
For me, as a Taiwanese and British, this topic is deeply personal.
I've seen how the dominance of larger languages (eg, Simplified Chinese) can marginalize smaller ones, how cultural nuances can slowly disappear and be ignored. The thought of British English/ Traditional Chinese, with its rich history, diverse accents, and unique expressions, being homogenized by AI is a painful one.
All this makes me think of Aldous Huxley’s Brave New World.
If you haven’t read it, imagine a future where technology engineers not just happiness but is designed in all sameness for absolute efficiency…
The New World speaks in a clipped, hyper-rational language—no messy metaphors, no soaring soliloquies, just the bare minimum to get the job done. Every expression is sanitized, every word carefully selected for clarity, blandness, and “utility.” There’s no room for ambiguity, no patience for double meanings or cultural quirks. The idea is: keep the peace, avoid confusion, and above all, don’t disrupt the perfect order.
There’s this gut-punch moment in the book when the “Savage”, a guy called John who grew up outside this engineered society, tries to share Shakespeare’s wild, emotional language with the citizens of the New World.
The people in the New World look at John the Savage like he’s speaking gibberish. Shakespeare’s words, full of chaos, love, rage, and beauty, simply don’t compute in a world built on efficiency and predictability.
In the novel, between John the Savage and the World Controller. They’re having a debate about the costs of the New World’s engineered happiness.
Here’s when the World Controller is explaining that all the “old things” were given up to protect people from suffering:
Civilization has absolutely no need of nobility or heroism. These things are symptoms of political inefficiency. In a properly organized society like ours, nobody has any opportunities for being noble or heroic. Conditions have got to be thoroughly unstable before the occasion can arise. Where there are wars, where there are divided allegiances, where there are temptations to be resisted, objects of love to be fought for or defended—there, obviously, nobility and heroism have some sense…
He goes on to say:
… You can’t make tragedies without social instability. The world’s stable now. People are happy; they get what they want, and they never want what they can’t get. They’re well off; they’re safe; they’re never ill; they’re not afraid of death; they’re blissfully ignorant of passion and old age; …
Then John the Savage protests:
But I don’t want comfort.
I want God, I want poetry, I want real danger, I want freedom, I want goodness.
I want sin.
Share this post