As in, the way languages are structured is different. Some are more precise, some are less, the information density per syllable is different, etc.
So besides just pure performance due to differences in training data, I’m curious if there’s some fundamental difference in the way LLMs interact with data in different languages even if end information is the same. Because even just in English, phrasing slightly different can yield different results.
Edit: would be interesting to see the “thinking” of the model done in different languages. Is identical problem thought about more or less the same, or does agent go on different train of thought depending on the language it is thinking in?
Using similar words should land you in similar places in the latent space, even if they actual word or their order is slightly different. Where it gets interesting is how well English words map to their counterparts in other languages, and what practical differences it makes. From various studies, it seems that the gravitational pull of English language/culture training data is substantial, but an LLM can switch cultures and values when prompted in different languages.
curioussquirrel•1h ago
Some notable insights:
- GPT-5 is strong at text normalization and translation but regressed on content generation vs GPT-4o. Chinese outputs had spacing/punctuation issues, Polish read like "translationese" even with no source text.
- Gemini 2.5 Pro scored 4.56/5 on Kinyarwanda. In our first study (late 2024), no model could produce coherent text in that language.
- Top LLMs outscored humans working under realistic constraints (time-limited, single pass, no QA). Humans didn't rank 1st in any language. (We're now planning a follow-up to zoom in on that.)
- Tokenizer efficiency matters again: reasoning models burn 5-10x more tokens thinking. Claude Sonnet 4.5 encodes Tamil at 1.19 chars/token vs Gemini's 4.24 — ~3.5x cost difference for the same output. There has been a lot of talk about the Opus 4.7 tokenizer, this is the same issue, just in multilingual setting.
If you find the study useful and want to help us convince the execs to keep funding this, a signup on the landing page goes a long way: https://www.rws.com/artificial-intelligence/train-ai-data-se...
Happy to answer questions!
curioussquirrel•1h ago
- Gemini 3 Pro is a multilingual monster.
- GPT-5.4 is a really good translation model, big improvements over previous subversions in the 5 family.
- Opus 4.6 is good but usually third place.
- Somehow, Grok 4.20 is surprisingly good at some long-tail languages? Its performance profile is really odd. Unlike all the other models.
EDIT: layout