How?
I think I've maybe occasionally seen "translit." in text used to mark that the following is transliterated, but I could see that being easily glossed over.
Korean -> English makes more sense.
For example, you wouldn't think twice about it if for the Japanese word for washing machine, you not only saw "洗濯機" (which is how it's written in Kanji), but also "sentakuki" or "sentakki" in the search results, because even to non-Japanese speakers it's pretty clear that that's probably the Japanese word for washing machine written with latin character transliteration, and pretty much exactly what you'd say.
With Korean, it looks more jarring, as the input method is apparently very different, and seems to map the keys for unrelated latin letters to Hangul letters? (I have no idea, I don't know anything about Hangul other than it's based on syllables, kind of like Hiragana/Katakana, and apparently very logical.)
More or less, yes. Each Hangul character represents a syllable, and is composed of two or more components (jamo) representing individual phonemes (like vowels or consonants) which make up the syllable. The keys on a Korean keyboard are mapped to those jamo.
Further details: https://en.wikipedia.org/wiki/Korean_language_and_computers
It is probably more like bopomofo keyboard for Chinese
For example, instead of typing “buzhidao” to get 不知道, you just type “bzd” and pick the top suggestion. Since all the phonetic endings are gone, it does look a little cryptic, but it means if you don’t have a pinyin keyboard, you can still type something fast that is highly correlated with your actual phrase.
For example when you’re searching a movie title on your SmartTV; teenage mutant ninja turtles (similarly abbreviated tmnt) becomes rzsg; some Chinese search tools will pick up on this; whether through statistics, fuzzy matching or specific 简拼 (jiǎnpīn) support, I don’t know.
BTW, this happens all the time in Korea, because it's extremely common for someone to type something while forgetting to switch to the correct input method. Try these, for example:
추ㅜ
gozjsbtm
elwmsl
vkdlTjsyou can also swear in a comedic way by just typing the Hangul sequence in Latin e.g. tlqkf
Hah, this comment is the top result when I searched with StartPage. There are a bunch of Korean results though.
https://trends.google.com/trends/explore?date=all&q=frqnce&h...
You'll notice it peaks every northern hemisphere summer. On French keyboards, Q and A are reversed compared to US keyboards, and every summer, millions of French people go on vacation, and start Google searching for things back home on unfamiliar keyboards.
It declines with the rise of the smartphone, as they're bringing their keyboards with them.
Why it suddenly spikes in the last few years, I don't know.
Haven’t finished the article yet but this jumped out at me. This doesn’t ring true to me. Google runs an extortion scheme - since you can buy ads on your competitors’ trademarks, and since no users can tell ads from results (and since the organic results are now buried so far, they rarely get clicks anyway) if you don’t buy your brand keywords your competitors will get all your traffic.
As others have said, keyboard mismatches are common enough that Google might have built out logic for it specifically. But thats not necessary and even “old school” search engines could learn these things.
The first time “alemwjsl” is searched you might not have any data, but the user will probably fix their keyboard and retype in Korean. That gives you a query correction mapping. And you can assume if query1 yields no clicks and they update to query2, q1 is a synonym for q2 and serve results for q2 instead.
Then, if a session contains a query “alemwjsl” and a click on midjourney.com and another session “midj” also contains a click on midjourney.com, those are co-clicked queries.
You can also even start to represent queries by the words in their associated clicked documents or vice versa. This helps to get around the fact that people might search “how much superbowl tickets” and “superbowl tickets price” but the official page might not contain either of those strings.
Of course there’s more advanced methods now (neural nets) but it’s cool to see how it worked in the past.
Also, for people that don’t use bilingual keyboards this is a pretty interesting finding.
I've got nothing to add there that people haven't already been saying - this was a fascinating quirk of humanity and technology. Really good full-circle adventure uncovering the source.
I'm commenting because I have to know what you're doing with your website and blog. It looks like a markdown/obsidian/static site generator. It's gorgeous and amazing. Did you write it yourself? Is it open source software?
yorwba•5d ago
Keyboard layout mismatches are common enough that I assume Google has a layout detection stage hardcoded just like they have typo correction hardcoded. And the creators of said algorithms probably understand very well how they work. (The naïve way would be to convert from every possible layout to every other layout, but I think you could build something more lightweight using Hidden Markov Models.)
alisonkisk•4h ago
nelsondev•1h ago
rhet0rica•39m ago