The error rate goes up to 1 in 66 for 256KB (in memory only);
When you need to store your dictionary in under 1 byte per word, a trie won't cut it.
I once built a spell checker plus corrector which had to run in 32kB under a DOS hotkey, interacting with some word processor. On top of that, it had to run from CD ROM, and respond within a second. I could do 4 lookups, in blocks of 8kB, which gave me the option to look up the word in normal order, in reverse order, and a phonetic transcription in both directions. Each 8kB block contained quite a few words, can't remember how many. Then counting the similarities, and returning them as a sorted list. It wasn't perfect, but worked reasonably well.
[1] Adding that for professional spell checking you'd need at least 100k lemmata plus all inflections plus information per word if you have to accept compounds/agglutination.
Not as much detail as the blog.codingconfessions.com article mentioned above, maybe some of the other/later techniques were added later on?
Link to the online version of the 1985 May Programming Pearls column: https://dl.acm.org/doi/10.1145/3532.315102
The PDF version of that article: https://dl.acm.org/doi/pdf/10.1145/3532.315102
Should have used it on his spell-correct article.
Even Microsoft Word, being a local app and everything, manages to work better than Google’s cloud-based offerings. That’s surely evidence that progress is far from being linear.
I have a spelling checker
It came with my PC
It highlights for my review
Mistakes I cannot sea.
I ran this poem thru it
I'm sure your pleased to no
Its letter perfect in it's weigh
My checker told me sew.
https://www.thoughtco.com/spell-checker-poem-by-mark-eckman-...It seems that anything that helps people gets this reaction these days. On the one hand, the argument 100% resonates with me. On the other hand, spelling isn't really the end, is it? It's just a means to an end, so what's wrong with making the mean easier? Did people worry that you'd stop knowing how to plant potatoes when trading was invented? EDIT: The example doesn't make sense because agriculture is newer than trading, but you got the idea.
https://www.washingtonpost.com/archive/opinions/1985/06/02/t...
People pushed back on the grammar checks when they landed in Word.
Before that, people pushed back on calculators in secondary schools. This was a huge point of contention all classes except trigonometry, and calculators were definitely not allowed in the SAT/ACT.
Word’s grammar checker has improved quite a lot. But I absolutely hate the style checker and its useless advice. Yes, I know how the passive voice works and yes, it is appropriate in this sentence. Also, it’s not really a problem in English but Word still can’t do spaces properly so it wants to put normal spaces everywhere and it’s fucking ugly. I wish it would spend as much time fixing inappropriate breaking spaces (in English as well).
I'd argue that where writing spelling is something completely arbitrary and thus of no fundamental importance. Arithmetic is the same way, lots of algorithms to do that and they are all valid. So calculators and spell checkers are fine. And you should use them.
The same is not true for grammar. Getting AI to write an essay for you.
Perhaps partly because most schoolkids then wouldn't have been using word processors as their main writing tool at school and people using them in a corporate environment were pleased not to make embarrassing errors in their emails.
I'd argue that negative people where correct. People can't spell anymore, not even with a spellchecker. Maybe they never could? I'm not against spellcheckers, I think they are amazing, but they haven't helped much.
However, also sounds weird, but I recall myself and some of my peers questioning spellcheckers, "Why do I need this?", because spelling was a primary mission of our education. We were all raised constantly being tested on spelling. In fact, I think I disabled the spellchecker on my old-ass 286 because it caused delays in the overall experience.
I actually consider spellcheck to have improved my spelling dramatically over the years. The little red squiggles under words have helped me to recognize my misspellings, especially the words that are hard for me to get right consistently.
Same went for using MacWord vs AppleWorks. MacWord had a built in dictionary, AppleWorks didn't.
I failed that interview by overengineering.
Almost. You needed to clarify what the interviewer was asking and discover requirements. As much as HN likes to hate on coding interviews requiring specific algorithm knowledge, determining requirements is very much part of the job, and engineers have a tendency to build what they want to build, not what the customer wants.
Writing a spell checker that quickly identifies if a word is in a list of valid words (the problem described in the article) is a trivial problem for anyone who has basic algorithms and data structure knowledge. It's the classic example for using a trie: https://en.wikipedia.org/wiki/Trie
The problem described in the article is doing it within very limited storage space. How do you store your list of 200K words on a system with only 256K of memory? This is the challenging part.
Your hard disk is almost always larger than your RAM. You only load into memory what's needed at the moment. I hope that gives a hint on how to proceed with the above problem.
In my opinion, this is where ML/AL local model, no internet required, would be the most beneficial today.
Even had to use a search engine with, "thoughts and opi" because I forgot how to spell opinion before posting this. In application spell checker was 100% useless with assisting me.
Instead of how LLMs operate by taking the current text and taking the most likely next token, you take your full text and use an LLM to find the likeliness/rank of each token. I'd imagine this creates a heatmap that shows which parts are the most 'surprising'.
You wouldn't catch all misspelling, but it could be very useful information to find what flows and what doesn't - or perhaps explicitly go looking for something out of the norm to capture attention.
I constantly type "form" instead of "from" for example and spelling checkers don't help at all. Even a simple LLM could easily notice out of place words like that. And LLMs also could easily go further and do grammar and style checking.
This should also be pretty cheap (just one pass through the LLM).
I often find myself butchering the spelling of a word in a way where the correct answer is obvious to human eyes (probably because of "typoglycemia" [1]) and an AI LLM immediately understands what I meant to say, but Apple's spellcheck has "No Guesses Found."
Does anyone else have this experience?
E.g.
> No entries for "typoglycemia", did you mean "hypoglycemia"?
Actually, come to think of it, the problem must be a bit easier than on smartphones, right? Real keyboard input is very precise. Smartphone keyboards already guess what word you were trying to spell, so they are influencing the typos in the direction of likely words… cannibalizing the very guess list that the dictionary uses!
That said, trying to use long press on iOS (or whatever it actually is), is one of those places that often drives me nuts. I don't know if the issue is a specific app or the OS or what but sometimes I want the popup menu to appear and I can't get it to appear. Or I do something to make it appear but it doesn't appear for x hundred milliseconds, during which I think it didn't get my gesture so I start a new one, just as it's finally responding in which case my new gesture dismisses it. Repeat 3-4 times before I'm ready to tear my hair out
It also shows why canvas based websites suck. Open Google Docs, select a word, press Cmd-Ctrl-D, ... nothing. Try it in gmail (which is not canvas based) and it works.
- they really don’t want you saying bad words of any kind.
- they do not look at context at all
- they focus too much on the first letter of the word for suggestions
Not true anymore, I just typed fuck in this comment without having to fight it. They made a change I think last year and they even announced it.
> they do not look at context at all
Also not true. It's true that they're not perfect at it, but replacement after you typed 2 more words happen specifically because it can tell better what you want to say. Sometimes works against you because language is highly personal.
They also do that in Apple Notes. On the iPad the search can only match word prefixes. So if you type "oo" and the entire note consists of just the word "foo", it will find nothing. This doesn't even require fuzzy search, yet they couldn't be bothered while solving the much more difficult handwriting recognition problem.
Also the iPhone's Settings app still doesn't have all settings in the search index. So it's impossible to find the section "headphone safety" & "reduce loud audio" using words like "headphone", "audio" or "safety". This setting was introduced five years ago, by the way.
Since "typo" comes from "typography", it roughly means "symbolic". So "typoglycemia" should mean "symbolic sugar of the blood". Low typos in your blood would be "hypotypemia".
I have no idea why "typoglycemia" refers to a human ability to autocorrect, but it brings me joy, so I'm not going to question it ^_^
I remember when macbooks briefly came out with a ridiculously bright standby led that required Black electrical tape over if you wanted to sleep with it in the house. Shortly after no more status leds on any MacBook (thank you!).
Nowadays i find non stop little annoyances with threads from others on the same issues on Apple devices. From.the.overly.prominent.full.stop when searching textually in the url bar to the crappy spell check and crappy spam filtering. As much as Jobs apparently came across as an asshole there’s a need for someone at the top to say ‘WTF is this, fix it or get fired!’.
There's plenty of people described as 'quiet and polite, but firm' or some similar variations.
Lee Kuan Yew is one example that comes to mind immediately. Warren Buffett might be a good example from the world of business.
(Your favourite search engine or chatbot is probably more than happy to give you a steady stream of other examples.)
Yelling at a rank-and-file to unfuck some random system, then not giving them any time, resources, or tools to fix it is just being a dictatorial dickhead.
It did break prioritization in the opinion of the ground level teams and their goals but I argue it's not bad to at least periodically do this since grating against the current org structure prioritization and goals is not a bad thing to do on occasion.
Chances are they'll find there's no team that considers themselves the owners of spell check or spam filtering and the goals the keyboard team are going for is likely some silly thing like "number of sentences with correct punctuation" leading to the current ridiculous outcomes where the period in the URL is way too prominent, especially considering we don't even type full URLS into the search bar that often these days.
Dear Apple leads: if you're reading this do a short initiative where execs aim to file an annoyance a day. It's not hard to find such. There will be some complaints at the ground level that these executive annoyances get too much priority but part of that will be because you're questioning lower level org priorities (a healthy thing to do!), not because the issues don't matter. The end result will bring Apple a bit more in line with the quality we saw during the Jobs period since this is exactly the kind of shake up he did on occasion.
(I’m sorry, it doesn’t matter but I couldn’t help it in a discussion on quality)
I suspect an LLM wouldn’t be the most optimal choice
In terms of 'can I run it locally on an early 2000s machine?' LLMs are definitely the wrong choice.
In terms of 'what can I quickly hack together in 2025 regardless of variable cost?' LLMs might be the right choice.
> I wonder how it does work, I remember MS Word having a fairly decent grammar checker [...]
You can get pretty far with some lookup tables and some heuristics.
the middle appreciate metrics and deliverables
One example is how this product manager type, because of company politics, isn’t really under the same department as the other software teams.
Because of his very very narrow horse blinkers approach, he doesn’t see or even comprehend why we’d want to align with literally anything in any other team and that includes visual UI stuff.
That’s why we have a bright neon pink “Back” button. Right in the literal center of the screen. It’s insane.
I was given a small electric fan. It’s great in that it’s portable and I can use it in some of the crummy hotels I have to stay in.
Unfortunately, it has a bright blue LED on it so it’s a pain to use at night when you’re trying to sleep.
It’s so bright that even covered with tape it still shines through the thin plastic of the fan body.
What really gets me is why they bothered putting an operating light on it in the first place?
It’s a fan. The fact that it’s working tells you it’s working.
A Jobs or Torvalds type character would have pointed that out.
I suspect though that it’s often a case of people noticing these type of design flaws but not having the authority to fix them while those with the authority don’t care.
Kinda related but also not really, my own pet peeve is the pouring spout in many products, coffee machine, water jugs, buckets... they might look effective but I find that more often than not, they are curved too much and drip all over when actually pouring.
And I always have to wonder, after serving coffee from one of those things, did the person who design it never even try it just once? Didn't they ever use such a thing, they never ever poured water from a pot?
Adding a appropiate diode in it's place is advised.
It was more just the observation that an unnecessary light had been included that degrades the performance of the product.
I find it intriguing how that comes to be. On paper it seems like adding the light wouldn't hurt the product even if not useful but no body actually used it it seems.
More often than not, those annoying features are direct requests from the person up top who smacks people. They want that feature because they think it will sell, and it's no use trying to argue with them because you'll just get smacked again.
I think that is pretty unusual in large companies.
When did the OG MacBook Air have instant on at launch in 2008?
IIRC the M1 brough Instant on and Jobs wasn't around anymore.
What we really mean is before you complete the action of fully opening the hinge to 120deg which is something like 1.5-2seconds?
AFAIK pre M1 days it would be still a few seconds after fully opening and now it's more like < 1sec.
Compare to most corporations where the only thing you can do to get fired is fail at office politics and failure to deliver/delivering the lowest quality crap that can be passed off is just business as usual.
Alas, human don't come fully customisable. You get to pick from the packages on offer. And it seemed like for Apple Steve Jobs' good parts only came as part of a package that also included his bad parts.
These things need to be well-placed to be effective. Sounds like it was.
See from the replies to this how well you got your point across.
It's just a word.
If you’re booting a computer or building web search, every subsystem can contribute to latency. If you have more teams and more features, you’re likely to have more latency.
In the early days of Google, Larry Page would push hard on this as well, in person. So Google search was fast.
But later the company became larger and bureaucratized, so nobody was in charge of latency. So then each team contributes a bit to latency, and that’s what ends up shipping.
Google products used to be known for being fast, but they’ve reverted to the mean
One of the most aggravating things in iOS. Trips me up almost every day (and it's been there for what? 10 years now?)
So if you suffer from this it's not even your fault. You're literally hitting the spacebar but some incentive at Apple in their org structure has led to the period literally having waaaay too much weighting and the lack of exec oversight at Apple in the post Jobs days is leading to us all.typing.periods.whenever.we.just.wanted.to.search.
But.you.took.a.weight.from.my.shoulders. I always.thought. I was an incompetent who couldn't hit the spacebar correctly.
The lack of status LEDs is actually the only thing I really REALLY hate about MacBooks!
Too often I have been bitten by the thing not properly going to sleep because SOMETHING keeps a wake lock (and of course macOS doesn't indicate this anywhere outside of Energy Monitor, nested in System Activity) and overheating in my bag as a result. A simple LED would have been a good visual indicator that it is still awake.
Here are some random examples I thought of for this comment. Notice how everything is spelled wrong as though the screen input doesn’t match the location of the buttons.
- tomoroww eather in united.kingdom
- lookip exhange rate
- devopper news
- download twotter.video
And if you're not paying attention, your message ends up looking like you're having a stroke.
Here are some nice examples (excluding obvious edit distance based ones which it does right)
"snowbalfight" --> "snowball fight"
"unrelevant" --> "irrelevant"
"fone" --> "phone"
"the the" --> "The"
And all of this with auto capitalization if it notices you're at the start of a sentence, and stuff like handling proper nouns, punctuations, etc,.
What I find really interesting is swipe-type spell checking (its basically word prediction) on phones. That is a really cool problem to solve well. Sometimes it works like a dream and other times it's annoying. I wonder how they write those.
Yes: Apple doesn't care.
> Does anyone else...
Yes. I just typed in "Tipografical earer" - and iOS 18.6 suggested "Tipograxical" for the first word, and one of "eared", "eager", and "eater" for the second word.
I use the swipe feature because I guess I have wide fingertips and frequently hit unintended, adjacent keys when pecking on the keyboard (especially as I’ve gotten older). The words produced by swiping often make no grammatical sense, and are frequently esoteric words that I just can’t believe rank high enough on a basic frequency list to suggest. Not to mention my own vocabulary, which apparently is not considered by the keyboard at all.
I had a way better experience using SwiftKey on my android phone 15 years ago.
It's somewhat funny that human performance is seen as a baseline here, and not the pinnacle of achievement to aim for.
(I agree with you. I just find it entertaining.)
Another interesting challenge with CJK languages was just displaying them. You need higher-resolution graphics and a much bigger character ROM to even consider that.
Pinyin is sort of the standard for romanization, although other systems exist, as well as inputs that aren't based on romanization (bopomofo).
Take the pinyin `fei`. Just looking at the tones that can be on this word, it can mean at least 4 words (my dictionary app couldn't find any neutral tone words). In reality, its at least dozens, each with different contextual meanings.
Very interesting! It was certainly a different technological challenge...
Also discussed here: https://news.ycombinator.com/item?id=40537464
Thinking back, how the heck did they do spell checking algorithms on a 6502? That’s a bit of code I’d like to see reverse engineered!
It did both spell checking and correction (and had an anagram finder as a bonus), had integration with several different wordprocessors, check as you type functionality, AND its own integrated editor on top of that. The built in dictionary had a claimed 58k words (with a claimed checking speed of 10k words per minute). All of this was somehow squeezed into 128k (as a ROM on a carrier board with a hardware bank switching mechanism paging in 16K at once).
It still is. The spell checker on my Android phone is a PIA. It's too dumb to correct many typos, there's no way of highlighting wrongly used but correct words such a 'fro' and 'for', etc. There's no automatic or user defined substitution such as correcting 'rhe' with 'the' and yet keep the words highlighted until a final revision.
Wordpossessor spellers have no way of tagging certain words that one may or may not wish to use depending on context. A classic example that's caught me out past the draft and found its way into the final document without me noticing it is 'pubic' for 'public'. Why doesn't my speller highlight such words in red and ask whether I actually meant to use this word?
Moreover, spellers are not all of the same level of accuracy, for example Microsoft Word's speller is much better than LibrOffice's much to my annoyance as LibreOffice is my main (preferred) WP.
Nor is there a method of collecting misspelled words or typos and tagging them as spelling errors or typos for the purpose of helping one's spelling or typing. It'd be nice to have a list of my misspelled words together with their correct spelling, that way I could become a better speller. Also, spellers could be integrated with full dictionaries—highlight the word and press F1 for its meaning, etc.
There are no dictionary formats that are both universal and smart, that is that would allow for easy amalgamation between dictionaries and yet could contain user defined words and other user metadata which would be distinguished from the general corpus of words when crossed or amalgamated. For example, a smart dictionary format could contain metadata that would allow a dictionary and thesaurus to coexist in the same word list, similarly so different dictionaries, technical, medical etc.
All up, spellercheckers are still a damn mess. They need urgent attention.
The computer has to figure out whether the word is in the dictionary, but it also has to figure out a suggestion for what to change it to.
And even after just that, we already have a bug- homonym mistakes- homonyms are in the dictionary but they’re misspelled (that was intentional btw).
How misspelled is another problem. We’ve had Levenshtein et al algorithms for a long time, but how different can you get? A really badly misspelled word might not have any good replacement candidates within your edit distance limit.
There are also optimizations like frequently mistyped words (acn-> can), acronyms, etc.
It was never just about size.
Thinking of the example given about being able to just load the word list into memory, I did something of that ilk when my son’s fifth grade class read a book which had a concept of dollar words: You assign a value to each letter, a=1, b=2, … z=26, add up the value and try to get exactly 100. It was pretty trivial to write a program that read the word list and produced the complete list of dollar words (although I didn’t share that with my son, I did give him access to the word list and challenged him to write the program himself).
At the moment, I’m building up a Spanish rhyming dictionary by using a Spanish word list, reversing the words and sorting the reversed list to find the groups of words that are most likely to rhyme, which was something that 30 years ago would have been a challenge on my desktop computer but now is a brief script that I’m just as likely to manage through perl 1-liners and shell pipes as not.
Would you ask someone to "check my spell"? Not unless you're a wizard, I suppose.
gnabgib•4d ago
2023 (314 points, 180 comments) https://news.ycombinator.com/item?id=34971924
2020 (363 points, 143 comments) https://news.ycombinator.com/item?id=25296900
2012 (94+156 points, 70+61 comments) https://news.ycombinator.com/item?id=4640658 https://news.ycombinator.com/item?id=3466388
dang•15h ago
A spellchecker used to be a major feat of software engineering (2008) - https://news.ycombinator.com/item?id=34971924 - Feb 2023 (180 comments)
A spellchecker used to be a major feat of software engineering (2008) - https://news.ycombinator.com/item?id=25296900 - Dec 2020 (143 comments)
A Spellchecker Used to Be a Major Feat of Software Engineering (2008) - https://news.ycombinator.com/item?id=10789019 - Dec 2015 (29 comments)
A Spellchecker Used to Be a Major Feat of Software Engineering - https://news.ycombinator.com/item?id=4640658 - Oct 2012 (70 comments)
A Spellchecker Used To Be A Major Feat of Software Engineering - https://news.ycombinator.com/item?id=3466388 - Jan 2012 (61 comments)
A Spellchecker Used to Be a Major Feat of Software Engineering - https://news.ycombinator.com/item?id=212221 - June 2008 (22 comments)