frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Half million 'Words with Spaces' missing from dictionaries

https://www.linguabase.org/words-with-spaces.html
39•gligierko•2h ago

Comments

happycat5000•2h ago
These are under-respected for non native English speakers.
grantpitt•1h ago
Can you say more on this?
gligierko•1h ago
They don't get into enough learning lists, and from my perspective, they are great additions to word games because the more transparent compounds are unique and legit words that can more than double the accessible vocabulary.
hrnnnnnn•1h ago
Consider phrasal verbs like "shut up", "get lost" or "kick off". Knowing what the parts mean doesn't let you understand the whole.

In your native tongue you take these for granted, but in a second language you have to learn that the sum is more (or different) than the parts.

f1shy•1h ago
Phrasal verbs are listed under the main verb. I never ever had a problem with that. As a native speaker sometimes I still have to search for some in some strange context.
dragonwriter•1h ago
These are called idiomatic phrases, and many (all natural?) languages have them, and, yes, they are pitfalls for language learners.
smt88•1h ago
These particular examples are figures of speech, so "shut" in "shut up" still means the same thing it would mean in "shut the door." And "up" is used the same way as "cover up."

So the issue is just that this is figurative language, and you have to know that a kickoff is the beginning of certain sports, for example. It's more of a cultural issue than something a dictionary needs to fix.

ndr42•1h ago
I imagine that languages like german that create composites of nouns have less of a problem with this:

English: cream of mushroom soup

Spanisch: sopa cremosa de champiñones

German: Champignoncremesuppe

looperhacks•1h ago
I just checked, Champignoncremesuppe is not in my dictionary ;)

It has some compound words. But including too many of them would quickly get out of hand

ndr42•1h ago
You are right! So the situation for german is worse: Millions of words are missing... ;-)
ticulatedspline•1h ago
but can't you basically make anything a composite noun in German? That it's a single word doesn't really help you decided if it has enough presence unto itself to be defined in the dictionary.

Seems like they would have just as much of a problem since the issue is delineating when a "phrase" becomes a "word"

Wobbles42•8m ago
More to the point, how to German dictionaries handle this?

Is there a distinction between words that get enumerated and compound nouns that do not?

It does seem, though, that German speakers might be more comfortable with the fuzziness that apparently exists at the edges of what the word "word" means.

agmater•1h ago
In Dutch we indeed happily do this even for English loanwords like "creditcard" or something more obscure like "lockpick". When in doubt, remove the space.
grantpitt•1h ago
Very cool project! Reminds me Chiang's great short story 'The Truth of Fact, the Truth of Feeling':

> “If you speak slowly, you pause very briefly after each word. Thatʼs why we leave a space in those places when we write. Like this: How. Many. Years. Old. Are. You?” He wrote on his paper as he spoke, leaving a space every time he paused: Anyom a ou kuma a me?

> “But you speak slowly because youʼre a foreigner. Iʼm Tiv, so I donʼt pause when I speak. Shouldnʼt my writing be the same?”

anotherhue•1h ago
Clearly those Irish monks are to blame.
dec0dedab0de•1h ago
There are nearly half a million compound phrases that aren’t in any dictionary—simply because they contain spaces. “Boiling water.” “Saturday night.” “Help me.”

I would hope that none of those examples were taking up space in a dictionary.

gligierko•1h ago
Some are better than others. Many semi-transparents could get legit coverage. And many are good fodder for word game content.
dec0dedab0de•1h ago
The rest of the article did a good job explaining that. I just think those were terrible examples for the introduction. I think "shut up", "good night", and "hot dog" would have really got the point across better, but those might already be in dictionaries.
ticulatedspline•1h ago
They're clearly a bit over-zealous bout what examples they think have meaning. They cite substitution as a good test for a phrase but double down on boiling water.

> Lexicographers used a substitutability test: if you can swap synonyms freely, it’s not a lexical unit. “Cold feet” (meaning fear) can’t become “frigid feet”—so it gets an entry. But the test cuts both ways. You can say “boiling water” but not “seething water” or “raging water.” The phrase resists substitution too.

These aren't failures for substitution because "Raging" isn't' a synonym in this case. where frigid would be a reasonable.

I wonder perhaps if the author is confusing the idiom "hot water" which is in there https://en.wiktionary.org/wiki/hot_water and would fail the substitution test.

gligierko•37m ago
I removed that sentence/claim, I see the point that "boiling" and "raging" was a bad example.
butvacuum•1h ago
'hot dog' belongs in a thesaurus, not a dictionary. It's just a type of sausage.
dec0dedab0de•1h ago
It’s a type of sausage, but they are definitely not synonymous. At least not in American English.
smt88•1h ago
In the US, if you ordered a hot dog and got a sausage (or vice versa), it would be very reasonable to return the item and ask for something else. They are culturally completely different, the same way Cheerios in milk is not another cold soup like gazpacho is.
alecbz•54m ago
All words in a thesaurus would generally also be in a dictionary? The difference between a thesaurus and a dictionary is what each tells you about a word.
Wobbles42•29m ago
A dictionary is an enumeration of words. A thesaurus is a mapping between existing words.

Every word in a thesaurus belongs in an dictionary.

dragonwriter•1h ago
Yeah, the good examples are usually in dictionaries as headwords, the moderate examples are usually in dictionaries as phrases within the entry for one (or more) of the words that comprise them, leaving fairly weak examples actually “missing” if you want to use “missing words with spaces” as the basis for content.
michaeld123•52m ago
Fair point. I just rewrote the intro w/ the naming-function argument first.
simlevesque•1h ago
The first two I kind of understand what the author means. But "help me" and "severe pain" made me think that I'm just not the right public for this text.
dec0dedab0de•1h ago
I don’t see how boiling water could ever be a single word. Would that mean we need entries for every other liquid boiling?

i guess Saturday night could have some extra details explaining the context around our standard work week. But even that is a stretch.

Wobbles42•23m ago
A single word for boiling water would be like the single word "slush" we have for ice in water.

It likely could apply to other liquids in the same mixed state, but would be assumed to refer to water (or solutions or colloidal mixtures primarily consisting of water) in common speech.

Water is extremely common, and has anonymously high heats of crystalization and vaporization, so it is the most common example of a mixed phase system and the only one most people encounter in everyday life.

jakub_g•1h ago
It's quite interesting that "boiling water" in many Slavic languages is actually a separate word (and not derived from "water", but from "boiling"; similar how the author mentions "ice" being used instead of "frozen water").
epgui•1h ago
I mean it’s interesting that this is generally the case with many (or even most) words across languages… But I’d wager it’s more the norm than the exception, so I don’t know if “boiling water” is that interesting of an example.
dec0dedab0de•1h ago
It was mentioned in other comments but boiled water is steam, and frozen water is ice. We do not have separate words for freezing water or boiling water.

in the slavic languages do they have a different way to describe boiling or freezing milk, or any other liquid?

Wobbles42•32m ago
We have the word slush to mean a mixture of ice and water. A single word for boiling water would occupy a similar conceptual space.

While these are not separate states of matter, they ARE special thermodynamic systems, with the particular property that they tend to remain exactly at the phase transition temperature while heat is added or removed from the system.

This is a somewhere esoteric technical distinction, but it has practical everyday consequences. It's why boiling food works so consistently as a universal cooking option.

You don't need to control the temperature of boiling water, it is an exact temperature that depends only on ambient pressure. As a consequence recipes work by only specifying time, sometimes with a single adjustment for people at higher altitudes.

This is remarkable given the wide variety of containers and heat sources used, and it is used practically by virtually every cooking tradition, even if it's reason for working is not common knowledge.

It shouldn't be surprising it'd acquire a single word as a unified concept.

dec0dedab0de•14m ago
but what about boiling milk? or boiling oil? I get your point, I just don't understand why we would have a word for boiling water but then still need boiling-x for everything else that boils.

edit: In those other languages is it like how we use ice? where water is the default, but it could mean any frozen liquid?

kgwgk•1h ago

    > Got a word           Didn’t
    > frozen water → ice   boiling water
Freezing water doesn’t have a word. Boiled water does have a word.
hagbard_c•1h ago
Freezing water doesn't have a word, it only gets one after water has changed phase. Boiling water also gets a word once it has changed phase: steam.

ice - water - steam

kgwgk•1h ago
Right. (I’m not sure if you’re aware but that’s exactly what I said.)
hagbard_c•1h ago
Almost but not exactly, 'boiled water' can go two ways: phase changed to steam (at which point is is no longer 'boiled water') or boiled and cooled again. Pedantic? Sure. Fits right in here? Absolutely.
dragonwriter•1h ago
Steam is liquid water droplets suspended in gas; water in the gas phase is “water vapor” which also doesn't have a single word.

This is also an interesting case because “vapor” without a qualifier also refers to a suspension of solid or liquid particles in gas (of which “steam” is a particular example).

dec0dedab0de•59m ago
this is an interesting distinction that i was unaware of.
mcswell•50m ago
"Steam is liquid water droplets suspended in gas": You clearly did not work on steam-powered ships (or land-based steam power plants). I was Main Propulsion Assistant on a steam powered destroyer, and I can assure you that every effort is made to prevent droplets being suspended in the steam--because such droplets erode the blades on steam turbines. To that end, steam coming out of the stem drum (the upper part of the boiler) is run through superheaters, which raise the temperature of the incoming steam to evaporate any droplets. On our ship, the steam coming off the steam drum was a bit over 1200 psi and 600 some degrees Fahrenheit. After it goes through the superheaters, it's about the same pressure but 975 degrees.

And there's effectively no other gas in the steam, because dissolved air in the boiler's feedwater (particularly oxygen and carbon dioxide) has to be removed to prevent corrosion. To that end, water going into the boiler is first run through a deaerator, to remove any air that dissolved in the water as it came through the condensor.

dragonwriter•44m ago
> You clearly did not work on steam-powered ships (or land-based steam power plants

Well, that's true, I haven't, BUT still I went back and forth writing and deleting and rewriting and eventually deleting a whole digression about the special case of the jargon of steam power and how it uses “wet steam” (or “saturated steam”) for “steam” in the general use sense and “dry steam” for “water vapor” and “superheated steam” for dry steam created by heating wet steam away from contact with water, before deciding that was way too much, but, yeah, that's all true. (And, in details about the actual processes used, a lot more than I knew or would have gone into even if I had and had decided to keep the digression.)

hagbard_c•46m ago
Nope, water vapour is the gas phase of water mixed with other gases while steam is just the gas phase of water. Water vapour can condense into tiny droplets which can freeze into ice crystals, both of which are visible as 'clouds'. Steam is not visible until it condenses into droplets at which point it no longer is steam but water suspended in another medium, usually air.
pvillano•53m ago
A mixture of melting ice and water suitable for drinking has a word: ice water. It's not a adjective noun phrase. It has a more specific meaning than just the two words together. You can order an ice water at a restaurant
aaroninsf•1h ago
With Twain in mind, might I suggest we adopt the simple expedient of snake casing such terms.
pvillano•43m ago
Finally, someone who actually thought about where to draw the line instead of rejecting words with spaces entirely.
below43•1h ago
“Hospital bills”. That’s very country specific. Also, that’s two words.
soperj•1h ago
What does it mean?
eternauta3k•1h ago
It's what your insurance gets from the hospital after they provided a service to you.
tialaramex•57m ago
Hospital bills feels like a pretty ordinary compound to me - not like "good morning" or "ginger ale" where you can't just use what you know about the two words to figure out what the compound must mean.

Some cases are basically impossible "Crash blossoms" you don't stand any chance without knowing why we call them that

Some are middling difficult, "Home Secretary" requires that you know every meaning for the two words and then you happen to pick the correct obscure meaning, a "Secretary" could be in charge, and "Home" could mean the entire country as distinct from everywhere else.

But "Hospital bills" doesn't seem even marginally difficult

quesera•20m ago
I had to look up "crash blossoms"! But that's just an idiom, which is always tricky in translation. It might also be slang. Idioms and slang are borderline dictionary material, different editors make different choices, and they change over time.

But "ginger ale" seems straightforward to me. It's an ale, flavored with ginger. Not even idiomatic, just descriptive. Root beer. Grape soda. Orange chicken.

Wobbles42•9m ago
There seems to be a lot of overlap between this compound word concept and idioms. Both are largely atomic, defy analysis via individual word definition, and fairly language (and culture or dialect) specific.

Dictionaries are also language specific. We don't necessarily expect a 1:1 mapping of words between languages. I have personally always wondered if this subtley shapes thoughts in different languages as well.

below43•17m ago
In most English speaking countries it's a far from common phrase (ie. it's very USA-centric).
quesera•13m ago
OK. But is the meaning any less literally-obvious than "grocery bills" or "electricity bills"?

Maybe you don't have "hospital bills". I don't have "landscaping bills", but I know exactly what they are.

kelseyfrog•1h ago
The name for these are "collocations".

Collocation dictionaries are lists of collocations. The reason they're absent from single word dictionaries is because there's about 25x more collocations than single words.

danesparza•1h ago
I don't think 'Words with spaces' is a thing.

I think maybe the word the author is looking for is 'phrase'

epgui•1h ago
It’s probably a thing, especially with loan-words (eg.: “avant garde”), and there are probably much better examples… But the examples in the article make no sense to me.
alecbz•1h ago
I think 'phraseme' is closer: https://en.wikipedia.org/wiki/Phraseme
Wobbles42•19m ago
The difference between phrases and "words with spaces" is addressed.

The confusion might be that this seems to be a spectrum rather than a binary phenomon.

We have single words at one extreme, ordinary sentences at the other, and in the middle we have idiomatic assemblies of words that span a range of substitutability.

"Hot dog" and "Saturday night" are arguably great examples, because they exist at the opposite extremes of the spectrum. Saturday night can retain some of the original meaning following substitution, whereas hot dog almost deserves a hyphen.

quesera•15m ago
I disagree that "saturday night" ever means anything other than the literal meaning of the nighttime of the day of saturday.

You can argue that there's a connotative association with the phrase. Sure. Just like "beach weather", or "blizzard conditions". But that doesn't make "saturday night" special in any way.

thmpp•1h ago
While 'this analysis would not have been possible without LLM', I am not sure the LLM analysis was well reviewed after it has been done. From the obscure/familiar word list, some of the n-grams, e.g. "is resource", "seq size", "db xref" surely happen in the wild (we well know), but I would doubt that we can argue they are missing from the dictionary. Knowing the realm, I would argue none of them are words, not even collocations. If "is resource" is, why not, "has resource"? So while the path is surely interesting, this analysis does miss scrutiny, which you would expect from a high-level LLM analysis.
michaeld123•48m ago
The very bottom of the slider is there to illustrate where LLM artifacts and Wiktionary noise live — it's not presented as legitimate vocabulary. The slider lets you see the full quality gradient, including where it breaks down.
JackFr•1h ago
"Opaque MWE"? Does no one know the word "idiom"?
hmokiguess•1h ago
On another note, I always wished "never mind" was spelled "nevermind"
pvillano•46m ago
"Each other" is like that for me, and according to search results, a lot of other people. I pronounce it ee-chother.

"Eachother" feels as natural as "somebody", "nobody", "anybody" to me

AlotOfReading•1h ago
A compound word isn't just a phrase. The latter is a group of words that indicate a single concept. The former is a new word that has a distinct meaning from the subwords that compose it. "I love you" is an example of a clausal phrase. The meaning is entirely evident from the words that compose it. In contrast, a "hot dog" is not a particularly warm canine, and has its own OED entry [0] as a compound word.

And some of the entries on this list are wrong. "Good night" exists in OED as "goodnight" [1] because there are multiple ways it's used. One is the clausal phrase "I hope you have a good night", which can be modified by changing the adjective, e.g. "great night" or "terrible night". "Goodnight" the bedtime ritual can't be modified the same way, so OED chooses to write it as a compound word without spaces.

[0] https://www.oed.com/dictionary/hot-dog_n

[1] https://www.oed.com/dictionary/goodnight_n

johnhamlin•1h ago
I got into solving the NYT crossword during Covid. I couldn’t solve a Monday when I started; now I do Mondays downs-only and look forward to Saturdays. Along the way, I developed a sixth sense for when an answer will be more than one word. I’ve thought a lot about it and can’t really describe how I do it. (Some other puzzles clarify if an answer spans multiple words, but I find the ambiguity adds to the fun.)
Wobbles42•4m ago
Do you think this comes from a gradual internalization of a real linguistic concept? Or it more a familiarity with common (if unspoken) conventions of the puzzle makers?

I suspect the answer isn't binary, but it's interesting to think about.

This "sixth sense" phenomenon seems to pop up a lot. Crosswords are a great example. The sense some people are getting for detecting LLM output might be another.

speak_plainly•1h ago
Dictionaries are a mixed bag at best. If you apply David Kaplan’s character/content distinction from Demonstratives, you have to ask: should pure indexicals, which are essentially 'contentless' pointers be treated the same way as standard words? Let alone the thousands of rigid designators in this dataset that map directly to specific objects in the real world. At a certain point, is there no room left for encyclopedias?
johnhamlin•1h ago
Fascinating! I’d add “word nerd” to the list to describe the authors.
alecbz•1h ago
"to be" is a very weird example because that's just the full infinitive of "be" which is definitely in dictionaries: https://www.merriam-webster.com/dictionary/be
MarkusQ•58m ago
This boils down to an "is Pluto a planet" debate.

We act as if some languages have "compound words" that can encompass entire sentences (subject & object attaching to the verb as prefixes or suffixes) while others don't form compounds, and most are somewhere in between. But these are all statements about lexicographic conventions and say nothing about the languages. In reality all languages are muddles sprawling across a multidimensional continuum, and they abso-frigging-lutely do n't sit neatly in such pigeonholes.

Wobbles42•14m ago
This is a great comparison. We're arguing about the definition of "word", and attempting to expand it to include edge cases where two words with separate meanings have a different atomic meaning when combined.

We could have a similar debate about whether common suffixes and prefixes should be regarded as individual words.

Much like "planets" don't really exist as a separate natural object, words don't really exist in natural languages. They are artificial concepts, and therefore we will always have edge cases.

I would argue that it is still a useful discussion, as it sheds light on the nature of language (or of celestial bodies), even if the definitions defy the same rigour as mathematical concepts.

Mine. Mine. Mine. How One Corrupt Billionaire Kicked Off the Global Cobalt Spree

https://www.vanityfair.com/news/story/dan-gertler-cobalt
1•randycupertino•26s ago•1 comments

India's VIP culture is out of control

https://www.economist.com/asia/2026/02/22/indias-vip-culture-is-out-of-control
1•vinni2•57s ago•0 comments

Cloudflare non existent Trust and Safety

1•rtsam•1m ago•0 comments

Binance Fired Employees Who Found $1.7B in Crypto Was Sent to Iran

https://www.nytimes.com/2026/02/23/technology/binance-employees-iran-firings.html
3•boplicity•2m ago•1 comments

Women's heart attack risk rises even if arteries aren't as clogged as men's

https://www.statnews.com/2026/02/23/heart-disease-in-women-plaque-scan-risk/
1•brandonb•2m ago•0 comments

BC Hydro call for AI, data-centre projects – Limited capacity

https://news.gov.bc.ca/releases/2026ECS0005-000095
1•SteveVeilStream•3m ago•1 comments

What I've learned to recognize as a designer and technologist thanks to sci-fi

https://www.chrbutler.com/in-but-not-of
3•delaugust•5m ago•0 comments

Fighting Cognitive Debt in Agentic Code with Video Overviews

https://enigmeta.com/posts/2026-02-19-video-overviews-for-agentic-code/
1•fdb•5m ago•0 comments

Diversifying lithium-rich mineral sources with petalite

https://www.csiro.au/en/news/All/Articles/2026/February/Petalite
1•PaulHoule•8m ago•0 comments

Were the Egyptian Pyramids Not Built Up, but Carved Down?

https://www.openculture.com/2026/02/were-the-egyptian-pyramids-not-built-up-but-carved-down.html
2•_kidlike•8m ago•0 comments

Show HN: Dare v2 – A token-efficient, AI-native language for PDF generation

https://dare.pages.dev/
1•hassan-elkady•10m ago•1 comments

Deplatform Yourself

https://pluralistic.net/2026/02/23/goodharts-lawbreaker/#no-metrics-no-targets
5•leephillips•11m ago•0 comments

Locker by Ente

https://ente.io/locker/
1•matthiaswh•13m ago•0 comments

Clojure, Reimplemented in Zig

https://github.com/clojurewasm/ClojureWasm
1•jedisct1•14m ago•0 comments

Consistent Hashing: Algorithmic Tradeoffs (2018)

https://dgryski.medium.com/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8
1•jitl•15m ago•0 comments

Donut Lab's solid-state battery gets its first test result

https://www.theverge.com/transportation/882993/donut-labs-solid-state-battery-charge-speed-vtt-test
2•thelastgallon•16m ago•1 comments

A lithium-ion breakthrough that could boost range and lower costs

https://www.techradar.com/vehicle-tech/hybrid-electric-vehicles/forget-solid-state-batteries-rese...
3•thelastgallon•18m ago•0 comments

A visual summary of the 5 prerequisites for improvement

https://mental-models.oldschoolburke.com/five-prerequisites/
2•zdosb•20m ago•1 comments

Zwasm: A fast, spec-compliant WebAssembly runtime written in Zig

https://github.com/clojurewasm/zwasm
1•jedisct1•20m ago•0 comments

Americans are destroying Flock surveillance cameras

https://techcrunch.com/2026/02/23/americans-are-destroying-flock-surveillance-cameras/
4•mikece•21m ago•0 comments

Life at the Frontlines of Demographic Collapse

https://www.lesswrong.com/posts/FreZTE9Bc7reNnap7/life-at-the-frontlines-of-demographic-collapse
2•reducesuffering•23m ago•0 comments

I analyzed hundreds of humans vs. AI Tetris games, here's what I found

https://www.a16z.news/p/i-built-tetrisbench-where-llms-compete
1•ykhli•23m ago•0 comments

Real-time security reasoning inside your IDE

https://open-vsx.org/extension/DevSecAI/Arko
1•mlnas•23m ago•1 comments

Fuss: OverlayFS Without Mounting

https://writethat.blog/fuss.html
2•psarna•26m ago•0 comments

Alleged Distillation Attacks by DeepSeek, Moonshot AI, and MiniMax

https://twitter.com/anthropicai/status/2025997929840857390
6•mike_kamau•27m ago•1 comments

ESR posits that the C-era is reaching its natural conclusion

https://twitter.com/esrtweet/status/2026004594590089484
2•sgt•31m ago•0 comments

Show HN: Emotica – AI that analyzes your emotions instead of just tracking them

https://apps.apple.com/us/app/emotica-mood-tracker-diary/id6757162931
2•tirupati_balan•31m ago•1 comments

Muscle Cathepsin B Improves Neurogenic Deficits in Mouse Alzheimer's Disease

https://onlinelibrary.wiley.com/doi/10.1111/acel.70242
3•bookofjoe•32m ago•0 comments

Show HN: I rebuilt my hobby mapping platform

https://trippi.app
2•velmu•33m ago•0 comments

Waymo Is Destroying Tesla's Self-Driving Dreams

https://neuralfoundry.substack.com/p/waymo-is-destroying-teslas-self-driving
7•truenfel•36m ago•0 comments