this is pretty cool, here's another question, how much language compression would we get if we collapse all related words to a single synonymous word?
Here's what chatgps came up with:
Assume an English-like active vocabulary V = 50,000 word types (a rough stand-in for “distinct words” seen commonly). We could get a realistic guess of: ~30% reduction for a less modest, more aggressive embedding-style collapse in typical English text. I.e. Collapse words with similar meaning directions in vector space... happy, glad, pleased, delighted → happy
Nzen•3h ago
If you want to see examples of this in practice, I recommend reading Randall Monroe's Thing Explainer [0] or some simple wikipedia articles [1].
There's a nice utopian book about a world where they do this. They then even remove the comparatives and superlatives, to have for example "plushappy". And with the language controlled and simplified in this manner, everything is doubleplusgood forever.
namanyayg•1h ago
Had me at the first half! One of my favorite, mind blowing books that I had the pleasure to read during my senior years of HS.
tantalor•3h ago
But lots of words have multiple definitions
nph278•3h ago
Pick whichever works best.
pugworthy•1h ago
Then you get handed words like stenohaline...
jihadjihad•3h ago
congress [0]
> collection of non governors really exhibiting self service
this is the type of important work that transformer LLM’s are actually really good at, I think
orangecat•2h ago
Can confirm, Claude is quite good at this. "Intelligence" -> "Inner neural thinking enables learning, logic, insight, grasping, evaluating, navigating challenges efficiently".
JKCalhoun•1h ago
Hub for Analysis, Code, Knowledge, Engineering, Research, and Noteworthy Emerging Web Stories
dwrensha•1h ago
highly adept computer knowers explaining recent network exploits while sitting
clueless•3h ago
Assume an English-like active vocabulary V = 50,000 word types (a rough stand-in for “distinct words” seen commonly). We could get a realistic guess of: ~30% reduction for a less modest, more aggressive embedding-style collapse in typical English text. I.e. Collapse words with similar meaning directions in vector space... happy, glad, pleased, delighted → happy
Nzen•3h ago
[0] https://xkcd.com/thing-explainer/
[1] https://simple.wikipedia.org/wiki/Rabbit (versus https://en.wikipedia.org/wiki/Rabbit)
falcor84•1h ago
namanyayg•1h ago