frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Semantic Calculator (king-man+woman=?)

https://calc.datova.ai
86•nxa•6h ago
I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.

For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.

Comments

antidnan•6h ago
Neat! Reminds me of infinite craft

https://neal.fun/infinite-craft/

thaumasiotes•3h ago
I went to look at infinite craft.

It provides a panel filled with slowly moving dots. Right of the panel, there are objects labeled "water", "fire", "wind", and "earth" that you can instantiate on the panel and drag around. As you drag them, the background dots, if nearby, will grow lines connecting to them. These lines are not persistent.

And that's it. Nothing ever happens, there are no interactions except for the lines that appear while you're holding the mouse down, and while there is notionally a help window listing the controls, the only controls are "select item", "delete item", and "duplicate item". There is also an "about" panel, which contains no information.

n2d4•3h ago
In the panel, you can drag one of the items (eg. Water) onto another one (eg. Earth), and it will create a new word (eg. Plant). It uses AI, so it goes very deep
thaumasiotes•3h ago
No, that was the first thing I tried. The only thing that happens is that the two objects will now share their location. There are no interactions.
n2d4•3h ago
Probably a bug then, you can check YouTube to find videos of people playing it (eg. [0])

[0] https://youtu.be/8-ytx84lUK8

firejake308•6h ago
King-man+woman=Navratilova, who is apparently a Czech tennis player. Apparently, it's very case-sensitive. Cool idea!
fph•5h ago
"King" (capital) probably was interpreted as https://en.wikipedia.org/wiki/Billie_Jean_King , that's why a tennis player showed up.
nxa•5h ago
when I first tried it, king was referring to the instrument and I was getting a result king-man+woman=flute ... :-D
BeetleB•5h ago
Heh. This is fun:

Navratilova - woman + man = Lendl

nikolay•5h ago
Really?!

  man - brain = woman
  woman - brain = businesswoman
2muchcoffeeman•5h ago
Man - brain = Irish sea
nikolay•5h ago
Case matters, obviously! Try "man" with a lower-case "M"!
Alifatisk•5h ago
Why does case matter? How does it affect the meaning?
bfLives•5h ago
“Man” is probably being interpreted as the Isle of Man.

https://en.m.wikipedia.org/wiki/Isle_of_Man

G1N•5h ago
Man (capital M) is probably being interpreted as some proper noun, maybe Isle of Man in this case?
karel-3d•5h ago
woman+penis=newswoman (businesswoman is second)

man+vagina=woman (ok that is boring)

sapphicsnail•5h ago
Telling that Jewess, feminist, and spinster were near matches as well.
nxa•5h ago
I probably should have prefaced this with "try at your own risk, results don't reflect the author's opinions"
dmonitor•3h ago
I'm sure it would be trivial to get it to say something incredibly racist, so that's probably a worthwhile disclaimer to put on the website
dalmo3•5h ago
I think subtraction is broken. None of what I tried made any sense. Water - oxygen = gin and tonic.
adzm•5h ago
noodle+tomato=pasta

this is pretty fun

growlNark•5h ago
Surely the correct answer would be `pasta-in-tomato-sauce`? Pasta exists outside of tomato sauce.
cabalamat•5h ago
What does it mean when it surrounds a word in red? Is this signalling an error?
nxa•5h ago
Yes, word in red = word not found mostly the case when you try plurals or non-nouns (for now)
rpastuszak•5h ago
This is neat!

I think you need to disable auto-capitalisation because on mobile the first word becomes uppercase and triggers a validation error.

iambateman•5h ago
Try Lower casing, my phone tried to capitalize and it was a problem.
fallinghawks•5h ago
Seems to be a word not in its dictionary. Seems to not have any country or language names.

Edit: these must be capitalized to be recognized.

zerof1l•5h ago
male + age = female

female + age = male

G1N•5h ago
twelve-ten+five=

six (84%)

Close enough I suppose

lightyrs•5h ago
I don't get it but I'm not sure I'm supposed to.

    life + death = mortality
    life - death = lifestyle

    drug + time = occasion
    drug - time = narcotic

    art + artist + money = creativity
    art + artist - money = muse

    happiness + politics = contentment
    happiness + art      = gladness
    happiness + money    = joy
    happiness + love     = joy
grey-area•5h ago
Does the system you’re querying ‘get it’? From the answers it doesn’t seem to understand these words or their relations. Once in a while it’ll hit on something that seems to make sense.
bee_rider•4h ago

    Life + death = mortality  
is pretty good IMO, it is a nice blend of the concepts in an intuitive manner. I don’t really get

   drug + time = occasion
But

   drug - time = narcotic
Is kind of interesting; one definition of narcotic is

> a drug (such as opium or morphine) that in moderate doses dulls the senses, relieves pain, and induces profound sleep but in excessive doses causes stupor, coma, or convulsions

https://www.merriam-webster.com/dictionary/narcotic

So we can see some element of losing time in that type of drug. I guess? Maybe I’m anthropomorphizing a bit.

woodruffw•5h ago
colorless+green+ideas doesn't produce anything of interest, which is disappointing.
dmonitor•3h ago
well green is not a creative color, so that's to be expected
skeptrune•5h ago
This is super fun. Offering the ranked matches makes it significantly more engaging than just showing the final result.
spindump8930•5h ago
First off, this interface is very nice and a pleasure to use, congrats!

Are you using word2vec for these, or embeddings from another model?

I also wanted to add some flavor since it looks like many folks in this thread haven't seen something like this - it's been known since 2013 that we can do this (but it's great to remind folks especially with all the "modern" interest in NLP).

It's also known (in some circles!) that a lot of these vector arithmetic things need some tricks to really shine. For example, excluding the words already present in the query[1]. Others in this thread seem surprised at some of the biases present - there's also a long history of work on that [2,3].

[1] https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...

[2] https://arxiv.org/abs/1905.09866

[3] https://arxiv.org/abs/1903.03862

nxa•5h ago
Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.

The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.

It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).

kaycebasques•4h ago
(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?
nxa•4h ago
Yes, that's pretty much what it is. Watch out for homographs.
7373737373•5h ago
it doesn't know the word human
grey-area•5h ago
As you might expect from a system with knowledge of word relations but without understanding or a model of the world, this generates gibberish which occasionally sounds interesting.
fallinghawks•5h ago
goshawk-cocaine = gyrfalcon , which is funny if you know anything about goshawks and gyrfalcons

(Goshawks are very intense, gyrs tend to be leisurely in flight.)

kataqatsi•5h ago
garden + sin = gardening

hmm...

MYEUHD•5h ago
king - man + woman = queen

queen - woman + man = drone

bee_rider•4h ago
The second makes sense, I think, if you are a bee.
blobbers•5h ago
rice + fish = fish meat

rice + fish + raw = meat

hahaha... I JUST WANT SUSHI!

godelski•5h ago

  data + plural = number
  data - plural = research
  king - crown = (didn't work... crown gets circled in red)
  king - princess = emperor
  king - queen = kingdom
  queen - king = worker
  king + queen = queen + king = kingdom
  boy + age = (didn't work... boy gets circled in red)
  man - age = woman
  woman - age = newswoman
  woman + age = adult female body (tied with man)
  girl + age = female child
  girl + old = female child
The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.
Affric•4h ago
Yeah I did similar tests and got similar results.

Curious tool but not what I would call accurate.

n2d4•3h ago
For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:

   data + plural    = datasets
   data - plural    = datum
   king - crown     = ruler
   king - princess  = man
   king - queen     = prince
   queen - king     = woman
   king + queen     = royalty
   boy + age        = man
   man - age        = boy
   woman - age      = girl
   woman + age      = elderly woman
   girl + age       = woman
   girl + old       = grandmother

The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.

The prompt I used:

> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:

nbardy•3h ago
I hate to be pedantic, but the llm is definitely doing embedding math. In fact that’s all it does.
franga2000•3h ago
This is an LLM approximating a semantic calculator, based solely on trained-in knowledge of what that is and probably a good amount of sample output, yet somehow beating the results of a "real" semantic calculator. That's crazy!

The more I think about it the less surprised I am, but my initial thoughts were quite simply "now way" - surely an approximation of an NLP model made by another NLP model can't beat the original, but the LLM training process (and data volume) is just so much more powerful I guess...

CamperBob2•2h ago
This is basically the whole idea behind the transformer. Attention is much more powerful than embedding alone.
refulgentis•1h ago
...welcome to ChatGPT, everyone! If you've been asleep since...2022?

(some might say all an LLM does is embeddings :)

gweinberg•3h ago
I got a bunch of red stuff also. I imagine the author cached embeddings for some words but not really all that many to save on credits. I gave it mermaid - woman and got merman, but when I tried to give it boar + woman - man or ram + woman - man, it turns out it has never heard of rams or boars.
thatguysaguy•3h ago
Can you elaborate on what the unsolved problem you're referring to is?
mathgradthrow•2h ago
Distance is extremely well defined in high dimensional spaces. That isn't the problem.
ericdiao•5h ago
Interesting: parent + male = female (83%)

Can not personally find the connection here, was expecting father or something.

ericdiao•5h ago
Though dad is in the list with lower confidence (77%).

High dimension vector is always hard to explain. This is an example.

TZubiri•4h ago
I'm getting Navralitova instead of queen. And can't get other words to work, I get red circles or no answer at all.
gus_massa•4h ago
From another comment, https://news.ycombinator.com/item?id=43988861 King (with capital K) was a top 1 male tenis player.
nxa•4h ago
This might be helpful: I haven't implemented it in the UI, but from the API response you can see what the word definitions are, both for the input and the output. If the output has homographs, likeliness is split per definition, but the UI only shows the best one.

Also, if it gets buried in comments, proper nouns need to be capitalized (Paris-France+Germany).

I am planning on patching up the UI based on your feedback.

ericdiao•4h ago
wine - alcohol = grape juice (32%)

Accurate.

afandian•4h ago
There was a site like this a few years ago (before all the LLM stuff kicked off) that had this and other NLP functionality. Styling was grey and basic. That’s all I remember.

I’ve been unable to find it since. Does anyone know which site I’m thinking of?

halter73•4h ago
I'm not sure this is old enough, but could you be referencing https://neal.fun/infinite-craft/ from https://news.ycombinator.com/item?id=39205020?
montebicyclelo•4h ago
> king-man+woman=queen

Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?

I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.

(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)

(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)

raddan•4h ago
> is it actually just very cherry picked?

100%

gregschlom•4h ago
Also, as I just learned the other day, the result was never equal, just close to "queen" in the vector space.
charcircuit•36m ago
And queen isn't even the closest.
mcswell•31m ago
What is the closest?
Retr0id•3h ago
I think it's slightly uncommon for the vectors to "line up" just right, but here are a few I tried:

actor - man + woman = actress

garden + person = gardener

rat - sewer + tree = squirrel

toe - leg + arm = digit

groby_b•3h ago
I think it's worth keeping in mind that word2vec was specifically trained on semantic similarity. Most embedding APIs don't really give a lick about the semantic space

And, worse, most latent spaces are decidedly non-linear. And so arithmetic loses a lot of its meaning. (IIRC word2vec mostly avoided nonlinearity except for the loss function). Yes, the distance metric sort-of survives, but addition/multiplication are meaningless.

(This is also the reason choosing your embedding model is a hard-to-reverse technical decision - you can't just transform existing embeddings into a different latent space. A change means "reembed all")

jbjbjbjb•2h ago
Well when it works out it is quite satisfying

India - Asia + Europe = Italy

Japan - Asia + Europe = Netherlands

China - Asia + Europe = Soviet-Union

Russia - Asia + Europe = European Russia

calculation + machine = computer

bee_rider•1h ago
Hmm, well I got

    cherry - picker = blackwood
if that helps.
jumploops•4h ago
This is super neat.

I built a game[0] along similar lines, inspired by infinite craft[1].

The idea is that you combine (or subtract) “elements” until you find the goal element.

I’ve had a lot of fun with it, but it often hits the same generated element. Maybe I should update it to use the second (third, etc.) choice, similar to your tool.

[0] https://alchemy.magicloops.app/

[1] https://neal.fun/infinite-craft/

ezbie•4h ago
Can someone explain me what the fuck this is supposed to be!?
mhitza•3h ago
Semantical subtraction within embeddings representation of text ("meaning")
matallo•4h ago
uncle + aunt = great-uncle (91%)

great idea, but I find the results unamusing

HWR_14•3h ago
Your aunt's uncle is your great-uncle. It's more correct than your intuition.
matallo•3h ago
I asked ChatGPT (after posting my comment) and this is the response. "Uncle + Aunt = Great-Uncle is incorrect. A great-uncle is the brother of your grandparent."
lcnPylGDnU4H9OF•4h ago
Some of these make more sense than others (and bookshop is hilarious even if it's only the best answer by a small margin; no shade to bookshop owners).

  map - legend = Mercator projection
  noodle - wheat = egg noodle
  noodle - gluten = tagliatelle
  architecture - calculus = architectural style
  answer - question = comment
  shop - income = bookshop
  curry - curry powder = cuisine
  rice - grain = chicken and rice
  rice + chicken = poultry
  milk + cereal = grain
  blue - yellow = Fiji
  blue - Fiji = orange
  blue - Arkansas + Bahamas + Florida - Pluto = Grenada
kylecazar•4h ago
Woman + president = man
tlhunter•3h ago
man + woman = adult female body
__MatrixMan__•3h ago
Here's a challenge: find something to subtract from "hammer" which does not result in a word that has "gun" as a substring. I've been unsuccessful so far.
neom•3h ago
if I'm allowed only 1 something, I can't find anything either, if I'm allowed a few somethings, "hammer - wine - beer - red - child" will get you there. Guessing given that a gun has a hammer and is also a tool, it's too heavily linked in the small dataset.
tough•3h ago
hammer + man = adult male body (75%)
rdlw•2h ago
Close, that's addition
Retr0id•3h ago
Well that's easy, subtract "gun" :P
mrastro•3h ago
The word "gun" itself seems to work. Package this as a game and you've got a pretty fun game on your hands :)
downboots•3h ago
Bullet
aniviacat•2h ago
Gun related stuff works: bullet, holster, barrel

Other stuff that works: key, door, lock, smooth

Some words that result in "flintlock": violence, anger, swing, hit, impact

soxfox42•2h ago
hammer - red = lock
neom•3h ago
cool but not enough data to be useful yet I guess. Most of mine either didn't have the words or were a few % off the answer, vehicle - road + ocean gave me hydrosphere, but the other options below were boat, ship, etc. Klimt almost made it from Mozart - music + painting. doctor - hospital + school = teacher, nailed it.

Getting to cornbread elegantly has been challenging.

downboots•3h ago
three + two = four (90%)
LadyCailin•3h ago
Haha, yes, this was my first thought too. It seems it’s quite bad at actual math!
yigitkonur35•3h ago
shows how bad embeddings are in a practical way
rdlw•3h ago
I've always wondered if there's s way to find which vectors are most important in a model like this. The gender vector man-woman or woman-man is the one always used in examples, since English has many gendered terms, but I wonder if it's possible to generate these pairs given the data. Maybe to list all differences of pairs of vectors, and see if there are any clusters. I imagine some grammatical features would show up, like the plurality vector people-person, or the past tense vector walked-walk, but maybe there would be some that are surprisingly common but don't seem to map cleanly to an obvious concept.

Or maybe they would all be completely inscrutable and man-woman would be like the 50th strongest result.

Jimmc414•2h ago
dog - cat = paleolith

paleolith + cat = Paleolithic Age

paleolith + dog = Paleolithic Age

paleolith - cat = neolith

paleolith - dog = hand ax

cat - dog = meow

Wonder if some of the math is off or I am not using this properly

downboots•2h ago
mathematics - Santa Claus = applied mathematics

hacker - code = professional golf

quantum_state•2h ago
The app produces nonsense ... such as quantum - superposition = quantum theory !!!
nxa•2h ago
artificial intelligence - bullsh*t = computer science (34%)
behnamoh•1h ago
This. I'm tired of so many "it's over, shocking, game changer, it's so over, we're so back" announcements that turn out to be just gpt-wrappers or resume-builder projects.

Very few papers that actually say something meaningful are left unnoticed, but as soon as you say something generic like "language models can do this", it gets featured in "AI influencer" posts.

galaxyLogic•1h ago
What about starting with the result and finding set of words that when summed together give that result?

That could be seen as trying to find the true "meaning" of a word.

GrantMoyer•1h ago
These are pretty good results. I messed around with a dumber and more naive version of this a few years ago[1], and it wasn't easy to get sensinble output most of the time.

[1]: https://github.com/GrantMoyer/word_alignment

e____g•27m ago
man - intelligence = woman (36%)

woman + intelligence = man (77%)

Oof.

maxcomperatore•10m ago
Just use a LLM api to generate results, it will be far better and more accurate than a weird home cooked algorithm

Breaking Out of Restricted Mode: XSS to RCE in Visual Studio Code

https://starlabs.sg/blog/2025/05-breaking-out-of-restricted-mode-xss-to-rce-in-visual-studio-code/
1•pabs3•1m ago•0 comments

Magic Leap One Bootloader Exploit

https://github.com/EliseZeroTwo/ml1hax
2•mmastrac•4m ago•0 comments

At last, a workable plan for high-speed rail

https://www.slowboring.com/p/at-last-a-workable-plan-for-high
1•JumpCrisscross•13m ago•0 comments

Redroid is a multi-arch, GPU enabled, Android in Cloud solution

https://github.com/remote-android/redroid-doc
1•LorenDB•14m ago•0 comments

Hash Collisions and the Birthday Paradox [video]

https://www.youtube.com/watch?v=jsraR-el8_o
1•mfrw•17m ago•0 comments

XOS: Lightweight OS designed with efficiency, security, and flexibility in mind

https://github.com/BlCorporation/XOS
1•thunderbong•18m ago•0 comments

LLMs Get Lost in Multi-Turn Conversation

https://nlp.elvissaravia.com/p/llms-get-lost-in-multi-turn-conversation
2•omarsar•18m ago•0 comments

Benchmarks lie. Vector databases deserve a real test

https://milvus.io/blog/benchmarks-lie-vector-dbs-deserve-a-real-test.md
1•redskyluan•20m ago•0 comments

Tesla has yet to start testing Austin robotaxi service weeks before launch

https://electrek.co/2025/05/14/tesla-yet-start-testing-robotaxi-service-without-driver-weeks-before-launch/
2•coloneltcb•21m ago•0 comments

MicroPython v1.25.0

https://github.com/micropython/micropython/releases/tag/v1.25.0
6•todsacerdoti•21m ago•0 comments

US Warns That Using Huawei AI Chip 'Anywhere' Breaks Its Rules

https://finance.yahoo.com/news/us-warns-using-huawei-ai-191718234.html
1•jonbaer•27m ago•0 comments

Section 174 changes: Tech firms facing tax bills are laying off workers

https://www.resourcefulfinancepro.com/news/irs-section-174-changes-tech-firms-face-huge-tax-bills-layoffs-are-surging/
1•walterbell•28m ago•0 comments

Tilt Gestures for Text Property Control in Mobile Interfaces

https://www.mdpi.com/2414-4088/9/5/41
1•PaulHoule•28m ago•0 comments

Former journalist Evan Solomon named Canada's first-ever federal AI minister

https://www.thecanadianpressnews.ca/science/former-journalist-evan-solomon-named-first-ever-federal-ai-minister/article_5421351e-2fd4-52c9-9553-964800b622b0.html
1•ChrisArchitect•31m ago•0 comments

US warns companies around the world to stay away from Huawei chips

https://arstechnica.com/gadgets/2025/05/us-warns-companies-around-the-world-to-stay-away-from-huawei-chips/
2•jonbaer•33m ago•0 comments

The Camel Principle

https://thepalindrome.org/p/the-camel-principle
1•gmays•35m ago•0 comments

The Pay by Bank Breakthrough

https://fintechtakes.com/articles/2025-03-25/the-pay-by-bank-breakthrough/
1•toomuchtodo•35m ago•0 comments

VACE – Multifunctional video creation and editing AI model

https://vace.studio/
1•zoudong376•35m ago•1 comments

Christie's – 21st Century Evening Sale – Wed May 14 25 [video]

https://www.youtube.com/watch?v=nyuXtHizDaI
1•handfuloflight•36m ago•0 comments

AI is like hyperprocessed foods for learning

https://blindsidenetworks.com/ai-is-like-hyperprocessed-food-for-learning/
2•ffdixon1•39m ago•1 comments

A metaverse based digital preservation of temple architecture and heritage

https://www.nature.com/articles/s41598-025-00039-w
1•gnabgib•40m ago•0 comments

Neom climate adviser warns futuristic city could alter weather patterns

https://www.ft.com/content/8bb45e6e-5a1b-4e93-ad40-8f0568e02274
1•bookofjoe•45m ago•1 comments

What a DMD chip looks like in operation – DLP projector teardown [video]

https://www.youtube.com/watch?v=f3g38g3H_aM
1•creer•45m ago•1 comments

Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback

10•DonEsquire•59m ago•4 comments

Pakistan Needs a Plan

https://www.noahpinion.blog/p/pakistan-needs-a-plan
1•JumpCrisscross•1h ago•0 comments

The China Pakistan economic corridor facing serious difficulties

https://www.geopolitica.info/china-pakistan-economic-corridor/
2•JumpCrisscross•1h ago•1 comments

Without high-performance computing plan, the U.S. could lose innovation lead

https://www.fastcompany.com/91334523/high-performance-computing-plan-us-innovation
5•doctaj•1h ago•0 comments

Why agency and cognition are fundamentally not computational

https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1362658/full
2•nativeit•1h ago•0 comments

Is it just me or it is kind of hard to find people to build something with?

2•klondono•1h ago•3 comments

Gardening can help you live better for longer

https://www.bbc.com/future/article/20250509-how-gardening-boosts-brain-health
2•1659447091•1h ago•0 comments