Word2vec-style vector arithmetic on docs embeddings

https://technicalwriting.dev/embeddings/arithmetic/index.html

81•kaycebasques•3mo ago

Comments

aDyslecticCrow•3mo ago

There have been quite a few writing tools that are effectively just GPT wrappers with pre-defined prompts. "rephrase this more formally". Personally I find them to modify too much or are difficult to use effectively. Asking a for a few different rephrasings and then merging it myself ends up being my workflow.

But ever since learning about word2vec, I've been thinking that there must be a better way. "Push" a section in with the formal vector a bit. Add a pinch of "brief", dial up the "humour" vector. I think it could create a very controllable and efficient writing tool.

acmiyaguchi•3mo ago

This does exist to some degree, as far as I understand, along the lines of style-transfer and ControlNet in visual domains. Anthropic has some research called "persona vectors" which effectively push generative behaviors toward or away from particular traits.

[0] https://www.anthropic.com/research/persona-vectors [1] https://arxiv.org/abs/2507.21509

aDyslecticCrow•3mo ago

That's a fascinating paper you linked. A step further than the OP article.

Not quite a usable commercial writing tool like i want, but it shows that extracting and applying a vector of a concept to the embedding is very useful.

Its also a potentially a very effective AI alignment tool like anthropic mentioned. Steering or restricting the model embedding loop instead of convincing it with a convoluted system prompt.

nostrebored•3mo ago

> How do we actually use this in technical writing workflows or documentation experiences? I’m not sure. I was just curious to learn whether or not it would work.

There are a few easy applications.

* When surfacing relevant documents, you can keep a list of the previous documents visited and boost in the "direction" that the customer is headed (could be an average of the previous N docs or weight towards frequency). But then you're just building a worse recsys for something where latency probably isn't that critical.

* If you know for every feature you release, you need an API doc, an FAQ, usage samples for different workflows or verticals you're targetting, you can represent each of these as f(doc) + f(topic) and find the existing doc set. But then, you can have much more deterministic workflows from just applying structure.

It's nice that you have a super flexible tool in the toolbox, but I think a lot of text based embedding applications (especially on out of domain data like long, unchunked technical docs) are just better off being something else if you have the time.

kaycebasques•3mo ago

> If you know for every feature you release, you need an API doc, an FAQ, usage samples for different workflows or verticals you're targetting, you can represent each of these as f(doc) + f(topic) and find the existing doc set. But then, you can have much more deterministic workflows from just applying structure.

This one sounds promising to me, thanks for the suggestion. We technical writers often build out "docs completeness" spreadsheets where we track how completely each product feature is covered, exactly as you described. E.g. the rows are features, column B is "Reference", column C is "Tutorial" etc. So cell B1 would contain the deeplink to the reference for some particular feature. When we inherit a huge, messy docs set (which is fairly common) it can take a very long time to build out a docs completeness dashboard. I think the embeddings workflow you're suggesting could speed up the initial population of these dashboards a lot.

seg_lol•3mo ago

You can probably do this in a day with a CLI based LLM like Claude Code. It can write the tools that would allow you to sort, test and cross check your doc sets.

jdthedisciple•3mo ago

Intriguing! This inspired me to run the example "calculation" ("king" - "man" + "woman") against several well-known embedding models and order them by L2 distance between the actual output and the embedding for "queen". Result:

    voyage-3-large:             0.54
    voyage-code-3:              0.62
    qwen3-embedding:4b:         0.71
    embeddinggemma:             0.84
    voyage-3.5-lite:            0.94
    text-embedding-3-small:     0.97
    voyage-3.5:                 1.01
    text-embedding-3-large:     1.13

Shocked by the apparently bad performance of OpenAI's SOTA model. Also always had a gut feeling that `voyage-3-large` secretly may be the best embedding model out there. Have I been vindicated? Make of it what you will ...

Also `qwen3-embedding:4b` is my current favorite for local RAG for good reason...

smallerize•3mo ago

That never exactly worked for word2vec either. https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...

kaycebasques•3mo ago

From the linked article:

> The widely known example only works because the implementation of the algorithm will exclude the original vector from the possible results!

I saw this issue in the "same topic, different domain" experiment when using EmbeddingGemma with the default task types. But when using custom task types, the vector arithmetic worked as expected. I didn't have to remove the original vector from the results or control for that in any way. So while the criticism is valid for word2vec I'm skeptical that modern embedding models still have this issue.

Very curious to learn whether modern models are still better at some analogies (e.g. male/female) and worse at others, though. Is there any more recent research on that topic? The linked article is from 2019.

gojomo•3mo ago

Not sure you can judge whether these modern models do well on the 'arithmetic analogization' task based on absolute similarity values – & especially L2 distances.

That it ever worked was simply that, among the universe of candidate answers, the right answer was closer to the arithmetic-result-point than other candidates – not necessarily close on any absolute scale. Especially in higher dimensions, everything gets very angularly far from everything else - the "curse of dimensionality".

But the relative differences may still be just as useful/effective. So the real evaluation of effectiveness can't be done with the raw value diff(king-man+woman, queen) alone. It needs to check if that value is less than that for every other alternative to 'queen'.

(Also: canonically these exercises were done as cosine-similarities, not Euclidean/L2 distance. Rank orders will be roughly the same if all vectors normalized to the unit sphere before arithmetic & comparisons, but if you didn't do that, it would also make these raw 'distance' values less meaningful for evaluating this particular effect. The L2 distance could be arbitrarily high for two vectors with 0.0 cosine-difference!)

jdthedisciple•3mo ago

> It needs to check if that value is less than that for every other alternative to 'queen'.

There you go: Closest 3 words (by L2) to the output vector for the following models, out of the most common 2265 spoken English words among which is also "queen":

    voyage-3-large:             king (0.46), woman (0.47), young (0.52), ... queen (0.56)
    ollama-qwen3-embedding:4b:  king (0.68), queen (0.71), woman (0.81)
    text-embedding-3-large:     king (0.93), woman (1.08), queen (1.13)

All embeddings are normalized to unit length, therefore L2 dists are normalized.

gojomo•3mo ago

Thanks!

So of those 3, despite the superficially "large" distances, 2 of the 3 are just as good at this particular analogy as Google's 2013 word2vec vectors, in that 'queen' is the closest word to the target, when query-words ('king', 'woman', 'man') are disqualified by rule.

But also: to really mimic the original vector-math and comparison using L2 distances, I believe you might need to leave the word-vectors unnormalized before the 'king'-'man'+'woman' calculation – to reflect that the word-vectors' varied unnormalized magnitudes may have relevant translational impact – but then ensure the comparison of the target-vector to all candidates is between unit-vectors (so that L2 distances match the rank ordering of cosine-distances). Or, just copy the original `word2vec.c` code's cosine-similarity-based calculations exactly.

Another wrinkle worth considering, for those who really care about this particular analogical-arithmetic exercise, is that some papers proposed simple changes that could make word2vec-era (shallow neural network) vectors better for that task, and the same tricks might give a lift to larger-model single-word vectors as well.

For example:

- Levy & Goldberg's "Linguistic Regularities in Sparse and Explicit Word Representations" (2014), suggesting a different vector-combination ("3CosMul")

- Mu, Bhat & Viswanath's "All-but-the-Top: Simple and Effective Postprocessing for Word Representations" (2017), suggesting recentering the space & removing some dominant components

jdthedisciple•3mo ago

Interesting papers, thanks.

> you might need to leave the word-vectors unnormalized before the 'king'-'man'+'woman' calculation – to reflect that the word-vectors' varied unnormalized magnitudes may have relevant translational impact

I believe translation should be scale-invariant, and scale should not affect rank ordering

gojomo•3mo ago

> I believe translation should be scale-invariant, and scale should not affect rank ordering

I don't believe this is true with regard to ending angles after addition steps between vectors of varying magnitudes.

Imagine just in 2D: vector A at 90° & magnitude 1.0, vector B at 0° & magnitude 0.5, and vector B' at 0° but normalized to magnitude 1.0.

The vectors (A+B) and (A+B') will be at both different magnitudes and different directions.

Thus, cossim(A,(A+B')) will be notably less than cossim(A,(A+B)), and more generally, if imagining the whole unit circles as filled with candidate nearest-neighbors, (A+B) and (A+B') may have notably different ranked lists of cosine-similarity nearest-neighbors.

jdthedisciple•3mo ago

You are totally right of course!

It had slipped my (tired) mind that vector magnitudes are actually discarded in embedding model training.

thornton•3mo ago

We’ve done similar work. Use case was identifying pages in an old website that now 404 and where they should be redirected to.

Basically doc2vec and cosine similarity. Totally nonsensical matching outputs to the point matching on title tag vectors or precis was better so now I’m curious if we just did something wrong…

gojomo•3mo ago

If by 'doc2vec' you mean the word2vec-like 'Paragraph Vectors' technique: even though that's a far simpler approach than the transformer embeddings, it usually works pretty well for coarse document similarity. Even the famous word2vec vector-addition operations kinda worked, as illustrated by some examples in the followup 'Paragraph Vector' paper in 2015: https://arxiv.org/abs/1507.07998

So if for you the resulting doc-to-doc similarities seemed nonsensical, there was likely some process error in model training or application.

Claude Code Controller

Software design is now cheap

Show HN: Are You Random? – A game that predicts your "random" choices

Poland to probe possible links between Epstein and Russia

Effectiveness of AI detection tools in identifying AI-generated articles

Warsaw Circle

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

The AI4Agile Practitioners Report 2026

Digital Independence Day

What a bot hacking attempt looks like: SQL injections galore

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

Show HN: AgentLens – Open-source observability and audit trail for AI agents

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

Explanation of British Class System

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Claude Code Controller

Software design is now cheap

Show HN: Are You Random? – A game that predicts your "random" choices

Poland to probe possible links between Epstein and Russia

Effectiveness of AI detection tools in identifying AI-generated articles

Warsaw Circle

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

The AI4Agile Practitioners Report 2026

Digital Independence Day

What a bot hacking attempt looks like: SQL injections galore

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

Show HN: AgentLens – Open-source observability and audit trail for AI agents

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

Explanation of British Class System

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Word2vec-style vector arithmetic on docs embeddings

Comments