frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Anitag2vec – learning vector embeddings from unordered tag sets

https://github.com/michael-0acf4/anitag2vec
1•michael-0acf4•1h ago

Comments

michael-0acf4•1h ago
This is similar in spirit to Deep Sets (learning embeddings over unordered collections), but instead of a permutation-invariant MLP it uses a small Transformer encoder without positional encoding. In practice this helps capture things like spelling variation and co-occurrence structure between the tags since they are usually way more well-behaved than generic sets.

The approach itself is very general; however, the available models were trained on imageboard and anime-focused tags. I did some experiments with the clearly biased models on completely nonsensical tags, and they still performed fairly well (suggesting some latent understanding of permutation invariance, cosine scores decrease but not significantly when the one term is a subset of the other).

More links: - https://huggingface.co/michael-0acf4/anitag2vec - https://blog.afmichael.dev/posts/2026/set-embeddings-and-ani...