Most vector search tools I’ve seen focus on one type of input — like text or images.
But in practice, a lot of real-world data lives in tables with a mix of text, numbers, and categorical fields.
I ran into this while trying to find similar stocks, tools, and products across multiple fields (e.g., description, sector, p/e ratio, etc.).
Text-only search gave poor results. Naive feature concat didn’t work either.
So I built this small Python package:
It handles mixed-column similarity search using block-wise embeddings + cosine similarity.
No training required. Just plug in your tabular data and run.
Some use cases it supports:
Similar stocks → description + sector + p/e + market cap
Similar movies → plot + crew + year + ratings
Similar tools → task + specs + geometry
Would love feedback or thoughts if you’ve struggled with something similar.
hari_data•4h ago
I ran into this while trying to find similar stocks, tools, and products across multiple fields (e.g., description, sector, p/e ratio, etc.). Text-only search gave poor results. Naive feature concat didn’t work either.
So I built this small Python package: It handles mixed-column similarity search using block-wise embeddings + cosine similarity. No training required. Just plug in your tabular data and run.
Some use cases it supports:
Similar stocks → description + sector + p/e + market cap
Similar movies → plot + crew + year + ratings
Similar tools → task + specs + geometry
Would love feedback or thoughts if you’ve struggled with something similar.
repo: https://pypi.org/project/hybrid-vectorizer/