Every provider has its own UI (or none), debugging embeddings is guesswork, and migrating data between systems is painful. I wanted a single tool where I could browse, search, visualize, and compare vector data across providers.
So I built Vector Inspector.
It currently supports:
- Chroma
- Qdrant (local + server)
- Postgres/pgvector
- Pinecone (partial support)
You can browse collections, inspect metadata, run searches, compare distances, visualize embeddings, and debug cases where a vector “should” match but doesn’t. The goal is to make it feel like a forensic tool for vector data — something that helps you understand what your embeddings are actually doing.
There’s an OSS tier (Vector Inspector) and a more advanced version (Vector Studio) with upcoming features like clustering overlays, model-to-model comparison, and provenance coloring.
One of the biggest problems I kept hitting was missing provenance. You load a collection and you have no idea:
- what model produced these vectors
- whether they were normalized
- whether some vectors came from a different model entirely
- whether the source text was cleaned or chunked differently
Without that context, debugging is almost impossible. Vector Inspector tries to make provenance a first-class concept: if the metadata exists, it shows it; if it’s missing, it makes that visible too, so you can actually debug your embeddings instead of guessing.
I’d love feedback from the HN crowd — especially around:
- workflows you’d want for multi-provider setups
- what’s missing for real debugging
- how you’d expect migrations to work
- any pain points you’ve hit with embeddings or vector DBs
- how you would like to work with creation workflows
Repo: https://github.com/anthonypdawson/vector-inspector
Landing page: https://vector-inspector.divinedevops.com
spitefowl•1h ago
- Provider support is solid for Chroma, Qdrant, and Postgres/pgvector. Pinecone works for most read workflows but isn’t full parity yet.
- The tool is designed to be “forensic first”: surfacing metadata, provenance, and mismatches rather than hiding them behind abstractions.
- Visualization is intentionally minimal right now; clustering overlays and model-to-model comparison are in progress.
- I’m especially interested in how people think about creation workflows (re-embedding, mixed-model collections, reproducibility, etc.) since teams handle this very differently.
Just to set expectations: it’s basically been me running it so far. PyPI has been getting a lot of traffic, but real-world usage is still very small. I’m really curious how it behaves with other people’s data and workflows — that feedback is incredibly helpful at this stage.
If you hit anything confusing, missing, or surprising, I’d love to hear it. Real-world debugging stories are gold for shaping the next set of features.