What is especially beneficial about that approach is that you can hang each of the embeddings off of the same bits in the db and tune how their scores are blended at query time.
If you haven't tried it yet: because what you're searching is presumably standardized enough to the point that there will be sprawling glossaries of acronyms, taking those and processing them into custom word lists will boost scores. If you go a little further and build lil graphs/maps of them all, doubly so, and it will give you 'free' autocomplete and the ability to specify which specific acronym(s) you meant or don't want on the query side.
Have recently been playing around with these for some code+prose+extracted prose+records semantic searching stuff, its a fun rabbit hole
Thanks for the article though.
It's built on Postgres, which I know you said you left behind, but one of the cool features it supports is hybrid search over multiple vector representations of a passage, so you can do a dense (e.g. nomic) and sparse (e.g. splade) search. Reranking is also built in, although it lacks automatic caching (since, in general, the corpus changes over time)
It also deploys to fly.io/railway and costs a few bucks a month to run if you're willing to use cloud-hosted embedding models (otherwise, you can run TEI/vLLM on CPU or GPU for the setup you described).
I hope it's helpful to someone.
We support both commercial APIs and self-hosted options:
- Cohere (rerank-english-v3.0, etc.)
- Voyage AI (rerank-2.5)
- Jina AI (jina-reranker-v3)
Self-hosted (no API key needed): - TEI - https://github.com/huggingface/text-embeddings-inference
- vLLM - https://docs.vllm.ai/en/v0.8.1/serving/openai_compatible_server.html#rerank-api
You register a reranker once with the CLI: # Cohere
goodmem reranker create \
--display-name "Cohere" \
--provider-type COHERE \
--endpoint-url "https://api.cohere.com" \
--model-identifier "rerank-english-v3.0" \
--cred-api-key "YOUR_API_KEY"
# Self-hosted TEI (e.g., BAAI/bge-reranker-v2-m3)
goodmem reranker create \
--display-name "TEI Local" \
--provider-type TEI \
--endpoint-url "http://localhost:8081" \
--model-identifier "BAAI/bge-reranker-v2-m3"
Then you can experiment interactively through the TUI. goodmem memory retrieve \
--space-id <your-space> \
--post-processor-interactive \
"your query"
For your setup, I think TEI is probably the path of least resistance, it has first-class reranker support and runs well on CPU.For text search, I'm using lnx which is based off of Tantivy.
I disabled the vector search feature for now but I will re-enable it after some optimization. The site is at https://stray.video
gomoboo•1w ago
I’m leaning on OpenAI for my embedding needs but will be trying llama-server in the future. I stuck with Postgres because it was easy to run it on my Dokku installation. Great to know sqlite is an option there too. My corpus is too small for Postgres to elect to use an index so it’s running the full table scans that sqlite would. For seeding I use a msgpack file and ship that with the code when deploying.
This is my site: https://customelon.com (niche need of tariff and excise information for shipping to The Bahamas)
It’s built with ASP.NET, Postgres/pgvector, and OpenAI embedding/LLMs. Ingestion is via Textract with a lot of chunking helpers to preserve context layered on top.
Again, great article.
cckolon•5d ago
Cool site :)