Problems I’m running into:
Metadata pickle file loads entirely into RAM
No incremental indexing — have to rebuild the FAISS index from scratch
Query performance degrades with concurrent use
Want to scale to 1M+ chunks but not sure FAISS + pickle is the right long-term architecture
My questions for those who’ve scaled local or offline RAG systems:
How do you store metadata efficiently at this scale?
Is there a practical pattern for incremental FAISS updates?
Would a vector DB (Qdrant, Weaviate, Milvus) be a better fit for offline use?
Any lessons learned from running large FAISS indexes on consumer hardware?
Not looking for product feedback — just architectural guidance from people who’ve built similar systems.
andre-z•2mo ago
paul2495•2mo ago
One thing I’m exploring now is Qdrant in embedded mode, since the tool has to run in fully air-gapped environments (no internet, no external services, distributed on a portable SSD). The embedded version runs as a simple file-based directory, similar to SQLite:
from qdrant_client import QdrantClient client = QdrantClient(path="./qdrant_data") # local-only, no server If that model works reliably, it would solve several problems FAISS creates for my use case:
incremental updates instead of full index rebuilds
storing metadata as payloads instead of a 1.5GB pickle
much easier filtering (e.g., per-source, per-customer, per-tool)
better concurrency under load
I’m still benchmarking, but curious about your experience: Have you used Qdrant’s embedded mode in a production/offline scenario? And if so, how does it behave with larger collections (500k–1M vectors) on consumer hardware?
Not dismissing FAISS — just trying to pick the right long-term architecture for an offline tool that gets updated via USB and needs to stay lightweight for the end user.