Ask HN: Scaling local FAISS and LLM RAG system (356k chunks)architectural advice

1•paul2495•2mo ago

I’ve been building a local-only AI assistant for security analysis that uses a FAISS vector index and a local model for reasoning over parsed tool output. The current system works well, but I’m running into scaling issues as the dataset grows. Current setup: ~356k chunks FAISS (Flat index) 384-d MiniLM embeddings llama-cpp-python for inference Metadata stored in a single pickle file (~1.5GB) Tool outputs (Nmap/YARA/Volatility/etc.) parsed into structured JSON before querying

Problems I’m running into:

Metadata pickle file loads entirely into RAM

No incremental indexing — have to rebuild the FAISS index from scratch

Query performance degrades with concurrent use

Want to scale to 1M+ chunks but not sure FAISS + pickle is the right long-term architecture

My questions for those who’ve scaled local or offline RAG systems:

How do you store metadata efficiently at this scale?

Is there a practical pattern for incremental FAISS updates?

Would a vector DB (Qdrant, Weaviate, Milvus) be a better fit for offline use?

Any lessons learned from running large FAISS indexes on consumer hardware?

Not looking for product feedback — just architectural guidance from people who’ve built similar systems.

Comments

andre-z•2mo ago

FAISS is not suitable for production. The dedicated vector search solutions solve all the issues you mentioned: you just store the metadata along with vectors in JSON format. At least, with Qdrant, it works like this: https://qdrant.tech/documentation/concepts/payload/

paul2495•2mo ago

Thanksthat makes sense and it never even crossed my mind . FAISS has been great for prototyping but I'm definitely hitting the limits around metadata, updates, and operational overhead.

One thing I’m exploring now is Qdrant in embedded mode, since the tool has to run in fully air-gapped environments (no internet, no external services, distributed on a portable SSD). The embedded version runs as a simple file-based directory, similar to SQLite:

from qdrant_client import QdrantClient client = QdrantClient(path="./qdrant_data") # local-only, no server If that model works reliably, it would solve several problems FAISS creates for my use case:

incremental updates instead of full index rebuilds

storing metadata as payloads instead of a 1.5GB pickle

much easier filtering (e.g., per-source, per-customer, per-tool)

better concurrency under load

I’m still benchmarking, but curious about your experience: Have you used Qdrant’s embedded mode in a production/offline scenario? And if so, how does it behave with larger collections (500k–1M vectors) on consumer hardware?

Not dismissing FAISS — just trying to pick the right long-term architecture for an offline tool that gets updated via USB and needs to stay lightweight for the end user.

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

Skim – vibe review your PRs

Show HN: Open-source AI assistant for interview reasoning

Tech Edge: A Living Playbook for America's Technology Long Game

Golden Cross vs. Death Cross: Crypto Trading Guide

Hoot: Scheme on WebAssembly

What the longevity experts don't tell you

Monzo wrongly denied refunds to fraud and scam victims

They were drawn to Korea with dreams of K-pop stardom – but then let down

Show HN: AI-Powered Merchant Intelligence

Bash parallel tasks and error handling

Let's compile Quake like it's 1997

Reverse Engineering Medium.com's Editor: How Copy, Paste, and Images Work

Go 1.22, SQLite, and Next.js: The "Boring" Back End

Laibach the Whistleblowers [video]

Slop News - HN front page right now as AI slop

Economists vs. Technologists on AI

Life at the Edge

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?