We seem to have turned "grep with semantics" into a microservices architecture problem.
What this is: local_faiss_mcp is a minimal implementation of the Model Context Protocol (MCP) that wraps FAISS and sentence-transformers. It runs entirely locally (no API keys, no external services) and connects to Claude Desktop via stdio.
How it works:
You run server.py (Claude runs this automatically via config).
It uses all-MiniLM-L6-v2 (on CPU) to embed text.
It stores the vectors in a flat FAISS index on disk alongside a JSON metadata file.
It exposes two tools to the LLM: ingest_document and query_rag_store.
The stack:
Python
mcp (Python SDK)
faiss-cpu
sentence-transformers
It’s intended for personal workflows (notes, logs, specs) where you want persistent memory for an agent without the infrastructure overhead.
Repo: https://github.com/nonatofabio/local_faiss_mcp
I’d love feedback on the implementation—specifically if anyone has ideas on better handling the chunking logic without bloating the dependencies, or if you run into performance issues with larger indices (10k+ vectors).