I built a macOS desktop app that lets you chat with your documents completely locally.
No cloud, no API keys, no data leaving your machine. Everything runs offline on Apple Silicon using GGUF models and llama.cpp.
What it does: - Upload PDFs, text files, and images - OCR for images and scanned PDFs - Local embeddings + retrieval (RAG) - Chat with documents using a local LLM - Models are downloaded on first run and stored locally
Tech stack: - Electron (frontend) - Python backend bundled as a native binary - llama.cpp + GGUF (currently Gemma / Mistral class models) - SentenceTransformers for embeddings - FAISS for vector search - Runs entirely on-device (CPU / Metal)
Why I built this: I wanted a privacy-first alternative to cloud document chat tools. Packaging a full local LLM + OCR + RAG pipeline into a single macOS app turned out to be much harder than expected (Gatekeeper, PyInstaller, dylibs, model size, etc.).
Download: GitHub release (macOS Apple Silicon): https://github.com/navid72m/chatbot/releases/tag/v.0.1.2
Note on macOS security: Because the app is not signed yet, macOS may block it on first launch. You can run: xattr -rd com.apple.quarantine "/Applications/Document Chat.app"
I’d really appreciate feedback on: - UX for document chat - Model choices / performance - How others approach local RAG on desktop
Happy to answer technical questions.