OP here. I spent the last few months building a local semantic search engine for my personal archive (1997-2025).
The Problem: I have 28 years of digital history (journals, emails, notes) that I wanted to query to find patterns, but I didn't want to upload sensitive data to a cloud vector store.
The Stack:
Ingestion: Python scripts to parse chaotic formats (mbox, docx, json).
Embeddings: Local FAISS index.
Inference: Running Qwen 2.5 (32b) via Ollama on MBP. App on my NAS via Tailscale.
UI: Simple React frontend.
Full Architecture Notes: I wrote up the detailed breakdown (chunking strategy, PII redaction pipeline) here: https://botwork.com/trace
botwork•1h ago
The Problem: I have 28 years of digital history (journals, emails, notes) that I wanted to query to find patterns, but I didn't want to upload sensitive data to a cloud vector store.
The Stack:
Ingestion: Python scripts to parse chaotic formats (mbox, docx, json).
Embeddings: Local FAISS index.
Inference: Running Qwen 2.5 (32b) via Ollama on MBP. App on my NAS via Tailscale.
UI: Simple React frontend.
Full Architecture Notes: I wrote up the detailed breakdown (chunking strategy, PII redaction pipeline) here: https://botwork.com/trace