Built this using NexaAI's local VLM (Qwen3-VL) + semantic embeddings. How it works:
1. VLM generates natural language descriptions of each image (one-time processing) 2. Descriptions converted to 384D embeddings 3. Search by meaning using cosine similarity (<100ms for 1000 images)
Everything runs locally. Your photos never leave your device. Zero API costs.
Technical reality: CPU processing is slow (~20-30s per image initially), but searching is instant afterward. JSON database works fine for personal collections; you'd want FAISS for 10k+ images.
It's a prototype, but it works and solves a real problem I had. Open to feedback.
Built for NexaAI's Builder Bounty Program (on-device AI, privacy-first).
Demo video: https://youtu.be/YVkPa-aJpEo Medium writeup: https://medium.com/@pankajgoyal4152/building-smart-photo-fin...