But v1 re-searched raw chunks from scratch every query. So I rebuilt it.
v2 (mcptube-vision) follows Karpathy's LLM Wiki pattern. At ingest time, it extracts transcripts, detects scene changes with ffmpeg, describes key frames via a vision model, and writes structured wiki pages. Knowledge compounds across videos rather than being re-discovered. FTS5 + a two-stage agent (narrow then reason) for retrieval.
MCPTube works both as CLI (BYOK) and MCP server. I tested MCPTube with Claude Code, Claude Desktop, VS Code Copilot, Cursor, and others. Zero API key needed server-side.
Coming soon: I am also building SaaS platform. This platform supports playlist ingestion, team wikis, etc. I like to share early access signup: https://0xchamin.github.io/mcptube/
Happy to discuss architecture tradeoffs — FTS5 vs vectors, file-based wiki vs DB, scene-change vs fixed-interval sampling. Give it a try via `pip install mcptube`. Also, please do star the repo if you enjoy my contribution (https://github.com/0xchamin/mcptube)