Turns out Qwen3-VL-Embedding can natively embed video into the same kind of vector space, no API, fully offline. Runs on Apple Silicon (MPS) and NVIDIA GPUs (CUDA). The 8B model needs ~18GB RAM, or use the 2B model on smaller machines.
sentrysearch index /path --backend local
Also added: similarity threshold to suppress weak matches, and a Tesla metadata overlay that renders speed/location onto matched clips.
Details on the README.