Demo: https://www.youtube.com/watch?v=NuOVlyr-e1w
Repo: https://github.com/RealComputer/GlassKit
I initially tried a multimodal LLM for scene understanding, but the latency and consistency were not good enough for this use case, so I switched to a small object detection model (fine-tuned RF-DETR). It just runs an inference loop on the camera feed. This also makes on-device/offline use feasible (today it still runs via a local server).
tash_2s•1h ago