Built a demo using Gemini Live and Ultralytic's YOLO models running on Stream's Video API for real-time feedback. In this example, I'm having the LLM provide feedback to the player as they try to improve their form.
On the backend, it uses Stream's Python SDK to capture the WebRTC frames from the player, send them to YOLO to detect their arms and body, and then feed them to the Gemini Live API. Once we have a response from Gemini, the audio output is encoded and sent directly to the call, where the user can hear and respond.
Nash0x7e2•2h ago
On the backend, it uses Stream's Python SDK to capture the WebRTC frames from the player, send them to YOLO to detect their arms and body, and then feed them to the Gemini Live API. Once we have a response from Gemini, the audio output is encoded and sent directly to the call, where the user can hear and respond.