Gemini Live is capable - full-duplex audio (interrupt the AI mid-sentence), screen sharing so the AI sees what you're looking at, tool calling, built-in VAD. But using it from a browser is painful:
- Browser audio is 48kHz, Gemini wants 16kHz in and sends 24kHz out - PCM16 endianness conversions - Buffer management to avoid clicks and gaps - Keeping your API key out of client code
So I wrapped it into a single hook:
const { connect, transcripts, isConnected } = useGeminiLive({ proxyUrl: 'wss://your-project.supabase.co/functions/v1/gemini-live-proxy' });
Includes a Supabase Edge Function proxy, screen sharing, auto-reconnection, real-time transcription, full TypeScript.
GitHub: https://github.com/loffloff/gemini-live-react
Everyone builds voice AI with OpenAI's Realtime API, but Gemini Live is cheaper and screen sharing is underrated. Am I missing something?