I built gemini-live-react, a React hook that fixes the audio DX and adds features I needed to build real AI agents:
Session recording – record transcripts, audio metadata, tool calls, browser actions, and DOM snapshots into a single JSON for debugging/replay
Workflow builder – define multi-step browser automations as a simple state machine (branching + error handling)
Smart element detection – auto-detect clickable elements so agents don’t rely on brittle selectors
Used for voice-driven web agents where the loop is: AI sees UI → decides → clicks/types → repeat
Tech: React hook (~2k LOC), AudioWorklet, WS proxy (Deno/Supabase), TypeScript
GitHub: https://github.com/loffloff/gemini-live-react npm: npm install gemini-live-react
Looking for feedback on the workflow abstraction — state machines felt right, but curious what others use.