Here's a sneak peek from the interview: https://x.com/ptservlor/status/2024597444890128767
User speaks, Deepgram transcribes it, OpenClaw Gateway routes it to your agent, ElevenLabs turns the response into speech, and LemonSlice generates a lip synced avatar from the audio. Everything streams over LiveKit in real time.
Latency is about 1 to 2 seconds end to end depending on the LLM. The lip sync from LemonSlice honestly surprised us, it works way better than we expected.
The skill repo has a complete Python example, env setup, troubleshooting guide, and a Next.js frontend guide if you want to build your own web UI for it.