When I learned Japanese with new friends, I didn't know how to ask where we were going or where the bathroom was. It was really motivating. As in-country immersion isn't option in most cases, language exchange is next best option. However, in-person language exchange can be expensive or difficult to schedule. So I built ConvoLive to make language practice more engaging for myself.
Open Beta on Android and iOS https://convolive.com
Technically interesting bits:
- Lip-synced avatars using WebGL.
- Continuous speech recognition on recent phone models so you don't have to press to speak
- Using device speech recognition and caching assets means that most external LLM calls are limited to the free-form chat mode.
- Unfortunately, on-device local LLM ended up being too demanding and slow.
- GPT 4o provided better speed and results than GPT 5.
- Multimodal quiz system with drag-and-drop, fill-in-the-blank, and multi-choice exercises
- Freeform conversation gives you suggestions to keep the chat going, but you can also ask how to say something.
What works: - People seem to love or hate the avatars.
- Testers are saying they feel more comfortable speaking.
- The avatars aren't groundbreaking but they do make it feel less like talking to a chatbot.
- Being able to speak freely without clicking seems more natural.
What doesn't: - People seem to love or hate the avatars.
- As an app that promotes speaking out loud, people might be wary of speaking as a beginner outside their home.
- The app might be better suited for people who already understand fundamentals like tenses or different alphabets which don't fit into a per conversation approach.
- Newer better quality TTS models do not provide the viseme lip sync data I need for animation.
Currently supports Spanish, Japanese, Italian, German, Portuguese, and French.Curious what other language learners here think – is this approach useful or do we not want to talk to our phones in broken Spanish? Should I keep working on this?
pickettd•31m ago