We've trained a small <1MB voice classifier model that runs on CPU in 4ms. Can be run next to silero VAD in voice AI deployments.
What we noticed in production deployments of voice assistants in Contact Centers in EU is that human consultants pick up immediately how to inflect verbs and ajdectives after one utterance from the caller. But voice AI agents don't know it until 1-2 minutes into the call when they are either corrected or the caller uses explicitly words with male/female form a couple of times.
Our model solves this just from the first utterance of caller speech. The voice AI pipeline can inject the classification as context to the system prompt. We observed a significant impact of this on the adoption of voice AI in practice.
Model + paper: https://huggingface.co/syntropicsignal-ai/gender-voice-class...
properbrew•34m ago