A few issues I've seen: - Accuracy degradation as the conversation length increases - Latency stacking (STT + potentially multiple LLM sessions) making interactions feel pretty sluggish
I'd love to hear from anyone else who has worked on real-time voice agents, call center bots, or even just personal experiments:
- What pain points have you hit in building these pipelines? - Did you find any workarounds or tools that helped? (Chunking, smarter retrieval, smaller NLU models, streaming tricks, etc) - Anything that you wish existed that would have saved you time?
Happy to get technical in the discussion. I'm particularly interested in how people are approaching LLM accuracy and context at scale.