So when you're building a real-time voice system, there are a lot of hard parts that need to be handled, like voice resampling, jitter buffers, and frame pacing. This blog shows you the mental model of what happens behind the scenes, how voice-based media systems are built, and how they are handled in the background.
https://gokuljs.com/blogs/when-latency-becomes-audible