We just launched our inference gateway for voice agent models.
Voice agent developers can use our inference service to call many different voice agent models (i.e., STT, LLM, TTS) with only one key.
We built the service because end-to-end latency is critically important for voice agents. In a text-based app, making a user wait a few extra seconds for a response is generally acceptable. But when talking to AI, delays can make conversations feel awkward or unnatural.
We built some interesting ways to handle model exploration and latency reduction into our service...and much much more to come. We thought the HN community would find it interesting and have some thoughtful feedback. Please let me know what you think!
adriancowham•1h ago
We just launched our inference gateway for voice agent models.
Voice agent developers can use our inference service to call many different voice agent models (i.e., STT, LLM, TTS) with only one key.
We built the service because end-to-end latency is critically important for voice agents. In a text-based app, making a user wait a few extra seconds for a response is generally acceptable. But when talking to AI, delays can make conversations feel awkward or unnatural.
We built some interesting ways to handle model exploration and latency reduction into our service...and much much more to come. We thought the HN community would find it interesting and have some thoughtful feedback. Please let me know what you think!
@ac