Built a real-time voice AI agent console for a YC W25 startup assessment (Freya Voice). Focus was on production-ready implementation with minimal latency.
GitHub: https://github.com/05sanjaykumar/Freya-Voice-YC25-Assessment
Key specs: - 133ms average latency (voice input → AI response → audio output) - LiveKit for WebRTC streaming - Next.js frontend + Python FastAPI backend - Multi-stage Docker deployment - Full observability and session management - In-memory storage for speed (design trade-off for assessment scope)
Tech stack: - Frontend: Next.js, TypeScript, LiveKit client SDK - Backend: FastAPI, LiveKit server SDK, OpenAI - Infrastructure: Docker multi-stage builds, production configs
Design decisions I made: - Voice-first interface (no text fallback) to match real-world use case - In-memory session storage (speed over persistence for MVP) - LiveKit over WebSocket (proven real-time infrastructure) - Concurrent audio processing to hit latency targets
Built in ~1 week for the assessment. Didn't land the role (extremely competitive) but learned a ton about real-time systems and WebRTC optimization.
Happy to discuss the latency optimization techniques or design trade-offs!