In this episode, I sat down with Shawkat Kabbara, Founder & CEO of Papr AI, to discuss why traditional RAG systems fundamentally break at scale.
Key insights:
• Why retrieval gets WORSE with more data (backed by Google research)
• The 400-500ms latency requirement that kills personalized voice AI
• How multi-agent systems create hallucination cascades
• Brain-inspired predictive memory architecture as the solution
• Why you need to predict context, not just retrieve it
We dive into retrieval loss metrics, the working memory approach inspired by the prefrontal cortex, and why Paper built prediction layers instead of another search-based system.
Perfect for developers building production AI systems who are hitting the limits of RAG.
shawkatkabbara•5h ago
Key insights: • Why retrieval gets WORSE with more data (backed by Google research) • The 400-500ms latency requirement that kills personalized voice AI • How multi-agent systems create hallucination cascades • Brain-inspired predictive memory architecture as the solution • Why you need to predict context, not just retrieve it
We dive into retrieval loss metrics, the working memory approach inspired by the prefrontal cortex, and why Paper built prediction layers instead of another search-based system. Perfect for developers building production AI systems who are hitting the limits of RAG.