So everyone ends up “engineering context” — manually deciding what to stuff into prompts using RAG pipelines, agentic search, or trees of thought. These tricks work for small demos, but not at scale. That’s why MIT found that 95% of AI pilots fail, and why you keep seeing threads about vector search breaking down.
We built a different approach: a retrieval model that predicts the right context for every turn in a conversation. On Stanford’s STaRK benchmark it ranks #1. It’s also fast enough for voice chat, where even 100ms of lag kills the experience.
We also introduced a new metric: retrieval loss. Like language model loss, but for retrieval. Traditional systems get worse as your dataset grows. With Papr, retrieval loss drops as your dataset grows — meaning more knowledge makes your system smarter, not dumber.
Our memory APIs are available to try out with a generous free tier. We’d love feedback, questions, and brutal critique. Full details here - https://substack.com/home/post/p-172573217