What's particularly impressive: - Speed: Near-instant responses even for complex queries - Quality: Accurate, well-sourced answers with citations - Integration: Seamlessly pulls from the knowledge graph and fresh web results
I'm wondering: - What model(s) are they running under the hood? - How are they achieving such low latency at scale? - Are they using some kind of speculative execution or caching strategy? - How does their infrastructure differ from standalone LLM APIs?
For those who've worked on similar systems or have insights into Google's approach, I'd love to hear your thoughts on what makes this possible.