Curious how you handle context window management here. I've been building something similar and found balancing latency with accuracy to be the hardest part. Are you doing any AST parsing to reduce the payload or just relying on larger context windows? It seems like the latency would be a dealbreaker for real-time use without aggressive filtering or model switching.
storystarling•17m ago