Multi-turn conversation handling, does it manage state well or do you thread history manually? Long-running tasks (minutes/hours), any gotchas with timeouts or checkpointing? The latency overhead people mention (~12s per query per one github issue). is this still an issue or has it improved? General production rough edges we should know about?
For context: most of our context is pre-computed, occasional JIT tool calls. Comparing against Pydantic AI and LangGraph but trying to avoid over-engineering. Appreciate any war stories.