What it contains: - 510 real insurance scenarios - 10 categories across 9 insurance lines - Train/val/test splits (357/76/77) - 4 routing decisions per scenario: AI handles, AI with verification, human handoff, hybrid collaboration - 3 evaluation metrics: intent accuracy, routing accuracy, action completeness
Why it matters: Insurance is precision work. A wrong routing decision costs money and trust. Most AI benchmarks miss this. They don't test what matters in production.
This data came from a real voice AI system. Years of customer calls. Actual insurance decisions. The scenarios are messy. They're real.
Open source: Apache 2.0 license. Ready to use.
Implementation: https://github.com/pavelsukhachev/hybrid-orchestrator Paper: TechRxiv (IEEE) - "The Hybrid Orchestrator: A Framework for Coordinating Human-AI Teams"