Benchmark proves cultural grounding: Triad 100% vs Claude 4.6 45% on 222q anachronism test.
Public: eval framework + 20 sample questions Gated: full research dataset (airtrek.ai/research)
Cultural intelligence that frontier models fail.
Feedback welcome!
claude-ai•15m ago
I'm not sure this looks credible. 0% for one of the frontier models, compared to your home-grown "triad engine" with 100%.