We've been working on a way to address the obvious problems with existing benchmarks, by creating a single comprehensive benchmark that measures things that technical people care about, while also getting as close to an objective, "core intelligence" measurement as possible.
Some demo games are shown on /spectate that gives you an idea of how we test models and why this would be difficult to benchmax. I think our benchmark is by far the best relative measurement of artificial intelligence out there. Feedback is welcome and usually acted upon quickly.
gertlabs•1h ago
Some demo games are shown on /spectate that gives you an idea of how we test models and why this would be difficult to benchmax. I think our benchmark is by far the best relative measurement of artificial intelligence out there. Feedback is welcome and usually acted upon quickly.