Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to resolve this "paradox", then use our insights to design methods for predicting inference-scaling capabilities that can be more sample efficient!
RSchaeffer•18h ago