That said, I’m unclear how much this helps in practice; we don’t usually parse through say 32 responses from our 2B parameter models. I guess if you instrumented parallel reasoning processes in batch this might be helpful. Perhaps that’s what o1-pro is doing in the background, actually.
Anyway, this one seems to me like it might make its way onto the “good idea” list when rl is available in the training pipeline.
justanotheratom•16h ago
diwank•16h ago
codelion•15h ago
Cerebras has used optillm for optimising inference with techniques like CePO and LongCePO.
peepeepoopoo114•15h ago