That said, I’m unclear how much this helps in practice; we don’t usually parse through say 32 responses from our 2B parameter models. I guess if you instrumented parallel reasoning processes in batch this might be helpful. Perhaps that’s what o1-pro is doing in the background, actually.
Anyway, this one seems to me like it might make its way onto the “good idea” list when rl is available in the training pipeline.
justanotheratom•10h ago
diwank•10h ago
codelion•9h ago
Cerebras has used optillm for optimising inference with techniques like CePO and LongCePO.
peepeepoopoo114•9h ago