Between moE, aggressive quantization, and synthetic data pipelines, it’s getting harder to tell whether bigger models are actually better, or just more expensive to train.
Would be more interesting to see -> capability per dollar or per watt, not parameter count...
ramshanker•1h ago