The one thing I didn't see that would be good is some validation that the architecture(s) that perform best on large models are the same architectures that perform best on small models.
Ie validation the assumption that you can use small models with sma amounts of training/compute to determine the best architecture for large models and high training budgets.
Even if it doesn't translate it would still be very cool to be able to qui kly evolve better small models (1M to 400M params), but I believe the implied goal (and what everyone wants) is that this exploration and discovery of novel architectures would be applicable for the really big models as well.
If you could only ai discover larger models by spending OpenAi/Anthropic/... budgets per exploration then we're not really gaining much in terms of novel ideas as the cost (time and budget) would be too prohibitive.
Jimmc414•9h ago
They discovered 106 new state-of-the-art linear attention architectures through a fully autonomous AI research loop. The authors are making comparisons to AlphaGo’s move 37.
yorwba•8h ago
rafaelero•3h ago