I feel like most of this recent Autoresearch trend boils down to reinventing hyper-parameter tuning. Is the SOTA still Bayesian optimization when given a small cluster? It was ~3 years ago when I was doing this kind of work, haven't kept up since then.
Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!
ipsum2•7m ago
Hyperparam tuning that has better intuition and can incorporate architecture changes automatically. It won't invent something completely new though.
zhwu•37m ago
The most surprising part: the agent had access to both H100s and H200s. Without being told, it noticed H200s scored better and started screening ideas on H100s, then promoting winners to H200s for validation. That strategy emerged entirely on its own.
Aboutplants•33m ago
Yeah I thought that was a particularly neat part
rogerrogerr•24m ago
Why do we think this emerged “on its own”? Surely this technique has been discussed in research papers that are in the training set.
hhh•4m ago
Why?… The experiment.yaml shows that it is calling h100/200 explicitly, it’s pretty common for humans to say “number bigger more gooder” for anything… Lie and reverse the values and see what happens. I would put money on a rabbit hole of complaining about it being misconfigured.
covi•37m ago
This feels like the chimpanzee with a power drill. An agent is honestly just brute-force search, but guided.
ipsum2•12m ago
A cluster is 2 nodes? That's technically true, but not very exciting.
kraddypatties•58m ago
Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!
ipsum2•7m ago