AlphaGo Moment for Model Architecture Discovery

23•Jimmc414•9h ago

Comments

Jimmc414•9h ago

This could be a very big paper if its claims are reproducible. Like approaching attention is all you need big.

They discovered 106 new state-of-the-art linear attention architectures through a fully autonomous AI research loop. The authors are making comparisons to AlphaGo’s move 37.

yorwba•8h ago

The part that is in principle amenable to replication is where they throw a lot of stuff at the wall and see what sticks. The part where they hype their own work, on the other hand... as a rule of thumb, if this really were a breakthrough on the level of AlphaGo, they wouldn't have to make that comparison themselves, someone else would be impressed enough to do it for them.

rafaelero•3h ago

Let's definitely wait for replication, but I am honestly not that surprised that it works. I am surprised it took so long for people to give it a real try. It's such an ideal scenario: every experiment is conducted inside the computer, so there is no need to gather data in the real world, which is the pain point for most experiments in science. The LLM is therefore free to try a lot of different combinations and learn in real time what works and what doesn't.

BoiledCabbage•3h ago

Interesting paper - it will fascinating to see if it pans out.

The one thing I didn't see that would be good is some validation that the architecture(s) that perform best on large models are the same architectures that perform best on small models.

Ie validation the assumption that you can use small models with sma amounts of training/compute to determine the best architecture for large models and high training budgets.

Even if it doesn't translate it would still be very cool to be able to qui kly evolve better small models (1M to 400M params), but I believe the implied goal (and what everyone wants) is that this exploration and discovery of novel architectures would be applicable for the really big models as well.

If you could only ai discover larger models by spending OpenAi/Anthropic/... budgets per exploration then we're not really gaining much in terms of novel ideas as the cost (time and budget) would be too prohibitive.

supermdguy•1h ago

Interesting work. Not super familiar with neural architecture search, but how do they ensure they’re not overfitting to the test set? Seems like they’re evaluating each model on the test set, and using that to direct future evolution. I get that human teams will often do the same, but wouldn’t the overfitting issues be magnified a lot by doing thousands of iterations of this?

CachyOS Kernels Based on Different Schedulers and Performance Improvements

Built an NSFW AI image generator for AI art creators

Personality Dimensions and Temperaments of Engineering Professors and Students

Show HN: Launch Hacker News like community on your Domain

Show HN: Cant, rust nn lib for learning

Neovim plugin to prompt any model from Markdown files

Chemical Process Produces Critical Battery Metals with No Waste

Elon Musk opened a diner in Hollywood. What could go wrong?

Doge is suggesting an AI tool that puts half of federal regs on a 'delete list'

Company developing Paducah laser uranium enrichment hits regulatory milestone

Texas Is Getting Tough on Data Protection

ChatGPT Gave Instructions for Murder, Self-Mutilation

The future is not self-hosted, but self-sovereign

Is Australia's bloated property market destroying the middle class?

Show HN: I built a tool to fight YouTube clickbait with AI summaries

Show HN: Explore GitHub via What Stargazers Also Starred

Trump's AI Action Plan is a blueprint for dystopia

Are prompts the new unit of work?

How to expose Kubernetes OIDC JWKS endpoints

William Cowper's pet hares [1784]

Post to HN

$Lei – Aesthetic Computer

Verify Identities During Self-Service Registration

Fast and cheap bulk storage: using LVM to cache HDDs on SSDs

Measuring Engineering

The Electron E1 Processor

Smallest particulate matter sensor revolutionizes air quality measurement

An Interview with Alex Ward

eSports for Engineers: course syllabus bridging gaming and STEM education [pdf]

Voice AI for medical/premed students