Excited to share that NEO is the first fully autonomous Machine Learning Engineer.
It can reason, plan and iterate on complex data cleaning, feature engineering and model training tasks.
Neo is now the SOTA Machine Learning Engineering agent on OpenAI's MLE-Bench.
They are not just distributing trophies to anybody. It was built over 10 months with a small team and some GPUs.
andsoitis•5mo ago
How can I try Neo myself? I couldn't find a way.... as I said, the official Neo website just talks about a waitlist, so seeing benchmarks and no way to try it out or use it in some fashion makes it.... not useful to me? Unless I misunderstand how it is integrated into other systems (which the Neo website doesn't tell me either).
Seeing a benchmark for something that is hidden behind a veil isn't worth much.
If I am missing something key, I sincerely apologize, but I really tried to find a way to interact with or use or download Neo or something, but it is just a void, other than the empty website and this benchmark.
Vij137•5mo ago
No worries. I appreciate the concerns you shared and we will work on them. We are releasing access next week to our waitlist. Please signup and share your id, i will bump it up.
gauravvij137•5mo ago
Did you miss reading the official leaderboard score on openai/mle-bench page where it clearly states that Neo has the best score on mlebench?
Vij137•5mo ago
Neo is now the SOTA Machine Learning Engineering agent on OpenAI's MLE-Bench.
For context: Microsoft's RD agent is at 22.4%.
andsoitis•5mo ago
When I go to the website I can Join waitlist - https://heyneo.so/
Smells vaporware-y, fwiw
Vij137•5mo ago
Here:https://github.com/openai/mle-bench
They are not just distributing trophies to anybody. It was built over 10 months with a small team and some GPUs.
andsoitis•5mo ago
Seeing a benchmark for something that is hidden behind a veil isn't worth much.
If I am missing something key, I sincerely apologize, but I really tried to find a way to interact with or use or download Neo or something, but it is just a void, other than the empty website and this benchmark.
Vij137•5mo ago
gauravvij137•5mo ago