If truly someone has performant agents at “PhD level”, whatever that means, then they should be building stuff, and proving that the tools they’ve developed are better. My advice is to give your agents out for free work for a trial period to your would-be clients, and see how far that gets you.
I don’t understand the point of wasting compute cycles on writing papers. If you’ve figured out something interesting, then that’s your secret sauce for building whatever you’re building. It’s not enough for an automated tool to do research that’s good enough to get the green light past some overtired reviewers.
All research exists to get used, so how usable is any agent work? Simply put, writing and publishing papers is a poor metric to assess the intelligence or usability of automated tools, it is akin to saying that cars are faster than horses on a track. The reason it’s useless for AI to write and publish papers is because we don’t care about the track per se, but about the activity that the track enables.
Lastly, it seems that if an AI is only good for writing and publishing papers, then its basically saying that its performance is mediocre at best for actually building anything because whatever it produced wasn’t useful enough to be used as secret sauce for building stuff.
The generalist AI that’s just good at general knowledge work isn’t going to replace any human PhDs. Again the goal for R&D teams is to build stuff or figure out the universe (so you can build stuff), not write papers.
RonusMTG•1d ago
This is a main conference, a pretty large leap from workshops, like the ones submitted to ICLR by Sakana AI.