Sakana AI has presented their work “Learning to Orchestrate Agents in Natural Language with the Conductor,” which has been accepted to ICLR 2026. The idea is simple but powerful: instead of forcing a single model to handle an entire task on its own, the researchers trained a separate 7B model to act as a manager for other AIs.
This Conductor doesn’t write code or solve tasks directly. It looks at a problem and decides which agents to deploy, what subtask to give each one, and what context to provide. Essentially, it’s not just a router between models — it’s a meta-prompt engineer that assembles a working AI team tailored to a specific task.
What’s most interesting is that this behavior emerged not from hardcoded rules, but through reinforcement learning. For simple questions, the Conductor might rely on a single model call. For complex tasks, it builds a chain on its own: a planner, an executor, a verifier, and a correcting agent. It closely resembles how a strong team breaks down complex work into distinct roles.
The results look impressive. The 7B Conductor was able to outperform every individual model in its pool, including GPT-5, Gemini, Claude, and the open-source models available at the time of the research. The paper reports new state-of-the-art results on LiveCodeBench: 83.9%, and GPQA-Diamond: 87.5%. At the same time, the system proved cheaper than heavyweight multi-agent approaches like Mixture-of-Agents.
One standout feature is called Recursive Test-Time Scaling. The Conductor can select itself as one of the working agents, re-evaluate the output produced by its team, figure out where things went wrong, and assemble a new corrective workflow. In other words, scaling at inference happens not just by “thinking longer,” but by dynamically reconfiguring a new team in response to an error.
The key takeaway here isn’t just that there’s another multi-agent framework. What matters more is this: models are beginning to learn not only how to answer, but how to manage other models. Whereas AI systems used to be built around a single “smartest” agent, the focus is now shifting toward orchestration, roles, verification, and collective reasoning.
And it seems that Sakana is building its new multi-agent system, Sakana Fugu, precisely on this foundation.
immanuwell•1h ago
a tiny 7b model learning to boss around much bigger llms by figuring out who talks to whom and actually beating them - is genuinely wild
zaevlad•1h ago
This Conductor doesn’t write code or solve tasks directly. It looks at a problem and decides which agents to deploy, what subtask to give each one, and what context to provide. Essentially, it’s not just a router between models — it’s a meta-prompt engineer that assembles a working AI team tailored to a specific task.
What’s most interesting is that this behavior emerged not from hardcoded rules, but through reinforcement learning. For simple questions, the Conductor might rely on a single model call. For complex tasks, it builds a chain on its own: a planner, an executor, a verifier, and a correcting agent. It closely resembles how a strong team breaks down complex work into distinct roles.
The results look impressive. The 7B Conductor was able to outperform every individual model in its pool, including GPT-5, Gemini, Claude, and the open-source models available at the time of the research. The paper reports new state-of-the-art results on LiveCodeBench: 83.9%, and GPQA-Diamond: 87.5%. At the same time, the system proved cheaper than heavyweight multi-agent approaches like Mixture-of-Agents.
One standout feature is called Recursive Test-Time Scaling. The Conductor can select itself as one of the working agents, re-evaluate the output produced by its team, figure out where things went wrong, and assemble a new corrective workflow. In other words, scaling at inference happens not just by “thinking longer,” but by dynamically reconfiguring a new team in response to an error.
The key takeaway here isn’t just that there’s another multi-agent framework. What matters more is this: models are beginning to learn not only how to answer, but how to manage other models. Whereas AI systems used to be built around a single “smartest” agent, the focus is now shifting toward orchestration, roles, verification, and collective reasoning.
And it seems that Sakana is building its new multi-agent system, Sakana Fugu, precisely on this foundation.