> SIMA 2 Reasoning:
> The user wants me to go to the ‘tomato house’. Based on the description ‘ripe tomato’, I identify the red house down the street.
> "The house that looks like a ripe tomato!"
that was transformed into a "user prompt" in a more instructional format
> "Go to the tomato house"
And both were used in the agent output. At least the Y-axes on the graphs look more reasonable than some other recent benchmarks.
They've acquired this bad habit of keeping all their scientific experiments closed by default and just publishing press releases. I wish it was open-source by default and closed just when there's a good reason.
Don't get me wrong, I suppose this is more of a compliment. I really like what they are doing and I wish we could all participate in these advances.
Workaccount2•1h ago
>In subsequent training, SIMA 2’s own experience data can then be used to train the next, even more capable version of the agent. We were even able to leverage SIMA 2’s capacity for self-improvement in newly created Genie environments – a major milestone toward training general agents across diverse, generated worlds.
Pretty neat, I wonder how that works with Gemini, I suppose SIMA is a model (agent?) that runs on top of it?
FuckButtons•14m ago