Will be interesting to see how Gemini 3 does later this year.
Also, when they ask you to identify traffic lights, do you select the post? And when it’s motor/bycicles, do you select the guy riding it?
Either that or it was never about the buses and fire hydrants.
The worst offenders will just loop you forever, no matter how many solves you get right.
Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.
Some interesting debates around Gemini Computer Use - surely Google can post train this away right? (Currently it has no problem solving it haha)
PaulHoule•1h ago
golfer•53m ago
mdahardy•47m ago