So there might be awesome progress behind the scenes, just not ready for the general public.
VLA models essentially take a webcam screenshot + some text (think "put the red block in the right box") and output motor control instructions to achieve that.
Note: "Gemini Robotics-ER" is not a VLA, though Gemini does have a VLA model too: "Gemini Robotics".
That's a bit exaggerated, no? Early roombas would get tangled in socks, drag pet poop all over the floor, break glass stuff and so on, and yet the market accepted that, evolved, and now we have plenty of cleaning robots from various companies, including cheap spying ones from china.
I actually think that there's a lot of value in being the first to deploy bots into homes, even if they aren't perfect. The amount of data you'd collect is invaluable, and by the looks of it, can't be synth generated in a lab.
I think the "safer" option is still the "bring them to factories first, offices next and homes last", but anyway I'm sure someone will jump straight to home deployments.
My non-AI dishwasher can't even always keep the water inside. Nothing is perfect.
Anyway, cool.
Of course this is for counting animal legs while giving coordinates and reading analog clocks. Not coding or or solving puzzles. I imagine the image performance to model weight of this model is very high.
A few robot legs and arms, big battery, off-the-shelf GPU. Solar panels.
Prompt: "Take care of all this land within its limits and grow some veggies."
Or it could turn out to look like satayoma (Japanese peasant forests) or it could be more similar to the crop rotation that was traditionally practiced in many parts of Central Africa where roots were important.
In Russia before the Soviets forced "modern scientific agriculture" on peasants to modernize, they practiced things like contour farming (where they interplanted rows of crops against the contours of the land to slow water down) and maslins (where they intermixed multiple varieties of wheat and barleys in the same patch). Now contour farming are an active area of research for their ability to prevent topsoil loss and build soil health while maslins provide superior yield stability and use little to no pesticides.
That's not even getting into the over 40-120,000 varieties of rice we've documented. Most of which are hyper adapted to a very specific location—often even a single village.
My point is there is no one way to take care of a plot of land. It's all relative to a number of factors beyond just the abiotic characteristics of the land itself. Your goals and intentions matter and you will always find localized unique adaptations.
sho_hn•2h ago
The gauge-reading example here is great, but in reality of course having the system synthesize that Python script, run the CV tasks, come back with the answer etc. is currently quite slow.
Once things go much faster, you can also start to use image generation to have models extrapolate possible futures from photos they take, and then describe them back to themselves and make decisions based on that, loops like this. I think the assumption is that our brains do similar things unconsciously, before we integrate into our conscious conception of mind.
I'm really curious what things we could build if we had 100x or 1000x inference throughput.
moonu•2h ago
lachlan_gray•1h ago
Kostic•1h ago
timmg•46m ago
LetsGetTechnicl•53m ago
tootie•45m ago