Gemini Robotics-ER 1.6

https://deepmind.google/blog/gemini-robotics-er-1-6/

110•markerbrod•2h ago

Comments

sho_hn•2h ago

It does all start to feel like we'd get fairly close to being able to convincingly emulate a lot of human or at least animal behavior on top of the existing generative stack, by using brain-like orchestration patterns ... if only inference was fast enough to do much more of it.

The gauge-reading example here is great, but in reality of course having the system synthesize that Python script, run the CV tasks, come back with the answer etc. is currently quite slow.

Once things go much faster, you can also start to use image generation to have models extrapolate possible futures from photos they take, and then describe them back to themselves and make decisions based on that, loops like this. I think the assumption is that our brains do similar things unconsciously, before we integrate into our conscious conception of mind.

I'm really curious what things we could build if we had 100x or 1000x inference throughput.

moonu•2h ago

Idk if you've seen this already but Taalas does this interesting thing where they embed the model directly onto the chip, this leads to super-fast speeds (https://chatjimmy.ai) but the model they're using is an old small Llama model so the quality is pretty bad. But they say that it can scale, so if that's really true that'd be pretty insane and unlock the inference you're talking about.

lachlan_gray•1h ago

Robotics/control systems is exactly what came to mind when I saw this release! What struck me is the possibility of look ahead search in real time, a bit like alphazero's mcts.

Kostic•1h ago

Taalas showed that you could make LLMs faster by turning them into ASICs and get 10k+ token generation. It's a matter of time now.

timmg•46m ago

Actually pretty interesting to think: in a few years you might buy a raspberry pi style computer board with an extra chip on it with one of these types of embodiment models and you can slap it in a rover or something.

LetsGetTechnicl•53m ago

What if we put slop images into slop machines and got slop^2 back out

tootie•45m ago

Is emulating human behavior really a valuable end goal though? Humans exist as the evolutionary endpoint of exhaustion hunting large pray and organic tool-making. We've built loads of industrial and residential automation tools in the last 100 years and none of them are humanoid. I'd imagine a household robot butler would be more like R2D2 with lots and lots of arms.

jeffbee•2h ago

Showing the murder dog reading a gauge using $$$ worth of model time is kinda not an amazing demo. We already know how to read gauges with machine vision. We also know how to order digital gauges out of industrial catalogs for under $50.

snickmy•1h ago

Agree. I'm unclear what's the highlight of this post. Is the multimodality of the model (that can replace computer vision), is it the reasoning part, is it the overall wrapper that makes it very easy to develop on top?

readams•1h ago

I think that where this gets interesting is when you can just drop these robotic systems into an environment that wasn't necessarily set up specifically to handle them. The $50 for your gauge isn't really the cost: it's engineering time to go through the whole environment and set it up so that the robotic system can deal with each of the specific tasks, each of which will require some bespoke setup.

gallerdude•2h ago

I’ve been thinking about AI robotics lately… if internally at labs they have a GPT-2, GPT-3 “equivalent” for robotics, you can’t really release that. If a robot unloading your dishwasher breaks one of your dishes once, this is a massive failure.

So there might be awesome progress behind the scenes, just not ready for the general public.

monkeydust•1h ago

I ended up watching Bicentennial Man (1999) with Robin Williams over the weekend. If you haven't seen I thought it was a good and timely thing to watch and is kid friendly. Without giving away the plot, the scene where it was unloading the dishwasher...take my money!

spwa4•1h ago

It's called "VLA" (vision-language-action) models: https://huggingface.co/models?pipeline_tag=robotics

VLA models essentially take a webcam screenshot + some text (think "put the red block in the right box") and output motor control instructions to achieve that.

Note: "Gemini Robotics-ER" is not a VLA, though Gemini does have a VLA model too: "Gemini Robotics".

A demo: https://www.youtube.com/watch?v=DeBLc2D6bvg

NitpickLawyer•1h ago

> If a robot unloading your dishwasher breaks one of your dishes once, this is a massive failure.

That's a bit exaggerated, no? Early roombas would get tangled in socks, drag pet poop all over the floor, break glass stuff and so on, and yet the market accepted that, evolved, and now we have plenty of cleaning robots from various companies, including cheap spying ones from china.

I actually think that there's a lot of value in being the first to deploy bots into homes, even if they aren't perfect. The amount of data you'd collect is invaluable, and by the looks of it, can't be synth generated in a lab.

I think the "safer" option is still the "bring them to factories first, offices next and homes last", but anyway I'm sure someone will jump straight to home deployments.

doubled112•48m ago

I have broken dishes loading and unloading the dishwasher. Am I a massive failure?

My non-AI dishwasher can't even always keep the water inside. Nothing is perfect.

Rekindle8090•31m ago

If someone paid 100 grand for you to load and unload the the dishwasher, and the research to be able to do it costed hundreds of billions, decades of research, hundreds of thousands of researchers, and that was the ONLY thing you could do, yes, you WOULD be a massive failure.

skybrian•1h ago

Pointing a camera at a pressure gauge and recording a graph is something that I would have found useful and have thought about writing. Does software like that exist that’s available to consumers?

gunalx•1h ago

Look into opencv.

vessenes•51m ago

I'm pretty sure claude will one shot this for you, including making you a home assistant dashboard item if you ask it.

nickthegreek•46m ago

frigate can be setup to do this I believe, but its overkill. Openclaw could do it, slightly less overkill.

vessenes•49m ago

Nice. I couldn't find the part that I'm most interested in though, latency. This beats their frontier vision model for some identification tasks -- for a robotics model, I'm interested in hz. Since this is an "Embodied Reasoning" model, I'm assuming it's fairly slow - it's designed to match with on-robot faster cycle models.

Anyway, cool.

WarmWash•44m ago

In my quick image recognition testing on AI studio, it's performance seems similar to 3.1 pro, but is much much faster. It "thinks" but only for a few seconds.

Of course this is for counting animal legs while giving coordinates and reading analog clocks. Not coding or or solving puzzles. I imagine the image performance to model weight of this model is very high.

vibe42•48m ago

A parcel of land.

A few robot legs and arms, big battery, off-the-shelf GPU. Solar panels.

Prompt: "Take care of all this land within its limits and grow some veggies."

jayd16•36m ago

Yeah I'm not sure how that's currently working out. https://proofofcorn.com/

jonas21•21m ago

That's the opposite problem -- the agent doesn't have robot arms or legs or a parcel of land. It has to rely on people to get access to land and plant and harvest the corn, and those people are ignoring it.

jerf•4m ago

Are you saying that has failed? It isn't obvious to me from that page that anything in particular is going wrong. I don't think anyone is daft enough to claim that AI solves the "Iowa remains unplantable due to winter conditions" problem.

culi•20m ago

What if it turns out that "take care of this land" means the traditional way California was taken care of with regular small slow burns. After over 10k years of this type of management there are many important native species that won't even germinate without the presence of ash.

Or it could turn out to look like satayoma (Japanese peasant forests) or it could be more similar to the crop rotation that was traditionally practiced in many parts of Central Africa where roots were important.

In Russia before the Soviets forced "modern scientific agriculture" on peasants to modernize, they practiced things like contour farming (where they interplanted rows of crops against the contours of the land to slow water down) and maslins (where they intermixed multiple varieties of wheat and barleys in the same patch). Now contour farming are an active area of research for their ability to prevent topsoil loss and build soil health while maslins provide superior yield stability and use little to no pesticides.

That's not even getting into the over 40-120,000 varieties of rice we've documented. Most of which are hyper adapted to a very specific location—often even a single village.

My point is there is no one way to take care of a plot of land. It's all relative to a number of factors beyond just the abiotic characteristics of the land itself. Your goals and intentions matter and you will always find localized unique adaptations.

Open Source Isn't Dead. Cal.com Just Learned the Wrong Lesson

God Sleeps in the Minerals

Want to Write a Compiler? Just Read These Two Papers (2008)

Good Sleep, Good Learning (2012)

The Future of Everything Is Lies, I Guess: New Jobs

How do Wake-On-LAN works

Gemini Robotics-ER 1.6

Forcing an inversion of control on the SaaS stack

Costasiella kuroshimae – Solar Powered animals, that do indirect photosynthesis

Do you even need a database?

Wacli – WhatsApp CLI

Show HN: Libretto – Making AI browser automations deterministic

Fixing a 20-year-old bug in Enlightenment E16

Metro stop is Ancient Rome's new attraction

We ran Doom on a 40 year old printer controller (Agfa Compugraphic 9000PS) [video]

Proliferate (YC S25) Is Hiring Founding Engineers

Anna's Archive loses $322M Spotify piracy case without a fight

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

Pretty Fish: A better mermaid diagram editor

Show HN: Every CEO and CFO change at US public companies, live from SEC

Backpacks got worse on purpose

Study: Back-to-basics approach can match or outperform AI in language analysis

Elevated errors on Claude.ai, API, Claude Code

US v. Heppner (S.D.N.Y. 2026) no attorney-client privilege for AI chats [pdf]

AI ruling prompts warnings from US lawyers: Your chats could be used against you

H.R.8250 – To require operating system providers to verify the age of any user

Dependency cooldowns turn you into a free-rider

MIT Radiation Laboratory

Not all elementary functions can be expressed with exp-minus-log

A communist Apple II and fourteen years of not knowing what you're testing

Gemini Robotics-ER 1.6

Comments

Open Source Isn't Dead. Cal.com Just Learned the Wrong Lesson

God Sleeps in the Minerals

Want to Write a Compiler? Just Read These Two Papers (2008)

Good Sleep, Good Learning (2012)

The Future of Everything Is Lies, I Guess: New Jobs

How do Wake-On-LAN works

Gemini Robotics-ER 1.6

Forcing an inversion of control on the SaaS stack

Costasiella kuroshimae – Solar Powered animals, that do indirect photosynthesis

Do you even need a database?

Wacli – WhatsApp CLI

Show HN: Libretto – Making AI browser automations deterministic

Fixing a 20-year-old bug in Enlightenment E16

Metro stop is Ancient Rome's new attraction

We ran Doom on a 40 year old printer controller (Agfa Compugraphic 9000PS) [video]

Proliferate (YC S25) Is Hiring Founding Engineers

Anna's Archive loses $322M Spotify piracy case without a fight

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

Pretty Fish: A better mermaid diagram editor

Show HN: Every CEO and CFO change at US public companies, live from SEC

Backpacks got worse on purpose

Study: Back-to-basics approach can match or outperform AI in language analysis

Elevated errors on Claude.ai, API, Claude Code

US v. Heppner (S.D.N.Y. 2026) no attorney-client privilege for AI chats [pdf]

AI ruling prompts warnings from US lawyers: Your chats could be used against you

H.R.8250 – To require operating system providers to verify the age of any user

Dependency cooldowns turn you into a free-rider

MIT Radiation Laboratory

Not all elementary functions can be expressed with exp-minus-log

A communist Apple II and fourteen years of not knowing what you're testing