π0.5: A VLA with open-world generalization

95•lachyg•4h ago

Comments

beklein•3h ago

This is amazing! As someone working with industrial robots, normally under strict environmental constraints and control, witnessing such real-world robotics progress truly excites me about the future!

By the way, they’ve open-sourced their π0 model (code and model weights). More information can be found here: https://github.com/Physical-Intelligence/openpi

UltraSane•3h ago

It seems robotics has advanced more in the last 3 years than the previous 20.

gs17•3h ago

Is the robot platform they're using something they've developed themselves? The paper doesn't seem to mention any details outside of sensors and actuators.

lachyg•3h ago

Off the shelf robots -- we've got our models running on dozen+ different robot types (and have this specific generalization demo working on multiple platforms too.)

gs17•3h ago

Great, would you happen to know what's used in this video?

modeless•3h ago

Here are some of the suppliers for things seen in the videos:

https://arx-x.com/

https://x.com/GalaxeaDynamics

https://www.youtube.com/@HEXMOVEHexmove_Robotic

https://www.trossenrobotics.com/

meisel•3h ago

These variable-length arrays are getting quite advanced

matthewfcarlson•2h ago

Ignore the haters. This is hilarious

layer8•36m ago

Precisely my thoughts.

djoldman•2h ago

I'm genuinely asking (not trying to be snarky)... Why are these robots so slow?

Is it a throughput constraint given too much data from the environment sensors?

Is it processing the data?

I'm curious about where the bottleneck is.

robopolicy•2h ago

Part of it is that training of these VLAs currently happens on human teleop data which limits speed (both for safety reasons and because of actual physical speed constraints in the teleoperation pipeline).

Let’s see how it changes once these pipelines follow the LLM recipes to use more than just human data…

dheera•1h ago

Not a PI employee, but diffusion policies are like diffusion models for image generation, they generate actions from noise in multiple steps. With current compute you can't run 100+Hz control loops with that kind of architecture.

Some combination of distillation, new architectures, faster compute, can eventually attack these problems. Historically as long as something in tech has been shown to be possible, speed has almost always been a non-issue in the years afterwards.

For now even getting a robot to understand what to do in the physical world is a major leap from before.

michaelt•1h ago

When you're operating your robot around humans, you want to be very confident it won't injure anyone. It'd be pretty bad if a bug in your code meant instead of putting the cast iron frying pan in the dishwasher, it sent it flying across the room.

One way of doing that is to write code with no bugs or unpredictable behaviour, a nigh-impossible feat - especially once you've got ML models in the mix.

Another option is to put a guard cage around your robot so nobody can enter pan-throwing distance without deactivating the robot first. But obviously that's not practical in a home environment.

Another option is just to go slowly all the time. The pan won't fly very far if the robot only moves 6 inches per second.

ethan_smith•48m ago

The primary bottleneck is typically the motion planning system that must continuously solve complex optimization problems to ensure safe trajectories while avoiding collisions in dynamic environments.

vhartman•20m ago

These models typically predict actions directly, there is no motion planning going on here.

airstrike•2h ago

I'm just a layman, but I can't see this design scaling. It's way too slow and "hard" for fine motor tasks like cleaning up a kitchen or being anywhere around humans, really.

I think the future is in "softer" type of robots that can sense whether their robot fingers are pushing a cabinet door (or if it's facing resistance) and adjust accordingly. A quick google search shows this example (animated render) which is closer to what I imagine the ultimate solution will be: https://compliance-robotics.com/compliance-industry/

Human flesh is way too squishy for us to allow hard tools to interface with it, unless the human is in control. The difference between a blunt weapon and the robot from TFA is that the latter is very slow and on wheels.

nullc•58m ago

The development here is primarily in the model. If someone invents the 'brains' a robot needs to do useful domestic tasks then there will suddenly be a lot of incentive to build the right body for it.

huydotnet•1h ago

Amazing! On a fun note, I believe if a human kid were cleaning up the spill and threw the sponge into the sink like that, the kid would be in trouble. XD

th0ma5•1h ago

Does the general laws of demos apply here? Than any automation shown is the extent of capabilities not the start?

fwip•48m ago

One thing I notice is that they specify that the robot has never seen the homes before, but certain objects, like the laundry baskets, are identical.

Doing your demo is significantly easier if you've already programmed/trained the robot to recognize the specific objects it has to interact with, even if those items are in different locations.

horhay•18m ago

They also got these things working corners of a location instead of stacking tasks on different areas of the same location. And even on these "one-area" task groups it can fail a good amount. Kudos to them for showing the failures though

desertmonad•44m ago

Finally, machines doing the work we dont want to do

bytesandbits•28s ago

Most of it is open source. Their VLAs are based upon Gemma models + vision encoders, plus their own action experts. You can download and play around or fine tune their Pi0 VLAs from their servers directly (JAX format) or from Huggingface LeRobot safetensors port. They also have notebooks and code in their repo to get started with fine-tuning. Inference runs in a single 4090 RTX streamed over WiFi to the robot.

Generate Tailwind CSS Snippets Instantly – Snipzin

Security in Next-Generation Sequencing: An Analysis of Emerging Threat Vectors

Reactylon: The React Framework for XR

OpenAI Wants to Buy Google's Chrome Browser

J8 Notation – Fixing the JSON-Unix Mismatch

America's progressives should love standardised tests

Ask HN: What are the real reasons people vote in elections?

RIP, Google Privacy Sandbox

Magi-1: autoregressive video model with top-tier quality output

Please Enjoy These Images of the 2025 Corgi Derby

The Expression Problem in Rust

Why California's dangerous drivers get to keep their licenses

Ask HN: Airbnb has internal AirCover refund policy max?

Rules for Fast-Moving Dev Teams in the Age of AI

He Built Pump.Fun. Did He Make a Fortune Dumping His Own Shitcoins as a Teen?

Show HN: Built an ecommerce price tracker after my agency clients begged for it

SkyReels-V2: Infinite-Length Film Generative Model

LLM Robustness/Safety Benchmark

AWS claims 50% of Azure workloads would jump ship if licensing costs allowed

Postgres sequences can commit out-of-order

Exploiting Undefined Behavior in C/C++ Programs: The Performance Impact [pdf]

The hostile internet is driving us crazy

3x3x3x3 Hypercube Solve in 9:47 [video]

Tesla Is Losing to Volkswagen's EVs in Europe

Atuin Desktop: Runbooks That Run

Why users cannot create Issues directly

Show HN: Drafter – Online WSIWYG Markdown editor, which saves to GitHub repo

We Built a Peer-to-Peer Business Credit Platform to Replace the Bureaus

Where Flakes Fall Off: An Eval Cache Tale

Reasons to Use OSH