We forked LeRobot to operate a bimanual arm setup for 2 SO-ARM100s on a single computer. Then recorded 85 teleoperated folding sessions and trained an Action Chunking Transformer (ACT) on them.
The model learns to fold a shirt by predicting joint configurations based on the camera inputs! The robot gets a working (slightly crumpled) fold about 70% of the time, fails 15% of the time by dropping one of the shirts, and the other 15% of the time by failing to let go of the shirt at the end.
This crazy setup involved 6 power adapters, 5 USB-A and 3 USB-C inputs to power 4 arms and 3 cameras (one on each arm and an overhead webcam). We have 4 arms (2 leader arms and 2 follower arms).
Shoutout to my insane team: Spencer Kee, Advait Patel, Leo Lin, and Anuj Sesha!