Reproducing UMI with a UR5 Robot Arm and a 3D-Printed Gripper

https://twitter.com/raulgarreta/status/1987679358409203921

1•rgarreta•1h ago

Comments

rgarreta•1h ago

I've been working on reproducing the UMI paper (https://umi-gripper.github.io/ ) and their code. I've been relatively successful so far (see attached videos): most of the time the arm is able to pick up the cup, but it drops it at a higher-than-desired height over the saucer. I'm using their published code and model checkpoint.

I've tried several approaches to address the issue, including:

Adjusting lighting.

Tweaking latency configurations.

Enabling/disabling image processing from the mirrors.

I still haven’t been able to solve it.

My intuition is that the problem might be one of the following:

Model overfitting to the training cups. The exact list of cups used in training isn’t published. After reviewing the dataset, I see a red cup/saucer set, but I suspect its relative size is different from mine, so the model may be incorrectly estimating the right moment to release the cup.

The model might need fine-tuning with episodes recorded in my own environment using my specific cup/saucer set.

My gripper might lack the precision the original system had.

Residual jitter in the arm or gripper could also be contributing.

Other thoughts:

Depth estimation may be a bottleneck. Adding a depth camera or a secondary camera for stereo vision might help, but would likely require retraining the model from scratch.

Adding contact information could also improve performance, either via touch sensors or by borrowing ideas from ManiWAV (https://mani-wav.github.io/ ), which uses a microphone mounted on the finger.

If anyone has been more successful with this setup, I’d love to exchange notes.

Inundated with slop, TikTok tests feature for users to 'see less' AI content

Thomas Mann's Pessimistic Humanism

Using Effect Size–Or Why the P Value Is Not Enough

Call Center Agoda 24 Jam

Trump Signs Epstein Files Bill After Fight Straining Party Unity

Akron Officer Fires 15 Rounds on Man Outside a Bar After Gun Reports [video]

Open Source Developers Are Exhausted, Unpaid, and Ready to Walk Away

White House floats AI executive order to override state laws

A rare GM EV1 saved from the crusher is going to be driveable again

Show HN: CodexUse – A local GUI to manage Codex CLI profiles and rate limits

NewPipe: Mobile YouTube Without Shortform Videos

Librarian vows to stop invasive ed tech after ending lawsuit with Proctorio

Graphics Programming with SDL 3 [video]

Dense reconstruction is the scaffold of machine learning

Phrases.pdf – how well do LLM predictions compare with actual corpus data

Creating a Tab completion model from scratch

Cosmic Paradox Reveals the Awful Consequence of an Observer-Free Universe

Myanmar's military detains foreigners in raid on second major online scam center

VoIP Brings Back Old-Fashioned Pay Phones to Rural Vermont

Crypto got everything it wanted. Now it's sinking

How to Get High on Math

Nanoscale Mirrorless Superradiant Lasing

Histone variants and chromatin structure, update of advances(2022)

Show HN: Sourcewizard – AI installs SDKs in your codebase

Histone acetylation and CpG methylation on nucleosomes(2012)

Microtubules as Fractal Time Crystals: implications for life and consciousness [video]

QRL: Future‑Proof Blockchain

Ask HN: Struggling founders, pls share your startup struggle

Poland closes last Russian consulate after 'act of state terrorism' on railway

I Hate Journalism's Culture of Casual Calumny