Can we collect robot data without any robots?
Introducing Universal Manipulation Interface (UMI)
An open-source $400 system from
@Stanford
designed to democratize robot data collection
0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
What if the form of visuomotor policy has been the bottleneck for robotic manipulation all along? Diffusion Policy achieves 46.9% improvement vs prior StoA on 11 tasks from 4 benchmarks + 4 real world tasks! (1/7)
website :
paper:
Weights drop ⚠️
We released our pre-trained model for the cup arrangement task trained on 1400 demos! We aim to enable anyone to deploy UMI on their robot to arrange any "espresso cup with saucer" they buy on Amazon.
UMI x ARX
@YihuaiGao
just got our in-the-wild cup policy working with ARX5
@ARX_Zhang
! We are still tuning the controller and latency matching for smoother tracking. Lot’s of potential in these low-cost lightweight arms!
With UMI, you can go to any home, any restaurant and start data collection within 2 minutes.
With a diverse in-the-wild cup manipulation dataset, we can train a diffusion policy that generalizes to the top of a water fountain – clearly unseen environments and objects. 2/9
Can we use wearable devices to collect robot data without actual robots?
Yes! With a pair of gloves🧤!
Introducing DexCap, a portable hand motion capture system that collects 3D data (point cloud + finger motion) for training robots with dexterous hands
Everything open-sourced
UMI data is robot agnostic. Here we can deploy the same policy on both UR5e and Franka robots. In fact, you can deploy it on any robot with a parallel jaw stroke > 85mm. 3/9
I love how with just parallel jaw grippers and visuomotor policy, you can do really dextrous and precise tasks, often exceeding the mechancial accuracy of the robot arms themselves 🦾
Introducing 𝐀𝐋𝐎𝐇𝐀 𝐔𝐧𝐥𝐞𝐚𝐬𝐡𝐞𝐝 🌋 - Pushing the boundaries of dexterity with low-cost robots and AI.
@GoogleDeepMind
Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!
Please also check out our epic fails compilation! We achieve a 70-90% success rate on most tasks, which still doesn’t hit the bar for commercial deployment. However, we think getting a larger in-the-wild dataset will get us a lot closer! 6/9
Enabled by our unique wrist-only camera configuration and camera-centric action representation, our robot systems are calibration-free (works even with base movement) and robust against distractors and lighting changes. 4/9
Congrats on LeRobot’s release! Both the Diffusion Policy and UMI project have benefited tremendously from
@huggingface
libraries such as diffusers and accelerate. I hope to see more organized robotics open source efforts!
Meet LeRobot, my first library at
@huggingface
robotics 🤗
The next step of AI development is its application to our physical world. Thus, we are building a community-driven effort around AI for robotics, and it's open to everyone!
Take a look at the code:
Incredible results from simple hardware + behavior cloning! I’m glad that
@tonyzzhao
also find combining generative model + action sequence prediction to be effective at capturing multimodal actions!
Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation
After 8 months iterating
@stanford
and 2 months working with beta users, we are finally ready to release it!
Here is what ALOHA is capable of:
@GoPro
technologies: GPMF, QR control, Voice control, media mod, max lens … Has been indispensable for this project. Shout out to
@David_Newman
who personally responded to my questions related to timecodes, which is critical for bimanual UMI. 9/9
This model is far from perfect. For example, it doesn't work well under direct sunlight since it constantly rained at Stanford during our data collection effort. Please share your failure cases! Hopefully, we can have a community-based effort to train an even more robust model!
LLMs swept the world by predicting discrete tokens. But what’s the right tool to model continuous, multi-modal, and high dim behaviors?
Meet Vector Quantized Behavior Transformer (VQ-BeT), beating or matching diffusion based models in speed, quality, and diversity. 🧵
By learning the gradient field of action distribution, and generating action trajectories via “gradient descent” during inference, Diffusion Policy addressed 3 key challenges of robotics by inheriting advantages from Diffusion Models. (2/7)
Simulation envs/data are indispensible tools for rapid development iterations and reproducable benchmarks. A small but ambitious team lead by
@zhou_xian_
is pushing the boundary of robotic simulators by unifying multiple material representaitons and solvers. Please check it out!
Can GPTs generate infinite and diverse data for robotics?
Introducing RoboGen, a generative robotic agent that keeps proposing new tasks, creating corresponding environments and acquiring novel skills autonomously!
code:
👇🧵
(better with audio)
Not only is this amazing research, it has all the elements of a great robotics (manipulation) paper that I wish was common practice in the field.
Quick Thread:
We found Diffusion Policy to perform surprisingly well on real-world tasks by simply training its vision-encoder end-to-end. The resulting policy solves under-actuated, multi-stage tasks with a 6DoF action space and is robust against perturbations and kinematic constraints. (6/7)
I spent a couple months at the beginning of this year learning about GPU programming through trying to optimize inference for
@chichengcc
awesome Diffusion Policy paper. I was able to improve inference time for the denoising U-Net by ~3.4x over Pytorch eager mode and ~2.65x over
Introduce 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Learning!
With 50 demos, our robot can autonomously complete complex mobile manipulation tasks:
- cook and serve shrimp🦐
- call and take elevator🛗
- store a 3Ibs pot to a two-door cabinet
Open-sourced!
Co-led
@tonyzzhao
,
@chelseabfinn
@keerthanpg
@Stanford
I think wrist fisheye cams are sufficient for a surprisingly wide range of tasks. I do think there are tasks that could benefit from more views. For those cases, UMI data pipeline supports unlimited number of non-gripper GoPros (e.g. head mounted)
Imitation learning works™ – but you need good data 🥹 How to get high-quality visuotactile demos from a bimanual robot with multifingered hands, and learn smooth policies?
Check our new work “Learning Visuotactile Skills with Two Multifingered Hands”! 🙌
② Being used for image generation, Diffusion Models are no stranger for predicting high-dim tensors. Diffusion Policy’s actions-space scalability affords action sequence prediction, which is surprisingly important to maintain temporal mode-consistency of the actions. (4/7)
① Diffusion Policy can express arbitrary normalizable distributions, which includes multimodal action distributions - a well known challenge for policy learning. (3/7)
③ Training stability is particularly important for robotics due to the high-cost of real world evaluation. By side-stepping intractable constant approximation by learning the gradient, Diffusion Policy is significantly more stable and easy to train compared to IBC. (5/7)
Visual Nav Transformer 🤝 Diffusion Policy
Works really well and ready for deployment on your robot today! We will also be demoing this
@corl_conf
🤖
Videos, code and checkpoints:
Work led by
@ajaysridhar0
in collaboration with
@CatGlossop
@svlevine
3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step:
Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source!
💻:
/🧵
@abhishekunique7
@breadli428
@zipengfu
@tonyzzhao
@chelseabfinn
I think the key to real-world robustness is close-loop policy and visual feedback. Due to its simplicity, BC is perfect for studying policy representation, HW interface design, and data collection methods. I hope the learning from BC can quickly propagate to the rest of robotics!
#MuJoCo
2.3.0 is out. We've taken our first steps towards improving fast simulations of flexible materials. Props to
@TheSmallQuail
for the new cable model!
[2/3] Robotics is hard, and I focus my time mostly on generative AI these days. Because in 5 years, we likely still can't match motor control of a 3-year old baby (generalization, adaptability, smoothness, dexterity)... Last remaining AI challenge would be mastering dexterity.
We are thrilled to announce LIBERO, a lifelong robot learning benchmark to study knowledge transfer in decision-making and robotics at scale! 🤖 LIBERO paves the way for prototyping algorithms that allow robots to continually learn! More explanations and links are in the 🧵
@ericjang11
One issue I often encounter in practice is the “moment matching” behavior of L2 regression (implying Gaussian dist) failed to capture multimodal distributions. On the other hand, categorical distribution implied by classification handles this quite well.
First, if you enjoyed the video above, we have a live demo that runs MuJoCo in your browser using Javascript and Web Assembly! You can accompany the robot (drag the keys down), or be adversarial and tug at the fingers 🙃
@kevin_zakka
Their SDF based collision handling for nuts and bolts will be available in next Issac Sim release! You can already try it in latest Omniverse Create
@danfei_xu
I can’t thank you and coauthors of robomimic enough! Getting good results on robomimic was the reason we are confident enough to start working on real word tasks. The hyperparameters tuned on robomimic also transfer to real very well
@AlexanderDerve
@Stanford
The IMU data recorded by GoPro is key for robust SLAM. I’m not aware of other action cameras with the same feature. The max lens mod (fisheye) is critical for learning as well
@kevin_zakka
Issac Gym is still half-baked with missing advertised features. Nvidia had a bad track record for maintaining non-revenue generating software, specially given its recent pivot to Metaverse (Omniverse). On the other hand, MuJoCo has withstood the test of time
@AlperCanberk1
Direct sunlight makes extremely high contrast shadows. The highlight also saturates image sensor. Actually very different distribution vs what color jitter does.
@BoyuanChen0
@chris_j_paxton
I think they highlighted their teleop capabilities on their website as well as in this article. Still really impressive even for teleop
@lab_ho
@keerthanpg
@Stanford
I think more cameras and sensors are absolutely helpful! With UMI, we tried to answer a different question: what’s the absolute minimum set of cameras do we need? We found that you can do a lot with surprisingly few (one per arm).
@HongKongFP
I have respected HKFP for its independent reporting on recent HK issues. However, I failed to find any image in this post about protesters actually being crushed by the van. I guess it’s just a metaphor? Please don’t become yet another clickbait generator!
@syam64
@shapoco
I had exactly this problem before! It can be repaired at an Apple Store by replacing the camera module. I remember it only costed USD $60
@Silas_Artist
@cgmastersnet
The rendering used several light path based tricks, therefore will not work out of the box in EEVEE. I will take a look once I have time.
@PatrickMoorhead
@dylan522p
Numerous counter examples can be made to your argument, including the newly started Tesla factory in Shanghai and Samsung’s factory in Tianjin. Apple also operates in China, and has far less than 50% of its stock owned by Chinese investors.
@raghavaupp
@AlperCanberk1
We use GoPro’s auto exposure for both data collection and inference. During training additional color jittering augmentation is added.
@tculpan
No offenses, but Hu’s statement is consistent with my observation of my Chinese friends. There is a huge media bias problem on both sides. HK protestors will not lose support from the west even if they kill a couple of innocent people, same can be said for HK police and CN people
@muddywatersre
@elonmusk
@ritholtz
Due to various constraints, the ability to sublease house/cars multiple levels are limited, yet you can short a stock multiple times, >100% of what’s available, which is one of the important factor of why this big squeeze worked.