Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware!
A low-cost, open-source, mobile manipulator.
One of the most high-effort projects in my past 5yrs! Not possible without co-lead
@zipengfu
and
@chelseabfinn
.
At the end, what's better than cooking yourself a meal with the 🤖🧑🍳
Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation
After 8 months iterating
@stanford
and 2 months working with beta users, we are finally ready to release it!
Here is what ALOHA is capable of:
China's progress in humanoid robots deserves more attention.
The video below has <300 views on YouTube, while the robot appears to be
- more agile than
@Tesla
's Optimus
- more dexterous than
@agilityrobotics
's Digit
- (likely) a lot cheaper than both
Introducing 𝐀𝐋𝐎𝐇𝐀 𝐔𝐧𝐥𝐞𝐚𝐬𝐡𝐞𝐝 🌋 - Pushing the boundaries of dexterity with low-cost robots and AI.
@GoogleDeepMind
Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!
Robots are not ready to take over the world yet!
@zipengfu
and I just compiled a video of the dumbest mistakes 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 made in the autonomous mode 🤣
We are also planning to organize some live demos after taking a break. Stay tuned!
How can robots acquire fine-grained manipulation skills?
Introducing ACT: Action Chunking with Transformers 🤖
Key idea: Imitation, but predict actions in chunks instead of one at a time.
Here are results with only ~15min of demonstrations, running on low-cost arms:
Led by
@GoogleDeepMind
, we present ALOHA 2 🤙: An Enhanced Low-Cost Hardware for Bimanual Teleoperation.
ALOHA 2 🤙 significantly improves the durability of the original ALOHA 🏖️, enabling fleet-scale data collection on more complex tasks.
As usual, everything is open-sourced!
Not just cooking! We made another video showing what 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 is capable of in a real home, inspired by the famous PR1 video.
2024 will be the year of robotics, and this is just the beginning!
Mobile ALOHA's hardware is very capable. We brought it home yesterday and tried more tasks! It can:
- do laundry👔👖
- self-charge⚡️
- use a vacuum
- water plants🌳
- load and unload a dishwasher
- use a coffee machine☕️
- obtain drinks from the fridge and open a beer🍺
- open
With the advent of AGI, humans will soon be the weakest link in software industry. How can we have better coding buddies that *enhance* humans?
Introducing 𝐁ug 𝐀nalysis and 𝐈dentification with enhanced 𝐓oads (BAIT), where we fit toads with contact lenses to better catch bugs
I wish we can come back to this tweet in a decade and be like "Hey here is when we finally cracked data collection".
Low-cost, portable, hardware agnostic. I could not ask for more!
Can we collect robot data without any robots?
Introducing Universal Manipulation Interface (UMI)
An open-source $400 system from
@Stanford
designed to democratize robot data collection
0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
Just built another ALOHA🏖️
@Stanford
! Rumor said there are now more than 20 ALOHAs in the world 👀
Notice the new grippers? Folks at
@GoogleDeepMind
actually redesigned it and kind enough to open source. We will be announcing it shortly!
Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation
After 8 months iterating
@stanford
and 2 months working with beta users, we are finally ready to release it!
Here is what ALOHA is capable of:
one time
@tonyzzhao
took off his sweater to try it with the model. The policy was never trained on an adult sized shirt or any type of sweaters, but we found it's able to generalize.
How to build ALOHA? We open-sourced everything about the setup, and prepared a detailed tutorial. In short: it's with off-the-shelf robots + 3D printed components.
We also contacted
@trossenrobotics
, who agreed to manufacture and sell the whole ALOHA kit that you can buy now!
To achieve these goals, we mount ALOHA to a mobile base designed for warehouses: Tracer AGV
It can carry 100kg, move up to 1.6m/s, while costing only $7k
To allow simultaneous arms and base control, we simply tether the operator to the mobile base, i.e. backdriving the wheels.
Mobile ALOHA 🏄 is coming soon!
Special thanks to
@tonyzzhao
for throwing random objects into the scene, and
@chelseabfinn
for the heavy pot (> 3 lbs) !
Stay tuned!
@Stanford
We built ALOHA to be maximally user-friendly for researchers: it is simple, dependable and performant.
The whole system costs <$20k, yet it is more capable than setups with 5-10x the price.
Introduce 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Learning!
With 50 demos, our robot can autonomously complete complex mobile manipulation tasks:
- cook and serve shrimp🦐
- call and take elevator🛗
- store a 3Ibs pot to a two-door cabinet
Open-sourced!
Co-led
@tonyzzhao
,
@chelseabfinn
Thrilled to announce that I will be joining
@StanfordAILab
as a PhD student! Starting to code in my freshman year, it has been a wild ride: I'm fortunate to be part of both
@svlevine
's lab and
@BerkeleyNLP
. For my PhD, I want to explore the synergy of Robotics, Language and ML!
1. Moves fast. Similar to human walking of 1.42m/s.
2. Stable. Manipulate heavy pots, a vacuum, etc.
3. Whole-body. All dofs teleoperated simultaneously.
4. Untethered. Onboard power and compute.
Seems to be a quite significant improvement over the original ALOHA 🏖️ ! Just from this video:
- Smooth active gravity comp
- Larger payload and gripper opening
- The bottle throw and catch demo is 🔥
Excited to see ALOHA applied to another new robot!
It is so inspiring to see researchers outside of academia being able to replicate ALOHA🏖️ and ACT.
This is really the best-case scenario I can hope for, to democratize access to robotics and AI research!
Ben Katz's thesis is full of golden nuggets. In particular, I discovered today he had a really cool bilateral teleoperation system using two Mini Cheetah legs.
How does it work? ALOHA has two leader & two follower arms, and syncs the joint positions from leaders to followers at 50Hz. The user teleops by simply moving the leader robots.
This takes 10 lines to implement, yet intuitive and responsive anywhere within the joint limits.
At test time when the robot is autonomous, the backdriving structure and the leader arms can be easily detached. This reduces the robot's footprint by 45% and shaves off 15kg in weight.
The robot can reach 65cm to 200cm vertically, and 100cm away from its base.
Curious about deploying robot learning solutions in the real world? 🤖
Join us and our amazing lineup of speakers at
#CoRL2023
this year. We will be holding a debate on the future of robot learning, in addition to talks and poster sessions!
CfP:
Is Silicon Valley too obsessed with pure software businesses? Do we still have a chance to disrupt DJI? Will Unitree be the next DJI but with a much much larger scope?
I have so many questions.
Introducing Unitree H1: Its First General-purpose Humanoid Robot| Embodied AI, Price below $90k
The preview of half-a-year achievement
The highest-power-performance robot of its counterparts with similar specifications in the world, weigh ~47Kg, maximum joint torque of 360N.m
This simple idea + proper mechanical design allows ALOHA to perform precise tasks like RAM insertion, dynamic tasks like juggling a ping pong ball, and contact-rich tasks like putting on a shoe.
It is reliable: there were no motor failures throughout the 8 months testing.
Before diving into the hardware, we also release a *proper* ALOHA sim model with SysID, thanks to
@kevin_zakka
@the_real_btaba
@ayzwah
.
Even if you don’t have the hardware, there is now a way to perform complex tasks with ALOHA in Mujoco!
#RSS2023
I am unable to present in-person because of visa issues😢 But the amazing
@siddkaramcheti
is kind enough to help me present it, on Tue 11am!
I will be at the poster session to answer any questions (through an iPad on tripod.) Thanks
@du_maximilian
for setting it up!
Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation
After 8 months iterating
@stanford
and 2 months working with beta users, we are finally ready to release it!
Here is what ALOHA is capable of:
It is worth noting, however, that this robot has not been publicly demoed like Optimus or Digit. Additionally, its payload capacity is likely smaller. Nevertheless, it deserves more than 300 views 🙂
Original video:
Product page:
Check out our new waypoint extraction method led by
@lucy_x_shi
@archit_sharma97
! It’s a plug-and-play module that boosts imitation learning performance 🤖
Very impressed by Lucy’s execution in this project. She would also be applying for PhD this cycle!
Unitree H1 Breaking humanoid robot speed world record [full-size humanoid] Evolution V3.0 🥰
The humanoid robot driven by the robot AI world model unlocks many new skills!
Strong power is waiting for you to develop!
#Unitree
#AI
#subject3
#BlackTech
Super neat system! It seems that Chinese robotics startups have everything they need to quickly iterate on capable & low-cost hardware. Will US startups be able to compete? Chaining together dynamixals/off-the-shelf motors likely won’t cut it…
Maybe joint space teleop is all you need? 👀
Amazing project from
@philippswu
making teleoperation more accessible on a series of cobots. Its also awesome to see more hardware advances optimized for robot learning use cases!
🎉Excited to share a fun little hardware project we’ve been working on. GELLO is an intuitive and low cost teleoperation device for robot arms that costs less than $300. We've seen the importance of data quality in imitation learning. Our goal is to make this more accessible
1/n
@heskelbalas
@Stanford
Thanks for pointing it out Heskel: it is indeed my video. There has been some misinformation that associates it with OpenAI's investment in
@1x__tech
It takes a lot of effort to not only build something that "works", but also document the process and make it available to the community. Kudos to
@kenny__shaw
and the team!
We have easy-to-follow assembly videos with step-by-step instructions on the website. All the parts are easily available off-the-shelf, and the CAD files are open-source. Our design is stronger and more robust than other hands. Takes 3 hours to assemble.
2/
For the past year we've been working on ALOHA Unleashed 🌋
@GoogleDeepmind
- pushing the scale and dexterity of tasks on our ALOHA 2 fleet. Here is a thread with some of the coolest videos!
The first task is hanging a shirt on a hanger (autonomous 1x)
How can robots acquire fine-grained manipulation skills?
Introducing ACT: Action Chunking with Transformers 🤖
Key idea: Imitation, but predict actions in chunks instead of one at a time.
Here are results with only ~15min of demonstrations, running on low-cost arms:
This project is not possible without the support from my advisor
@chelseabfinn
and
@svlevine
@Vikashplus
.
But so far, we’ve only covered *half* of the project! In a second thread, I will show how ALOHA can *autonomously* perform these tasks!
Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation
After 8 months iterating
@stanford
and 2 months working with beta users, we are finally ready to release it!
Here is what ALOHA is capable of:
🤖Joint-level control + portability = robot data in the wild! We present AirExo, a low-cost hardware, and showcase how in-the-wild data enhances robot learning, even in contact-rich tasks. A promising tool for large-scale robot learning & TeleOP, now at !
@AiBreakfast
We
@Stanford
will be releasing the research next week. Silver lining: *Everything* you saw it that video will be open-sourced to everyone.
Stay tuned!
"Insert anything into anything!"
New paper applying offline RL to industrial insertion. Test it with 12 new tasks, 100/100 success rate on all of them, with only 6 minutes of finetuning time on average!
📝
🌎
We start by improving the grippers: to make them grasp better and more robust.
We use a low-friction rail design that transmits 2x more force to the gripper tips. We also change the grip tape layout to improve grasping of small objects.
Led by
@SpencerGoodric6
and Thinh Nguyen
It is refreshing to see highly creative, open-source works like DexCap building on top of another highly creative, open-source work (LEAP hand by
@kenny__shaw
.)
This is the best way forward for the community. Congratulations
@chenwang_j
!
Can we use wearable devices to collect robot data without actual robots?
Yes! With a pair of gloves🧤!
Introducing DexCap, a portable hand motion capture system that collects 3D data (point cloud + finger motion) for training robots with dexterous hands
Everything open-sourced
Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation
After 8 months iterating
@stanford
and 2 months working with beta users, we are finally ready to release it!
Here is what ALOHA is capable of:
#𝗥𝗼𝗯𝗼𝗔𝗴𝗲𝗻𝘁 -- A universal multi-task agent on a data-budget
💪 with 12 non-trivial skills
💪 can generalize them across 38 tasks
💪& 100s of novel scenarios!
🌐
w/
@mangahomanga
@jdvakil
@m0hitsharma
, Abhinav Gupta,
@shubhtuls
Similar to ALOHA, we open source ACT together with 2 simulated environments for reproducibility. You can find it in the project website:
We hope ALOHA+ACT would be a helpful resource towards advancing fine-grained manipulation!
Introducing Yell At Your Robot (YAY Robot!) 🗣️- a fun collaboration b/w
@Stanford
and
@UCBerkeley
🤖
We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback.
YAY Robot enables
Are simple grippers limited to simple motions such as pick-and-place? Our work to be presented at
#CoRL2022
demonstrates that RL can be used to enable a parallel gripper to find interesting strategies to exploit the environment to enhance its “dexterity”.
A thread:
How to chain multiple dexterous skills to tackle complex long-horizon manipulation tasks?
Imagine retrieving a LEGO block from a pile, rotating it in-hand, and inserting it at the desired location to build a structure.
Introducing our new work - Sequential Dexterity 🧵👇
Always love seeing people's first reaction using ALOHA. This is Koko's first try closing the lid of that small cup, pretty much no learning curve!
Learn more about ALOHA here 👉
Introducing Open-World Mobile Manipulation 🦾🌍
– A full-stack approach for operating articulated objects in open-ended unstructured environments:
Unlocking doors with lever handles/ round knobs/ spring-loaded hinges 🔓🚪
Opening cabinets, drawers, and refrigerators 🗄️
👇
You should definitely chat with Phil & folks from Tesla if you are excited about large scale vision, robotics and more! They got a ton of data and compute to test your newest algorithm 👀
(Also wonderful people! Had a great time there last summer)
@Tesla
AI team is at
@CVPR
in Vancouver this week! If you are also here, stop by and check out what we have been working on for Autopilot, Optimus, and dojo!
#CVPR2023
With all above, ACT obtains 64%, 96%, 84%, 92% success for 4 tasks shown, with objects randomized along the 15 cm line.
It does not just memorize the training data, and is able to react to external disturbances:
(1) Predict action sequence
Standard BC predicts one action at a time, while a fine manipulation task can have >1000 steps easily.
Predicting action in chunks slows down compounding error, and can better model non-stationary human behavior.
Salute to the failure compilations of DARPA Robotics Challenge back in 2015 and of course the Boston Dynamics Atlas
I am secretly hoping to see
@Tesla_Optimus
fall 🙈
Is scaling all we need to deploy general purpose robot? We have an exciting lineup of speakers and a debate session tomorrow at
@corl_conf
. Look forward to seeing everyone in Atlanta! 🏙️
We are organizing
@corl_conf
2023 workshop on Reliable and Deployable Learning-Based Robotic Systems with an exciting list of invited speakers, looking towards the future of robot learning systems: . Please don't hesitate to submit your work here!
Fine manipulation is difficult: either from RL, Sim2Real, or Imitation.
- Hard exploration and sparse reward
- Large Sim2Real gap
- Compounding error for BC
- No large dataset
We introduce three important design choices behind ACT, an efficient imitation learning method:
(3) Transformer
We modernize the VAE by using a BERT-like encoder and a DETR-like decoder, training end-to-end from scratch.
This transformer architecture benefits more from chunking than ConvNets and non-parametric methods.
Personally, this is a challenging project to work on, spanning from hardware to ML.
It would certainly not be possible without my amazing advisor
@chelseabfinn
and collaboration from
@svlevine
@Vikashplus
!
Next, we improve the gravity compensation of the leader arm. With a constant-force retractor and a spring-pulley system, the arm can "float" in most places.
It is also much more durable than the original rubberbands!
some friends working in robotics expressed interest in a gripper I was designing for my senior capstone project, so I've decided to make it open source! it's extremely simple and cheap, but doesn't sacrifice on performance 🦾
check it out:
(2) Generative model policy
The policy is trained as the decoder of a VAE, reconstructing action chunks from latent z, 4 RGB images, and proprioception.
Intuitively, z extracts the “style” of the action chunk.
This is crucial when learning from human demos.
We use the same rail design on the leader side. To further improve ergonomics, we replace the original servo with a lower gear ratio one that is easier to backdrive.
This results in a 10x reduction in friction that the operator needs to overcome when opening grippers!
Last but not least: we simplify the frame surrounding the workcell while maintaining the rigidity of the camera mounting points. This opens up the space for both human-robot collaborators and props for the robot to interact with.
[2/3] Robotics is hard, and I focus my time mostly on generative AI these days. Because in 5 years, we likely still can't match motor control of a 3-year old baby (generalization, adaptability, smoothness, dexterity)... Last remaining AI challenge would be mastering dexterity.
The deployable workshop
@corl_conf
is starting in 30 min! We are located at the second floor (follow the sign of "robot demo"), and will be hosting a debate, a panel and invited talks.
Streaming:
Just played with it in
@SvLevine
's lab. ALOHA is FAR more intuitive and responsive than I expected, esp. at that price. Thanks for the demo
@jianlanluo
and hats off to
@tonyzzhao
for outstanding engineering!
Diffusion policy from
@chichengcc
: also uses a generative model for policy. Great for fitting multi-modal data and made large progress on the RoboMimic benchmark. Also very impressive real-world experiments!
What if the form of visuomotor policy has been the bottleneck for robotic manipulation all along? Diffusion Policy achieves 46.9% improvement vs prior StoA on 11 tasks from 4 benchmarks + 4 real world tasks! (1/7)
website :
paper:
@chenwang_j
Thanks Chen!! We actually released all commit history at the moment (might remove it later haha) but first commit is from
@zipengfu
Oct 16, when we just received all the hardware and start putting things together!
Here are some really cool related works you should also know about!
Chopstick-holding cherry-picking robot from
@xkelym
, trained with RL in the real world. The motion is very reactive and precise!
Let’s do 🍒 Cherry Picking with Reinforcement Learning
- 🥢 Dynamic fine manipulation with chopsticks
- 🤖 Only 30 minutes of real world interactions
- ⛔️ Too lazy for parameter tuning = off-the-shelf RL algo + default params + 3 seeds in real world
@ShikharMurty
I've seen something similar for our transformer (the ACT robot policy ). While validation loss seems largely plateaued, real-world performance keeps improving. Not sure if it is for the same reason though!
Excited to release OK-Robot, an open-vocabulary mobile-manipulator for homes. Simply tell the robot what to pick and where to drop it in natural language, and it will do it. Like:
Me: "OK Robot, move the Takis from the desk to the nightstand"
Robot: ⬇️