Got a taste of
@Tesla
's FSD v12.3.4 last night. By no means flawless, but the human-like driving maneuvers (with no interventions) delivered a magical experience. Excited to witness the recipe of scaling law and data flywheel for full autonomy show signs of life in real products.…
The game of tenure-track faculty job:
ℍ𝕒𝕣𝕕 𝕞𝕠𝕕𝕖: 1st year
ℍ𝕖𝕝𝕝 𝕞𝕠𝕕𝕖: 1st year + COVID-19
𝕀𝕟𝕗𝕖𝕣𝕟𝕠 𝕞𝕠𝕕𝕖: 1st year + COVID-19 + No Power/Internet in freezing Texas
P.S. It has been great fun to play. What's next?
My Robot Learning class
@UTCompSci
is updated with the latest advances and trends, such as implicit representations, attention architectures, offline RL, human-in-the-loop, and synthetic data for AI. All materials will be public. Enjoy!
#RobotLearning
New work: we built a meta-learning algorithm for an agent to discover the causal and effect relations from its visual observations and to use such causal knowledge to perform goal-directed tasks. Paper:
Joint work w/
@SurajNair_1
@drfeifei
@silviocinguetta
📢Update announced in today’s
#GTC2024
Keynote📢
We are working on Project GR00T, a general-purpose foundation model for humanoid robots. GR00T will enable the robots to follow natural language instructions and learn new skills from human videos and demonstrations.
Generalist…
Heard students say WFH lowers productivity. In 1665, a Cambridge college student had to WFH during a pandemic. He got away from professors and worked on math alone. When he returned, the world knew him as Issac Newton! Good time to think hard in pajamas.
Thrilled to co-lead this new team with my long-time collaborator
@DrJimFan
. We are on a mission to build transformative breakthroughs in the landscape of Robotics and Embodied Agents. Come join us and shape the future together!
Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof.
@yukez
. GEAR stands for Generalist Embodied Agent Research.
We believe in a future where every machine that moves will be autonomous, and robots and…
Life update: I will be joining
@UTAustin
as an Assitant Professor in
@UTCompSci
starting Fall 2020. I am thrilled to continue my research on robot learning and perception as a faculty and look forward to collaborating with the exceptional faculty, researchers, and students at UT.
Honored to receive the NSF CAREER award titled "Intelligent Manipulation in the Real World via Modularity and Abstraction" to advance our lab's research on building autonomy stack for general-purpose robot manipulation in the wild!
Excited to share our latest progress on legged manipulation with humanoids. We created a VR interface to remote control the Draco-3 robot 🤖, which cooks ramen for hungry graduate students at night. We can't wait for the day it will help us at home in the real world!
#humanoid
Releasing my Stanford Ph.D. dissertation and talk slides "Closing the Perception-Action Loop: Towards Building General-Purpose Robot Autonomy", a summary of my work on robot perception and control
@StanfordSVL
Slides:
Dissertation:
Congratulations to
@snasiriany
and
@huihan_liu
on winning the
#ICRA2022
Outstanding Learning Paper award for their first paper
@UTCompSci
“Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks”!
Today we are on the road to Austin, TX.
I have a pleasant melancholy moving out of the Bay Area, a place where we have lived for seven years, leaving behind many fond memories and long-time friends. Meanwhile, thrilled to start out a new life. Tons of exiting things to come!
Taught my first (online) class
@UTCompSci
. Super pumped to teach a grad-level Robot Learning seminar this fall. Great to see UT students from all kinds of backgrounds passionate about learning what’s going on at the forefront of AI + Robotics🤘Syllabus:
Introducing MineDojo for building open-ended generalist agents!
✅Massive benchmark: 1000s of tasks in Minecraft
✅Open access to internet-scale knowledge base of 730K YouTube videos, 7K Wiki pages, 340K Reddit posts
✅First step towards a general agent
🧵
Very impressed by the new
@Tesla_Optimus
end2end skill learning video!
Our TRILL work () spills some secret sauce: 1. VR teleoperation, 2. deep imitation learning, 3. real-time whole-body control. It's all open-source! Dive in if you're into humanoids! 👾
Some of my proudest memories of my PhD are working with people from different countries and being advised by a stellar all-women thesis committee. I encourage students from diverse backgrounds to apply for my future lab
@UTCompSci
where diversity and inclusion will be valued.
We are unlikely to create an “ImageNet for Robotics”. In retrospect, ImageNet is such a homogeneous dataset. Labeled images w/ boxes.
Generalist robot models will be fueled by the Data Pyramid, blending diverse data sources from web and synthetic data to real-world experiences.
Dear academics, check out our 6 pack!! 💪 Ok... I meant 6-PACK, our new 6DoF Pose Anchor-based Category-level Keypoint tracker, real-time tracking of novel objects without known 3D models!
We present 6-PACK, an RGB-D category-level 6D pose tracker that generalizes between instances of classes based on a set of anchors and keypoints. No 3D models required! Code+Paper: w/ Chen Wang
@danfei_xu
Jun Lv
@cewu_lu
@silviocinguetta
@drfeifei
@yukez
One of RL's most future-proof ideas is that adaptation is just a memory problem in disguise. Simple in theory, scaling is hard!
Our
#ICLR2024
spotlight work AMAGO shows the path to training long-context Transformer models with pure RL. Open-source here:
Just wrapped up my
#CoRL2023
early-career keynote on 𝐏𝐚𝐭𝐡𝐰𝐚𝐲 𝐭𝐨 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐬𝐭 𝐑𝐨𝐛𝐨𝐭𝐬 on Wed. In case you missed it, here's a brief summary. Check out the slide deck for more detail: 🧵1/N
Future driverless cars will talk with each other! We introduce Coopernaut, a cooperative driving model that uses vehicle-to-vehicle (V2V) communication for robust driving in challenging traffic conditions.
#CVPR2022
Paper:
Project:
We are organizing a (virtual) workshop on Visual Learning and Reasoning for Robotic Manipulation at
#RSS2020
. We invite extended abstract submissions that address the research problems at the intersection of perception and manipulation:
Loved the Slow Science Manifesto (). We were told, "slow down to go faster." Oh boy, this is so much easier said than done. As a young academic, seeing fellow scholars churning out dozens of papers a year, it takes guts to hit the pause button and think!
As much as I'd like to tweet positivity and focus on
#AcademicChatter
, I know how difficult this moment is for the Asian community when my wife and I feel anxious about going out for shopping & errands, hearing recent news about hate crimes. Hatred is NOT a solution to a virus.
Implicit neural representations have pushed the envelope of 3D Vision and Graphics in recent years. How will they be useful for Robot Manipulation?
Our work GIGA demonstrated that they can bridge geometry reasoning and affordance learning for 6-DoF grasping in cluttered scenes.
🔥robosuite updates🦾After eight months of dev effort, excited to release our v1.3 version! We integrate advanced graphics renderers with our simulation framework and provide vision APIs to bridge robot perception and decision-making research. Try it out!
Can't wait to attend
#CoRL2023
for the next two days and give an early career keynote titled "𝐏𝐚𝐭𝐡𝐰𝐚𝐲 𝐭𝐨 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐬𝐭 𝐑𝐨𝐛𝐨𝐭𝐬: 𝐒𝐜𝐚𝐥𝐢𝐧𝐠 𝐋𝐚𝐰, 𝐃𝐚𝐭𝐚 𝐅𝐥𝐲𝐰𝐡𝐞𝐞𝐥, 𝐚𝐧𝐝 𝐇𝐮𝐦𝐚𝐧𝐥𝐢𝐤𝐞 𝐄𝐦𝐛𝐨𝐝𝐢𝐦𝐞𝐧𝐭" on Wed!
Thanks Fei-Fei
@drfeifei
for being such an amazing advisor, mentor, role model, and friend! Finishing a PhD is the end of the beginning. And greater things have yet to come!
Very proud of my PhD student
@yukez
for passing his PhD thesis defense with flying colors! His work is on perception, learning and robotics. Thank you thesis committee members
@leto__jean
@EmmaBrunskill
@silviocinguetta
& Dan Yamins!
First time attending
@HumanoidsConf
(on the
@UTAustin
campus!) Feels pumped to see the lightning-fast progress in this space. I expect this community to proliferate in the next few years --- Generalist robot intelligence can't be achieved without general-purpose hardware!
My
#CoRL2023
keynote talk on Pathway to Generalist Robots is on YouTube now:
I discussed the three key ingredients for building general-purpose robot autonomy: scaling law, data flywheel, and human-like embodiment.
If you want to learn more about our…
I won't be at NeurIPS next week. But our team is seeking interns to work on exciting and ambitious new projects on Large Language Models for Agents (starting early next year). Please fill out the Application Form below if you're interested.
📢Release note📢 We are pleased to release *robosuite* v1.4 and migrate its backend to
@DeepMind
's MuJoCo binding for long-term support and feature extensibility, solidifying our commitment to building open-source research software. Try it out at
We are releasing our
#ICCV2019
work on goal-directed visual navigation. We introduced a method that harnesses different perception skills based on situational awareness. It makes a robot reach its goals more robustly and efficiently in new environments.
100% agreed! I also felt extremely lucky to have some kindest and smartest advisors
@Stanford
and colleagues
@UTCompSci
"We're all smart. Distinguish yourself by being kind." This quote is one of the first principles I will teach to my students as a scholar.
All the technically strongest people I know are *kind* people.
My advisors/profs at
@WisconsinCS
, my colleagues at
@UTCompSci
, they are all competent, caring, empathetic human beings.
Sure, there are some jerks, but they are the minority -- there is no need to hire them.
Check out a new blog post of our work on long-horizon planning for robot manipulation. We also released RoboVat, our learning framework that unifies
#BulletPhysics
simulation and Sawyer robot control interfaces. Sim2real has never been easier.
How can a robot solve complex sequential problems?
In our newest blog post,
@KuanFang
introduces CAVIN, an algorithm that hierarchically generate plans in learned latent spaces.
ICML deadline tonight, RSS deadline tomorrow, and CVPR rebuttals due next Monday. For researchers working on robot learning and perception, life is goooood 😌
Roomba builds a static map of your home by moving around. Can a robot create articulated models of indoor scenes through its physical interaction?
Ditto in the House builds digital twins of articulated objects in everyday environments.
#ICRA2023
Website:
How can robot manipulators perform in-home tasks such as making coffee for us? We introduce VIOLA, an imitation learning model for end-to-end visuomotor policies that leverages object-centric priors to learn from only 50 demonstrations!
Before the coronavirus outbreak, I almost decided to name my new lab VIRAL, which stands for Visual Intelligence & Robot Autonomy Lab. Now I have to change it 😅 Epidemics make us think harder.
Foundation Models in Robotics: Applications, Challenges, and the Future
paper page:
We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific…
Heading to
@CVPR
today! We are organizing a 3D Vision and Robotics workshop tomorrow with a great line-up of speakers:
Also, I am recruiting a postdoc on vision + robotics for my group. Come to chat with me if interested - DMs are open!
2 years ago I was shopping for a coffee machine at Target. I found a perfect Keurig not for me but for my robot:
- Round tray to insert a K-cup;
- Lid open/close w/ weak forces;
- Coffee out w/ one button click.
There's no magic. Human ingenuity is behind every robot's success.
If you want to learn more about how the task has motivated a line of research in manipulation, see the list:
- VIOLA:
- HYDRA:
- AWE:
- HITL-TAMP:
- MimicGen:
We are organizing a
#CVPR2021
Workshop on 3D Vision and Robotics to promote the cross-pollination of ideas between these two research fields. CfP is open. We look forward to your contributions!
Rewriting classical robot controller with physics-informed neural network, plugging it as learnable module into data-driven autonomy stack, trained with large-scale GPU-accelerated simulation ➡️ Adaptivity & Robustness to the next level💡
How can we enable robot controllers to better adapt to changing dynamics? Idea: learn a data-driven controller implemented with physics-informed neural networks, and finetune on task-specific dynamics.
Website:
Paper:
Excited to share my recent talk at the Stanford Robotics Seminar on “Objects, Skills, and the Quest for Compositional Robot Autonomy” featuring projects from my first year
@UTCompSci
and our lab’s vision of building the next generation of autonomy stack.
Delighted to present our recent work on hierarchical Scene Graphs for neuro-symbolic manipulation planning. We use 3D Scene Graphs as an object-centric abstraction to reason about long-horizon tasks. w/
@yifengzhu_ut
, Jonathan Tremblay, Stan Birchfield
We have just released our new work on 6D pose estimation from RGB-D data -- real-time inference with end-to-end deep models for real-world robot grasping and manipulation! Paper: Code: w/
@danfei_xu
@drfeifei
@silviocinguetta
Excited to share VIMA, our latest work on building generalist robot manipulation agents with multimodal prompts. Massive transformer model + unified task specification interface for the win!
We trained a transformer called VIMA that ingests *multimodal* prompt and outputs controls for a robot arm. A single agent is able to solve visual goal, one-shot imitation from video, novel concept grounding, visual constraint, etc. Strong scaling with model capacity and data!🧵
Introducing MineDojo for building open-ended generalist agents!
✅Massive benchmark: 1000s of tasks in Minecraft
✅Open access to internet-scale knowledge base of 730K YouTube videos, 7K Wiki pages, 340K Reddit posts
✅First step towards a general agent
🧵
I felt fortunate to attend all four CoRL conferences in the past and served as an AC the first time.
@corl_conf
is hands down my favorite conference - focused Robot Learning community, high-quality (<200) papers, YouTube live stream, inclusion events. I couldn't ask for more!
Texas is a booming state for robotics research and industry. We are bringing together robotics researchers across the state this Friday at Texas Regional Robotics Symposium (TEROS) 2022. Great line-up of speakers and live steam for all talks. Join us at !
We'll witness more and more demos of humanoid robots doing the same tasks the robotics community has mastered with simpler systems. Yet people will still be awed.
It speaks more about human psychology than technology. Humanoids make sense in domains requiring social interaction.
We've released an updated version of ACID, our
#RSS2022
paper on volumetric deformable manipulation, with real-robot experiments.
ACID predicts dynamics, 3d geometry, and point-wise correspondence from partial observations. It learns to maneuver a cute teddy bear into any pose.
All talk recordings of our
#CVPR2023
3D Vision and Robotics Workshop are now available on the YouTube playlist: . Check them out in case you missed the event!
Heading to
@CVPR
today! We are organizing a 3D Vision and Robotics workshop tomorrow with a great line-up of speakers:
Also, I am recruiting a postdoc on vision + robotics for my group. Come to chat with me if interested - DMs are open!
Spot-on! Top AI researchers and institutes have the magic power of pushing a research field years back, simply by publishing initial papers and inadvertently creating a vicious cycle of worthless publications. With great power comes great responsibility.
A lot of machine learning research has detached itself from solving real problems, and created their own "benchmark-islands".
How does this happen? And why are researchers not escaping this pattern?
A thread 🧵
Sharing the slides of my talk "Learning Keypoint Representations for Robot Manipulation" presented at the Workshop on Learning Representations for Planning and Control
@IROS2019MACAU
Slides:
Workshop:
We have six papers to be presented at
#ICRA2021
this week, spanning the topics of imitation learning for manipulation, neuro-symbolic planning, multimodal perception, uncertainty quantification, and morphological computation. A thread /5
I will attend ICML in Hawaii next week to present VIMA () and meet friends. Our NVIDIA team is seeking new talent for AI Agents, LLMs, and Robotics. Reach out via DMs if interested!
I'm going to ICML in Hawaii!
My team pushes the research frontier in AI agents, multimodal LLMs, game AI, and robotics. If you're interested in joining NVIDIA or collaborating with me, please reach out by email! My contact info is at
If applicable,…
Ajay gave a great talk on our RoboTurk project
#IROS2019
, nominated for Best Paper on Cognitive Robotics. Large-scale real robot dataset through crowd teleportation! More information can be found at
Uploading physical objects to the virtual world (metaverse) by observing and interacting with them in the real world. Exciting new work on sim2real via real2sim with articulated objects
#CVPR2022
#Ditto
robosuite v1.2 released: new sensor simulation APIs, visual/dynamics/sensor randomization for sim2real, enhanced operational space controllers, and human demonstrations! Check it out from here:
Pleased to be invited by
@SamsungUS
to talk about my research on robot perception and learning. Covered our latest work on self-supervised sensorimotor learning, hierarchical planning, and cognitive learning and reasoning in the open world. Video:
Excited to introduce 𝚛𝚘𝚋𝚘𝚖𝚒𝚖𝚒𝚌, a new framework for Robot Learning from Demonstration. This open-source library is a sister project of 𝚛𝚘𝚋𝚘𝚜𝚞𝚒𝚝𝚎 in our ARISE Initiative. Try it out!
Robot learning from human demos is powerful yet difficult due to a lack of standardized, high-quality datasets.
We present the robomimic framework: a suite of tasks, large human datasets, and policy learning algorithms.
Website:
1/
A nice summary of our recent works on imitation learning from visual demonstration. Compositionality and abstraction are key to scaling up IL algorithms to long-horizon manipulation tasks.
What if we can teach robots to do new task just by showing them one demonstration?
In our newest blog post,
@deanh_tw
and
@danfei_xu
show us three approaches that leverage compositionality to solve long-horizon one-shot imitation learning problems.
Pleased to see our Sirius paper nominated for the Best Paper Award
#RSS2023
: Join our presentation in Daegu, Korea on July 11th!
Exciting times ahead as our lab explores the new frontier of 𝗥𝗟𝗢𝗽𝘀 (Robot Learning + Operations) in long-term deployment.
Like the best chess players are human-AI teams (centaurs), trustworthy deployment of robot learning models needs such a partnership! Sirius is our first milestone toward Continuous Integration and Continuous Deployment (CI/CD) for robot autonomy during long-term deployments👇
Join us on July 12th at
#RSS
. This workshop will introduce the end-to-end GPU accelerated training pipeline in
#NVIDIA
Isaac Gym, demonstrate
#robotics
applications, and answer questions in breakout sessions. Register here:
#AI
#robots
#nvidiaisaac
I will give a talk at
#SXSW2024
on How to Train a Humanoid Robot tomorrow from 10 to 11:30 a.m. Come to check out our ramen-cooking DRACO 3 robot developed
@texas_robotics
and learn the technical stories behind it!
We introduce APT-Gen to procedurally generate tasks of rich variations as curricula for reinforcement learning in hard-exploration problems.
Webpage:
Paper:
w/
@yukez
@silviocinguetta
@drfeifei
At
#IROS2019
next week, I will be giving invited talks:
1. "Learning How-To Knowledge from the Web" at AnSWeR (Mon)
2. "Learning Keypoint Representations for Robot Manipulation" at LRPC (Fri)
Come to learn about our recent work!
Revisiting
@DavidEpstein
's book Range: Why Generalists Triumph in a Specialized World as a roboticist unveils a compelling insight: Elite athletes usually start broad and embrace diverse experiences as a generalist prior to delayed specialization.
Given this prevailing pathway…
This is a fantastic initiative and exciting collaboration between industry and academia toward unleashing the future of Robot Learning as a Big Science! We must join forces in the quest for the north-star goal of generalist robot autonomy.
Introducing 𝗥𝗧-𝗫: a generalist AI model to help advance how robots can learn new skills. 🤖
To train it, we partnered with 33 academic labs across the world to build a new dataset with experiences gained from 22 different robot types.
Find out more:
Excited about coming back to Vancouver, the beautiful city where I attended college, for
#NeurIPS2019
Looking forward to catching up with the latest research and hanging out with friends!
Like the best chess players are human-AI teams (centaurs), trustworthy deployment of robot learning models needs such a partnership! Sirius is our first milestone toward Continuous Integration and Continuous Deployment (CI/CD) for robot autonomy during long-term deployments👇
Deep learning for robotics is hard to perfect. How do we harness existing models for trustworthy deployment, and make them continue to learn and adapt?
Presenting Sirius🌟, a human-in-the-loop framework for continuous policy learning & deployment!
🌐:
What's the best way for humans to teach robots?
I'm excited to announce MimicPlay, an imitation learning algorithm that extracts the most signals from unlabeled human motions.
MimicPlay combines the best of 2 data sources:
1) Human "play data": a person uses their hands to…
Debates abound on the necessity of a human-like form factor for generalist robots. Humanoid robots are overkill now, but they make sense from first principles. A deep tech will always be overkill until we put hard work into it. Why not stop debating and shape the future together?
Yet another manifestation of the power of hybrid Imitation + Reinforcement learning! Imitating high-level cognitive reasoning 🧠 from humans + reinforcing agile motor actions 🦿 in parallel simulation = quadrupedal locomotion in dynamic, human-centered environments
#PRELUDE
Introducing PRELUDE, a hierarchical learning framework that allows a quadruped to traverse across dynamically moving crowds. The robot learns gaits from trial and error in simulation and decision-making from human demonstration.
Paper, Code, Videos:
In case you missed our 3D Vision and Robotics (3DVR) Workshop at
#CVPR2021
, we have released the invited talks, contributed papers, and panel discussions online. Check them out to learn about the latest progress in this booming research area.
Website:
We're excited to announce that our 3D Vision and Robotics workshop will be returning to
@CVPR
2023! We can't wait to connect with researchers from both communities and exchange ideas and insights.
Imitation learning often involves significant human effort to collect a large dataset for robust policy learning. How can we train robust policies in low-data regimes?
Our imitation learning framework PRIME scaffolds manipulation tasks with behavior primitives, breaking down…
Fun project with
@NVIDIAAI
collaborators on visual cognition and concept learning turned into a
@NeurIPSConf
spotlight. Looking forward to seeing more works on meta-learning and neuro-symbolic AI in this space.
Finally got a chance to watch
@NandoDF
's
#CoRL2021
keynote on Data-Driven Robotics. Loved the 10 lessons from his robot learning research
@DeepMind
and great to see the research progress over the last few years! Check it out in case you missed it.
I see the word “foundation” in “foundation models” as a closer analog of foundation in cosmetics (a base for the rest of the makeup) than an underlying principle. The problem is that people mix it up with "foundational," and the name is so catchy that I can’t stop saying it...
Learning vision-based manipulation on a shoestring budget of time and human efforts. Bottom-up principle to discover skills from demonstrations + Skills as building blocks to scaffold long-horizon behaviors = Solving multi-stage tasks within 30min data collection on a real robot.
Vision-based long-horizon manipulation is challenging. We introduce BUDS, a bottom-up method to extract reusable sensorimotor skills from hierarchical task structures. It can compose these skills to solve multi-stage manipulation tasks with demonstrations collected in 30min.
Brilliant performance art of hacking Google Maps illustrating why we should keep a critical eye on our reliance on technology and algorithmic governance in our daily lives
99 smartphones are transported in a handcart to generate virtual traffic jam in Google Maps. Through this activity, it is possible to turn a green street red which has an impact in the physical world by navigating cars on another route!
#googlemapshacks
It’s Commencement
@Stanford
today, one of my favorite days of the year! I’m a proud advisor of many wonderful students receiving undergrad, master or PhD degrees! Best wishes to you all - make all of us proud! And be a technologist with both smart and heart! <3
I am super excited to attend
#CoRL2019
in Osaka this week and
#IROS2019
in Macau next week, presenting our recent works on Hierarchical Planning and RoboTurk. Looking forward to catching up with friends and colleagues! Drop me a message if you'd like to have coffee and chat!
We have two papers on learning keypoint representations for robot manipulation accepted in
#ICRA2020
.
6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
KETO: Learning Keypoint Representations for Tool Manipulation
I will be at
#CVPR2022
next week and
#RSS2022
the week after. Looking forward to catching up with vision and robotics friends in the Big Easy and the Big Apple. Please reach out if you want to chat😃
robosuite v1.1 released:
Highlights:
1. Major refactoring to simplify procedural generation of objects and environments for fast prototyping
2. Contributing guideline:
Try it: pip install robosuite
Finally on my way to
#CoRL2022
in Auckland, NZ! Super pumped to have my first conference in Oceania and catch up with robot learning friends. Reach out if you'd like to chat!
Attending
#ICRA2019
in Montreal for the next three days. Will be presenting our best paper-nominated work on multimodal perception for contact-rich manipulation. Looking forward to syncing up with friends and colleagues!