Chen Wang @chenwang_j Twitter profile | Pikagi

Pikagi

Chen Wang

@chenwang_j

2,480

Followers

687

Following

44

Media

285

Statuses

PhD student @StanfordSVL @StanfordAILab . Prev @NVIDIA @MIT_CSAIL . Robotics/Manipulation

Stanford, CA

https://t.co/3T8u1fF6Qg

Joined January 2021

Don't wanna be here? Send us removal request.

Pinned Tweet

@chenwang_j

Chen Wang

2 months

Can we use wearable devices to collect robot data without actual robots? Yes! With a pair of gloves🧤! Introducing DexCap, a portable hand motion capture system that collects 3D data (point cloud + finger motion) for training robots with dexterous hands Everything open-sourced

21

131

625

Last Seen Profiles

@pascalxquinn

@enamr

@gibril2t

@FiveRoundMMA

@FirecrestFilms

@eddieswift99

@DytArwen

@sinhyej82166310

@FPWellman

@SimplySarc

@BrawlStars

@thestarvingact

@SummerTimeAlice

@oshikatsu_0402

@bigtubb

@momochi_new

@_BigDee001

@stw_pdg

@SafaricomET

@STyneWorks

@tinicupid_

@mdluffylvr

@chiquifta

@1998524_stpr

@VlcMobilitat

@mjSBJSh2sJt0KEE

@Chitituit

@PaaSDev

@ecuadorenvivo

@bagseon63137671

@TRYCS

@ss_bunbun

@8SRXX

@ecuadorenvivo

@C29O2H55Qpi9J4n

@HitMizz

@chenwang_j

Chen Wang

1 year

How to teach robots to perform long-horizon tasks efficiently and robustly🦾? Introducing MimicPlay - an imitation learning algorithm that uses "cheap human play data". Our approach unlocks both real-time planning through raw perception and strong robustness to disturbances!🧵👇

20

144

741

@chenwang_j

Chen Wang

8 months

How to chain multiple dexterous skills to tackle complex long-horizon manipulation tasks? Imagine retrieving a LEGO block from a pile, rotating it in-hand, and inserting it at the desired location to build a structure. Introducing our new work - Sequential Dexterity 🧵👇

27

92

473

@chenwang_j

Chen Wang

3 years

Can robots learn hand-eye coordination simply from teleoperated human demonstrations? Our new #IROS2021 paper presents a novel action space to enable this! Website: 1/9

3

37

107

@chenwang_j

Chen Wang

2 years

1/ Can we improve the generalization capability of a vision-based task planner with representation pretraining? Check out our RAL paper on learning to plan with pre-trained object-level representation. Website:

2

13

72

@chenwang_j

Chen Wang

10 months

The combination of LLM and VLM shows great potentials in grounding “Where” and “How” to act in 3D observation space. Such capability allows the robot to perform visuomotor manipulation in a zero-shot fashion! Checkout VoxPoser, amazing work led by @wenlong_huang at @StanfordSVL

@wenlong_huang

Wenlong Huang

10 months

How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 🧵👇

10

141

582

0

14

69

@chenwang_j

Chen Wang

4 months

Everything is soooo fast. Are these all happen within 3 months???

@zipengfu

Zipeng Fu

4 months

Introduce 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Learning! With 50 demos, our robot can autonomously complete complex mobile manipulation tasks: - cook and serve shrimp🦐 - call and take elevator🛗 - store a 3Ibs pot to a two-door cabinet Open-sourced! Co-led @tonyzzhao , @chelseabfinn

188

894

4K

4

4

60

@chenwang_j

Chen Wang

2 months

wow 200hz low-level control frequency! Thanks for sharing details, learned a lot!

@coreylynch

Corey Lynch

2 months

Finally, let's talk about the learned low-level bimanual manipulation. All behaviors are driven by neural network visuomotor transformer policies, mapping pixels directly to actions. These networks take in onboard images at 10hz, and generate 24-DOF actions (wrist poses and…

19

61

407

3

3

47

@chenwang_j

Chen Wang

2 months

Motion capture gloves, unlike vision-based tracking, are not affected by occlusions during hand-object interactions, perfect for mocap in daily activities. With an RGB-D camera, DexCap reconstructs 3D scenes and aligns motion data, all powered by a mini-PC in the backpack. 2/

1

3

36

@chenwang_j

Chen Wang

2 months

We then retarget the mocap data to the robot embodiment. This includes (1) Observation retargeting by switching the camera system from human to robot. (2) Action retargeting by matching fingertip positions with IK. (3) Bridging the visual gap by including robot point clouds. 3/

1

2

36

@chenwang_j

Chen Wang

2 months

Exactly! Checking out for all the hardware/code/data details!

@DrJimFan

Jim Fan

2 months

DexCap: a $3,600 open-source hardware stack that records human finger motions to train dexterous robot manipulation. It's like a very "lo-fi" version of Optimus, but affordable to academic researchers. This isn't teleoperation: data collection is decoupled from the robot…

23

161

748

0

3

32

@chenwang_j

Chen Wang

8 months

We hope Sequential Dexterity paves the path for future research on long-horizon dexterous manipulation. Feel free to check out our code! Website & Paper: Code: Work done w/ Yuanpei Chen, @drfeifei , and Karen Liu at @StanfordAILab .

0

6

29

@chenwang_j

Chen Wang

8 months

Designing such a wearable and portable exoskeleton system is extremely hard. And its low-cost ($300 per arm)! Another great work from @haoshu_fang !

@haoshu_fang

Hao-Shu Fang

8 months

🤖Joint-level control + portability = robot data in the wild! We present AirExo, a low-cost hardware, and showcase how in-the-wild data enhances robot learning, even in contact-rich tasks. A promising tool for large-scale robot learning & TeleOP, now at !

6

37

206

0

2

24

@chenwang_j

Chen Wang

8 months

Thanks Jim's amazing summary of our key insight - the bi-directional optimization for skill chaining. Check out our code () and video () to see how we make it work!

Tweet card media

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon...

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation (CoRL 2023)Authors: Yuanpei Chen*, Chen Wang*, Li Fei-Fei, C. Karen LiuWebsit...

www.youtube.com

@DrJimFan

Jim Fan

8 months

This is "Sequential Dexterity", a neural network that controls a robot arm to build legos given a manual 🤖 To do this task, the robot needs to chain together multiple skills (grasping, re-orienting, pushing, etc.) and execute without compounding error. I find some very simple…

10

68

331

0

3

24

@chenwang_j

Chen Wang

8 months

The core of the system is a learning-based transition feasibility function that progressively finetunes the sub-policies (learned with RL) for enhancing chaining success, which can also be used during skill selection for re-planning from failures and bypassing redundant stages.

2

3

23

@chenwang_j

Chen Wang

4 months

Probably the best manipulation result ever seen. Huge congrats to the team!

@tonyzzhao

Tony Z. Zhao

4 months

Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware! A low-cost, open-source, mobile manipulator. One of the most high-effort projects in my past 5yrs! Not possible without co-lead @zipengfu and @chelseabfinn . At the end, what's better than cooking yourself a meal with the 🤖🧑‍🍳

236

1K

5K

0

2

23

@chenwang_j

Chen Wang

2 months

We train a point cloud-based Diffusion Policy with retargeted human mocap data only. The robot controls both hands (46-dim action space) to perform tasks including collecting tennis balls🎾 and packaging objects🎁. All the policies are learned without any teleoperation data. 4/

1

1

22

@chenwang_j

Chen Wang

2 months

However, DexCap is not yet ready for tasks that require applying force, as positional data alone is insufficient. Therefore, we introduce DexCap for human-in-the-loop correction during rollouts. Within 30 trials of corrections, our robot can prepare tea🍵 and use scissors✂️. 7/

1

1

22

@chenwang_j

Chen Wang

7 months

Using a domain definition language to set multi-task evaluation goals is very scalable and saves a lot of engineering efforts! Kudos to @yifengzhu_ut and the team! If you missed it, we've released MimicPlay code and tested on LIBERO a while ago. Check out

@yifengzhu_ut

Yifeng Zhu 朱毅枫

7 months

We are thrilled to announce LIBERO, a lifelong robot learning benchmark to study knowledge transfer in decision-making and robotics at scale! 🤖 LIBERO paves the way for prototyping algorithms that allow robots to continually learn! More explanations and links are in the 🧵

2

58

245

0

1

19

@chenwang_j

Chen Wang

8 months

How to acquire generalizable robotic skills has garnered much attention recently. We invite you to join our CoRL 2023 workshop - Towards Generalist Robots. Don't miss the chance to hear from our amazing speakers and share your insights through paper submission (Before Oct. 16)!

@zhou_xian_

Zhou Xian

8 months

🤖How far are we from 𝐠𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐬𝐭 𝐫𝐨𝐛𝐨𝐭𝐬? 𝐀𝐧𝐧𝐨𝐮𝐧𝐜𝐢𝐧𝐠 the 1st Workshop on "Towards Generalist Robots" at #CoRL2023 ! Join us to discuss how to scale up robotic skill learning, with an amazing lineup of speakers! CfP: Details 👇

Tweet media one

4

34

130

0

0

19

@chenwang_j

Chen Wang

2 years

Our workshop on "Overlooked Aspects of Imitation Learning" is happening this Monday June 27 from 11:00am - 12:30pm EST at #RSS2022 . Welcome to joining us virtual or in-person and don't forget it is EST time zone!

@AjayMandlekar

Ajay Mandlekar

2 years

Our #RSS2022 workshop on "Overlooked Aspects of Imitation Learning" is this Monday June 27 - join us virtual or in-person! We have a wonderful lineup of speakers ( @ancadianadragan @ankurhandos @chelseabfinn @LerrelPinto Mohi Khansari @shimon8282 ) See:

Tweet media one

0

9

46

0

3

19

@chenwang_j

Chen Wang

3 months

Another amazing work shows the tremendous potential of "collecting robot data without a robot". These portable low-cost systems really pave the way for scaling up data collection. Super looking forward to what further advancements will emerge.

@chichengcc

Cheng Chi

3 months

Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

42

342

2K

1

1

19

@chenwang_j

Chen Wang

2 months

We hope DexCap paves the path for future research on scaling up robot data with wearable devices. The code, data, and hardware are open-sourced at 🌐 Work done w/ @HaochenShi74 @KenWangWeizhuo @RuohanZhang76 @drfeifei Karen Liu @StanfordAILab @StanfordSVL

1

1

18

@chenwang_j

Chen Wang

1 year

We observe that human play data is fast and easy to collect, which also covers diverse behavior and situations. On the other hand, although robot data is slow and limited, it does not have embodiment gaps. MimicPlay is a method designed to combine the best of both worlds. (2/N)

1

2

17

@chenwang_j

Chen Wang

8 months

Despite being trained only in simulation with a few task objects, our system demonstrates generalization capability to novel object shapes and is able to zero-shot transfer to a real-world robot equipped with a dexterous hand.

1

1

18

@chenwang_j

Chen Wang

8 months

Thanks @_akhaliq for sharing! Feel free to check out more at

@_akhaliq

AK

8 months

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation paper page: Many real-world manipulation tasks consist of a series of subtasks that are significantly different from one another. Such long-horizon, complex tasks highlight…

2

27

108

0

5

17

@chenwang_j

Chen Wang

1 year

Waiting for this thread for a long time! Fantastic work by @tonyzzhao showing the great potential of this bimanual teleoperation system. The conditional action synthesis makes a lot of sense in handling human demos for long-horizon tasks!

@tonyzzhao

Tony Z. Zhao

1 year

Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation After 8 months iterating @stanford and 2 months working with beta users, we are finally ready to release it! Here is what ALOHA is capable of:

94

711

3K

1

2

17

@chenwang_j

Chen Wang

2 months

DexCap is fully portable and can scale up data collection in the wild. By collecting data with multiple objects in diverse environments, the learned policy can generalize to unseen objects for the same task. 5/

1

0

17

@chenwang_j

Chen Wang

1 year

MimicPlay is a hierarchical imitation learning algorithm that leverages cheap and non-labeled human play data (10 minutes) for learning the high-level planner and a small amount of robot data (20 demonstrations ~ 20 minutes) for learning a plan-guided low-level controller. (3/N)

Tweet media one

2

1

16

@chenwang_j

Chen Wang

9 months

Fortunate to have early access to try out this infra. Indeed very useful tool for IG and other simulations!

@QinYuzhe

Yuzhe Qin

9 months

Just dropped sim_web_visualizer! 🚀 Transform the way you view simulation environments right from your web browser like Chrome. Dive into more examples on our Github:

Tweet media one

7

36

237

0

0

16

@chenwang_j

Chen Wang

2 months

Also check out the fun failure modes of our robot. 8/

1

1

15

@chenwang_j

Chen Wang

18 days

Great to see such natural power grasping and tool-use motions, and super smooth! Now, robots can even play video games with a joystick controller 🤣 Amazing results highlighting the strength of vision+tactile. Congrats @ToruO_O !

@ToruO_O

Toru

18 days

Imitation learning works™ – but you need good data 🥹 How to get high-quality visuotactile demos from a bimanual robot with multifingered hands, and learn smooth policies? Check our new work “Learning Visuotactile Skills with Two Multifingered Hands”! 🙌

7

67

273

1

1

15

@chenwang_j

Chen Wang

2 months

DexCap enables fast data collection, approximating the speed of natural human motion. Moreover, the collection process does not require costly robot hardware. 6/

1

1

14

@chenwang_j

Chen Wang

1 year

Pretty cool results with diffusion model + visuomotor imitation learning! It is always painful to learn from multi-modal demos. Seems like iteriative diffusion policy is a promising direction! congrats @chichengcc @SongShuran

@chichengcc

Cheng Chi

1 year

What if the form of visuomotor policy has been the bottleneck for robotic manipulation all along? Diffusion Policy achieves 46.9% improvement vs prior StoA on 11 tasks from 4 benchmarks + 4 real world tasks! (1/7) website : paper:

9

100

534

0

1

13

@chenwang_j

Chen Wang

28 days

Truly stunning results in manipulating deformable objects. The motion of spreading out the cloth is fascinating!

@tonyzzhao

Tony Z. Zhao

28 days

Introducing 𝐀𝐋𝐎𝐇𝐀 𝐔𝐧𝐥𝐞𝐚𝐬𝐡𝐞𝐝 🌋 - Pushing the boundaries of dexterity with low-cost robots and AI. @GoogleDeepMind Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!

56

330

1K

0

0

13

@chenwang_j

Chen Wang

8 months

Considering how fragile allegro hand is, the sample efficiency of the online learning is truly amazing! Plier cutting is a cool result highlighting the potential of dexterous hands using human tools.

@LerrelPinto

Lerrel Pinto

8 months

We just released TAVI -- a robotics framework that combines touch and vision to solve challenging dexterous tasks in under 1 hour. The key? Use human demonstrations to initialize a policy, followed by tactile-based online learning with vision-based rewards. Details in🧵(1/7)

11

72

305

0

2

12

@chenwang_j

Chen Wang

2 months

We especially thank @kenny__shaw @anag004 @pathak2206 for open-sourcing the LEAP Hand project. Having a customizable and low-cost dexterous hand benefits our project a lot!

0

0

13

@chenwang_j

Chen Wang

1 year

We found MimicPlay significantly outperforms prior methods in performance and sample efficiency. With only 20 robot demonstrations and a planner learned with 10 minutes of human play data (shared across tasks), MimicPlay can perform long-horizon tasks such as baking foods. (6/N)

1

2

12

@chenwang_j

Chen Wang

1 year

We hope MimicPlay paves the path for future research to scale up robot learning with affordable human costs. 🌐Project site 📄PDF (9/N)

1

0

12

@chenwang_j

Chen Wang

18 days

Amazing cross-platform system for multiple end-effectors. Nice work @xuxin_cheng and super fast move!

@xuxin_cheng

Xuxin Cheng

18 days

🤖Introducing 📺𝗢𝗽𝗲𝗻-𝗧𝗲𝗹𝗲𝗩𝗶𝘀𝗶𝗼𝗻: a web-based teleoperation software! 🌐Open source, cross-platform (VisionPro & Quest) with real-time stereo vision feedback. 🕹️Easy-to-use hand, wrist, head pose streaming. Code:

12

80

350

1

3

12

@chenwang_j

Chen Wang

1 year

Visual correspondence paves the way for numerous downstream tasks. Learning form internet-scale video further reveals its full strength! Very interested in trying it out for manipulation and more! Nice work from @agrimgupta92 🚀

@agrimgupta92

Agrim Gupta

1 year

How should we leverage internet videos for learning visual correspondence? In our latest work we introduce SiamMAE: Siamese Masked Autoencoders for self-supervised representation learning from videos. web: paper: 👇🧵

16

122

484

0

0

11

@chenwang_j

Chen Wang

3 months

Huge congrats to @DrJimFan and @yukez on the exciting move! Really enjoyed working with you last year and can't wait to see what this team will achieve!🦾

@DrJimFan

Jim Fan

3 months

Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof. @yukez . GEAR stands for Generalist Embodied Agent Research. We believe in a future where every machine that moves will be autonomous, and robots and…

Tweet media one

241

471

4K

0

0

10

@chenwang_j

Chen Wang

3 months

Amazing robot hand hardware and tracking results!

@BostonDynamics

Boston Dynamics

@BostonDynamics

3 months

Can't trip Atlas up! Our humanoid robot gets ready for real work combining strength, perception, and mobility.

225

1K

5K

0

0

11

@chenwang_j

Chen Wang

3 years

I’ll present our work on learning human-robot collaboration in simulation today @corl_conf (Wed 5:00-6:00pm GMT, 9:00-10:00am PST). Drop by the poster (Session V Booth 4) and check out our paper () with code () to learn more!

Tweet card media

GitHub - j96w/cogail: Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration - j96w/cogail

0

2

11

@chenwang_j

Chen Wang

10 months

New updates of robomimic v0.3 are out! We have included several new algorithms and features. Check them out and let us know any feedbacks!

@AjayMandlekar

Ajay Mandlekar

10 months

robomimic v0.3 released - the most major upgrade yet! New features: 🧠 New Algorithms (BC-Transformer, IQL) 🤖 Full compatibility with robosuite v1.4 and @DeepMind 's MuJoCo bindings 👁️ Pre-trained image reps 📈 wandb logging @weights_biases try it out:

1

19

79

1

0

10

@chenwang_j

Chen Wang

1 year

We further test MimicPlay in more challenging multi-task learning settings, where we found that MimicPlay has the smallest performance drop compared to prior methods. This result highlights MimicPlay's capability to handle diverse tasks within one model. (8/N)

1

1

9

@chenwang_j

Chen Wang

1 year

More importantly, after training multiple tasks within one model, MimicPlay is able to generalize to new tasks with unseen temporal compositions. (7/N)

1

1

9

@chenwang_j

Chen Wang

2 years

Super cool design for haptic feedbacks!

0

0

8

@chenwang_j

Chen Wang

8 months

Very neat design of the fingertip tactile sensor. Beautiful results!

@HaozhiQ

Haozhi Qi

8 months

🦾 Our robot hand can rotate objects over 6+ axes in the real-world! Introducing RotateIt (CoRL 2023), a Sim-to-Real policy that can rotate many objects over many axes, using vision and touch! Check it out: . Paper: . #CoRL2023

7

28

184

1

1

8

@chenwang_j

Chen Wang

4 months

Nice setup for bringing robot arm out of the lab. Congrats!

@Haoyu_Xiong_

Haoyu Xiong

4 months

Introducing Open-World Mobile Manipulation 🦾🌍 – A full-stack approach for operating articulated objects in open-ended unstructured environments: Unlocking doors with lever handles/ round knobs/ spring-loaded hinges 🔓🚪 Opening cabinets, drawers, and refrigerators 🗄️ 👇…

30

103

784

1

1

8

@chenwang_j

Chen Wang

1 year

The high-level planner is first trained as a goal-conditioned policy. It takes the current and goal images from human play data and outputs a latent plan. We also use a KL-loss to minimize the visual gap between human and robot data. (4/N)

Tweet media one

1

1

8

@chenwang_j

Chen Wang

1 year

amazing

@erwincoumans

Erwin Coumans 🇺🇦

1 year

OpenAI ChatGPT is excellent creating PyBullet scripts: "can you create a pybullet script with a ground plane, a box and a sphere on top."+"can you add 10 more boxes on top?"+"can you move the 5th box 0.3 units along the x axis?"+"can you add a quadruped robot next to the boxes?"

Tweet media one

4

16

139

0

0

8

@chenwang_j

Chen Wang

1 year

In the second step, we freeze the weights of the trained latent planner. The latent planner takes the current and goal images from the robot data and generates a latent plan to train the low-level controller with a plan-guided imitation learning algorithm. (5/N)

Tweet media one

1

1

7

@chenwang_j

Chen Wang

1 year

@Stone_Tao Great question! The primary issue at hand is whether we have sufficient data to support end2end. This becomes particularly challenging for tasks with longer horizons, as the accumulation of errors requires an abundance of data to cover diverse scenarios.

1

1

7

@chenwang_j

Chen Wang

2 years

Imitation learning for real world problems takes more than new algorithms - check out our RSS21 workshop on Overlooked Aspects of Imitation Learning - consider submitting by May 7th!

@danfei_xu

Danfei Xu@ICRA24

2 years

Applying imitation learning to real world problems takes more than new algorithms. We are organizing a workshop "Overlooked Aspects of Imitation Learning: Systems, Data, Tasks, and Beyond” at RSS22! Exciting speakers & more to come. Submit by May 7th!

1

8

65

0

0

6

@chenwang_j

Chen Wang

1 year

Work done during @NVIDIA internship with @DrJimFan , @JiankaiSun , @RuohanZhang76 , @drfeifei , @danfei_xu , @yukez and @AnimaAnandkumar . (in collaboration with @StanfordAILab ) (N/N)

2

0

5

@chenwang_j

Chen Wang

3 years

This is a joint project with amazing co-authors: Rui Wang( @Stanford ) @AjayMandlekar @drfeifei @silviocinguetta @danfei_xu at @StanfordSVL @StanfordAILab 9/9

0

0

4

@chenwang_j

Chen Wang

2 months

@tonyzzhao @kenny__shaw Thanks Tony!! Open-sourcing shapes the future🙌

0

0

4

@chenwang_j

Chen Wang

2 years

Super cool results. Multimodal prompts largely improve the flexibility of goal-conditioned policy!

@DrJimFan

Jim Fan

2 years

We trained a transformer called VIMA that ingests *multimodal* prompt and outputs controls for a robot arm. A single agent is able to solve visual goal, one-shot imitation from video, novel concept grounding, visual constraint, etc. Strong scaling with model capacity and data!🧵

18

147

870

1

0

4

@chenwang_j

Chen Wang

5 months

Amazing video generation results! Congrats Agrim and the team!

@agrimgupta92

Agrim Gupta

5 months

We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇

55

267

1K

1

0

4

@chenwang_j

Chen Wang

8 months

@kevin_zakka Thanks Kevin! huge fan of RoboPianist!

0

0

2

@chenwang_j

Chen Wang

2 years

Super cool results! Congrats @yifengzhu_ut !

@yifengzhu_ut

Yifeng Zhu 朱毅枫

2 years

How can robot manipulators perform in-home tasks such as making coffee for us? We introduce VIOLA, an imitation learning model for end-to-end visuomotor policies that leverages object-centric priors to learn from only 50 demonstrations!

9

57

320

0

0

3

@chenwang_j

Chen Wang

2 months

@DrJimFan Thanks Jim!!

0

0

3

@chenwang_j

Chen Wang

1 year

@kaixhin We 3d printed it and the model is here -

0

0

3

@chenwang_j

Chen Wang

2 months

@BoyuanChen0 @chichengcc @chris_j_paxton 1x speed video

@simonkalouche

Simon Kalouche

2 months

Humanoids w legs and anthropomorphic hands will never beat this robot in cost, speed, maintenance or reliability in the multi-trillion dollar warehouse/manufacturing market where climbing stairs is unnecessary. Customers will choose faster, lower cost & more reliable every time.

28

30

196

2

0

3

@chenwang_j

Chen Wang

2 months

@ralfgulde @StanfordAILab It's LEAP hand. Invented by @kenny__shaw . More info

1

0

3

@chenwang_j

Chen Wang

3 years

Humans learn to collaborate with others through experiences. However, it would take countless human hours to teach robots how to collaborate through trial-and-error. We ask: is it possible to teach human-robot collaboration skills through human-human collaboration demos? 3/8

1

0

2

@chenwang_j

Chen Wang

2 months

@drfeifei Thanks Fei-Fei!!

0

0

1

@chenwang_j

Chen Wang

4 months

@tonyzzhao @zipengfu amazing🥳

0

0

0

@chenwang_j

Chen Wang

8 months

@wenlong_huang Thanks Wenlong for all the nice feedback and discussion!

0

0

2

@chenwang_j

Chen Wang

5 months

@RuohanGao1 @umdcs @UofMaryland Big congrats Ruohan!🎉

0

0

1

@chenwang_j

Chen Wang

8 months

@QinYuzhe Thanks Yuzhe!

0

0

2

@chenwang_j

Chen Wang

2 years

@yukez @snasiriany @huihan_liu @UTCompSci @texas_robotics Cool! Congrats 🎉

0

0

2

@chenwang_j

Chen Wang

2 months

@kenny__shaw Thanks Kenny!! LEAP hand really helps us greatly. Can’t wait to try out more new hands from you!

0

0

2

@chenwang_j

Chen Wang

2 months

@astroboticist congrats, amazing work!!

0

0

0

@chenwang_j

Chen Wang

9 months

@KuanFang @Cornell @CornellCIS @Cornell_CS Huge congrats!

0

0

1

@chenwang_j

Chen Wang

3 years

Having a robot assistant at home that could seamlessly assist us with daily activities is a long-sought dream. To achieve this goal, the robot needs to recognize and react to their human partners’ intentions on-the-fly. 2/8

Tweet media one

1

0

2

@chenwang_j

Chen Wang

8 months

@pathak2206 Thanks Deepak! Very interested in LEAP hand and functional grasp

0

0

2

@chenwang_j

Chen Wang

2 months

@HaochenShi74 Thanks Haochen! Really looking forward to seeing your new creations!

0

0

2

@chenwang_j

Chen Wang

3 years

In this work, we take a step forward to improve the generalization capability of traditional imitation learning algorithms by introducing a novel human-like hand-eye coordination action space. We hope it can inspire future studies on generalizable offline policy learning. 8/9

Tweet media one

1

0

2

@chenwang_j

Chen Wang

2 years

@shaneguML @hardmaru @GoogleAI @UTokyo_News_en @Matsuo_Lab Congrats! 🎉 really cool

0

0

2

@chenwang_j

Chen Wang

1 year

@kevin_zakka Great question! We tried using human demos as task video prompts, which works for short-horizon subgoals such as opening an oven or turning off a lamp. However, it encountered difficulty with longer-horizon tasks due to a mismatch in motion speed between human and robot.

0

0

2

@chenwang_j

Chen Wang

2 years

@SongShuran Congrats!

0

0

1

@chenwang_j

Chen Wang

8 months

@tonyzzhao Thanks Tony!

0

0

1

@chenwang_j

Chen Wang

1 year

@hameleoned Awesome catch! We find lots of cases that human hand and the robot gripper need to be used in very different ways to accomplish the same task (e.g., open an oven). Such embodiment gap between human and robot makes it hard to learn low-level control solely from human data.

2

0

1

@chenwang_j

Chen Wang

2 months

@JasonMa2020 @dineshjayaraman @obastani congrats!!

1

0

1

@chenwang_j

Chen Wang

1 year

@AndreTI @olivercameron Great point! We are paying close attention to human/robot motion synthesis with diffusion. The key challenge is how to ground such generated behavior to the physical world/objects. Let's say the hand needs to reach the 3D location of a door handle in the correct pose to open it.

1

0

1

@chenwang_j

Chen Wang

2 months

@chichengcc Thanks Cheng! Collecting robot data without robots is promising🙌

0

0

1

@chenwang_j

Chen Wang

2 months

@DJiafei Thanks Jiafei! AR2D2 is a great idea!

0

0

1

@chenwang_j

Chen Wang

8 months

@_krishna_murthy Thanks Krishna!

0

0

1

@chenwang_j

Chen Wang

2 years

6/ With the pre-trained representation, a state transition model is learned to predict the skill effects. The key insight is: the representation summarizes the common features of the objects from the same category (e.g. foods are cookable), which could transfer to new instances.

Tweet media one

1

0

1

@chenwang_j

Chen Wang

8 months

@raghavaupp Thanks Raghava, glad ppl like it!

1

0

1

@chenwang_j

Chen Wang

8 months

@drfeifei Thanks Fei-Fei!

0

0

1

@chenwang_j

Chen Wang

2 months

@wenlong_huang Thanks Wenlong!!

0

0

1

@chenwang_j

Chen Wang

2 months

@LerrelPinto @ieeeras Congrats Lerrel!!

0

0

1

@chenwang_j

Chen Wang

1 year

@allenzren Thanks for bringing this up! Q_H and Q_R represent the feature embedding space produced by the image encoders. Yes, we treat all the data as a single distribution, and the human play data we utilized does not have any task labeling.

0

0

1

@chenwang_j

Chen Wang

1 year

@allenzren The KL is trying to minimize the distribution gap between human and robot images. I like your idea of dividing the distribution by clustering the data with 3D trajectories and employing KL based on the clustered features. This seems reasonable and might enhance the results!

0

0

1

@chenwang_j

Chen Wang

8 months

@lucy_x_shi Thanks Lucy!

0

0

1

@chenwang_j

Chen Wang

2 years

7/ Finally, we can use a simple tree search algorithm to find the task plans for different symbolic goals. Experiments show that the task planner based on the pre-trained representation could successfully transfer to new objects and new scene layouts.

1

0

1