Xiaolong Wang @xiaolonw Twitter profile

Pinned Tweet

Xiaolong Wang

@xiaolonw

12 days

This is accepted to RSS! Congratulations to @xuxin_cheng and @JiYandong on their first work in phd.

Xiaolong Wang

@xiaolonw

3 months

Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance? Introducing Expressive Whole-Body Control for Humanoid Robots: See how our robot performs rich, diverse,

94

214

1K

0

5

61

Last Seen Profiles

@MrsMertes

@FBUOK

@joodsactueel

@soumyadeep100

@cros1k

@ColoColo

@WeAreTeam48

@162geetubatra

@qorrer_az

@MGecko117

@84baseballcards

@NataliaGamer4

@blakethorpe21

@valrey

@KoraPlusEG

@Arun_Sasha

@sb___official

@ohio_jayne

@LoveLiv15

@Prophet_GS

@EnesOMAR_

@headIineowner

@fellarequests

@Paupelos

@YoonsunL21205

@haketaasuk86679

@Ayana_Mathis

@morokumash87229

@Sirjo01579577

@kdramatreats

@daniel_sempere1

@m05204520

@NicolaBell171

@OqAc4ma3N5646QH

@senatorduff

@1feardorian

Xiaolong Wang

@xiaolonw

3 months

Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance? Introducing Expressive Whole-Body Control for Humanoid Robots: See how our robot performs rich, diverse,

94

214

1K

Xiaolong Wang

@xiaolonw

1 year

Stable Diffusion generates beautiful images, but can it be used for open-world recognition? Try Demo! Our #CVPR2023 paper shows that the pre-trained diffusion model indeed is a good image parser, allows for open-vocabulary segmentation and detection.

7

166

701

Xiaolong Wang

@xiaolonw

3 months

Is 3D scene generation much closer to being solved all of a sudden? It has been a few days since the release of @OpenAI Sora. We run our COLMAP-Free 3D Gaussian Splatting on the released videos. Our method does not need to pre-process cameras and it seems we can directly just get

23

112

688

Xiaolong Wang

@xiaolonw

2 months

I have been cleaning my daughter's mess for more than two years now. Last weekend our robot came to home to do the job for me. 🤖 Our new work on visual whole-body control learns a policy to coordinate the robot legs and arms for mobile manipulation. See

23

117

652

Xiaolong Wang

@xiaolonw

2 years

Introducing #CVPR2022 GroupViT: Semantic Segmentation Emerges from Text Supervision 👨‍👩‍👧 Without any pixel label ever, Our Grouping ViT can group pixels bottom-up to open vocabulary semantic segments. The only training data is 30M noisy image-text pairs.

4

128

648

Xiaolong Wang

@xiaolonw

2 months

Our humanoid dancing outside #GTC2024

18

54

509

Xiaolong Wang

@xiaolonw

3 years

New work with Yinbo Chen, one of my first PhD students: Learning Continuous Image Representation with Local Implicit Image Function. Check our video showing images in arbitrary resolutions. proj: code: @YinboChen @SifeiL (1/n)

6

76

420

Xiaolong Wang

@xiaolonw

2 years

Introducing Category-Level 6D Object Pose Estimation in the Wild.🏞️ We release Wild6D: an in-the-wild object-centric RGBD dataset with 5000+videos over 1700+objects. We perform semi-supervised 6D object pose estimation on it without manual annotations.

4

77

342

Xiaolong Wang

@xiaolonw

19 days

Tesla Optimus can arrange batteries in their factories, ours can do skincare (on @QinYuzhe )! We opensource Bunny-VisionPro, a teleoperation system for bimanual hand manipulation. The users can control the robot hands in real time using VisionPro, flexible like a bunny. 🐇

10

53

324

Xiaolong Wang

@xiaolonw

1 year

The robot climbs stairs🏯, steps over stones 🪨, and runs in the wild🏞️, all in one policy, without any remote control! Our #CVPR2023 Highlight paper achieves this by using RL + a 3D Neural Volumetric Memory (NVM) trained with view synthesis!

5

66

298

Xiaolong Wang

@xiaolonw

2 months

Our humanoid robot attending #GTC2024 today with @xuxin_cheng

15

27

283

Xiaolong Wang

@xiaolonw

5 months

3D Gaussian Splatting is great, but can it work without the pre-computed camera poses? Introducing: COLMAP-Free 3D Gaussian Splatting Our recent work shows not only it can, but 3D Gaussians make camera pose estimation easy (compared to NeRF) along with reconstruction. 👇🧵

5

49

265

Xiaolong Wang

@xiaolonw

8 months

Introducing our CoRL work on Dynamic Handover. We humans often pass along objects, a baseball, a bottle of water, using throw and catch. 🫴⚾️ We now enable the robot hands to throw and catch different unseen objects using RL and Sim2Real transfer!

3

42

257

Xiaolong Wang

@xiaolonw

2 years

Object 6D pose estimation in the wild can now be achieved by only self-supervision! 🏞️ No 3D GTs are needed in sim or real (camera, pose, shape). Multiple cycles across 2D-3D, instance, and time are utilized. Project page: Led by @kaiwynd (1/n

AK

@_akhaliq

2 years

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild abs: project page:

1

24

214

1

50

255

Xiaolong Wang

@xiaolonw

5 years

We released our CVPR'19 paper on Learning Correspondence from the Cycle-Consistency of Time () with code (). We hope this opens up new opportunities for unsupervised learning with videos. Check out our result for long-range tracking!

1

79

237

Xiaolong Wang

@xiaolonw

2 months

We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the

5

48

233

Xiaolong Wang

@xiaolonw

4 months

Tired of seeing synthetic objects go round and round, round and round, round and round? Introducing the WildRGB-D dataset! We collect a dataset of real-world RGB-D objects in 360 under cluttered scenes. To download: Website: ✅

4

24

222

Xiaolong Wang

@xiaolonw

1 year

Imagine if you have an object in hand, you can rotate the object by feeling without even looking. This is what we enable the robot to do now: Rotating without Seeing. Our multi-finger robot hand learns to rotate diverse objects using only touch sensing.

2

43

207

Xiaolong Wang

@xiaolonw

2 years

Dense Correspondences found in StyleGAN! #CVPR2022 CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs. We learn a disentanglement of category-level correspondence map and style with StyleGAN, and show various applications. 🧵👇 (1/n)

1

34

206

Xiaolong Wang

@xiaolonw

3 months

Walking in the morning @UCSanDiego Operators: @xuxin_cheng @JiYandong

12

22

197

Xiaolong Wang

@xiaolonw

6 months

Can a machine solve diverse computer vision tasks even on the ones it is not trained on? Introducing IMProv: It performs multimodal in-context learning for solving generic computer vision tasks. It formulates all tasks as an image inpainting problem.

2

38

201

Xiaolong Wang

@xiaolonw

5 years

I am happy to announce that I will be joining UC San Diego as an assistant professor in the ECE department in fall 2020. I am looking forward to working with great colleagues and students there! I am very grateful for all the support from my advisors, friends, and family.

15

0

201

Xiaolong Wang

@xiaolonw

3 months

Since Sora is out, I have been thinking about our role in academia. One thing we can do at school is fast prototyping with very talented students, showing the potential, the possibility. Of course, the future will always be scaling up.

Ge Yang

@EpisodeYang

3 months

That 675M is 24% of the company -- 675M round at 2.8B valuation. In the meantime, enjoy what we achieved with a nimble team of just six people :) paper link: w/ @xiaolonw @xuxin_cheng @JiYandong @RchalYang

25

79

666

3

15

198

Xiaolong Wang

@xiaolonw

4 years

How to train very deep ConvNets without residual blocks? Our ICML paper on Deep Isometric Learning successfully trains 100-layer ConvNets without any shortcut connections nor normalization layers (BN/GN) on ImageNet. Paper: Code:

4

37

181

Xiaolong Wang

@xiaolonw

3 years

Is tracking task necessary for learning to track? Our new work on self-supervised correspondence learning shows it is not! By simply comparing two video frames, without negative pairs, the correspondence can emerge for tracking objects and pixels. (1/n)

1

22

179

Xiaolong Wang

@xiaolonw

2 years

The code and model for GroupViT is released. We also host a demo on @huggingface Spaces, for open category segmentation. Try them out!

GroupViT - a Hugging Face Space by xvjiarui

huggingface.co

Xiaolong Wang

@xiaolonw

2 years

Introducing #CVPR2022 GroupViT: Semantic Segmentation Emerges from Text Supervision 👨‍👩‍👧 Without any pixel label ever, Our Grouping ViT can group pixels bottom-up to open vocabulary semantic segments. The only training data is 30M noisy image-text pairs.

4

128

648

1

38

176

Xiaolong Wang

@xiaolonw

4 years

Self-Supervised Policy Adaptation during Deployment (). We propose to adapt RL policy to new environments without any reward, by using self-supervision during deployment. Joint work with @ncklashansen , Yu Sun, @pabbeel , Alyosha Efros, @LerrelPinto (1/3)

4

40

175

Xiaolong Wang

@xiaolonw

3 years

The website of Video Autoencoder is online: We have also released our code here:

GitHub - zlai0/VideoAutoencoder: Video Autoencoder: self-supervised disentanglement of 3D structure...

Video Autoencoder: self-supervised disentanglement of 3D structure and motion (ICCV 2021). Website: https://zlai0.github.io/VideoAutoencoder/ - zlai0/VideoAutoencoder

github.com

AK

@_akhaliq

3 years

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion pdf: abs:

1

21

90

1

37

172

Xiaolong Wang

@xiaolonw

10 months

Presenting MonoNeRF at #ICML2023 We train a generalizable NeRF from: ✅Large-scale monocular videos instead of one scene ✅No GT camera poses.📷🚫 Without per-scene optimization, the model can do view synthesis, depth estimation, camera pose estimation.

5

30

170

Xiaolong Wang

@xiaolonw

9 months

Vision-Language Foundation model should go to 3D for robotics!🤖 CoRL23 Oral: GNFactor learns Generalizable Neural Feature Fields for language conditioned manipulation on diverse scenes. It unifies 3D➕Stable Diffusion features using generalizable NeRFs.

AK

@_akhaliq

9 months

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields paper page: It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world

0

43

168

1

35

168

Xiaolong Wang

@xiaolonw

3 years

We propose A-SDF, a manipulatable and differentiable implicit representation for articulated object. Given input depth image or point clouds of an object, we can infer its shape and articulation angles, and manipulate the object parts to any angles. (1/3)

1

35

164

Xiaolong Wang

@xiaolonw

2 years

We propose to train a generalizable NeRF on large-scale videos without camera ground truths! 📽️❌📷 This leads to multiple applications on monocular depth estimation, camera pose estimation, and view synthesis. Project page: w @yangfu0817 @imisra_

AK

@_akhaliq

2 years

Multiplane NeRF-Supervised Disentanglement of Depth and Camera Pose from Videos abs: project page:

0

12

70

1

32

155

Xiaolong Wang

@xiaolonw

1 month

Another bird video. We can reconstruct the details of the wings and obtain correspondences at the same time. 🕊️

Isabella Liu

@Isabella__Liu

1 month

Want to obtain time-consistent dynamic mesh from monocular videos? Introducing: Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos We reconstruct meshes with flexible topology change and build the corresp. across meshes. 🧵(1/n)

6

49

177

1

15

151

Xiaolong Wang

@xiaolonw

2 years

What is out there? Let’s look outside the room! #CVPR2022 We synthesize a consistent long-term 3D scene video to navigate out a room, given a single image as input: This is highly inspired by "Infinite Images [2010]" and "Infinite Nature [2021]". (1/n

3

19

145

Xiaolong Wang

@xiaolonw

4 years

We are hosting a CVPR tutorial on Learning Representations via Graph-structured Networks. We are having live presentations. Website: Time: Sunday 1:00pm-4:30pm Speakers: Chen Sun, Han Hu, Shubham Tulsiani, Saining Xie, Sifei Liu, and me.

2

24

147

Xiaolong Wang

@xiaolonw

2 years

It got in ECCV! it is not easy to convince vision people.. DexMV: Imitation Learning for Dexterous Manipulation from Human Videos A platform to collect, convert real-world human videos to 3D demos for imitation👩🤖 code and data:

6

20

146

Xiaolong Wang

@xiaolonw

3 years

Introducing our ICCV'21 Oral: Video Autoencoder. We disentangle the 3D structure and camera pose from a video on a static scene, in a self-supervised manner. The training objective is solely reconstructing the input video frames. page with code: (1/n

1

19

147

Xiaolong Wang

@xiaolonw

3 years

We train the network with a self-supervised super-resolution task. As the image representation is continuous, we can visualize and zoom in the image in an arbitrary resolution. arxiv: (3/n)

3

28

146

Xiaolong Wang

@xiaolonw

4 months

Introducing Texture UV Radiance Fields (TUVF) #ICLR2024 , a new generalizable NeRF that generates textures given a 3D shape input. The generation happens in a learnable UV sphere space. That is, it learns the correspondence across shapes! This allows the texture to be

2

22

145

Xiaolong Wang

@xiaolonw

2 years

New work on hands: DexPoint #CoRL2022 ! It is all about geometry and contact👆🤖👇 We perform RL with raw point cloud inputs for manipulation: open door and grasping. This brings generalization across diverse objects and much easier sim2real transfer. 1/n

1

27

138

Xiaolong Wang

@xiaolonw

2 years

We are presenting VideoINR in #CVPR2022 , a new continuous Video Implicit Neural Representation, that allows any scale super-resolution / interpolation in both space and time. Paper: Project page: 1/n)

3

19

139

Xiaolong Wang

@xiaolonw

3 years

Our quadrupedal robot is finally running in the wild, on grass, in forest, inside campus, day and night, avoiding boxes, trees, humans, using end-to-end visual RL! Work done by Chieko Sarah Imai, @Mehooz2 , Yuchen Zhang, Marcin Kierebinski, @RchalYang , @QinYuzhe

AK

@_akhaliq

3 years

Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization pdf: abs: project page:

2

12

56

2

15

136

Xiaolong Wang

@xiaolonw

2 months

If you look at the best image/video diffusion models, they are still not able to get the hands quite right, especially when interacting with objects ✍️ We present HOIDiffusion #CVPR2024 , a way to generate accurate and realistic hand-object interaction images, in diverse poses,

2

23

134

Xiaolong Wang

@xiaolonw

2 years

Excited to share our imitation learning work for dexterous manipulation, using human demos collected with single iPad camera teleoperation. We provide a new system using a customized hand in sim, and transfer to multiple hands and a real Allegro hand. 1/n

2

28

131

Xiaolong Wang

@xiaolonw

1 month

Great work! But a reminder that our work released last year "COLMAP-Free 3D Gaussian Splatting" tackles this exact problem, and there are many others as well. In the FlowMap pdf it is mentioned "Concurrent work [22] somewhat accelerates optimization compared to earlier

Vincent Sitzmann

@vincesitzmann

1 month

Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap! IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods. 1/n

11

99

605

2

14

131

Xiaolong Wang

@xiaolonw

2 years

Implicit function for dexterous hand grasping, learned from 3D human demonstrations! 🤟🤖✌️ We introduce the Continuous Grasping Function (CGF), which takes continuous time as input and generates a smooth grasping plan for the real Allegro Hand. 1/n)

3

21

126

Xiaolong Wang

@xiaolonw

1 year

Spring Group Retreat

1

4

127

Xiaolong Wang

@xiaolonw

2 years

Our work on combining vision and proprioceptive state for locomotion control is accepted to ICLR 2022! We use a transformer to incorporate the two modalities, and train the policy end2end with RL, then directly deploy it to the real robot. Project page:

0

22

125

Xiaolong Wang

@xiaolonw

1 year

🏗️ Policy Adaptation from Foundation Model Feedback #CVPR2023 Instead of using foundation model as a pre-trained encoder (generator), we use it as a Teacher (discriminator) to tell where our policy did wrong and helps it adapts to new envs and tasks.

3

26

124

Xiaolong Wang

@xiaolonw

11 months

Elastic Decision Transformer (EDT): It is not always optimal to use all history states as inputs for decision transformer. A shorter history allows transitions to diverse and better future states, implicitly stitch training trajectories to a better one.

AK

@_akhaliq

11 months

Elastic Decision Transformer paper page: paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical

2

14

86

3

34

119

Xiaolong Wang

@xiaolonw

1 year

I am hiring a postdoc at UCSD as well. Please let me know if you are interested in working on vision and robotics. Our building for collaborative robotics labs 👇

5

16

117

Xiaolong Wang

@xiaolonw

3 months

The plan was to collect some demos among people in university center, lead to some unexpected donations from the crowd 💵

5

4

108

Xiaolong Wang

@xiaolonw

11 months

Test-Time Training on Video Streams: Every frame in a test video can be used for training, even without a ground truth label. When deploying your model in a video stream, train it online with self-supervised learning (e.g. MAE).

2

14

107

Xiaolong Wang

@xiaolonw

10 months

ICML was nice. See you my friends.

3

2

104

Xiaolong Wang

@xiaolonw

1 year

Another robotics paper to present in #CVPR2023 ! DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects We continue pushing dexterous manipulation using point cloud RL, studying 3D pre-training, generalizing to unseen objects.

2

17

98

Xiaolong Wang

@xiaolonw

3 years

We are hosting a CVPR tutorial on Graph-structured Networks tomorrow. We will cover topics over Transformers, graph networks, and applications on 3D scene understanding, physical interaction prediction, RL and control. Sunday, 9:00am-12:30 pm PDT

3

11

98

Xiaolong Wang

@xiaolonw

1 year

Wait, foundation models are actually not helping that much for visuo-motor control? Who would have thought that? We conduct a detailed study on a learning-from-scratch baseline and find it performs similar to recent methods using large pre-trained models:

5

11

97

Xiaolong Wang

@xiaolonw

2 years

New #CVPR2022 work GIFS: We reconstruct non-watertight, and multi-layer shapes using implicit function. It is simple: Instead of classifying a point inside/outside the surface, we classify whether there is a surface between every two points in 3D space.

1

16

93

Xiaolong Wang

@xiaolonw

2 months

Spring break group retreat again.

6

2

92

Xiaolong Wang

@xiaolonw

2 years

What is the right thing to predict, that will be useful for robotics in Egocentric Videos? In #CVPR2022 , we propose one possibility is to predict Joint Hand Motion and Interaction Hotspots. We provide an automatic way to generate GT and a new model.

1

10

89

Xiaolong Wang

@xiaolonw

2 years

We opensource the visual locomotion code! If you are interested in RL for locomotion with a REAL robot, we provide a full package: * Diverse environments in PyBullet; * End2end visual-RL training code; * Sim2Real @UnitreeRobotics A1 robot deployment code.

GitHub - Mehooz/vision4leg: Toolkit for vision-guided quadrupedal locomotion research

Toolkit for vision-guided quadrupedal locomotion research - Mehooz/vision4leg

github.com

Xiaolong Wang

@xiaolonw

2 years

Our work on combining vision and proprioceptive state for locomotion control is accepted to ICLR 2022! We use a transformer to incorporate the two modalities, and train the policy end2end with RL, then directly deploy it to the real robot. Project page:

0

22

125

1

16

82

Xiaolong Wang

@xiaolonw

1 year

Self-supervised 6D pose estimation is accepted in #ICLR2023 Directly training on real data and no simulated data is required: We also release the code:

GitHub - kywind/self-corr-pose: Self-Supervised Geometric Correspondence for Category-Level 6D...

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild, ICLR 2023 - kywind/self-corr-pose

github.com

Xiaolong Wang

@xiaolonw

2 years

Object 6D pose estimation in the wild can now be achieved by only self-supervision! 🏞️ No 3D GTs are needed in sim or real (camera, pose, shape). Multiple cycles across 2D-3D, instance, and time are utilized. Project page: Led by @kaiwynd (1/n

1

50

255

1

14

80

Xiaolong Wang

@xiaolonw

11 months

Our RSS work is covered in Scientific American!

Ananya (ಅನನ್ಯ)

@punarpuli

11 months

Manipulating objects in hands is a tough task for robots, what with having to coordinate its fingers and making sure the object doesn't simply slide off the palm. My latest story for @sciam on what you can do with simple sensors and no vision data.

2

6

22

1

6

81

Xiaolong Wang

@xiaolonw

1 year

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation When ViT takes high-resolution inputs, the memory explodes easily. We develop a grouping mechanism inside ViT that efficiently propagates information and reduces memory.

2

12

76

Xiaolong Wang

@xiaolonw

1 year

Our lab at UCSD is hiring PhD students for next fall. Applications to ECE are due today, and applications to CSE are due on 21th. Check here for more information:

1

6

76

Xiaolong Wang

@xiaolonw

10 months

Very proud of this engineering work from Yuzhe. Robotics students typically require a desktop of their own to run and visualize their env at the same time. With this tool, you can now run your code on a remote server, and visualize the simulation in real-time on your laptop.

Yuzhe Qin

@QinYuzhe

10 months

Just dropped sim_web_visualizer! 🚀 Transform the way you view simulation environments right from your web browser like Chrome. Dive into more examples on our Github:

7

36

236

0

6

73

Xiaolong Wang

@xiaolonw

10 months

“Correspondence, correspondence, and correspondence!” -- Takeo I still remembered quoting Takeo in my PhD defense. Now we have correspondence between image and sketch. A great collaboration with the phycology side @judyefan @XuanchenLu I will be presenting this today #ICML2023

Xuanchen Lu @ NeurIPS 2023

@XuanchenLu

10 months

Humans effortlessly grasp the fine-grained correspondences between sketches and real-world objects. How well can current vision algorithms do the same? To find out, check out our #ICML2023 paper w/ @xiaolonw @judyefan !

1

20

69

1

11

75

Xiaolong Wang

@xiaolonw

3 years

Our recent project :)

12

0

74

Xiaolong Wang

@xiaolonw

29 days

A peek on what are we working on lately for humanoid robot system. ☑️ Control bimanual hands from VisionPro real-time. ☑️An active camera is installed on the robot head, following human head motion, streaming ego-centric view observations in real time to VisionPro. Interfacing

Xuxin Cheng

@xuxin_cheng

30 days

🤖Introducing 📺𝗢𝗽𝗲𝗻-𝗧𝗲𝗹𝗲𝗩𝗶𝘀𝗶𝗼𝗻: a web-based teleoperation software! 🌐Open source, cross-platform (VisionPro & Quest) with real-time stereo vision feedback. 🕹️Easy-to-use hand, wrist, head pose streaming. Code:

12

82

352

2

6

74

Xiaolong Wang

@xiaolonw

3 years

Introducing DexMV: Imitation Learning for Dexterous Manipulation from Human Videos. A new platform for recording human hand demonstration videos, and a new imitation learning pipeline to leverage the videos for 5-finger robot hand manipulation. (1/n)

1

17

71

Xiaolong Wang

@xiaolonw

7 months

The robot can now backflip, jump, and run in different ways, all in one policy. A generalized animal imitator is now here enabling new agility for legged robots:

Ruihan Yang

@RchalYang

8 months

🏅In the future, there will be Olympics for legged robots, showcasing unmatched agility. We bring diverse agile locomotion skills to legged robots with a single controller! Introducing VIM - a single policy unlocking diverse agile locomotion skills.🐾

3

22

128

0

8

69

Xiaolong Wang

@xiaolonw

2 years

In the coming NeurIPS week, we will introduce our work on predicting multi-person motion and interaction using Transformers. Check out our demo on scaling up the interaction predictions among 10-15 people! With: @JiashunWang @medhini_n @HarryXu12 (1/2)

2

17

68

Xiaolong Wang

@xiaolonw

3 years

Let's add vision to legged robots! New work on: Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers. By using depth image inputs, our legged robot can navigate through moving obstacles, and random chairs/desks. (1/n)

3

12

66

Xiaolong Wang

@xiaolonw

1 year

The way you name your jobs before the deadline.

1

0

66

Xiaolong Wang

@xiaolonw

2 years

6 months!

0

67

Xiaolong Wang

@xiaolonw

3 years

Our Group is presenting 6 papers in this ICCV, with 3 Orals. We have released the codes for: Video Autoencoder: Video Frame Similarity Learning: Grasp Generation: Articulated SDF:

2

3

65

Xiaolong Wang

@xiaolonw

7 months

CoRL group photo, with current, new and past students. Have to run but please come to our oral talk on Thursday on generalizable feature fields.

1

4

63

Xiaolong Wang

@xiaolonw

10 days

Very excited to have Xueyan to join our group. My first postdoc! Let’s push VLM for robotics together!

Yong Jae Lee

@yong_jae_lee

10 days

@xueyanzou1 had a number of impactful works in visual recognition, starting w/ a BMVC best paper award on Anti-Aliasing in CNNs to her recent Segment Everything Everywhere All At Once work . She's off to UCSD to do a postdoc in CV+robotics w/ @xiaolonw ! 4/

2

0

20

2

1

68

Xiaolong Wang

@xiaolonw

8 months

When working on Sim2Real for robotics, designing the simulation environment always requires a big effort. We release GenSim, using LLM to automatically help us generate codes for simulation tasks. We show large-scale task study and Sim2Real transfer at:

GenSim: Generating Robotic Simulation Tasks via Large Language Models

Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data. However, existing methods...

arxiv.org

Lirui Wang (Leroy)

@LiruiWang1

8 months

🚀How to scale up robotic simulation tasks? We announce GenSim: Generating Robotic Simulation Tasks via Large Language Models. Website (), arxiv (), and Gradio demo () are all out! Try out the Gradio demo🤗!

9

59

294

0

14

62

Xiaolong Wang

@xiaolonw

3 months

New work on combining Generalizable NeRF and diffusion loss for manipulation. The cool thing is that we are actually NOT training a diffusion policy, but using diffusion loss to train the representation and another feed-forward network on top of the representation for action

Ge Yan

@GeYan_21

3 months

Introducing DNAct: Diffusion Guided Multi-Task 3D Policy Learning. We combine neural rendering pre-training and diffusion models to learn a generalizable policy with a strong 3D semantic scene understanding.

2

20

89

0

11

61

Xiaolong Wang

@xiaolonw

2 years

Our #ECCV2022 work trains Transformers as optimizers for MLPs of implicit neural representations. The Transformer jointly parameterizes a learnable initialization and update rules for the column vectors of weight matrices in MLPs. Paper link:

Transformers as Meta-Learners for Implicit Neural Representations

Implicit Neural Representations (INRs) have emerged and shown their benefits over discrete representations in recent years. However, fitting an INR to the given observations usually requires...

arxiv.org

AK

@_akhaliq

2 years

Transformers as Meta-Learners for Implicit Neural Representations abs: project page: github:

1

40

210

0

13

58

Xiaolong Wang

@xiaolonw

3 years

My group in UCSD is presenting 3 papers in this CVPR, on Local Implicit Image Function: 3D Hand-Object Pose: 3D Human Motion Synthesis: We provide a short 90s video to introduce these 3 works.

1

7

59

Xiaolong Wang

@xiaolonw

1 year

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation, is presented as a spotlight at #ICLR2023

Xiaolong Wang

@xiaolonw

1 year

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation When ViT takes high-resolution inputs, the memory explodes easily. We develop a grouping mechanism inside ViT that efficiently propagates information and reduces memory.

2

12

76

1

8

56

Xiaolong Wang

@xiaolonw

14 days

Got my visa today, will be in ICRA from Tuesday. Happy to chat!

2

58

Xiaolong Wang

@xiaolonw

3 years

Check out our latest long-term prediction results on the ICLR'21 paper (). We can predict complex interactions on PHYRE given only the 𝗙𝗜𝗥𝗦𝗧 image as the input.

Learning Long-term Visual Dynamics with Region Proposal...

Learning long-term dynamics models is the key to understanding physical common sense. Most existing approaches on learning dynamics from visual input sidestep long-term predictions by resorting to...

arxiv.org

Haozhi Qi

@HaozhiQ

3 years

Our work on learning visual dynamics is accepted by #ICLR2021 . We obtained state-of-the-art results on multiple prediction tasks as well as the #PHYRE physical reasoning benchmark. Check our latest results at

0

5

40

0

5

58

Xiaolong Wang

@xiaolonw

4 years

Our recent work on video synthesis with graph networks. We construct an Action Graph which takes the object as nodes and action as edges. Given an initial frame, we can generate future frames by following the action graph. Our method generalizes to unseen objects and new actions.

Amir Bar

@_amirbar

4 years

Our new work "Compositional Video Synthesis with Action Graphs" is out! We focus on goal-oriented #video #generation and introduce the new task of *Action Graph to Video*. Project page: Abstract: @BAIR @NVIDIAAI @TAU @Nvidia

1

6

24

0

10

57

Xiaolong Wang

@xiaolonw

2 years

One big challenge for transferring image-based RL to real robot is the pixels are too small to capture interaction details. We combine an active egocentric view on the robot arm with 3rd person view as inputs for RL policy and deploy to a real robot. 1/n

2

3

55

Xiaolong Wang

@xiaolonw

1 year

We are organizing the RSS workshop on Learning Dexterous Manipulation. Please consider submitting your work on hands to our workshop! The deadline is: June 2th!

0

14

53

Xiaolong Wang

@xiaolonw

11 months

Our Learning Dexterous Manipulation () at RSS was a hit! Thank you for the speakers @pulkitology @Vikashplus @t_hellebrekers @abhishekunique7 Carmelo, @haoshu_fang , and the organizers, especially in-person ones @QinYuzhe @LerrelPinto @notmahi @ericyi0124

3

5

55

Xiaolong Wang

@xiaolonw

4 years

State-Only Imitation Learning for Dexterous Manipulation (). We propose a simple imitation learning method with state-only demonstrations, generalizing better in novel dynamics/environments for dexterous manipulation. @ir413 @LerrelPinto and Jitendra Malik.

1

11

53

Xiaolong Wang

@xiaolonw

3 years

Cycle comes again: we propose contrastive learning with cross-video cycle-consistency. Instead of learning by augmenting a single image, our method forms a cycle across different videos to provide positive training pairs from different instances. (1/n)

3

53

Xiaolong Wang

@xiaolonw

1 year

Accepted to #RSS2023 !

Xiaolong Wang

@xiaolonw

1 year

Imagine if you have an object in hand, you can rotate the object by feeling without even looking. This is what we enable the robot to do now: Rotating without Seeing. Our multi-finger robot hand learns to rotate diverse objects using only touch sensing.

2

43

207

0

4

53

Xiaolong Wang

@xiaolonw

2 years

You cannot really trust a sim env until you know the bottom of it. Most seemingly fancy simulations will not actually transfer to the real world. Our tutorial tomorrow will unfold not just the beauty, but mostly the dirty parts of making a simulation for embodied AI.

Hao Su Lab

@HaoSuLabUCSD

2 years

Ever thought of applying your vision algorithm to embodied tasks but cannot find one? Why not make one yourself？ Our #CVPR2022 tutorial is starting Monday at 13:00 CT! We'll show you hand-by-hand on building Embodied AI environments from scratch!

1

11

80

1

5

53

Xiaolong Wang

@xiaolonw

2 months

This project looks super fun! And it is also wonderful to see TD-MPC2 works so well on this benchmark compared to PPO/SAC/Dreamerv3!

Carlo Sferrazza

@carlo_sferrazza

2 months

Humanoids 🤖 will do anything humans can do. But are state-of-the-art algorithms up to the challenge? Introducing HumanoidBench, the first-of-its-kind simulated humanoid benchmark with 27 distinct whole-body tasks requiring intricate long-horizon planning and coordination. 🧵👇

8

89

325

4

8

53

Xiaolong Wang

@xiaolonw

4 years

We are presenting our ICML paper on "Test-Time Training with Self-Supervision for Generalization under Distribution Shifts" today at 10am and 11pm (PDT). The link for the virtual conference is: Project page:

1

8

52

Xiaolong Wang

@xiaolonw

8 months

Presenting FeatureNeRF #ICCV2023 tomorrow morning. Exploring representation learning with NeRF. I will be there as well. Come and say hi!

1

3

52

Xiaolong Wang

@xiaolonw

7 months

Going to CoRL, in red eye style, arriving at 5 am and presenting at 10am. Hope this can work.

1

50

Xiaolong Wang

@xiaolonw

3 years

After adding time in cycles, it is time to add dynamics in cycles (). We add a forward dynamics model in CycleGAN to learn correspondence and align dynamic robot behavior across two domains differing in observed representation, physics, and morphology.

2

7

49

Xiaolong Wang

@xiaolonw

3 years

New work on human grasps generation given a single 3D object input. We reason the consistency between the prior hand contact points and the object common contact regions for better and diverse grasp generation. @hanwenjiang1 @stevenpg8 @JiashunWang (1/2)

2

9

50