Xiaolong Wang Profile
Xiaolong Wang

@xiaolonw

11,244
Followers
986
Following
287
Media
953
Statuses

Assistant Professor @UCSDJacobs Postdoc @berkeley_ai PhD @CMU_Robotics

San Diego, CA
Joined March 2016
Don't wanna be here? Send us removal request.
Pinned Tweet
@xiaolonw
Xiaolong Wang
12 days
This is accepted to RSS! Congratulations to @xuxin_cheng and @JiYandong on their first work in phd.
@xiaolonw
Xiaolong Wang
3 months
Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance? Introducing Expressive Whole-Body Control for Humanoid Robots: See how our robot performs rich, diverse,
94
214
1K
0
5
61
@xiaolonw
Xiaolong Wang
3 months
Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance? Introducing Expressive Whole-Body Control for Humanoid Robots: See how our robot performs rich, diverse,
94
214
1K
@xiaolonw
Xiaolong Wang
1 year
Stable Diffusion generates beautiful images, but can it be used for open-world recognition? Try Demo! Our #CVPR2023 paper shows that the pre-trained diffusion model indeed is a good image parser, allows for open-vocabulary segmentation and detection.
7
166
701
@xiaolonw
Xiaolong Wang
3 months
Is 3D scene generation much closer to being solved all of a sudden? It has been a few days since the release of @OpenAI Sora. We run our COLMAP-Free 3D Gaussian Splatting on the released videos. Our method does not need to pre-process cameras and it seems we can directly just get
23
112
688
@xiaolonw
Xiaolong Wang
2 months
I have been cleaning my daughter's mess for more than two years now. Last weekend our robot came to home to do the job for me. 🤖 Our new work on visual whole-body control learns a policy to coordinate the robot legs and arms for mobile manipulation. See
23
117
652
@xiaolonw
Xiaolong Wang
2 years
Introducing #CVPR2022 GroupViT: Semantic Segmentation Emerges from Text Supervision 👨‍👩‍👧 Without any pixel label ever, Our Grouping ViT can group pixels bottom-up to open vocabulary semantic segments. The only training data is 30M noisy image-text pairs.
4
128
648
@xiaolonw
Xiaolong Wang
2 months
Our humanoid dancing outside #GTC2024
18
54
509
@xiaolonw
Xiaolong Wang
3 years
New work with Yinbo Chen, one of my first PhD students: Learning Continuous Image Representation with Local Implicit Image Function. Check our video showing images in arbitrary resolutions. proj: code: @YinboChen @SifeiL (1/n)
6
76
420
@xiaolonw
Xiaolong Wang
2 years
Introducing Category-Level 6D Object Pose Estimation in the Wild.🏞️ We release Wild6D: an in-the-wild object-centric RGBD dataset with 5000+videos over 1700+objects. We perform semi-supervised 6D object pose estimation on it without manual annotations.
4
77
342
@xiaolonw
Xiaolong Wang
19 days
Tesla Optimus can arrange batteries in their factories, ours can do skincare (on @QinYuzhe )! We opensource Bunny-VisionPro, a teleoperation system for bimanual hand manipulation. The users can control the robot hands in real time using VisionPro, flexible like a bunny. 🐇
10
53
324
@xiaolonw
Xiaolong Wang
1 year
The robot climbs stairs🏯, steps over stones 🪨, and runs in the wild🏞️, all in one policy, without any remote control! Our #CVPR2023 Highlight paper achieves this by using RL + a 3D Neural Volumetric Memory (NVM) trained with view synthesis!
5
66
298
@xiaolonw
Xiaolong Wang
2 months
Our humanoid robot attending #GTC2024 today with @xuxin_cheng
15
27
283
@xiaolonw
Xiaolong Wang
5 months
3D Gaussian Splatting is great, but can it work without the pre-computed camera poses? Introducing: COLMAP-Free 3D Gaussian Splatting Our recent work shows not only it can, but 3D Gaussians make camera pose estimation easy (compared to NeRF) along with reconstruction. 👇🧵
5
49
265
@xiaolonw
Xiaolong Wang
8 months
Introducing our CoRL work on Dynamic Handover. We humans often pass along objects, a baseball, a bottle of water, using throw and catch. 🫴⚾️ We now enable the robot hands to throw and catch different unseen objects using RL and Sim2Real transfer!
3
42
257
@xiaolonw
Xiaolong Wang
2 years
Object 6D pose estimation in the wild can now be achieved by only self-supervision! 🏞️ No 3D GTs are needed in sim or real (camera, pose, shape). Multiple cycles across 2D-3D, instance, and time are utilized. Project page: Led by @kaiwynd (1/n
@_akhaliq
AK
2 years
Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild abs: project page:
1
24
214
1
50
255
@xiaolonw
Xiaolong Wang
5 years
We released our CVPR'19 paper on Learning Correspondence from the Cycle-Consistency of Time () with code (). We hope this opens up new opportunities for unsupervised learning with videos. Check out our result for long-range tracking!
1
79
237
@xiaolonw
Xiaolong Wang
2 months
We have seen a lot of legged robots doing navigation in the wild. But how about mobile manipulation in the wild? I have been pushing the direction of learning a unified, efficient, and dynamic 3D representation of scenes (for navigation) and objects (for manipulation) for the
5
48
233
@xiaolonw
Xiaolong Wang
4 months
Tired of seeing synthetic objects go round and round, round and round, round and round? Introducing the WildRGB-D dataset! We collect a dataset of real-world RGB-D objects in 360 under cluttered scenes. To download: Website: ✅
4
24
222
@xiaolonw
Xiaolong Wang
1 year
Imagine if you have an object in hand, you can rotate the object by feeling without even looking. This is what we enable the robot to do now: Rotating without Seeing. Our multi-finger robot hand learns to rotate diverse objects using only touch sensing.
2
43
207
@xiaolonw
Xiaolong Wang
2 years
Dense Correspondences found in StyleGAN! #CVPR2022 CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs. We learn a disentanglement of category-level correspondence map and style with StyleGAN, and show various applications. 🧵👇 (1/n)
1
34
206
@xiaolonw
Xiaolong Wang
3 months
Walking in the morning @UCSanDiego Operators: @xuxin_cheng @JiYandong
12
22
197
@xiaolonw
Xiaolong Wang
6 months
Can a machine solve diverse computer vision tasks even on the ones it is not trained on? Introducing IMProv: It performs multimodal in-context learning for solving generic computer vision tasks. It formulates all tasks as an image inpainting problem.
2
38
201
@xiaolonw
Xiaolong Wang
5 years
I am happy to announce that I will be joining UC San Diego as an assistant professor in the ECE department in fall 2020. I am looking forward to working with great colleagues and students there! I am very grateful for all the support from my advisors, friends, and family.
15
0
201
@xiaolonw
Xiaolong Wang
3 months
Since Sora is out, I have been thinking about our role in academia. One thing we can do at school is fast prototyping with very talented students, showing the potential, the possibility. Of course, the future will always be scaling up.
@EpisodeYang
Ge Yang
3 months
That 675M is 24% of the company -- 675M round at 2.8B valuation. In the meantime, enjoy what we achieved with a nimble team of just six people :) paper link: w/ @xiaolonw @xuxin_cheng @JiYandong @RchalYang
25
79
666
3
15
198
@xiaolonw
Xiaolong Wang
4 years
How to train very deep ConvNets without residual blocks? Our ICML paper on Deep Isometric Learning successfully trains 100-layer ConvNets without any shortcut connections nor normalization layers (BN/GN) on ImageNet. Paper: Code:
Tweet media one
4
37
181
@xiaolonw
Xiaolong Wang
3 years
Is tracking task necessary for learning to track? Our new work on self-supervised correspondence learning shows it is not! By simply comparing two video frames, without negative pairs, the correspondence can emerge for tracking objects and pixels. (1/n)
1
22
179
@xiaolonw
Xiaolong Wang
2 years
The code and model for GroupViT is released. We also host a demo on @huggingface Spaces, for open category segmentation. Try them out!
@xiaolonw
Xiaolong Wang
2 years
Introducing #CVPR2022 GroupViT: Semantic Segmentation Emerges from Text Supervision 👨‍👩‍👧 Without any pixel label ever, Our Grouping ViT can group pixels bottom-up to open vocabulary semantic segments. The only training data is 30M noisy image-text pairs.
4
128
648
1
38
176
@xiaolonw
Xiaolong Wang
4 years
Self-Supervised Policy Adaptation during Deployment (). We propose to adapt RL policy to new environments without any reward, by using self-supervision during deployment. Joint work with @ncklashansen , Yu Sun, @pabbeel , Alyosha Efros, @LerrelPinto (1/3)
4
40
175
@xiaolonw
Xiaolong Wang
3 years
The website of Video Autoencoder is online: We have also released our code here:
@_akhaliq
AK
3 years
Video Autoencoder: self-supervised disentanglement of static 3D structure and motion pdf: abs:
Tweet media one
1
21
90
1
37
172
@xiaolonw
Xiaolong Wang
10 months
Presenting MonoNeRF at #ICML2023 We train a generalizable NeRF from: ✅Large-scale monocular videos instead of one scene ✅No GT camera poses.📷🚫 Without per-scene optimization, the model can do view synthesis, depth estimation, camera pose estimation.
5
30
170
@xiaolonw
Xiaolong Wang
9 months
Vision-Language Foundation model should go to 3D for robotics!🤖 CoRL23 Oral: GNFactor learns Generalizable Neural Feature Fields for language conditioned manipulation on diverse scenes. It unifies 3D➕Stable Diffusion features using generalizable NeRFs.
@_akhaliq
AK
9 months
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields paper page: It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world
0
43
168
1
35
168
@xiaolonw
Xiaolong Wang
3 years
We propose A-SDF, a manipulatable and differentiable implicit representation for articulated object. Given input depth image or point clouds of an object, we can infer its shape and articulation angles, and manipulate the object parts to any angles. (1/3)
1
35
164
@xiaolonw
Xiaolong Wang
2 years
We propose to train a generalizable NeRF on large-scale videos without camera ground truths! 📽️❌📷 This leads to multiple applications on monocular depth estimation, camera pose estimation, and view synthesis. Project page: w @yangfu0817 @imisra_
@_akhaliq
AK
2 years
Multiplane NeRF-Supervised Disentanglement of Depth and Camera Pose from Videos abs: project page:
0
12
70
1
32
155
@xiaolonw
Xiaolong Wang
1 month
Another bird video. We can reconstruct the details of the wings and obtain correspondences at the same time. 🕊️
@Isabella__Liu
Isabella Liu
1 month
Want to obtain time-consistent dynamic mesh from monocular videos? Introducing: Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos We reconstruct meshes with flexible topology change and build the corresp. across meshes. 🧵(1/n)
6
49
177
1
15
151
@xiaolonw
Xiaolong Wang
2 years
What is out there? Let’s look outside the room! #CVPR2022 We synthesize a consistent long-term 3D scene video to navigate out a room, given a single image as input: This is highly inspired by "Infinite Images [2010]" and "Infinite Nature [2021]". (1/n
3
19
145
@xiaolonw
Xiaolong Wang
4 years
We are hosting a CVPR tutorial on Learning Representations via Graph-structured Networks. We are having live presentations. Website: Time: Sunday 1:00pm-4:30pm Speakers: Chen Sun, Han Hu, Shubham Tulsiani, Saining Xie, Sifei Liu, and me.
Tweet media one
2
24
147
@xiaolonw
Xiaolong Wang
2 years
It got in ECCV! it is not easy to convince vision people.. DexMV: Imitation Learning for Dexterous Manipulation from Human Videos A platform to collect, convert real-world human videos to 3D demos for imitation👩🤖 code and data:
6
20
146
@xiaolonw
Xiaolong Wang
3 years
Introducing our ICCV'21 Oral: Video Autoencoder. We disentangle the 3D structure and camera pose from a video on a static scene, in a self-supervised manner. The training objective is solely reconstructing the input video frames. page with code: (1/n
1
19
147
@xiaolonw
Xiaolong Wang
3 years
We train the network with a self-supervised super-resolution task. As the image representation is continuous, we can visualize and zoom in the image in an arbitrary resolution. arxiv: (3/n)
3
28
146
@xiaolonw
Xiaolong Wang
4 months
Introducing Texture UV Radiance Fields (TUVF) #ICLR2024 , a new generalizable NeRF that generates textures given a 3D shape input. The generation happens in a learnable UV sphere space. That is, it learns the correspondence across shapes! This allows the texture to be
2
22
145
@xiaolonw
Xiaolong Wang
2 years
New work on hands: DexPoint #CoRL2022 ! It is all about geometry and contact👆🤖👇 We perform RL with raw point cloud inputs for manipulation: open door and grasping. This brings generalization across diverse objects and much easier sim2real transfer. 1/n
1
27
138
@xiaolonw
Xiaolong Wang
2 years
We are presenting VideoINR in #CVPR2022 , a new continuous Video Implicit Neural Representation, that allows any scale super-resolution / interpolation in both space and time. Paper: Project page: 1/n)
3
19
139
@xiaolonw
Xiaolong Wang
3 years
Our quadrupedal robot is finally running in the wild, on grass, in forest, inside campus, day and night, avoiding boxes, trees, humans, using end-to-end visual RL! Work done by Chieko Sarah Imai, @Mehooz2 , Yuchen Zhang, Marcin Kierebinski, @RchalYang , @QinYuzhe
@_akhaliq
AK
3 years
Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization pdf: abs: project page:
2
12
56
2
15
136
@xiaolonw
Xiaolong Wang
2 months
If you look at the best image/video diffusion models, they are still not able to get the hands quite right, especially when interacting with objects ✍️ We present HOIDiffusion #CVPR2024 , a way to generate accurate and realistic hand-object interaction images, in diverse poses,
2
23
134
@xiaolonw
Xiaolong Wang
2 years
Excited to share our imitation learning work for dexterous manipulation, using human demos collected with single iPad camera teleoperation. We provide a new system using a customized hand in sim, and transfer to multiple hands and a real Allegro hand. 1/n
2
28
131
@xiaolonw
Xiaolong Wang
1 month
Great work! But a reminder that our work released last year "COLMAP-Free 3D Gaussian Splatting" tackles this exact problem, and there are many others as well. In the FlowMap pdf it is mentioned "Concurrent work [22] somewhat accelerates optimization compared to earlier
@vincesitzmann
Vincent Sitzmann
1 month
Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap! IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods. 1/n
11
99
605
2
14
131
@xiaolonw
Xiaolong Wang
2 years
Implicit function for dexterous hand grasping, learned from 3D human demonstrations! 🤟🤖✌️ We introduce the Continuous Grasping Function (CGF), which takes continuous time as input and generates a smooth grasping plan for the real Allegro Hand. 1/n)
3
21
126
@xiaolonw
Xiaolong Wang
1 year
Spring Group Retreat
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
4
127
@xiaolonw
Xiaolong Wang
2 years
Our work on combining vision and proprioceptive state for locomotion control is accepted to ICLR 2022! We use a transformer to incorporate the two modalities, and train the policy end2end with RL, then directly deploy it to the real robot. Project page:
0
22
125
@xiaolonw
Xiaolong Wang
1 year
🏗️ Policy Adaptation from Foundation Model Feedback #CVPR2023 Instead of using foundation model as a pre-trained encoder (generator), we use it as a Teacher (discriminator) to tell where our policy did wrong and helps it adapts to new envs and tasks.
3
26
124
@xiaolonw
Xiaolong Wang
11 months
Elastic Decision Transformer (EDT): It is not always optimal to use all history states as inputs for decision transformer. A shorter history allows transitions to diverse and better future states, implicitly stitch training trajectories to a better one.
@_akhaliq
AK
11 months
Elastic Decision Transformer paper page: paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical
Tweet media one
2
14
86
3
34
119
@xiaolonw
Xiaolong Wang
1 year
I am hiring a postdoc at UCSD as well. Please let me know if you are interested in working on vision and robotics. Our building for collaborative robotics labs 👇
5
16
117
@xiaolonw
Xiaolong Wang
3 months
The plan was to collect some demos among people in university center, lead to some unexpected donations from the crowd 💵
Tweet media one
5
4
108
@xiaolonw
Xiaolong Wang
11 months
Test-Time Training on Video Streams: Every frame in a test video can be used for training, even without a ground truth label. When deploying your model in a video stream, train it online with self-supervised learning (e.g. MAE).
2
14
107
@xiaolonw
Xiaolong Wang
10 months
ICML was nice. See you my friends.
Tweet media one
3
2
104
@xiaolonw
Xiaolong Wang
1 year
Another robotics paper to present in #CVPR2023 ! DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects We continue pushing dexterous manipulation using point cloud RL, studying 3D pre-training, generalizing to unseen objects.
2
17
98
@xiaolonw
Xiaolong Wang
3 years
We are hosting a CVPR tutorial on Graph-structured Networks tomorrow. We will cover topics over Transformers, graph networks, and applications on 3D scene understanding, physical interaction prediction, RL and control. Sunday, 9:00am-12:30 pm PDT
Tweet media one
3
11
98
@xiaolonw
Xiaolong Wang
1 year
Wait, foundation models are actually not helping that much for visuo-motor control? Who would have thought that? We conduct a detailed study on a learning-from-scratch baseline and find it performs similar to recent methods using large pre-trained models:
Tweet media one
5
11
97
@xiaolonw
Xiaolong Wang
2 years
New #CVPR2022 work GIFS: We reconstruct non-watertight, and multi-layer shapes using implicit function. It is simple: Instead of classifying a point inside/outside the surface, we classify whether there is a surface between every two points in 3D space.
1
16
93
@xiaolonw
Xiaolong Wang
2 months
Spring break group retreat again.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
2
92
@xiaolonw
Xiaolong Wang
2 years
What is the right thing to predict, that will be useful for robotics in Egocentric Videos? In #CVPR2022 , we propose one possibility is to predict Joint Hand Motion and Interaction Hotspots. We provide an automatic way to generate GT and a new model.
1
10
89
@xiaolonw
Xiaolong Wang
2 years
We opensource the visual locomotion code! If you are interested in RL for locomotion with a REAL robot, we provide a full package: * Diverse environments in PyBullet; * End2end visual-RL training code; * Sim2Real @UnitreeRobotics A1 robot deployment code.
@xiaolonw
Xiaolong Wang
2 years
Our work on combining vision and proprioceptive state for locomotion control is accepted to ICLR 2022! We use a transformer to incorporate the two modalities, and train the policy end2end with RL, then directly deploy it to the real robot. Project page:
0
22
125
1
16
82
@xiaolonw
Xiaolong Wang
1 year
Self-supervised 6D pose estimation is accepted in #ICLR2023 Directly training on real data and no simulated data is required: We also release the code:
@xiaolonw
Xiaolong Wang
2 years
Object 6D pose estimation in the wild can now be achieved by only self-supervision! 🏞️ No 3D GTs are needed in sim or real (camera, pose, shape). Multiple cycles across 2D-3D, instance, and time are utilized. Project page: Led by @kaiwynd (1/n
1
50
255
1
14
80
@xiaolonw
Xiaolong Wang
11 months
Our RSS work is covered in Scientific American!
Tweet media one
@punarpuli
Ananya (ಅನನ್ಯ)
11 months
Manipulating objects in hands is a tough task for robots, what with having to coordinate its fingers and making sure the object doesn't simply slide off the palm. My latest story for @sciam on what you can do with simple sensors and no vision data.
2
6
22
1
6
81
@xiaolonw
Xiaolong Wang
1 year
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation When ViT takes high-resolution inputs, the memory explodes easily. We develop a grouping mechanism inside ViT that efficiently propagates information and reduces memory.
2
12
76
@xiaolonw
Xiaolong Wang
1 year
Our lab at UCSD is hiring PhD students for next fall. Applications to ECE are due today, and applications to CSE are due on 21th. Check here for more information:
1
6
76
@xiaolonw
Xiaolong Wang
10 months
Very proud of this engineering work from Yuzhe. Robotics students typically require a desktop of their own to run and visualize their env at the same time. With this tool, you can now run your code on a remote server, and visualize the simulation in real-time on your laptop.
@QinYuzhe
Yuzhe Qin
10 months
Just dropped sim_web_visualizer! 🚀 Transform the way you view simulation environments right from your web browser like Chrome. Dive into more examples on our Github:
Tweet media one
7
36
236
0
6
73
@xiaolonw
Xiaolong Wang
10 months
“Correspondence, correspondence, and correspondence!” -- Takeo I still remembered quoting Takeo in my PhD defense. Now we have correspondence between image and sketch. A great collaboration with the phycology side @judyefan @XuanchenLu I will be presenting this today #ICML2023
@XuanchenLu
Xuanchen Lu @ NeurIPS 2023
10 months
Humans effortlessly grasp the fine-grained correspondences between sketches and real-world objects. How well can current vision algorithms do the same? To find out, check out our #ICML2023 paper w/ @xiaolonw @judyefan !
1
20
69
1
11
75
@xiaolonw
Xiaolong Wang
3 years
Our recent project :)
12
0
74
@xiaolonw
Xiaolong Wang
29 days
A peek on what are we working on lately for humanoid robot system. ☑️ Control bimanual hands from VisionPro real-time. ☑️An active camera is installed on the robot head, following human head motion, streaming ego-centric view observations in real time to VisionPro. Interfacing
@xuxin_cheng
Xuxin Cheng
30 days
 🤖Introducing 📺𝗢𝗽𝗲𝗻-𝗧𝗲𝗹𝗲𝗩𝗶𝘀𝗶𝗼𝗻: a web-based teleoperation software!  🌐Open source, cross-platform (VisionPro & Quest) with real-time stereo vision feedback.  🕹️Easy-to-use hand, wrist, head pose streaming. Code:
12
82
352
2
6
74
@xiaolonw
Xiaolong Wang
3 years
Introducing DexMV: Imitation Learning for Dexterous Manipulation from Human Videos. A new platform for recording human hand demonstration videos, and a new imitation learning pipeline to leverage the videos for 5-finger robot hand manipulation. (1/n)
1
17
71
@xiaolonw
Xiaolong Wang
7 months
The robot can now backflip, jump, and run in different ways, all in one policy. A generalized animal imitator is now here enabling new agility for legged robots:
@RchalYang
Ruihan Yang
8 months
🏅In the future, there will be Olympics for legged robots, showcasing unmatched agility. We bring diverse agile locomotion skills to legged robots with a single controller! Introducing VIM - a single policy unlocking diverse agile locomotion skills.🐾
3
22
128
0
8
69
@xiaolonw
Xiaolong Wang
2 years
In the coming NeurIPS week, we will introduce our work on predicting multi-person motion and interaction using Transformers. Check out our demo on scaling up the interaction predictions among 10-15 people! With: @JiashunWang @medhini_n @HarryXu12 (1/2)
2
17
68
@xiaolonw
Xiaolong Wang
3 years
Let's add vision to legged robots! New work on: Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers. By using depth image inputs, our legged robot can navigate through moving obstacles, and random chairs/desks. (1/n)
3
12
66
@xiaolonw
Xiaolong Wang
1 year
The way you name your jobs before the deadline.
Tweet media one
1
0
66
@xiaolonw
Xiaolong Wang
2 years
6 months!
Tweet media one
0
0
67
@xiaolonw
Xiaolong Wang
3 years
Our Group is presenting 6 papers in this ICCV, with 3 Orals. We have released the codes for: Video Autoencoder: Video Frame Similarity Learning: Grasp Generation: Articulated SDF:
2
3
65
@xiaolonw
Xiaolong Wang
7 months
CoRL group photo, with current, new and past students. Have to run but please come to our oral talk on Thursday on generalizable feature fields.
Tweet media one
1
4
63
@xiaolonw
Xiaolong Wang
10 days
Very excited to have Xueyan to join our group. My first postdoc! Let’s push VLM for robotics together!
@yong_jae_lee
Yong Jae Lee
10 days
@xueyanzou1 had a number of impactful works in visual recognition, starting w/ a BMVC best paper award on Anti-Aliasing in CNNs to her recent Segment Everything Everywhere All At Once work . She's off to UCSD to do a postdoc in CV+robotics w/ @xiaolonw ! 4/
Tweet media one
2
0
20
2
1
68
@xiaolonw
Xiaolong Wang
8 months
When working on Sim2Real for robotics, designing the simulation environment always requires a big effort. We release GenSim, using LLM to automatically help us generate codes for simulation tasks. We show large-scale task study and Sim2Real transfer at:
@LiruiWang1
Lirui Wang (Leroy)
8 months
🚀How to scale up robotic simulation tasks? We announce GenSim: Generating Robotic Simulation Tasks via Large Language Models. Website (), arxiv (), and Gradio demo () are all out! Try out the Gradio demo🤗!
9
59
294
0
14
62
@xiaolonw
Xiaolong Wang
3 months
New work on combining Generalizable NeRF and diffusion loss for manipulation. The cool thing is that we are actually NOT training a diffusion policy, but using diffusion loss to train the representation and another feed-forward network on top of the representation for action
@GeYan_21
Ge Yan
3 months
Introducing DNAct: Diffusion Guided Multi-Task 3D Policy Learning. We combine neural rendering pre-training and diffusion models to learn a generalizable policy with a strong 3D semantic scene understanding.
2
20
89
0
11
61
@xiaolonw
Xiaolong Wang
2 years
Our #ECCV2022 work trains Transformers as optimizers for MLPs of implicit neural representations. The Transformer jointly parameterizes a learnable initialization and update rules for the column vectors of weight matrices in MLPs. Paper link:
@_akhaliq
AK
2 years
Transformers as Meta-Learners for Implicit Neural Representations abs: project page: github:
1
40
210
0
13
58
@xiaolonw
Xiaolong Wang
3 years
My group in UCSD is presenting 3 papers in this CVPR, on Local Implicit Image Function: 3D Hand-Object Pose: 3D Human Motion Synthesis: We provide a short 90s video to introduce these 3 works.
1
7
59
@xiaolonw
Xiaolong Wang
1 year
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation, is presented as a spotlight at #ICLR2023
Tweet media one
@xiaolonw
Xiaolong Wang
1 year
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation When ViT takes high-resolution inputs, the memory explodes easily. We develop a grouping mechanism inside ViT that efficiently propagates information and reduces memory.
2
12
76
1
8
56
@xiaolonw
Xiaolong Wang
14 days
Got my visa today, will be in ICRA from Tuesday. Happy to chat!
2
2
58
@xiaolonw
Xiaolong Wang
3 years
Check out our latest long-term prediction results on the ICLR'21 paper (). We can predict complex interactions on PHYRE given only the 𝗙𝗜𝗥𝗦𝗧 image as the input.
@HaozhiQ
Haozhi Qi
3 years
Our work on learning visual dynamics is accepted by #ICLR2021 . We obtained state-of-the-art results on multiple prediction tasks as well as the #PHYRE physical reasoning benchmark. Check our latest results at
0
5
40
0
5
58
@xiaolonw
Xiaolong Wang
4 years
Our recent work on video synthesis with graph networks. We construct an Action Graph which takes the object as nodes and action as edges. Given an initial frame, we can generate future frames by following the action graph. Our method generalizes to unseen objects and new actions.
@_amirbar
Amir Bar
4 years
Our new work "Compositional Video Synthesis with Action Graphs" is out! We focus on goal-oriented #video #generation and introduce the new task of *Action Graph to Video*. Project page: Abstract: @BAIR @NVIDIAAI @TAU @Nvidia
1
6
24
0
10
57
@xiaolonw
Xiaolong Wang
2 years
One big challenge for transferring image-based RL to real robot is the pixels are too small to capture interaction details. We combine an active egocentric view on the robot arm with 3rd person view as inputs for RL policy and deploy to a real robot. 1/n
2
3
55
@xiaolonw
Xiaolong Wang
1 year
We are organizing the RSS workshop on Learning Dexterous Manipulation. Please consider submitting your work on hands to our workshop! The deadline is: June 2th!
Tweet media one
0
14
53
@xiaolonw
Xiaolong Wang
11 months
Our Learning Dexterous Manipulation () at RSS was a hit! Thank you for the speakers @pulkitology @Vikashplus @t_hellebrekers @abhishekunique7 Carmelo, @haoshu_fang , and the organizers, especially in-person ones @QinYuzhe @LerrelPinto @notmahi @ericyi0124
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
5
55
@xiaolonw
Xiaolong Wang
4 years
State-Only Imitation Learning for Dexterous Manipulation (). We propose a simple imitation learning method with state-only demonstrations, generalizing better in novel dynamics/environments for dexterous manipulation. @ir413 @LerrelPinto and Jitendra Malik.
1
11
53
@xiaolonw
Xiaolong Wang
3 years
Cycle comes again: we propose contrastive learning with cross-video cycle-consistency. Instead of learning by augmenting a single image, our method forms a cycle across different videos to provide positive training pairs from different instances. (1/n)
3
3
53
@xiaolonw
Xiaolong Wang
1 year
Accepted to #RSS2023 !
@xiaolonw
Xiaolong Wang
1 year
Imagine if you have an object in hand, you can rotate the object by feeling without even looking. This is what we enable the robot to do now: Rotating without Seeing. Our multi-finger robot hand learns to rotate diverse objects using only touch sensing.
2
43
207
0
4
53
@xiaolonw
Xiaolong Wang
2 years
You cannot really trust a sim env until you know the bottom of it. Most seemingly fancy simulations will not actually transfer to the real world. Our tutorial tomorrow will unfold not just the beauty, but mostly the dirty parts of making a simulation for embodied AI.
@HaoSuLabUCSD
Hao Su Lab
2 years
Ever thought of applying your vision algorithm to embodied tasks but cannot find one? Why not make one yourself? Our #CVPR2022 tutorial is starting Monday at 13:00 CT! We'll show you hand-by-hand on building Embodied AI environments from scratch!
1
11
80
1
5
53
@xiaolonw
Xiaolong Wang
2 months
This project looks super fun! And it is also wonderful to see TD-MPC2 works so well on this benchmark compared to PPO/SAC/Dreamerv3!
Tweet media one
@carlo_sferrazza
Carlo Sferrazza
2 months
Humanoids 🤖 will do anything humans can do. But are state-of-the-art algorithms up to the challenge? Introducing HumanoidBench, the first-of-its-kind simulated humanoid benchmark with 27 distinct whole-body tasks requiring intricate long-horizon planning and coordination. 🧵👇
8
89
325
4
8
53
@xiaolonw
Xiaolong Wang
4 years
We are presenting our ICML paper on "Test-Time Training with Self-Supervision for Generalization under Distribution Shifts" today at 10am and 11pm (PDT). The link for the virtual conference is: Project page:
Tweet media one
1
8
52
@xiaolonw
Xiaolong Wang
8 months
Presenting FeatureNeRF #ICCV2023 tomorrow morning. Exploring representation learning with NeRF. I will be there as well. Come and say hi!
1
3
52
@xiaolonw
Xiaolong Wang
7 months
Going to CoRL, in red eye style, arriving at 5 am and presenting at 10am. Hope this can work.
1
1
50
@xiaolonw
Xiaolong Wang
3 years
After adding time in cycles, it is time to add dynamics in cycles (). We add a forward dynamics model in CycleGAN to learn correspondence and align dynamic robot behavior across two domains differing in observed representation, physics, and morphology.
2
7
49
@xiaolonw
Xiaolong Wang
3 years
New work on human grasps generation given a single 3D object input. We reason the consistency between the prior hand contact points and the object common contact regions for better and diverse grasp generation. @hanwenjiang1 @stevenpg8 @JiashunWang (1/2)
2
9
50