@GeYan_21
Ge Yan
3 months
Introducing DNAct: Diffusion Guided Multi-Task 3D Policy Learning. We combine neural rendering pre-training and diffusion models to learn a generalizable policy with a strong 3D semantic scene understanding.
2
20
89

Replies

@GeYan_21
Ge Yan
3 months
DNAct leverages NeRF as a 3D pre-training approach. By distilling 2D features from foundation models into a 3D space, we pre-train a 3D encoder to learn a unified representation of semantics and geometry via volumetric rendering.
1
0
1
@GeYan_21
Ge Yan
3 months
Our 3D pre-training approach brings out-of-domain generalization ability! We show this by using out-of-distribution data from five unseen tasks in the pre-training stage, denoted as DNAct*. It outperforms baselines with over 20% improvement, which utilizes in-domain data.
1
0
1
@GeYan_21
Ge Yan
3 months
Another insight is formulating representation learning as an action reconstruction problem with a diffusion model. We optimize the learned representation by adding a diffusion objective. This helps distinguish different modes in demonstration data and improves the robustness.
1
0
1
@GeYan_21
Ge Yan
3 months
Visualization of our learned 3D features in real-world tasks
1
0
1
@GeYan_21
Ge Yan
3 months
Welcome to our website for more details: Arxiv: Work done by me and @yh_kris , advised by @xiaolonw
0
0
3
@DJiafei
Jiafei Duan@CVPR2024
3 months
@GeYan_21 Good work! Look forward to see how DNAct does on our Colosseum benchmark
0
0
2