Introducing DNAct: Diffusion Guided Multi-Task 3D Policy Learning. We combine neural rendering pre-training and diffusion models to learn a generalizable policy with a strong 3D semantic scene understanding. Tweet added by Ge Yan @GeYan_21

Ge Yan

3 months

Introducing DNAct: Diffusion Guided Multi-Task 3D Policy Learning. We combine neural rendering pre-training and diffusion models to learn a generalizable policy with a strong 3D semantic scene understanding.

2

20

89

Ge Yan

@GeYan_21

3 months

DNAct leverages NeRF as a 3D pre-training approach. By distilling 2D features from foundation models into a 3D space, we pre-train a 3D encoder to learn a unified representation of semantics and geometry via volumetric rendering.

1

0

1

Ge Yan

@GeYan_21

3 months

Our 3D pre-training approach brings out-of-domain generalization ability! We show this by using out-of-distribution data from five unseen tasks in the pre-training stage, denoted as DNAct*. It outperforms baselines with over 20% improvement, which utilizes in-domain data.

1

0

1

Ge Yan

@GeYan_21

3 months

Another insight is formulating representation learning as an action reconstruction problem with a diffusion model. We optimize the learned representation by adding a diffusion objective. This helps distinguish different modes in demonstration data and improves the robustness.

1

0

1

Ge Yan

@GeYan_21

3 months

Visualization of our learned 3D features in real-world tasks

1

0

1

Ge Yan

@GeYan_21

3 months

Welcome to our website for more details: Arxiv: Work done by me and @yh_kris , advised by @xiaolonw

DNAct: Diffusion Guided Multi-Task 3D Policy Learning

This paper presents DNAct, a language-conditioned multi-task policy framework that integrates neural rendering pre-training and diffusion training to enforce multi-modality learning in action...

arxiv.org

0

3

Jiafei Duan@CVPR2024

@DJiafei

3 months

@GeYan_21 Good work! Look forward to see how DNAct does on our Colosseum benchmark

0

2

Replies