The team primarily works on multimodal (image, text, video, audio) representation learning, zero-shot/few-shot learning, and multimodal generative modeling. We are a highly interdisciplinary team focused on developing cutting edge methods for real-world problems at large scales.