Lili Yu Profile
Lili Yu

@liliyu_lili

1,165
Followers
204
Following
1
Media
62
Statuses

AI Research Scientist @ Meta AI (FAIR)

Seattle, WA
Joined January 2015
Don't wanna be here? Send us removal request.
@liliyu_lili
Lili Yu
1 year
When you train interleaved text -image data, via CM3 object, the model can handle any combination of text and image, both on the input and output sides. Thx to multitask finetuning, we are able to bake instructpix2pix, controlnet, openflamingo and more in a single model.
@ArmenAgha
Armen Aghajanyan
1 year
We apply large-scale multitask instruction tuning to CM3leon for both image and text generation, and show that it significantly improves performance on tasks such as image caption generation, visual question answering, text-based editing, and conditional image generation.
Tweet media one
1
9
65
5
28
153
@liliyu_lili
Lili Yu
1 month
🚀 Excited to share our latest work: Transfusion! A new multi-modal generative training combining language modeling and image diffusion in a single transformer! Huge shout to @violet_zct @omerlevy_ @michiyasunaga @arunbabu1234 @kushal_tirumala and other collaborators.
@violet_zct
Chunting Zhou
1 month
Introducing *Transfusion* - a unified approach for training models that can generate both text and images. Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
Tweet media one
Tweet media two
23
209
995
4
17
98
@liliyu_lili
Lili Yu
1 year
Megabyte paper has been accepted to #NeurIPS 2023. Looking forward to meeting old and new friends in New Orleans!
@_akhaliq
AK
1 year
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers abs: paper page:
Tweet media one
15
222
1K
4
8
94
@liliyu_lili
Lili Yu
4 months
🚀 Excited to introduce Chameleon, our latest breakthrough in mixed-modal early-fusion foundation models! 🦎✨ Capable of understanding and generating text and images in any sequence. Check out our paper to learn more about its SOTA performance and versatile capabilities! 🌟
@AIatMeta
AI at Meta
4 months
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️
Tweet media one
27
197
940
2
6
76
@liliyu_lili
Lili Yu
6 days
@karpathy Totally on board with viewing LLMs as versatile "Autoregressive Transformers"! At FAIR, our Chameleon model initially took this path, fusing tokens from images and text and demonstrating scalability, high-quality tuning, and few-shot learning. () But we
4
0
50
@liliyu_lili
Lili Yu
4 months
Interleaving text image generation with consistency is a unique feature bought by our early-fusing end to end training model.
@sriniiyer88
Srini Iyer
4 months
Chameleon can produce full multi-modal documents with interleaved high quality images and text. It’s the one model to rule them all! More shout-outs: @arunbabu1234 @aramHmarkosyan @omerlevy_ (5/n)
Tweet media one
2
3
12
1
4
39
@liliyu_lili
Lili Yu
5 months
Thrilled be part of #Megalodon team, an effective model in LLMs for handling unlimited context lengths efficiently, both training and inference. The mode was rigirously compared to LLAMA with *7B* model *2T* tokens, derisked for large scale training. Congrats to the team!
@violet_zct
Chunting Zhou
5 months
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head
Tweet media one
Tweet media two
4
51
230
0
5
36
@liliyu_lili
Lili Yu
1 year
CM3leon is the first multimodal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multitask supervised fine-tuning (SFT) stage.
@ArmenAgha
Armen Aghajanyan
1 year
I’m excited to release our most recent work setting a new SOTA FID of 4.88 on text-to-image generation we call CM3Leon (pronounced chameleon)!
Tweet media one
25
128
483
0
6
32
@liliyu_lili
Lili Yu
4 months
The team is working very hard to make this happen. @ArmenAgha @sriniiyer88 @gargighosh @LukeZettlemoyer
@yoavartzi
Yoav Artzi (PC-ing COLM)
4 months
Tweet media one
1
2
38
1
3
24
@liliyu_lili
Lili Yu
4 months
Such a fun coincident on picking the same name. Before scaling up, we called it CM3leon (pronounced as chameleon, with a twist to older cm3 paper) last year, in the paper "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning" ().
Tweet media one
@lupantech
Pan Lu
4 months
🤔Naming things is hard!! 🦎 #Meta 's new work shares the same name as our NeurIPS 2023 paper from one year ago: Chameleon: Compositional Reasoning with LLMs. Coincidence or great minds thinking alike? 😈 Dive into our work here:
3
2
29
0
3
17
@liliyu_lili
Lili Yu
1 year
Come checkout new work where we deeply fuse pretrained text LLM and pretrained mixmodal LLM, seamlessly integrating text & image generation. The beauty is the model can condition on arbitrarily interleaved image and text inputs to continue generating interleaved output.
@_akhaliq
AK
1 year
Jointly Training Large Autoregressive Multimodal Models paper page: In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities
Tweet media one
3
21
119
1
0
15
@liliyu_lili
Lili Yu
3 months
Super excited to open-source Chameleon 7B and 34B model weights today. Finally 🎇
@AIatMeta
AI at Meta
3 months
Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and
98
528
2K
0
0
13
@liliyu_lili
Lili Yu
12 days
Excited to see what's coming next!! You are an amazing researcher and amazing team leader.
@alex_conneau
Alexis Conneau
12 days
Career update: After an amazing journey at @OpenAI building #Her , I’ve decided to start a new company.
192
141
4K
1
0
11
@liliyu_lili
Lili Yu
1 year
Everything generated is casually related from an autoregressive model. Look how good the hands are. :p
@ArmenAgha
Armen Aghajanyan
1 year
We all know how tough hands are :)
Tweet media one
3
1
33
0
1
7
@liliyu_lili
Lili Yu
1 year
@ArmenAgha CM3leon is the first multimodal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multitask supervised fine-tuning (SFT) stage.
0
0
3
@liliyu_lili
Lili Yu
4 months
We are better than all other multimodal baseline on interleaving generation.
@sriniiyer88
Srini Iyer
4 months
We asked humans which response they prefer from Chameleon, GPT-4V w/ Image-gen (GPT-4V+), Gemini-Pro w/ Image-gen (Gemini+). Humans prefer Chameleon outputs to the rest! (7/n)
Tweet media one
1
1
4
0
1
5
@liliyu_lili
Lili Yu
5 months
The long-awaited LLAMA3!!
@ml_perception
Mike Lewis
5 months
Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come...
18
97
508
0
0
5
@liliyu_lili
Lili Yu
1 month
@yoavartzi Bidirectional attention is the default in encoder or encoder-decoder model. When we switch to decoder dominant models, it takes some time for the community to find out its importance. Besides us, two other recent works also found it important. (paligemma,
0
0
5
@liliyu_lili
Lili Yu
1 month
@cccntu @iScienceLuvr We found that when we limit the noise level in the input images, the image understanding is as good as clean image. So there is no need to "fix it by adding un-noised image & masking", which is more compute expensive.
0
0
4
@liliyu_lili
Lili Yu
1 year
@ArmenAgha Thanks!! And I am grateful to have you as a collaborator.
1
0
2
@liliyu_lili
Lili Yu
1 month
@cccntu @violet_zct @iScienceLuvr yes, when you have tons of noise, it is bad. We also experimented with fully replacing the noised images with clean image. In fact, that is slightly worse that the limited noise version. So, in fact limited noise is helping the model learning robust image understanding.
0
0
3
@liliyu_lili
Lili Yu
1 year
Theeee recipe to get an strongly aligned chatbot model!
@violet_zct
Chunting Zhou
1 year
How do you turn a language model into a chatbot without any user interactions? We introduce LIMA: a LLaMa-based model fine-tuned on only 1,000 curated prompts and responses, which produces shockingly good responses. * No user data * No mode distillation * No RLHF
Tweet media one
27
231
1K
0
0
2
@liliyu_lili
Lili Yu
1 month
@Grad62304977 @violet_zct @omerlevy_ @michiyasunaga @arunbabu1234 @kushal_tirumala Chameleon added the following stability techniques, qk-norm, lower learning rate, swin-norm and z-loss. We found qk-norm is harmless, lower learning rate hurts the text performance at the model size we tested (760m), z-loss+swin-norm hurts the performance. Due to limited
1
0
1
@liliyu_lili
Lili Yu
5 months
Beside large scale language modeling , it also achieve SOTA on many other tasks, with very little tweak.
@violet_zct
Chunting Zhou
5 months
9/n Results on raw speech classification, ImageNet-1K, WikiText-103 and PG19.
Tweet media one
1
0
5
0
0
1