Lili Yu @liliyu_lili Twitter profile

Last Seen Profiles

@TheDeFiList

@bokeplokalmalam

@CondeNante89881

@espressoesukups

@TereCompany

@Antony19429850

@Keefe21

@NinaCamero

@1LvnLegend

@sergezzr928

@auliya_fahira

@IrenicPelagian

@auliya_fahira

@stw_pdg

@stwmaniax

@grimmfest

@kiml0chan

@rin_kadan

@CoinomiWallet

@gluning

@IPNA_PedNeph

@467312SSS

@darkwhorese

@DAction63

@stw_pdg

@asksherloc

@kawalmasadepan

@Coop_Rooney

@fishtankislive

@Aliceenligne

@bokeplokalmalam

@KK003696228

@waheed_is_me

@ginnyhogan_

Lili Yu

@liliyu_lili

1 year

When you train interleaved text -image data, via CM3 object, the model can handle any combination of text and image, both on the input and output sides. Thx to multitask finetuning, we are able to bake instructpix2pix, controlnet, openflamingo and more in a single model.

Armen Aghajanyan

@ArmenAgha

1 year

We apply large-scale multitask instruction tuning to CM3leon for both image and text generation, and show that it significantly improves performance on tasks such as image caption generation, visual question answering, text-based editing, and conditional image generation.

1

9

65

5

28

153

Lili Yu

@liliyu_lili

1 month

🚀 Excited to share our latest work: Transfusion! A new multi-modal generative training combining language modeling and image diffusion in a single transformer! Huge shout to @violet_zct @omerlevy_ @michiyasunaga @arunbabu1234 @kushal_tirumala and other collaborators.

Chunting Zhou

@violet_zct

1 month

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This

23

209

995

4

17

98

Lili Yu

@liliyu_lili

1 year

Megabyte paper has been accepted to #NeurIPS 2023. Looking forward to meeting old and new friends in New Orleans!

AK

@_akhaliq

1 year

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers abs: paper page:

15

222

1K

4

8

94

Lili Yu

@liliyu_lili

4 months

🚀 Excited to introduce Chameleon, our latest breakthrough in mixed-modal early-fusion foundation models! 🦎✨ Capable of understanding and generating text and images in any sequence. Check out our paper to learn more about its SOTA performance and versatile capabilities! 🌟

AI at Meta

@AIatMeta

4 months

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️

27

197

940

2

6

76

Lili Yu

@liliyu_lili

6 days

@karpathy Totally on board with viewing LLMs as versatile "Autoregressive Transformers"! At FAIR, our Chameleon model initially took this path, fusing tokens from images and text and demonstrating scalability, high-quality tuning, and few-shot learning. () But we

Chameleon: Mixed-Modal Early-Fusion Foundation Models

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training...

arxiv.org

4

0

50

Lili Yu

@liliyu_lili

4 months

Interleaving text image generation with consistency is a unique feature bought by our early-fusing end to end training model.

Srini Iyer

@sriniiyer88

4 months

Chameleon can produce full multi-modal documents with interleaved high quality images and text. It’s the one model to rule them all! More shout-outs: @arunbabu1234 @aramHmarkosyan @omerlevy_ (5/n)

2

3

12

1

4

39

Lili Yu

@liliyu_lili

5 months

Thrilled be part of #Megalodon team, an effective model in LLMs for handling unlimited context lengths efficiently, both training and inference. The mode was rigirously compared to LLAMA with *7B* model *2T* tokens, derisked for large scale training. Congrats to the team!

Chunting Zhou

@violet_zct

5 months

How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head

4

51

230

0

5

36

Lili Yu

@liliyu_lili

1 year

CM3leon is the first multimodal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multitask supervised fine-tuning (SFT) stage.

Armen Aghajanyan

@ArmenAgha

1 year

I’m excited to release our most recent work setting a new SOTA FID of 4.88 on text-to-image generation we call CM3Leon (pronounced chameleon)!

25

128

483

0

6

32

Lili Yu

@liliyu_lili

4 months

The team is working very hard to make this happen. @ArmenAgha @sriniiyer88 @gargighosh @LukeZettlemoyer

Yoav Artzi (PC-ing COLM)

@yoavartzi

4 months

1

2

38

1

3

24

Lili Yu

@liliyu_lili

4 months

Such a fun coincident on picking the same name. Before scaling up, we called it CM3leon (pronounced as chameleon, with a twist to older cm3 paper) last year, in the paper "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning" ().

Pan Lu

@lupantech

4 months

🤔Naming things is hard!! 🦎 #Meta 's new work shares the same name as our NeurIPS 2023 paper from one year ago: Chameleon: Compositional Reasoning with LLMs. Coincidence or great minds thinking alike? 😈 Dive into our work here:

3

2

29

0

3

17

Lili Yu

@liliyu_lili

1 year

Come checkout new work where we deeply fuse pretrained text LLM and pretrained mixmodal LLM, seamlessly integrating text & image generation. The beauty is the model can condition on arbitrarily interleaved image and text inputs to continue generating interleaved output.

AK

@_akhaliq

1 year

Jointly Training Large Autoregressive Multimodal Models paper page: In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities

3

21

119

1

0

15

Lili Yu

@liliyu_lili

3 months

Super excited to open-source Chameleon 7B and 34B model weights today. Finally 🎇

AI at Meta

@AIatMeta

3 months

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and

98

528

2K

0

13

Lili Yu

@liliyu_lili

12 days

Excited to see what's coming next!! You are an amazing researcher and amazing team leader.

Alexis Conneau

@alex_conneau

12 days

Career update: After an amazing journey at @OpenAI building #Her , I’ve decided to start a new company.

192

141

4K

1

0

11

Lili Yu

@liliyu_lili

1 year

Everything generated is casually related from an autoregressive model. Look how good the hands are. :p

Armen Aghajanyan

@ArmenAgha

1 year

We all know how tough hands are :)

3

1

33

0

1

7

Lili Yu

@liliyu_lili

1 year

@ArmenAgha CM3leon is the first multimodal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multitask supervised fine-tuning (SFT) stage.

0

3

Lili Yu

@liliyu_lili

4 months

We are better than all other multimodal baseline on interleaving generation.

Srini Iyer

@sriniiyer88

4 months

We asked humans which response they prefer from Chameleon, GPT-4V w/ Image-gen (GPT-4V+), Gemini-Pro w/ Image-gen (Gemini+). Humans prefer Chameleon outputs to the rest! (7/n)

1

4

0

1

5

Lili Yu

@liliyu_lili

5 months

The long-awaited LLAMA3!!

Mike Lewis

@ml_perception

5 months

Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come...

18

97

508

0

5

Lili Yu

@liliyu_lili

1 month

@yoavartzi Bidirectional attention is the default in encoder or encoder-decoder model. When we switch to decoder dominant models, it takes some time for the community to find out its importance. Besides us, two other recent works also found it important. (paligemma,

0

5

Lili Yu

@liliyu_lili

1 month

@cccntu @iScienceLuvr We found that when we limit the noise level in the input images, the image understanding is as good as clean image. So there is no need to "fix it by adding un-noised image & masking", which is more compute expensive.

0

4

Lili Yu

@liliyu_lili

1 month

@giffmana @violet_zct @omerlevy_ @michiyasunaga @arunbabu1234 @kushal_tirumala looking forward to your comment and discussion. :)

0

4

Lili Yu

@liliyu_lili

1 year

@ArmenAgha Thanks!! And I am grateful to have you as a collaborator.

1

0

2

Lili Yu

@liliyu_lili

1 month

@cccntu @violet_zct @iScienceLuvr yes, when you have tons of noise, it is bad. We also experimented with fully replacing the noised images with clean image. In fact, that is slightly worse that the limited noise version. So, in fact limited noise is helping the model learning robust image understanding.

0

3

Lili Yu

@liliyu_lili

1 year

Theeee recipe to get an strongly aligned chatbot model!

Chunting Zhou

@violet_zct

1 year

How do you turn a language model into a chatbot without any user interactions? We introduce LIMA: a LLaMa-based model fine-tuned on only 1,000 curated prompts and responses, which produces shockingly good responses. * No user data * No mode distillation * No RLHF

27

231

1K

0

2

Lili Yu

@liliyu_lili

1 month

@Grad62304977 @violet_zct @omerlevy_ @michiyasunaga @arunbabu1234 @kushal_tirumala Chameleon added the following stability techniques, qk-norm, lower learning rate, swin-norm and z-loss. We found qk-norm is harmless, lower learning rate hurts the text performance at the model size we tested (760m), z-loss+swin-norm hurts the performance. Due to limited

1

0

1

Lili Yu

@liliyu_lili

5 months

Beside large scale language modeling , it also achieve SOTA on many other tasks, with very little tweak.

Chunting Zhou

@violet_zct

5 months

9/n Results on raw speech classification, ImageNet-1K, WikiText-103 and PG19.

1

0

5

0

1