Edouard Grave Profile
Edouard Grave

@EXGRV

2,538
Followers
149
Following
7
Media
118
Statuses

large language models @kyutai_labs

paris, france
Joined October 2012
Don't wanna be here? Send us removal request.
@EXGRV
Edouard Grave
3 years
New paper on memory efficient open domain question answering. We show that combining dimension reduction, vector quantization and passage filtering greatly reduces the memory footprint of retrieval based systems, without hurting accuracy too much. Paper:
Tweet media one
2
47
210
@EXGRV
Edouard Grave
2 years
🔎 Can we train dense unsupervised retrievers that are as good as BM25? With the latest contrastive learning techniques, it seems that we are getting there! Our model, the Contriever, outperforms BM25 on NQ, and is competitive on BEIR. Paper:
Tweet media one
6
22
192
@EXGRV
Edouard Grave
7 months
/kyutai has landed! Super excited to build this new research lab. Pure focus on research. As open as it gets.
@kyutai_labs
kyutai
7 months
Announcing Kyutai: a non-profit AI lab dedicated to open science. Thanks to Xavier Niel ( @GroupeIliad ), Rodolphe Saadé ( @cmacgm ) and Eric Schmidt ( @SchmidtFutures ), we are starting with almost 300M€ of philanthropic support. Meet the team ⬇️
Tweet media one
18
164
757
13
6
156
@EXGRV
Edouard Grave
5 years
New @ACL2019_Italy paper: Adaptive Attention Span in Transformers, with S. Sukhbaatar ( @tesatory ), P. Bojanowski, @armandjoulin . We scale to large context (up to 8k) and reduce memory footprint by learning attention length for each head and layer. SOTA on text8/enwik8. 1/2
2
26
126
@EXGRV
Edouard Grave
5 years
New blogpost about two recent papers on Transformer networks.
@AIatMeta
AI at Meta
5 years
Facebook AI researchers are sharing an all-attention layer to simplify the Transformer model and an adaptive attention span method to make it more efficient. Even with a much simpler architecture, these methods match or improve state-of-the-art results.
9
292
818
0
16
115
@EXGRV
Edouard Grave
3 years
New paper with Gautier Izacard ( @gizacard ), using distillation to train information retrieval systems! We show that attention scores of a model trained on the downstream task can be used as synthetic labels. This allows to train retrievers without document or passage annotations.
2
16
97
@EXGRV
Edouard Grave
4 years
New work w/ @gizacard (Gautier Izacard): how much do generative models for open domain QA benefit from retrieval? A lot! Retrieving 100 passages, we get 51.4 EM on NaturalQuestions, 67.6 EM on TriviaQA. 1/3 Paper:
Tweet media one
2
14
77
@EXGRV
Edouard Grave
2 years
Very excited to introduce Atlas, a new retrieval augmented language model which is competitive with larger models on few-shot tasks such as question answering or fact checking. Work lead by @gizacard and @PSH_Lewis . Paper:
@PSH_Lewis
Patrick Lewis
2 years
🚨We’ve been working on better retrieval-augmented models & thrilled to present Atlas, led by @gizacard @EXGRV & myself🚨 Atlas is a end2end pretrained "RAG"-like model, beats models 50x its size on fewshot QA, sets numerous SotA on knowledge-intensive NLP
Tweet media one
7
84
433
2
11
74
@EXGRV
Edouard Grave
6 months
✈️ I will be attending #NeurIPS2023 : let me know if you want to chat about the future of LLMs, and how to democratize them. 🌐 We are also hiring members of technical staff and interns @kyutai_labs . Happy to talk about the lab and our mission.
1
6
58
@EXGRV
Edouard Grave
1 year
Super excited by the release of LLaMA, a serie of large language models, from 7B to 65B parameters. 🎉 By training longer, LLaMA obtains GPT3 level performance with a 13B model, which can run on a single GPU. Excited to see what the research community will do with these models.
@GuillaumeLample
Guillaume Lample @ ICLR 2024
1 year
Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at 1/n
Tweet media one
Tweet media two
Tweet media three
Tweet media four
173
1K
7K
0
7
53
@EXGRV
Edouard Grave
3 years
We obtain new state-of-the-art results on TriviaQA (+4.5%) and NaturalQuestions (+2.3%). We also used this technique for our winning entry to the 6 GB track of the Efficient QA competition (more on this soon). Paper:
2
5
33
@EXGRV
Edouard Grave
5 years
Hyper-parameter autotuning for fastText: get a 1MB text classifier while having a coffee. With this new feature, it is possible to constrain the size of the final model, and automatically find the hyper-parameters giving the best results on a validation set.
@AIatMeta
AI at Meta
5 years
Facebook AI researchers are releasing a new feature for the fastText library which provides hyper-parameter autotuning for more efficient text classifiers.
3
152
514
0
4
24
@EXGRV
Edouard Grave
7 years
@sleepinyourhat FastText does something very similar (bag of char-ngrams) and yields good, UNK-free embeddings:
1
1
21
@EXGRV
Edouard Grave
2 years
On BEIR, the Contriever is on-par, or outperforms, BM25 on 11 out of 15 datasets for the recall @100 . Code & models will be released soon. Joint work w/ @gizacard @mcaron31 @lucas_hosseini @riedelcastro @p_bojanowski @armandjoulin
Tweet media one
2
4
20
@EXGRV
Edouard Grave
5 years
Code release for our #acl2019nlp paper "Adaptive Attention Span in Transformers": .
@tesatory
Sainbayar Sukhbaatar
5 years
We released our code for adaptive-span! It can train a Transformer with a context size of 8k tokens #ACL2019
2
83
310
0
2
15
@EXGRV
Edouard Grave
3 years
Joint work with @gizacard , @Fabio_Petroni , @lucas_hosseini , @nicola_decao and @riedelcastro . This was part of our winning entry to the 6Gb track of the EfficientQA NeurIPS competition.
0
3
16
@EXGRV
Edouard Grave
2 years
New release of our Contriever project! It includes multi-lingual models which can perform cross-lingual retrieval (eg, retrieve English documents to answer a question in Swahili), the code to (pre-)train your own retrievers, and an updated version of the paper with new results.
@gizacard
Gautier Izacard
2 years
Code for Contriever is now available! Code: Paper: Additionally we trained mContriever, a state-of-the-art multilingual neural retriever, by applying a similar contrastive learning method.
6
12
114
0
2
13
@EXGRV
Edouard Grave
2 years
Introducing PEER, a new language model which makes text generation and editing more collaborative and controllable. It adds human in the loop, by following instructions and providing explanations. Work lead @timo_schick . Paper:
@timo_schick
Timo Schick
2 years
🎉 New paper 🎉 We introduce PEER, a language model trained to incrementally write texts & collaborate w/ humans in a more natural way. It can write drafts, add suggestions, follow instructions, perform edits, correct itself & provide explanations. Link:
18
126
675
1
1
11
@EXGRV
Edouard Grave
3 years
Super happy about this result too! 🚀 And thanks to the organizers for this great competition!
@Fabio_Petroni
Fabio Petroni
3 years
Super happy for winning the 6Gb track at the EfficientQA #NeurIPS competition. Our submission, lead by @gizacard w/ Lucas Husseini, @nicola_decao , @riedelcastro and @EXGRV , achieved top position in both auto and manual evaluation. 🚀
Tweet media one
2
6
56
0
0
9
@EXGRV
Edouard Grave
5 years
@OriolVinyalsML Interesting, didn't know about this appendix since it was removed in v4 and v5. Also, all attention is different since it merges the two sublayers, hence using the same attention over parameters and hidden states.
1
0
7
@EXGRV
Edouard Grave
9 months
@abacaj Yes, we have a couple of papers on that exact topic with @gizacard and @PSH_Lewis . Combining these advances lead to the Atlas language model (paper: , code: ).
0
0
7
@EXGRV
Edouard Grave
7 years
Very strong results on word language modeling using various regularization techniques, ASGD and continuous cache
@Smerity
Smerity
7 years
[Weight-dropped LSTM, non-monotonic ASGD, ptr cache] on word level language modeling gives 52.8 on PTB & 52.0 on WT2
Tweet media one
8
62
139
1
0
5
@EXGRV
Edouard Grave
7 years
Word vectors for 90 languages, trained on Wikipedia with fastText:
1
2
5
@EXGRV
Edouard Grave
5 years
New long-form question answering dataset and baselines!
@AIatMeta
AI at Meta
5 years
Introducing long- form question answering (), a new challenge that pushes #AI to provide complex explanations rather than just simple facts. #NLP
0
100
281
0
0
5
@EXGRV
Edouard Grave
4 years
Our main finding: generative models are great at combining information from multiple passages, as their performance keeps improving as the number of support documents increases. 2/3
Tweet media one
1
0
5
@EXGRV
Edouard Grave
2 years
Our model, at 11B parameters, and significantly less training compute, outperforms LLMs on 64-shot question answering (+3 pts wrt SOTA) or 15-shot fact checking (+5 pts wrt SOTA).
Tweet media one
1
0
5
@EXGRV
Edouard Grave
4 years
By processing passages independently in the encoder, but jointly in the decoder, our models scale to large numbers of passages, and can combine information from these multiple passages. 3/3
1
0
4
@EXGRV
Edouard Grave
2 years
Previous works showed that retrieval is helpful for knowledge intensive tasks, but mostly in settings with large training sets. Here, we show how to get the same benefits for few-shot learning.
1
0
4
@EXGRV
Edouard Grave
2 years
@mark_riedl Maybe use membership inference? Give a different example (or set of examples) to each student, check if these examples were used to train the models?
1
0
4
@EXGRV
Edouard Grave
1 year
@yoavgo @_joaogui1 @BlancheMinerva @MetaAI @GuillaumeLample No, it just means that after a number of tokens, it's more efficient to increase the model size than the dataset size to improve performance. Personally, I would not use the word "overtrained" to describe these models...
1
0
4
@EXGRV
Edouard Grave
11 months
@yoavgo The idea is cute, but I would not take the experimental results too seriously as the baseline numbers seem to be off.
0
0
4
@EXGRV
Edouard Grave
1 year
@yoavgo @_joaogui1 @BlancheMinerva @MetaAI @GuillaumeLample Another point worth mentioning is that "compute optimality" in that context only considers training, not inference. If using the model a lot, it is worth training a smaller model longer.
0
0
4
@EXGRV
Edouard Grave
1 year
@GuillaumeLample @arthurmensch @tlacroix6 Congrats Guillaume, Arthur and Timothée! Excited to see what you will build!
0
0
4
@EXGRV
Edouard Grave
7 years
0
0
4
@EXGRV
Edouard Grave
2 years
@ogrisel @ChrSzegedy @F_Vaggi @_arohan_ @kchonyc @deliprao @VahidK My guess is that it prevents the collapse of embeddings to zero? I'm curious what is the impact of this value as long as it's larger than 1.0.
1
1
3
@EXGRV
Edouard Grave
11 months
@armandjoulin @giffmana I do like this idea, and more generally drawing links between compression and prediction.
0
0
3
@EXGRV
Edouard Grave
7 years
@sleepinyourhat You can get vectors for OOV words from a trained model using ./fasttext print-word-vectors model.bin
0
0
3
@EXGRV
Edouard Grave
11 months
@giffmana I think the baseline numbers are off (at least the fasttext ones that were easy to run: eg, on r8/r52, it should be 97%/93%, not 82%/57%).
1
0
3
@EXGRV
Edouard Grave
2 years
@yoavgo Maybe Asymmetric Numeral Systems, by Duda (2013)? Probably not the latest, but definitely widely used.
0
0
2
@EXGRV
Edouard Grave
1 year
@yoavgo @kroscoo @ryandcotterell I believe that with large neural nets, it would not really improve over a canonical segmentation. In the paper though, we did not have a subword vocabulary or canonical segmentation, and used all char ngrams (with freq. threshold).
1
0
2
@EXGRV
Edouard Grave
4 years
@srush_nlp @_joaogui1 I don't think it does (as you don't need this at test time).
0
0
2
@EXGRV
Edouard Grave
7 months
@soumithchintala Thanks for the kind words Soumith! Really excited by this new lab.
0
0
1
@EXGRV
Edouard Grave
8 years
@deliprao Thanks for the shout-out! Glad you liked our paper.
0
0
1
@EXGRV
Edouard Grave
2 years
@milesosborne @aCraigPfeifer That's a great and fair question! I think that a big advantage of (unsupervised) dense retrievers is that they easily benefit from a few annotated queries.
Tweet media one
1
0
1
@EXGRV
Edouard Grave
6 months
@ylecun Merci Yann !
0
0
0
@EXGRV
Edouard Grave
1 year
@yoavgo @kroscoo @ryandcotterell This could potentially be used to find (slightly) better segmentations? But overall, yeah, having high capacity models means these differences dont really matter :/
0
0
1
@EXGRV
Edouard Grave
1 year
@yoavgo @_joaogui1 @BlancheMinerva @MetaAI @GuillaumeLample (I believe that the models could still improve by training even longer)
0
0
1
@EXGRV
Edouard Grave
8 years
@haldaume3 @armandjoulin @yoavgo We'll include vw to the preprint. Datasets are here if you want to run sth.
0
0
1
@EXGRV
Edouard Grave
2 years
@sclincha @gizacard @mcaron31 @lucas_hosseini @riedelcastro @p_bojanowski @armandjoulin Thanks for the references! We wrote that at the time of the ICLR submission, and it's probably outdated now ("competitive" is likely a better term anyway). Please note that we don't use distillation, and focus on unsupervised retrievers.
0
0
1
@EXGRV
Edouard Grave
8 years
@srchvrs @haldaume3 "For a more detailed overview of AI research on StarCraft, the reader should consult [23]."
1
0
1
@EXGRV
Edouard Grave
2 years
@balajis Search engines still want to have a word though
@PSH_Lewis
Patrick Lewis
2 years
🚨We’ve been working on better retrieval-augmented models & thrilled to present Atlas, led by @gizacard @EXGRV & myself🚨 Atlas is a end2end pretrained "RAG"-like model, beats models 50x its size on fewshot QA, sets numerous SotA on knowledge-intensive NLP
Tweet media one
7
84
433
0
0
1