Edouard Grave @EXGRV Twitter profile

Last Seen Profiles

@NaikaForoutan

@AOukhelfen

@TheRealMikeEpps

@USmqsek3bkZMuQQ

@Charlieatle

@anaklanangayah

@jojocem21

@stw_pdg

@CanerGuncaner86

@Bhisi2tete

@indahbugil62

@CoreArinw

@tomosaito

@UNICEFIndia

@jello_ssbm

@stw_pdg

@Orhan8927

@_transitiongolf

@boy__smells

@FrankMengana

@astralrabbit1

@EmiTonye

@RevBari

@iwantplants

@sedanurd819

@TiogaGolf

@alperdilan2017

@marilyn_gfx

@TATA_2020Z

@NRSaundersbooks

@crottdalamajaa

@FondsMuniVert

@ionapreplax

@Caomeiyuanzi

@roadsenseau

@CasinosOfCrypto

Edouard Grave

@EXGRV

4 years

Training with QuantNoise allows to strongly compress neural networks: 80.0% accuracy on ImageNet in 3.3MB, 82.5MB accuracy on MNLI in 14MB. Blog: Paper:

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights...

arxiv.org

2

103

364

Edouard Grave

@EXGRV

3 years

New paper on memory efficient open domain question answering. We show that combining dimension reduction, vector quantization and passage filtering greatly reduces the memory footprint of retrieval based systems, without hurting accuracy too much. Paper:

2

47

210

Edouard Grave

@EXGRV

2 years

🔎 Can we train dense unsupervised retrievers that are as good as BM25? With the latest contrastive learning techniques, it seems that we are getting there! Our model, the Contriever, outperforms BM25 on NQ, and is competitive on BEIR. Paper:

6

22

192

Edouard Grave

@EXGRV

7 months

/kyutai has landed! Super excited to build this new research lab. Pure focus on research. As open as it gets.

kyutai

@kyutai_labs

7 months

Announcing Kyutai: a non-profit AI lab dedicated to open science. Thanks to Xavier Niel ( @GroupeIliad ), Rodolphe Saadé ( @cmacgm ) and Eric Schmidt ( @SchmidtFutures ), we are starting with almost 300M€ of philanthropic support. Meet the team ⬇️

18

164

757

13

6

156

Edouard Grave

@EXGRV

5 years

New @ACL2019_Italy paper: Adaptive Attention Span in Transformers, with S. Sukhbaatar ( @tesatory ), P. Bojanowski, @armandjoulin . We scale to large context (up to 8k) and reduce memory footprint by learning attention length for each head and layer. SOTA on text8/enwik8. 1/2

2

26

126

Edouard Grave

@EXGRV

5 years

New blogpost about two recent papers on Transformer networks.

AI at Meta

@AIatMeta

5 years

Facebook AI researchers are sharing an all-attention layer to simplify the Transformer model and an adaptive attention span method to make it more efficient. Even with a much simpler architecture, these methods match or improve state-of-the-art results.

9

292

818

0

16

115

Edouard Grave

@EXGRV

3 years

New paper with Gautier Izacard ( @gizacard ), using distillation to train information retrieval systems! We show that attention scores of a model trained on the downstream task can be used as synthetic labels. This allows to train retrievers without document or passage annotations.

2

16

97

Edouard Grave

@EXGRV

4 years

New work w/ @gizacard (Gautier Izacard): how much do generative models for open domain QA benefit from retrieval? A lot! Retrieving 100 passages, we get 51.4 EM on NaturalQuestions, 67.6 EM on TriviaQA. 1/3 Paper:

2

14

77

Edouard Grave

@EXGRV

2 years

Very excited to introduce Atlas, a new retrieval augmented language model which is competitive with larger models on few-shot tasks such as question answering or fact checking. Work lead by @gizacard and @PSH_Lewis . Paper:

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact...

arxiv.org

Patrick Lewis

@PSH_Lewis

2 years

🚨We’ve been working on better retrieval-augmented models & thrilled to present Atlas, led by @gizacard @EXGRV & myself🚨 Atlas is a end2end pretrained "RAG"-like model, beats models 50x its size on fewshot QA, sets numerous SotA on knowledge-intensive NLP

7

84

433

2

11

74

Edouard Grave

@EXGRV

6 years

New paper on unsupervised mapping of word vectors, using Procrustes in Wasserstein distance, available on arxiv: . With @armandjoulin and Q. Berthet. More resources to come soon!

Unsupervised Alignment of Embeddings with Wasserstein Procrustes

We consider the task of aligning two sets of points in high dimension, which has many applications in natural language processing and computer vision. As an example, it was recently shown that it...

arxiv.org

2

30

63

Edouard Grave

@EXGRV

6 years

New fastText word vectors for 157 languages, trained on Wikipedia+Common Crawl: . Corresponding paper: . #nlproc #fasttext

Learning Word Vectors for 157 Languages

Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the...

arxiv.org

0

33

68

Edouard Grave

@EXGRV

6 months

✈️ I will be attending #NeurIPS2023 : let me know if you want to chat about the future of LLMs, and how to democratize them. 🌐 We are also hiring members of technical staff and interns @kyutai_labs . Happy to talk about the lab and our mission.

1

6

58

Edouard Grave

@EXGRV

1 year

Super excited by the release of LLaMA, a serie of large language models, from 7B to 65B parameters. 🎉 By training longer, LLaMA obtains GPT3 level performance with a 13B model, which can run on a single GPU. Excited to see what the research community will do with these models.

Guillaume Lample @ ICLR 2024

@GuillaumeLample

1 year

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at 1/n

173

1K

7K

0

7

53

Edouard Grave

@EXGRV

3 years

We obtain new state-of-the-art results on TriviaQA (+4.5%) and NaturalQuestions (+2.3%). We also used this technique for our winning entry to the 6 GB track of the Efficient QA competition (more on this soon). Paper:

Distilling Knowledge from Reader to Retriever for Question Answering

The task of information retrieval is an important component of many natural language processing systems, such as open domain question answering. While traditional methods were based on...

arxiv.org

2

5

33

Edouard Grave

@EXGRV

5 years

Hyper-parameter autotuning for fastText: get a 1MB text classifier while having a coffee. With this new feature, it is possible to constrain the size of the final model, and automatically find the hyper-parameters giving the best results on a validation set.

AI at Meta

@AIatMeta

5 years

Facebook AI researchers are releasing a new feature for the fastText library which provides hyper-parameter autotuning for more efficient text classifiers.

3

152

514

0

4

24

Edouard Grave

@EXGRV

7 years

@sleepinyourhat FastText does something very similar (bag of char-ngrams) and yields good, UNK-free embeddings:

1

21

Edouard Grave

@EXGRV

2 years

On BEIR, the Contriever is on-par, or outperforms, BM25 on 11 out of 15 datasets for the recall @100 . Code & models will be released soon. Joint work w/ @gizacard @mcaron31 @lucas_hosseini @riedelcastro @p_bojanowski @armandjoulin

2

4

20

Edouard Grave

@EXGRV

5 years

Code release for our #acl2019nlp paper "Adaptive Attention Span in Transformers": .

GitHub - facebookresearch/adaptive-span: Transformer training code for sequential tasks

Transformer training code for sequential tasks. Contribute to facebookresearch/adaptive-span development by creating an account on GitHub.

github.com

Sainbayar Sukhbaatar

@tesatory

5 years

We released our code for adaptive-span! It can train a Transformer with a context size of 8k tokens #ACL2019

2

83

310

0

2

15

Edouard Grave

@EXGRV

3 years

Joint work with @gizacard , @Fabio_Petroni , @lucas_hosseini , @nicola_decao and @riedelcastro . This was part of our winning entry to the 6Gb track of the EfficientQA NeurIPS competition.

0

3

16

Edouard Grave

@EXGRV

5 years

Here is the arXiv link: . #acl2019nlp 2/2

Adaptive Attention Span in Transformers

We propose a novel self-attention mechanism that can learn its optimal attention span. This allows us to extend significantly the maximum context size used in Transformer, while maintaining...

arxiv.org

0

14

Edouard Grave

@EXGRV

2 years

New release of our Contriever project! It includes multi-lingual models which can perform cross-lingual retrieval (eg, retrieve English documents to answer a question in Swahili), the code to (pre-)train your own retrievers, and an updated version of the paper with new results.

Gautier Izacard

@gizacard

2 years

Code for Contriever is now available! Code: Paper: Additionally we trained mContriever, a state-of-the-art multilingual neural retriever, by applying a similar contrastive learning method.

6

12

114

0

2

13

Edouard Grave

@EXGRV

2 years

Introducing PEER, a new language model which makes text generation and editing more collaborative and controllable. It adds human in the loop, by following instructions and providing explanations. Work lead @timo_schick . Paper:

PEER: A Collaborative Language Model

Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language...

arxiv.org

Timo Schick

@timo_schick

2 years

🎉 New paper 🎉 We introduce PEER, a language model trained to incrementally write texts & collaborate w/ humans in a more natural way. It can write drafts, add suggestions, follow instructions, perform edits, correct itself & provide explanations. Link:

18

126

675

1

11

Edouard Grave

@EXGRV

3 years

Super happy about this result too! 🚀 And thanks to the organizers for this great competition!

Fabio Petroni

@Fabio_Petroni

3 years

Super happy for winning the 6Gb track at the EfficientQA #NeurIPS competition. Our submission, lead by @gizacard w/ Lucas Husseini, @nicola_decao , @riedelcastro and @EXGRV , achieved top position in both auto and manual evaluation. 🚀

2

6

56

0

9

Edouard Grave

@EXGRV

6 years

@deliprao Thanks! The models are now available on the website:

Word vectors for 157 languages · fastText

We distribute pre-trained word vectors for 157 languages, trained on [*Common Crawl*](http://commoncrawl.org/) and [*Wikipedia*](https://www.wikipedia.org) using fastText.

fasttext.cc

2

9

Edouard Grave

@EXGRV

5 years

@OriolVinyalsML Interesting, didn't know about this appendix since it was removed in v4 and v5. Also, all attention is different since it merges the two sublayers, hence using the same attention over parameters and hidden states.

1

0

7

Edouard Grave

@EXGRV

2 years

Joint work with the great following team: @gizacard @PSH_Lewis @MariaLomeli_ @lucas_hosseini @Fabio_Petroni @timo_schick Jane Dwivedi-Yu @armandjoulin @riedelcastro

0

7

Edouard Grave

@EXGRV

9 months

@abacaj Yes, we have a couple of papers on that exact topic with @gizacard and @PSH_Lewis . Combining these advances lead to the Atlas language model (paper: , code: ).

0

7

Edouard Grave

@EXGRV

7 years

Very strong results on word language modeling using various regularization techniques, ASGD and continuous cache

Smerity

@Smerity

7 years

[Weight-dropped LSTM, non-monotonic ASGD, ptr cache] on word level language modeling gives 52.8 on PTB & 52.0 on WT2

8

62

139

1

0

5

Edouard Grave

@EXGRV

7 years

Word vectors for 90 languages, trained on Wikipedia with fastText:

1

2

5

Edouard Grave

@EXGRV

3 years

@arankomatsuzaki Nice! By the way, we also have a new way to train retriever systems, by distilling the attention scores of the reader to the retriever:

Distilling Knowledge from Reader to Retriever for Question Answering

The task of information retrieval is an important component of many natural language processing systems, such as open domain question answering. While traditional methods were based on...

arxiv.org

1

0

5

Edouard Grave

@EXGRV

5 years

New long-form question answering dataset and baselines!

AI at Meta

@AIatMeta

5 years

Introducing long- form question answering (), a new challenge that pushes #AI to provide complex explanations rather than just simple facts. #NLP

0

100

281

0

5

Edouard Grave

@EXGRV

4 years

Our main finding: generative models are great at combining information from multiple passages, as their performance keeps improving as the number of support documents increases. 2/3

1

0

5

Edouard Grave

@EXGRV

2 years

Our model, at 11B parameters, and significantly less training compute, outperforms LLMs on 64-shot question answering (+3 pts wrt SOTA) or 15-shot fact checking (+5 pts wrt SOTA).

1

0

5

Edouard Grave

@EXGRV

4 years

By processing passages independently in the encoder, but jointly in the decoder, our models scale to large numbers of passages, and can combine information from these multiple passages. 3/3

1

0

4

Edouard Grave

@EXGRV

2 years

Previous works showed that retrieval is helpful for knowledge intensive tasks, but mostly in settings with large training sets. Here, we show how to get the same benefits for few-shot learning.

1

0

4

Edouard Grave

@EXGRV

2 years

@mark_riedl Maybe use membership inference? Give a different example (or set of examples) to each student, check if these examples were used to train the models?

1

0

4

Edouard Grave

@EXGRV

1 year

@yoavgo @_joaogui1 @BlancheMinerva @MetaAI @GuillaumeLample No, it just means that after a number of tokens, it's more efficient to increase the model size than the dataset size to improve performance. Personally, I would not use the word "overtrained" to describe these models...

1

0

4

Edouard Grave

@EXGRV

11 months

@yoavgo The idea is cute, but I would not take the experimental results too seriously as the baseline numbers seem to be off.

0

4

Edouard Grave

@EXGRV

1 year

@yoavgo @_joaogui1 @BlancheMinerva @MetaAI @GuillaumeLample Another point worth mentioning is that "compute optimality" in that context only considers training, not inference. If using the model a lot, it is worth training a smaller model longer.

0

4

Edouard Grave

@EXGRV

1 year

@GuillaumeLample @arthurmensch @tlacroix6 Congrats Guillaume, Arthur and Timothée! Excited to see what you will build!

0

4

Edouard Grave

@EXGRV

7 years

@iatitov @omerlevy_ @kentonctlee @LukeZettlemoyer This also seems relevant:

0

4

Edouard Grave

@EXGRV

2 years

@ogrisel @ChrSzegedy @F_Vaggi @_arohan_ @kchonyc @deliprao @VahidK My guess is that it prevents the collapse of embeddings to zero? I'm curious what is the impact of this value as long as it's larger than 1.0.

1

3

Edouard Grave

@EXGRV

11 months

@armandjoulin @giffmana I do like this idea, and more generally drawing links between compression and prediction.

0

3

Edouard Grave

@EXGRV

7 years

@sleepinyourhat You can get vectors for OOV words from a trained model using ./fasttext print-word-vectors model.bin

0

3

Edouard Grave

@EXGRV

11 months

@giffmana I think the baseline numbers are off (at least the fasttext ones that were easy to run: eg, on r8/r52, it should be 97%/93%, not 82%/57%).

1

0

3

Edouard Grave

@EXGRV

2 years

@yoavgo Maybe Asymmetric Numeral Systems, by Duda (2013)? Probably not the latest, but definitely widely used.

Asymmetric numeral systems - Wikipedia

en.m.wikipedia.org

0

2

Edouard Grave

@EXGRV

1 year

@yoavgo @kroscoo @ryandcotterell I believe that with large neural nets, it would not really improve over a canonical segmentation. In the paper though, we did not have a subword vocabulary or canonical segmentation, and used all char ngrams (with freq. threshold).

1

0

2

Edouard Grave

@EXGRV

4 years

@srush_nlp @_joaogui1 I don't think it does (as you don't need this at test time).

0

2

Edouard Grave

@EXGRV

7 months

@soumithchintala Thanks for the kind words Soumith! Really excited by this new lab.

0

1

Edouard Grave

@EXGRV

8 years

@deliprao Thanks for the shout-out! Glad you liked our paper.

0

1

Edouard Grave

@EXGRV

2 years

@milesosborne @aCraigPfeifer That's a great and fair question! I think that a big advantage of (unsupervised) dense retrievers is that they easily benefit from a few annotated queries.

1

0

1

Edouard Grave

@EXGRV

6 months

@ylecun Merci Yann !

0

Edouard Grave

@EXGRV

1 year

@yoavgo @kroscoo @ryandcotterell This could potentially be used to find (slightly) better segmentations? But overall, yeah, having high capacity models means these differences dont really matter :/

0

1

Edouard Grave

@EXGRV

1 year

@yoavgo @_joaogui1 @BlancheMinerva @MetaAI @GuillaumeLample (I believe that the models could still improve by training even longer)

0

1

Edouard Grave

@EXGRV

4 years

@mo_norouzi @zodiacJRH @haffari Awesome work! We had a similar model in , although we did not apply it to segmentation for NMT. Excited to see advances in this research direction.

Training Hybrid Language Models by Marginalizing over Segmentations

Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Armand Joulin. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.

aclanthology.org

1

0

1

Edouard Grave

@EXGRV

8 years

@haldaume3 @armandjoulin @yoavgo We'll include vw to the preprint. Datasets are here if you want to run sth.

0

1

Edouard Grave

@EXGRV

2 years

@sclincha @gizacard @mcaron31 @lucas_hosseini @riedelcastro @p_bojanowski @armandjoulin Thanks for the references! We wrote that at the time of the ICLR submission, and it's probably outdated now ("competitive" is likely a better term anyway). Please note that we don't use distillation, and focus on unsupervised retrievers.

0

1

Edouard Grave

@EXGRV

8 years

@srchvrs @haldaume3 "For a more detailed overview of AI research on StarCraft, the reader should consult [23]."

1

0

1

Edouard Grave

@EXGRV

1 year

@yoavgo @kroscoo @ryandcotterell It is also possible to train by marginalizing over segmentations, if you parametrize the conditional distribution using characters.

Training Hybrid Language Models by Marginalizing over Segmentations

Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Armand Joulin. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.

aclanthology.org

1

0

1

Edouard Grave

@EXGRV

2 years

@balajis Search engines still want to have a word though

Patrick Lewis

@PSH_Lewis

2 years

🚨We’ve been working on better retrieval-augmented models & thrilled to present Atlas, led by @gizacard @EXGRV & myself🚨 Atlas is a end2end pretrained "RAG"-like model, beats models 50x its size on fewshot QA, sets numerous SotA on knowledge-intensive NLP

7

84

433

0

1