Douwe Kiela @douwekiela Twitter profile | Pikagi

Pikagi

Douwe Kiela

@douwekiela

9,979

Followers

380

Following

28

Media

463

Statuses

@ContextualAI CEO, @Stanford Adjunct Prof.

Bay Area

Joined July 2013

Don't wanna be here? Send us removal request.

Pinned Tweet

@douwekiela

Douwe Kiela

1 month

Very excited to finally share a bit more about what we've been working on! We've come a long way since the early days of RAG.

@ContextualAI

Contextual AI

1 month

Today, we’re excited to announce RAG 2.0, our end-to-end system for developing production-grade AI. Using RAG 2.0, we’ve created Contextual Language Models (CLMs), which achieve state-of-the-art performance on a variety of industry benchmarks. CLMs outperform strong RAG…

Tweet media one

35

140

1K

1

8

57

Last Seen Profiles

@cookiedoves

@Anthony_Gilardi

@Robert_Pierce56

@worstbehaviorll

@RubinsteinOk

@cottaget

@TrixieD56219177

@ODDLOREofficial

@NWCC_Football

@IronmanNetwork

@gravitystrikcs

@Spanish_Pinay

@madnessofcolors

@suny_cortland

@NimikaRatnakar

@nebraskabryce

@ASchectman

@LIMRA

@PutneyHighSport

@willgriff_uk

@dino999z

@mitpress

@graetgalaxy

@The_Swop

@ElCarpediem31

@NYSenatorRivera

@JCCMaccabiGames

@TheAuldDub_

@heavenonearth_2

@wimjmvoermans

@Shahid_Abbasi_1

@Ai_Art

@JaaydenK

@Ani_Rini_

@serlat_PvP

@jauhari_johar

@douwekiela

Douwe Kiela

2 years

Personal news! Today is my first day @huggingface ! I'm going to help build out a world-class research lab. Thrilled to be joining such an amazing team! In my new role I plan to be more active on Twitter so be sure to follow for updates🤗

56

35

1K

@douwekiela

Douwe Kiela

2 years

🎉🎉🎉 Happy to officially announce the @HuggingFace 🤗 AI Research Residency Program, a great opportunity to launch or advance your career in machine learning research 🚀 Read more and/or apply here:

Tweet media one

13

224

1K

@douwekiela

Douwe Kiela

1 year

Woohoo we won #EMNLP2022 best demo! 🤗 That’s three in a row for @huggingface 🤯🔥 2020 - Transformers 2021 - Datasets 2022 - Evaluate & Evaluation on the Hub

Tweet media one

11

61

775

@douwekiela

Douwe Kiela

2 years

🥳 We are hiring researchers and research interns! Apply here: . People with characteristics that are underrepresented in tech are especially encouraged to apply. We will also be having a residency program soon @HuggingFace , stay tuned! 🤗

Tweet card media

Here at Hugging Face, we’re on a journey to advance and democratize ML for everyone. Along the way, we contribute to the development of technology for the better.

apply.workable.com

8

131

507

@douwekiela

Douwe Kiela

4 years

I’m super excited to announce Dynabench - a new and ambitious research platform for dynamic data collection and benchmarking: 1/n

9

117

459

@douwekiela

Douwe Kiela

2 years

Machine learning is about algorithms, compute, data and measurement. The last one is often glossed over, but it’s hugely important: to understand our amazing progress, and current limitations, we need to improve how we do evaluation. Today, we introduce 🤗Evaluate to help! (1/4)

Tweet media one

2

60

423

@douwekiela

Douwe Kiela

4 years

Excited to announce the Hateful Memes Challenge, a new dataset and competition for vision and language, focusing on multimodal understanding and reasoning: "Mean" memes shown here as an illustration of the problem. Some highlights: (1/7)

Tweet media one

13

101

402

@douwekiela

Douwe Kiela

11 months

Super excited to announce that @apsdehal and I have launched a new company: @ContextualAI ! Why did we start it? Because LLMs are going to radically change the way enterprises operate, and we see a huge need for LLMs that actually work for enterprise use cases. 1/5

38

39

427

@douwekiela

Douwe Kiela

2 years

In addition to my new role at @huggingface , I have joined @Stanford as an Adjunct Professor in Symbolic Systems. Very excited to get to work with bright students on cutting edge open source ML projects, alongside such brilliant colleagues as @chrisgpotts and @michaelfrank .

@douwekiela

Douwe Kiela

2 years

Personal news! Today is my first day @huggingface ! I'm going to help build out a world-class research lab. Thrilled to be joining such an amazing team! In my new role I plan to be more active on Twitter so be sure to follow for updates🤗

56

35

1K

9

16

391

@douwekiela

Douwe Kiela

1 year

Life update: After working with the most wonderful team, coining “BLOOM” and winning EMNLP best demo, I have moved on from @huggingface . I’m very excited for what’s next!

14

5

396

@douwekiela

Douwe Kiela

2 years

Today, we introduce Evaluation on the Hub, a new tool in the @HuggingFace ecosystem that lets you evaluate any model on any dataset without a single line of code! Find out more in the blog post: .

Tweet media one

5

66

369

@douwekiela

Douwe Kiela

7 months

My @Stanford @stanfordnlp CS224N lecture on Multimodal Deep Learning is online! I've been saying for a long time now that this multimodal thing is going to be big one day ;)

9

58

323

@douwekiela

Douwe Kiela

5 years

Excited (in my 1st tweet ever!) to announce Adversarial NLI: a new large-scale benchmark dataset for NLU, and a challenge to the community. Great job by @EasonNie , together with @adinamwilliams @em_dinan @mohitban47 and @jaseweston .

Tweet card media

Adversarial NLI: A New Benchmark for Natural Language Understanding

We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to...

4

92

298

@douwekiela

Douwe Kiela

2 years

If there was an open source @huggingface library centered on evaluation, what do you think should definitely be in it?

42

21

229

@douwekiela

Douwe Kiela

3 years

I’m very excited to share the next step in the @DynabenchAI journey: Dynaboard. Paper: Blog: Moving beyond accuracy to dynamic, holistic model evaluation in NLP. Example: 1/5

1

55

185

@douwekiela

Douwe Kiela

2 years

BERT and GPT-2 don't know about Covid and think Obama is the president of the US (if only). They're still used all over the place in research and production 🤔 Let's start exploring continuously updating language models:

@TristanThrush

Tristan Thrush

2 years

We’re going to do it! We’ll train and release masked and causal language models (e.g. BERT & GPT-2) on new Common Crawl snapshots as they come out! We call this project Online Language Modeling (OLM). What applications or research questions can we enable or help answer? A 🧵:

10

62

497

7

10

174

@douwekiela

Douwe Kiela

4 years

Supervised multimodal bitransformers now available in the awesome HuggingFace Transformers library!

@huggingface

Hugging Face

4 years

Transformers 2.4.0 is out 🤗 - Training transformers from scratch is now supported - New models, including *FlauBERT*, Dutch BERT, *UmBERTo* - Revamped documentation - First multi-modal model, MMBT from @facebookai , text & images Bye bye Python 2 🙃

7

168

685

1

24

118

@douwekiela

Douwe Kiela

4 years

@ale_suglia @huggingface @mohitban47 @yoavartzi @jasonbaldridge @FelixHill84 We officially announced MMF today, which has a whole bunch of Vision & Language tasks and (pre-trained) V&L models: . Sounds like that's what you're looking for? cc @apsdehal

Tweet card media

GitHub - facebookresearch/mmf: A modular framework for vision & language multimodal research from...

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR) - facebookresearch/mmf

2

41

121

@douwekiela

Douwe Kiela

9 months

Progress in AI continues to outpace benchmarks. Check out this new plot, inspired by @DynabenchAI , that shows just how quickly it's happening. Read more about it here:

Tweet media one

6

28

116

@douwekiela

Douwe Kiela

2 years

I gave a talk about @DynabenchAI at @GoogleAI yesterday and figured people might find the slides interesting: An alternative title would have been "humans and models in loops" :)

0

8

116

@douwekiela

Douwe Kiela

2 years

I am in the UK this week to visit our brand new @HuggingFace London office! Who's up for a pint in a pub? Shoot me a message!🤗

4

3

106

@douwekiela

Douwe Kiela

1 year

There seems to be a huge appetite for a “BigScience for RLHF” (here at #EMNLP2022 at least), where as a community we pool resources to enable an open source #ChatGPT . What do you think that should that look like?

2

15

99

@douwekiela

Douwe Kiela

1 year

Come see our @huggingface demos at #EMNLP2022 ! We have swag (tshirts, hats, stickers)!!

Tweet media one

2

5

96

@douwekiela

Douwe Kiela

4 years

Just updated the ANLI paper with the #acl2020nlp camera ready: . Lots of extra stuff: more analysis on the value of dynamic adversarial data collection, details on annotators and more discussion. (1/2)

Tweet media one

@douwekiela

Douwe Kiela

5 years

Excited (in my 1st tweet ever!) to announce Adversarial NLI: a new large-scale benchmark dataset for NLU, and a challenge to the community. Great job by @EasonNie , together with @adinamwilliams @em_dinan @mohitban47 and @jaseweston .

4

92

298

1

21

77

@douwekiela

Douwe Kiela

2 years

Nice article from @Kyle_L_Wiggers on the amazing @BigscienceW project. It also covers some other things going on @huggingface , such as me joining :)

Tweet card media

Inside BigScience, the quest to build a powerful open language model

Hugging Face's BigScience project is making progress toward developing an open, massive -- and highly capable -- natural language model.

venturebeat.com

1

18

77

@douwekiela

Douwe Kiela

4 years

Exciting! Hateful Memes now enters the final phase that will decide the winners, with a new unseen test set. The best model on the seen test set so far scored 0.8566 vs 0.7141 for VisualBERT - it's going to be very interesting to see what people have come up with.

@drivendataorg

DrivenData

4 years

Can you help detect #hatespeech in internet memes? Phase 2 of the Hateful Memes Challenge is now live! Each team has until the end of October to make up to 3 submissions against a new, unseen test set & win a share of $100k! Good luck! @facebookai

Tweet media one

2

8

32

1

12

72

@douwekiela

Douwe Kiela

2 years

We know that large language models are very sensitive to prompts in few-shot learning, but @maxbartolo pointed out to me that the ground truth labels don’t actually matter all that much! Check out this example for BLOOM, with opposing labels--is this something that’s well-known?

Tweet media one

Tweet media two

8

7

71

@douwekiela

Douwe Kiela

6 months

The AI evaluation crisis continues..

@lmsysorg

lmsys.org

6 months

Catch me if you can! Announcing Llama-rephraser: 13B models reaching GPT-4 performance in major benchmarks (MMLU/GSK-8K/HumanEval)! To validate results, we followed OpenAI's decontamination method and found no evidence of contamination...🤔 Blog: [1/n]

Tweet media one

22

156

927

3

7

71

@douwekiela

Douwe Kiela

1 year

Big tech is on a freeze ❄❄ but at @HuggingFace we are hiring interns!🤗 Just announced:

@huggingface

Hugging Face

1 year

We're hiring interns! 👏 If you're interested in working on cutting-edge problems in AI and ML, come take a look at our internship list: Hiring for many teams: Science, OSS, Gradio, Moonshot, ... Hiring remotely, as well as in-person in many places! 🤗

8

70

256

3

7

69

@douwekiela

Douwe Kiela

1 year

Very excited about this collaboration with @arxiv ! Demos are super important for understanding research contributions, and for making cutting edge AI research more broadly accessible:

@arxiv

arXiv.org

1 year

We are launching a new arXivLabs collaboration with @HuggingFace to make demos related to papers in cs, stats, and eess directly accessible from arXiv!

23

426

3K

4

6

67

@douwekiela

Douwe Kiela

2 years

Winoground: the exact same words in a different order visually depict very different things - can SOTA vision and language models "ground" the captions correctly? Turns out they can't! Even CLIP does not outperform a random baseline..

Tweet media one

@TristanThrush

Tristan Thrush

2 years

Happy to announce our new CVPR paper - Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality. All tested SOTA multimodal models perform very poorly on our new vision-language eval dataset. Paper: #CVPR2022 , #NLProc 1/5

8

44

247

1

9

60

@douwekiela

Douwe Kiela

4 years

New work! What happens when you add retrieval over Wikipedia to the "work horse of NLP", sequence-to-sequence models? Turns out it works really well. Amazing job by @PSH_Lewis , @EthanJPerez and other @facebookai colleagues.

@PSH_Lewis

Patrick Lewis

4 years

Thrilled to share new work! “Retrieval-Augmented Generation for Knowledge-Intensive NLP tasks”. Big gains on Open-Domain QA, with new State-of-the-Art results on NaturalQuestions, CuratedTrec and WebQuestions. check out here: . 1/N

Tweet media one

4

150

562

0

11

46

@douwekiela

Douwe Kiela

7 months

@mustafasuleyman It’s a bit more subtle though. We haven't "solved" these problems - we've only surpassed human performance on very narrow and very flawed evaluation benchmarks. The story behind this plot is documented in the original blog post .

Plotting Progress in AI - Contextual AI

The last few years have seen relentless progress in what AI can do, yet our ability to gauge these abilities has never been worse. A key culprit? Static benchmarks. We recognized this problem back in...

2

6

48

@douwekiela

Douwe Kiela

10 months

LLMs are increasingly used in mission critical settings, but #LLM security is woefully understudied. Here are 12 of the most common risks and threats: …a 🧵(1/17)

Tweet media one

3

18

45

@douwekiela

Douwe Kiela

4 years

Yann did an incredible job as a member of the Facebook AI residency program, leading to a spotlight at #Neurips2020 that I think will have big repercussions:

@yanndubs

Yann Dubois

4 years

New paper: we characterize optimal representations for supervised learning, and show how to ~learn them! Our framework gives 1) a regularizer and 2) a predictor of generalization in DL. (NeurIPS spotlight) with @douwekiela @davidjschwab @Rama_vedantam 1/7

Tweet media one

3

66

240

0

4

39

@douwekiela

Douwe Kiela

10 months

Vision augmented language models outperform multimodal pretraining. Introducing LENS:

Tweet media one

@w33lliam

William Berrios

10 months

Announcing LENS 🔎, a framework for vision-augmented language models. - Outperforms Flamingo by 9% (56->65%) on VQAv2 - Eliminates the additional cost of multimodal pre-training Demo: Blog+Paper+Code: A 🧵 [1/N]

2

42

192

0

9

41

@douwekiela

Douwe Kiela

4 years

Awesome! RAG (accepted at #NeurIPS2020 ) is now available in @HuggingFace Transformers. Great work by @olapiktus . Also check out this super cool demo by @YJernite :

@AIatMeta

AI at Meta

4 years

Our Retrieval Augmented Generation #NLP model is now available as part of the @HuggingFace transformer library. The true strength of RAG is in its flexibility. You control what it knows simply by swapping out the documents it uses for knowledge retrieval.

8

202

644

1

2

38

@douwekiela

Douwe Kiela

11 months

We are *officially* out of stealth today 🚀 And on my birthday, how auspicious! You can read more about our launch and plans here: 2/5

Tweet card media

Announcing our $20m seed round to build the next generation of language models - Contextual AI

Large language models, or LLMs, are going to radically change the way we work, and in many ways they are already starting to do so. With AI going fully mainstream this year, however, we are also...

4

0

38

@douwekiela

Douwe Kiela

10 months

Pretty good! (Try it at )

Tweet media one

3

6

36

@douwekiela

Douwe Kiela

2 years

Very important, we can now search directly through the corpus that was used to train a LLM to understand it better:

@olapiktus

Ola Piktus

2 years

Why do LMs say what they say? We often don't know - but now we might get a better idea. My first project with @huggingface , the ROOTS search tool goes live today🤗🌸It allows anyone to browse through the 1.6TB of the corpus behind @BigScience ’s BLOOM 🚀🧵

Tweet media one

12

93

467

1

1

36

@douwekiela

Douwe Kiela

2 years

This project started as an exploration of fairness metrics for @DynabenchAI more than a year ago and became so much more - a large human-annotated dataset, a hopefully useful tool, and "fairer" language models (without sacrificing on accuracy).

@adinamwilliams

Adina Williams

@adinamwilliams

2 years

We’re happy to announce our new preprint on perturbation augmentation for fairer NLP! We trained a seq2seq control-gen model to “perturb” demographic references. We use it to pretrain & finetune LMs that are fairer, without sacrificing accuracy. We measure fairness with it too 🧵

Tweet media one

5

40

191

0

3

35

@douwekiela

Douwe Kiela

7 months

I’ve heard people say the AI revolution will impact the world on the same scale as the internet revolution. I say that’s wrong: in retrospect, the primary purpose of the Web will have been to ensure that we could get enough data to train AI. Much bigger revolution incoming.

1

4

35

@douwekiela

Douwe Kiela

2 years

Thank you @acmmm2022 for having me! For those who missed it and are curious -- here are my slides, about ten years of multimodal machine learning adventures:

@acmmm2022

ACM Multimedia

2 years

👏 Thanks, @douwekiela for the great keynote!

Tweet media one

0

1

12

0

3

32

@douwekiela

Douwe Kiela

3 years

The call for proposals for the Competition Track @NeurIPSConf 2021 is out: . Do not miss out on your chance to be part of this exciting track!

0

7

31

@douwekiela

Douwe Kiela

2 years

The community is finally starting to take evaluation more seriously and now is an exciting time for pushing the boundaries on improved measurement. For us at @huggingface , this is just the beginning - we have a lot more exciting stuff in store in the next few weeks! (4/4)

1

2

30

@douwekiela

Douwe Kiela

11 months

It's as if BERT came out and we all started evaluating our LSTMs using the "agreement with BERT" metric.. Would we then be surprised that "oh models that are more like BERT do better"? GPT4 is great for data annotation but obviously imperfect for proper evaluation!

@yizhongwyz

Yizhong Wang

11 months

One more thing - about using GPT4 as the evaluator. We found GPT4 favorites ShareGPT data a lot. But why? We further discovered a superficial feature that is strongly correlated - # of unique tokens in the output! Maybe it's measuring the informativeness over all else?😀

Tweet media one

3

28

133

0

2

31

@douwekiela

Douwe Kiela

5 months

Very excited about this work by @ethayarajh and @winniethexu on better faster cheaper alignment of LLMs -- KTO outperforms DPO, especially for the bigger models, without needing to spend on pairwise data annotation.

@ethayarajh

Kawin Ethayarajh

5 months

📢The problem in model alignment no one talks about — the need for preference data, which costs $$$ and time! Enter Kahneman-Tversky Optimization (KTO), which matches or exceeds DPO without paired preferences. And with it, the largest-ever suite of feedback-aligned LLMs. 🧵

Tweet media one

19

130

698

1

4

29

@douwekiela

Douwe Kiela

10 months

Two central topics in the LLM literature are “alignment” and “hallucination”. But what do these terms really mean? I think over time their meaning has kind of shifted, and now a lot of folks are confused.. 🧵

3

6

30

@douwekiela

Douwe Kiela

2 years

At #CVPR2022 , @TristanThrush and @apsdehal presented Winoground and FLAVA, two papers I'm very excited by. They also built some really cool demos and open sourced everything. It's all available in the 🤗 @huggingface ecosystem of course! ⬇️ (1/n)

1

2

28

@douwekiela

Douwe Kiela

8 months

Evaluation is going to be big business. What a fantastic team this is:

@PatronusAI

PatronusAI

8 months

We are launching out of stealth today with a $3M seed round led by @lightspeedvp , with participation from @amasad , @gokulr , @MattHartman and other fortune 500 execs and board members 🚀 Read our story here:

11

30

184

0

4

25

@douwekiela

Douwe Kiela

2 years

.. for a quick primer, see @lvwerra 's official announcement: (2/4)

@lvwerra

Leandro von Werra

2 years

Evaluation is one of the most important aspects of ML but today’s evaluation landscape is scattered and undocumented which makes evaluation unnecessarily hard. For that reason we are excited to release 🤗 Evaluate! Let’s take a tour:

Tweet media one

12

339

2K

1

1

25

@douwekiela

Douwe Kiela

3 years

The Hateful Memes Challenge winners have been announced! You can find out more about the winning solutions at the NeurIPS competition event, happening today at 5pm Vancouver time.

@AIatMeta

AI at Meta

3 years

Hate speech can come in many forms, including memes that combine text & images. We launched the Hateful Memes Challenge, a first-of-its-kind competition, to help the AI community find new ways to detect multimodal hate speech. Learn about the winners here:

21

52

172

0

2

23

@douwekiela

Douwe Kiela

2 years

Great article in @ScienceMagazine by @SilverJacket on benchmarking in AI, covering @DynabenchAI as one way forward. So much important work to do in machine learning evaluation!

@SilverJacket

Matthew Hutson

2 years

Computers ace IQ tests but still make dumb mistakes. Can better AI benchmarks help? My feature in @ScienceMagazine :

6

15

38

0

5

24

@douwekiela

Douwe Kiela

8 months

"One major missing feature is better retrieval" - exactly right! We are only just getting started when it comes to true enterprise-grade LLMs.

@DrJimFan

Jim Fan

8 months

ChatGPT Enterprise: the beginning of the end of many B2B thin wrapper startups. Finally addresses the data privacy (“no training” commitment) and security issues. Extra goodies are long context (32k), 2x faster inference, and Code Interpreter. I bet the App Store will be next…

Tweet media one

47

170

1K

0

5

23

@douwekiela

Douwe Kiela

10 months

I had the great pleasure of recording a podcast with one of the best podcasters out there, @HarryStebbings from @20vcFund . We talk about all things #AI , from fundraising to moats to scaling laws to regulation’s impact on innovation. Give it a listen!

20VC: Why Data Size Matters More Than Model Size, Why The Google Employee Was Wrong; OpenAI and...

Listen to this episode from The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch on Spotify. Douwe Kiela is the CEO of Contextual AI, building the contextual language model to...

open.spotify.com

1

4

23

@douwekiela

Douwe Kiela

10 months

Very insightful - LLMs from the perspective of developmental psychology:

@mcxfrank

Michael C. Frank

10 months

How do we use methods from developmental psychology to assess AI models? My comment, "'Baby steps' in evaluating the capacities of large language models" is now out in Nature Reviews Psychology:

Tweet media one

6

98

392

0

1

24

@douwekiela

Douwe Kiela

5 years

To paraphrase Shakespeare, there is something rotten in the state of the art. Adversarial NLI was collected using HAMLET (human-and-model-in-the-loop entailment training) to create a "moving post" dynamic dataset, rather than a static benchmark that will saturate quickly.

0

1

22

@douwekiela

Douwe Kiela

4 years

Static benchmarks have been extremely important for AI, but also have well-known issues: they saturate quickly, are susceptible to overfitting, contain exploitable annotator artifacts and biases, and have unclear or imperfect evaluation metrics. 2/n

1

2

21

@douwekiela

Douwe Kiela

4 years

Turning the famous Fred Jelinek quote around: “Every time I _hire_ a linguist, the performance of my NLP system goes up” is one of the goals here. Everyone can build models that cannot be fooled as easily; and everyone can help create examples to train next-gen AI systems. 4/n

1

6

22

@douwekiela

Douwe Kiela

4 years

Awesome team effort, including by @max_nlp , @dkaushik96 , @EasonNie , @adinamwilliams , @grushaprasad , @apsdehal , @pringshia , @robinomial , @mohitban47 , Pontus Stenetorp, @riedelcastro , @AtticusGeiger , @ChrisGPotts , @bertievidgen , @ZeerakW 12/n

1

1

22

@douwekiela

Douwe Kiela

2 years

BigScience for code: BigCode! Very important to develop large language models for code out in the open, for the whole community to learn from and build upon.

@BigCodeProject

BigCode

@BigCodeProject

2 years

print("Hello world! 🎉") Excited to announce the BigCode project led by @ServiceNowRSRCH and @huggingface ! In the spirit of BigScience we aim to develop large language models for code in an open and responsible way. Join here: A thread with our goals🧵

Tweet media one

5

72

216

1

2

22

@douwekiela

Douwe Kiela

3 years

The Hateful Memes Challenge has a new home! Also, did you know that there's a shared task on fine-grained multi-modal hate speech detection at the WOAH 2021 workshop @aclmeeting ? Still a couple of weeks left!

Tweet media one

0

5

21

@douwekiela

Douwe Kiela

2 years

Turns out there’s some interesting published work on this. E.g. @sewon__min et al demonstrated that "randomly replacing labels in the demonstrations barely hurts performance":

Tweet card media

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new...

1

0

21

@douwekiela

Douwe Kiela

2 years

There’s also a sort of “rebuttal” to that paper, which finds that “the ground-truth input-label mapping is a crucial component for successful in-context learning”: . Intriguing!

Tweet card media

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

Despite recent explosion of interests in in-context learning, the underlying mechanism and the precise impact of the quality of demonstrations remain elusive. Intuitively, ground-truth labels...

3

1

21

@douwekiela

Douwe Kiela

4 years

Dynabench, in essence, is a scientific experiment: can we make faster progress if we collect data dynamically, with humans and models in the loop, rather than in the old-fashioned static way? 3/n

Tweet media one

1

1

17

@douwekiela

Douwe Kiela

11 months

I’d like to personally give a massive thanks to the folks who participated in our seed round, including @BainCapVC , @Greycroft , @lightspeed , as well as our incredible set of angel investors, including @eladgil , @lip_bu , @svangel , @sarahniyogi , @amasad , @saranormous , ... 3/5

2

1

21

@douwekiela

Douwe Kiela

4 years

Can we learn to decompose difficult questions into easier ones, without supervision? Turns out we can! Great job by @EthanJPerez

@EthanJPerez

Ethan Perez

4 years

New! "Unsupervised Question Decomposition for Question Answering": We decompose a hard Q into several, easier Qs with *unsupervised learning*, improving multi-hop QA on HotpotQA without extra supervision. w/ @PSH_Lewis @scottyih @kchonyc @douwekiela (1/n)

Tweet media one

4

68

258

0

3

20

@douwekiela

Douwe Kiela

2 years

One of our goals with the 🤗Evaluate library is to make it easy to move beyond accuracy-based metrics, and to look at model evaluation more holistically. Here's @SashaMTL showcasing this for LLM bias evaluation:

@SashaMTL

Sasha Luccioni, PhD 🦋🌎✨🤗

2 years

Wanna know what kinds of biases are hidden in your large language model? We've made some tools to help you figure that out 🤗 Check out our blog (and accompanying Jupyter notebook!) to learn more:

5

44

190

1

1

20

@douwekiela

Douwe Kiela

4 years

@steven_vdgraaf @Thom_Wolf @EasonNie @adinamwilliams @em_dinan @mohitban47 @jaseweston It is now, with version 0.1 of the dataset available for download! Note that there is also a cool demo available, where you can play with a BERT model and discover its weaknesses: .

0

7

19

@douwekiela

Douwe Kiela

11 months

Nice coverage from @EricNewcomer :

Tweet card media

Former Facebook Research Scientists Raise $20 Million Seed Round for Enterprise Foundation Model...

Bain Capital Ventures is leading the $20 million seed round in Contextual AI

www.newcomer.co

2

4

20

@douwekiela

Douwe Kiela

2 years

Duocorn (twonicorn?) 🦄🦄 status unlocked! 🚀

@huggingface

Hugging Face

2 years

🤗🚀

Tweet media one

98

239

2K

0

0

20

@douwekiela

Douwe Kiela

4 years

For more, see . A nice blog post here: . Starter kit code here: . (7/7)

Hateful Memes Challenge and dataset

We’re launching the Hateful Memes Challenge, an online competition with a $100K total prize pool, and sharing a dataset designed specifically to help AI researchers develop new systems to identify...

0

2

19

@douwekiela

Douwe Kiela

11 months

... @HarryStebbings , Timothy Li, @ChrisGPotts , @kchonyc , @orf_bnw , @nathanbenaich , @joespeez , @Fraser , @egrefen , @karlmoritz , @atroyn , and others! 4/5

1

1

18

@douwekiela

Douwe Kiela

2 years

Come see our poster @NeurIPSConf on human-adversarial visual question answering at poster session 1 today! Also, we’ve teamed up with the awesome AVQA () folks to provide one central benchmark - check out ! #NeurIPS2021

Tweet card media

Adversarial VQA: A New Benchmark for Evaluating the Robustness of...

Benefiting from large-scale pre-training, we have witnessed significant performance boost on the popular Visual Question Answering (VQA) task. Despite rapid progress, it remains unclear whether...

@apsdehal

Amanpreet Singh

3 years

Vision and language research has made great progress, but how good are we really, and what are we still missing? We examine these questions in Human-Adversarial VQA: 1/6

Tweet media one

2

25

118

1

4

17

@douwekiela

Douwe Kiela

4 years

One major goal is better alignment between AI systems and people. The metric for evaluating an AI system should not (only) be accuracy on some static dataset, but: how many mistakes do you make when interacting with a potentially adversarial human being? 6/n

1

1

17

@douwekiela

Douwe Kiela

11 months

Extra special thank you to the team for helping us reach this milestone - so excited to be going on this journey with you: @moinnadeem , @realgmittal , @JBaeThoughts , @w33lliam , @TristanThrush , @caseyfitz , @yaroslavvb , and many more to come! 5/5

6

1

17

@douwekiela

Douwe Kiela

2 years

This is seriously cool: want to try out some prompts on @BigscienceW Bloom *as it is training*? Add them to the Bloom Book!

1

1

17

@douwekiela

Douwe Kiela

2 years

Simple and super fast few-shot learning, without prompts, outperforming GPT-3 on the RAFT benchmark.

@_lewtun

Lewis Tunstall

2 years

🔥 Excited to share new research from @huggingface & @IntelAIResearch on few-shot learning with language models 🔥 We introduce SetFit - a simple, yet sample efficient approach based on Sentence Transformers 🤖 A 🧵 on what it's all about ...

Tweet media one

16

86

392

0

3

18

@douwekiela

Douwe Kiela

2 years

The @BigscienceW AMA on Reddit r/MachineLearning is starting in a bit!

@BigscienceW

BigScience Research Workshop

2 years

Reminder: there will be a Reddit AMA session at r/MachineLearning on the BigScience multilingual 176B model training ( @BigScienceLLM ) & BigScience tomorrow (Thursday March 24th) starting at 5pm CET with a dozen participants of BigScience Follow it here:

0

7

30

0

1

16

@douwekiela

Douwe Kiela

2 years

Huge kudos to @_lewtun , @abhi1thakur , @TristanThrush , @SashaMTL , @lvwerra , @nazneenrajani , @olapiktus , @osanseviero , @victormustar , @julien_c , @SimonBrandeis for pulling this off!

@_lewtun

Lewis Tunstall

2 years

Excited to share a new tool we’ve built called Evaluation on the Hub 🔥🔥🔥! With this tool you can evaluate any model on any dataset with any metric🤯 Evaluate your models here👉 Let’s take a look at how it works 🧵 1/

Tweet media one

7

81

299

1

0

16

@douwekiela

Douwe Kiela

4 years

Hate speech is an important societal problem and multimodal hate speech is particularly difficult. While many other multimodal tasks often allow you to sort of rely on one modality, this dataset was constructed to *require* multimodal reasoning to succeed. (2/7)

2

3

11

@douwekiela

Douwe Kiela

10 months

How do long context LLMs use their input? Very cool deep dive:

0

2

15

@douwekiela

Douwe Kiela

2 years

🌸🌸🌸 It’s here!!!

@BigscienceW

BigScience Research Workshop

2 years

BLOOM is here. The largest open-access multilingual language model ever. Read more about it or get it at

Tweet media one

29

816

3K

0

0

15

@douwekiela

Douwe Kiela

3 years

I really enjoyed chatting with @pdasigi and @alexisjross about the future of benchmarking. Thanks for having me!

@pdasigi

Pradeep Dasigi

3 years

#nlphighlights 128: @alexisjross and I discussed dynamic benchmarking and leaderboards with @douwekiela for this episode. I really enjoyed this discussion. Thanks Douwe for joining us.

0

13

42

0

3

15

@douwekiela

Douwe Kiela

1 year

Great tool for exploring (the many) biases in Stable Diffusion:

@SashaMTL

Sasha Luccioni, PhD 🦋🌎✨🤗

1 year

What's the difference between these two groups of people? Well, according to Stable Diffusion, the first group represents an 'ambitious CEO' and the second a 'supportive CEO'. I made a simple tool to explore biases ingrained in this model:

Tweet media one

Tweet media two

21

103

364

0

2

14

@douwekiela

Douwe Kiela

8 months

Selecting good models and prompts can be cumbersome and expensive in the era of big benchmarks. Can we speed up prototyping? Yes we can:

@RajanVivek52643

Rajan Vivek

@RajanVivek52643

8 months

Can you reliably evaluate your model with just a handful of test examples? Yes, you often can! Anchor Points are tiny -- but surprisingly representative -- subsets of benchmarks. They can predict which other points the model will fail on… without evaluating on those points! 🧵

Tweet media one

5

38

261

1

1

14

@douwekiela

Douwe Kiela

3 years

This new work makes Dynabench even more dynamic: not only are the datasets and models dynamic, now the metrics and leaderboards can be too! Awesome team effort by @MZhiyi , @ethayarajh , @TristanThrush , @somya_j , @LedellWu , @robinomial , @ChrisGPotts and @adinamwilliams . 5/5

0

0

13

@douwekiela

Douwe Kiela

11 months

.. and from @TechCrunch @Kyle_L_Wiggers :

Tweet card media

Contextual AI launches from stealth to build enterprise-focused language models | TechCrunch

Contextual AI, a startup in the generative AI space with a focus on the enterprise, has launched out of stealth with $20 million in funding.

1

2

13

@douwekiela

Douwe Kiela

4 years

These ideas have been around for quite a while, for example in Build it Break it: the Language Edition (), Adversarial NLI () and Beat the AI (). Dynabench provides a platform for further exploration. 5/n

Tweet card media

Beat the AI: Investigating Adversarial Human Annotation for...

Innovations in annotation methodology have been a catalyst for Reading Comprehension (RC) datasets and models. One recent trend to challenge current RC models is to involve a model in the...

1

0

12

@douwekiela

Douwe Kiela

4 years

There's a demo here: . Awesome work by @EasonNie , together with @adinamwilliams @em_dinan @mohitban47 and @jaseweston . (2/2)

Tweet media one

0

2

11

@douwekiela

Douwe Kiela

4 years

The time is ripe to radically rethink benchmarking in AI. Models are now good enough to be “put in the loop” with humans. With some creativity, they’re still easy to fool. Dynabench will allow us to build more robust models, and shed light on SOTA models’ weaknesses. 8/n

1

0

11

@douwekiela

Douwe Kiela

3 years

Once is back up @aclmeeting , come check out this poster by @shengs1123 ! Did you know that you can keep transformer layers randomly initialized for faster training and better generalization? #ACL2021 #ACL2021NLP

@shengs1123

Sheng Shen

3 years

We are presenting "Reservoir Transformers" at ACL poster session now, welcome to stop by.

0

0

9

0

2

12

@douwekiela

Douwe Kiela

2 years

.. and @SashaMTL 's overview thread: (3/4)

@SashaMTL

Sasha Luccioni, PhD 🦋🌎✨🤗

2 years

Evaluation is arguably the most important part of machine learning but let's be honest, it can be confusing, complicated and hard to find best practices. We are trying to change this with the launch of Hugging Face 🤗 Evaluate:

Tweet media one

3

53

213

1

0

12

@douwekiela

Douwe Kiela

10 months

Super cool to see paper reading livestreams like this! (also, very jealous of that background)

@bhutanisanyam1

Sanyam Bhutani

@bhutanisanyam1

10 months

Making LLMs multi-modal without training! 🙏 An amazing read on how to effectively combine captioning models with LLMs to outperform multi-modal models. The main challenge w multi-modal LLMs is the much larger compute requirement or need for larger datasets. This paper…

Tweet media one

4

26

149

1

0

11

@douwekiela

Douwe Kiela

2 years

Check out these FLAVA-based demos: And this one for Winoground: Loading FLAVA in @huggingface transformers and playing with it is super easy! (2/n)

Tweet media one

1

2

10

@douwekiela

Douwe Kiela

2 years

I wish Einstein had done this, it would have been so much easier. "A New Paradigm for Brownian Motion", or "A New (Special/General) Paradigm for Relativity".

@alisawuffles

Alisa Liu

2 years

We introduce a new paradigm for dataset creation based on human 🧑‍💻 and machine 🤖 collaboration, which brings together the generative strength of LMs and the evaluative strength of humans. And we collect 🎉 WaNLI, a dataset of 108K NLI examples! 🧵 Paper:

Tweet media one

14

215

1K

1

0

11

@douwekiela

Douwe Kiela

3 years

#NLProc is overly focused on accuracy as the sole performance metric. We advocate for a more comprehensive multi-metric approach, incentivizing greener and fairer models. 2/5

1

0

11

@douwekiela

Douwe Kiela

6 months

We're looking for amazing solutions engineers and product engineers at @ContextualAI . Who should we hire? DMs open or apply via our job portal!

1

2

11

@douwekiela

Douwe Kiela

4 years

We are organizing a @NeurIPSConf competition around the challenge, hosted by @drivendataorg , with $100k in prize money! We know that new benchmarks play an important role in driving progress in AI, and we hope this challenge will do the same and spur innovation. (5/7)

1

0

10

@douwekiela

Douwe Kiela

3 years

@sleepinyourhat I guess it's subjective ;) I didn't even try to be "adversarial" - many more examples to find at .

Tweet media one

1

2

9