Douwe Kiela Profile
Douwe Kiela

@douwekiela

9,979
Followers
380
Following
28
Media
463
Statuses

@ContextualAI CEO, @Stanford Adjunct Prof.

Bay Area
Joined July 2013
Don't wanna be here? Send us removal request.
Pinned Tweet
@douwekiela
Douwe Kiela
1 month
Very excited to finally share a bit more about what we've been working on! We've come a long way since the early days of RAG.
@ContextualAI
Contextual AI
1 month
Today, weโ€™re excited to announce RAG 2.0, our end-to-end system for developing production-grade AI. Using RAG 2.0, weโ€™ve created Contextual Language Models (CLMs), which achieve state-of-the-art performance on a variety of industry benchmarks. CLMs outperform strong RAGโ€ฆ
Tweet media one
35
140
1K
1
8
57
@douwekiela
Douwe Kiela
2 years
Personal news! Today is my first day @huggingface ! I'm going to help build out a world-class research lab. Thrilled to be joining such an amazing team! In my new role I plan to be more active on Twitter so be sure to follow for updates๐Ÿค—
56
35
1K
@douwekiela
Douwe Kiela
2 years
๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Happy to officially announce the @HuggingFace ๐Ÿค— AI Research Residency Program, a great opportunity to launch or advance your career in machine learning research ๐Ÿš€ Read more and/or apply here:
Tweet media one
13
224
1K
@douwekiela
Douwe Kiela
1 year
Woohoo we won #EMNLP2022 best demo! ๐Ÿค— Thatโ€™s three in a row for @huggingface ๐Ÿคฏ๐Ÿ”ฅ 2020 - Transformers 2021 - Datasets 2022 - Evaluate & Evaluation on the Hub
Tweet media one
11
61
775
@douwekiela
Douwe Kiela
2 years
๐Ÿฅณ We are hiring researchers and research interns! Apply here: . People with characteristics that are underrepresented in tech are especially encouraged to apply. We will also be having a residency program soon @HuggingFace , stay tuned! ๐Ÿค—
8
131
507
@douwekiela
Douwe Kiela
4 years
Iโ€™m super excited to announce Dynabench - a new and ambitious research platform for dynamic data collection and benchmarking: 1/n
9
117
459
@douwekiela
Douwe Kiela
2 years
Machine learning is about algorithms, compute, data and measurement. The last one is often glossed over, but itโ€™s hugely important: to understand our amazing progress, and current limitations, we need to improve how we do evaluation. Today, we introduce ๐Ÿค—Evaluate to help! (1/4)
Tweet media one
2
60
423
@douwekiela
Douwe Kiela
4 years
Excited to announce the Hateful Memes Challenge, a new dataset and competition for vision and language, focusing on multimodal understanding and reasoning: "Mean" memes shown here as an illustration of the problem. Some highlights: (1/7)
Tweet media one
13
101
402
@douwekiela
Douwe Kiela
11 months
Super excited to announce that @apsdehal and I have launched a new company: @ContextualAI ! Why did we start it? Because LLMs are going to radically change the way enterprises operate, and we see a huge need for LLMs that actually work for enterprise use cases. 1/5
38
39
427
@douwekiela
Douwe Kiela
2 years
In addition to my new role at @huggingface , I have joined @Stanford as an Adjunct Professor in Symbolic Systems. Very excited to get to work with bright students on cutting edge open source ML projects, alongside such brilliant colleagues as @chrisgpotts and @michaelfrank .
@douwekiela
Douwe Kiela
2 years
Personal news! Today is my first day @huggingface ! I'm going to help build out a world-class research lab. Thrilled to be joining such an amazing team! In my new role I plan to be more active on Twitter so be sure to follow for updates๐Ÿค—
56
35
1K
9
16
391
@douwekiela
Douwe Kiela
1 year
Life update: After working with the most wonderful team, coining โ€œBLOOMโ€ and winning EMNLP best demo, I have moved on from @huggingface . Iโ€™m very excited for whatโ€™s next!
14
5
396
@douwekiela
Douwe Kiela
2 years
Today, we introduce Evaluation on the Hub, a new tool in the @HuggingFace ecosystem that lets you evaluate any model on any dataset without a single line of code! Find out more in the blog post: .
Tweet media one
5
66
369
@douwekiela
Douwe Kiela
7 months
My @Stanford @stanfordnlp CS224N lecture on Multimodal Deep Learning is online! I've been saying for a long time now that this multimodal thing is going to be big one day ;)
9
58
323
@douwekiela
Douwe Kiela
2 years
If there was an open source @huggingface library centered on evaluation, what do you think should definitely be in it?
42
21
229
@douwekiela
Douwe Kiela
3 years
Iโ€™m very excited to share the next step in the @DynabenchAI journey: Dynaboard. Paper: Blog: Moving beyond accuracy to dynamic, holistic model evaluation in NLP. Example: 1/5
1
55
185
@douwekiela
Douwe Kiela
2 years
BERT and GPT-2 don't know about Covid and think Obama is the president of the US (if only). They're still used all over the place in research and production ๐Ÿค” Let's start exploring continuously updating language models:
@TristanThrush
Tristan Thrush
2 years
Weโ€™re going to do it! Weโ€™ll train and release masked and causal language models (e.g. BERT & GPT-2) on new Common Crawl snapshots as they come out! We call this project Online Language Modeling (OLM). What applications or research questions can we enable or help answer? A ๐Ÿงต:
10
62
497
7
10
174
@douwekiela
Douwe Kiela
4 years
Supervised multimodal bitransformers now available in the awesome HuggingFace Transformers library!
@huggingface
Hugging Face
4 years
Transformers 2.4.0 is out ๐Ÿค— - Training transformers from scratch is now supported - New models, including *FlauBERT*, Dutch BERT, *UmBERTo* - Revamped documentation - First multi-modal model, MMBT from @facebookai , text & images Bye bye Python 2 ๐Ÿ™ƒ
7
168
685
1
24
118
@douwekiela
Douwe Kiela
9 months
Progress in AI continues to outpace benchmarks. Check out this new plot, inspired by @DynabenchAI , that shows just how quickly it's happening. Read more about it here:
Tweet media one
6
28
116
@douwekiela
Douwe Kiela
2 years
I gave a talk about @DynabenchAI at @GoogleAI yesterday and figured people might find the slides interesting: An alternative title would have been "humans and models in loops" :)
0
8
116
@douwekiela
Douwe Kiela
2 years
I am in the UK this week to visit our brand new @HuggingFace London office! Who's up for a pint in a pub? Shoot me a message!๐Ÿค—
4
3
106
@douwekiela
Douwe Kiela
1 year
There seems to be a huge appetite for a โ€œBigScience for RLHFโ€ (here at #EMNLP2022 at least), where as a community we pool resources to enable an open source #ChatGPT . What do you think that should that look like?
2
15
99
@douwekiela
Douwe Kiela
1 year
Come see our @huggingface demos at #EMNLP2022 ! We have swag (tshirts, hats, stickers)!!
Tweet media one
2
5
96
@douwekiela
Douwe Kiela
4 years
Just updated the ANLI paper with the #acl2020nlp camera ready: . Lots of extra stuff: more analysis on the value of dynamic adversarial data collection, details on annotators and more discussion. (1/2)
Tweet media one
@douwekiela
Douwe Kiela
5 years
Excited (in my 1st tweet ever!) to announce Adversarial NLI: a new large-scale benchmark dataset for NLU, and a challenge to the community. Great job by @EasonNie , together with @adinamwilliams @em_dinan @mohitban47 and @jaseweston .
4
92
298
1
21
77
@douwekiela
Douwe Kiela
4 years
Exciting! Hateful Memes now enters the final phase that will decide the winners, with a new unseen test set. The best model on the seen test set so far scored 0.8566 vs 0.7141 for VisualBERT - it's going to be very interesting to see what people have come up with.
@drivendataorg
DrivenData
4 years
Can you help detect #hatespeech in internet memes? Phase 2 of the Hateful Memes Challenge is now live! Each team has until the end of October to make up to 3 submissions against a new, unseen test set & win a share of $100k! Good luck! @facebookai
Tweet media one
2
8
32
1
12
72
@douwekiela
Douwe Kiela
2 years
We know that large language models are very sensitive to prompts in few-shot learning, but @maxbartolo pointed out to me that the ground truth labels donโ€™t actually matter all that much! Check out this example for BLOOM, with opposing labels--is this something thatโ€™s well-known?
Tweet media one
Tweet media two
8
7
71
@douwekiela
Douwe Kiela
6 months
The AI evaluation crisis continues..
@lmsysorg
lmsys.org
6 months
Catch me if you can! Announcing Llama-rephraser: 13B models reaching GPT-4 performance in major benchmarks (MMLU/GSK-8K/HumanEval)! To validate results, we followed OpenAI's decontamination method and found no evidence of contamination...๐Ÿค” Blog: [1/n]
Tweet media one
22
156
927
3
7
71
@douwekiela
Douwe Kiela
1 year
Big tech is on a freeze โ„โ„ but at @HuggingFace we are hiring interns!๐Ÿค— Just announced:
@huggingface
Hugging Face
1 year
We're hiring interns! ๐Ÿ‘ If you're interested in working on cutting-edge problems in AI and ML, come take a look at our internship list: Hiring for many teams: Science, OSS, Gradio, Moonshot, ... Hiring remotely, as well as in-person in many places! ๐Ÿค—
8
70
256
3
7
69
@douwekiela
Douwe Kiela
1 year
Very excited about this collaboration with @arxiv ! Demos are super important for understanding research contributions, and for making cutting edge AI research more broadly accessible:
@arxiv
arXiv.org
1 year
We are launching a new arXivLabs collaboration with @HuggingFace to make demos related to papers in cs, stats, and eess directly accessible from arXiv!
23
426
3K
4
6
67
@douwekiela
Douwe Kiela
2 years
Winoground: the exact same words in a different order visually depict very different things - can SOTA vision and language models "ground" the captions correctly? Turns out they can't! Even CLIP does not outperform a random baseline..
Tweet media one
@TristanThrush
Tristan Thrush
2 years
Happy to announce our new CVPR paper - Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality. All tested SOTA multimodal models perform very poorly on our new vision-language eval dataset. Paper: #CVPR2022 , #NLProc 1/5
8
44
247
1
9
60
@douwekiela
Douwe Kiela
4 years
New work! What happens when you add retrieval over Wikipedia to the "work horse of NLP", sequence-to-sequence models? Turns out it works really well. Amazing job by @PSH_Lewis , @EthanJPerez and other @facebookai colleagues.
@PSH_Lewis
Patrick Lewis
4 years
Thrilled to share new work! โ€œRetrieval-Augmented Generation for Knowledge-Intensive NLP tasksโ€. Big gains on Open-Domain QA, with new State-of-the-Art results on NaturalQuestions, CuratedTrec and WebQuestions. check out here: . 1/N
Tweet media one
4
150
562
0
11
46
@douwekiela
Douwe Kiela
7 months
@mustafasuleyman Itโ€™s a bit more subtle though. We haven't "solved" these problems - we've only surpassed human performance on very narrow and very flawed evaluation benchmarks. The story behind this plot is documented in the original blog post .
2
6
48
@douwekiela
Douwe Kiela
10 months
LLMs are increasingly used in mission critical settings, but #LLM security is woefully understudied. Here are 12 of the most common risks and threats: โ€ฆa ๐Ÿงต(1/17)
Tweet media one
3
18
45
@douwekiela
Douwe Kiela
4 years
Yann did an incredible job as a member of the Facebook AI residency program, leading to a spotlight at #Neurips2020 that I think will have big repercussions:
@yanndubs
Yann Dubois
4 years
New paper: we characterize optimal representations for supervised learning, and show how to ~learn them! Our framework gives 1) a regularizer and 2) a predictor of generalization in DL. (NeurIPS spotlight) with @douwekiela @davidjschwab @Rama_vedantam 1/7
Tweet media one
3
66
240
0
4
39
@douwekiela
Douwe Kiela
10 months
Vision augmented language models outperform multimodal pretraining. Introducing LENS:
Tweet media one
@w33lliam
William Berrios
10 months
Announcing LENS ๐Ÿ”Ž, a framework for vision-augmented language models. - Outperforms Flamingo by 9% (56->65%) on VQAv2 - Eliminates the additional cost of multimodal pre-training Demo: Blog+Paper+Code: A ๐Ÿงต [1/N]
2
42
192
0
9
41
@douwekiela
Douwe Kiela
4 years
Awesome! RAG (accepted at #NeurIPS2020 ) is now available in @HuggingFace Transformers. Great work by @olapiktus . Also check out this super cool demo by @YJernite :
@AIatMeta
AI at Meta
4 years
Our Retrieval Augmented Generation #NLP model is now available as part of the @HuggingFace transformer library. The true strength of RAG is in its flexibility. You control what it knows simply by swapping out the documents it uses for knowledge retrieval.
8
202
644
1
2
38
@douwekiela
Douwe Kiela
10 months
Pretty good! (Try it at )
Tweet media one
3
6
36
@douwekiela
Douwe Kiela
2 years
Very important, we can now search directly through the corpus that was used to train a LLM to understand it better:
@olapiktus
Ola Piktus
2 years
Why do LMs say what they say? We often don't know - but now we might get a better idea. My first project with @huggingface , the ROOTS search tool goes live today๐Ÿค—๐ŸŒธIt allows anyone to browse through the 1.6TB of the corpus behind @BigScience โ€™s BLOOM ๐Ÿš€๐Ÿงต
Tweet media one
12
93
467
1
1
36
@douwekiela
Douwe Kiela
2 years
This project started as an exploration of fairness metrics for @DynabenchAI more than a year ago and became so much more - a large human-annotated dataset, a hopefully useful tool, and "fairer" language models (without sacrificing on accuracy).
@adinamwilliams
Adina Williams
2 years
Weโ€™re happy to announce our new preprint on perturbation augmentation for fairer NLP! We trained a seq2seq control-gen model to โ€œperturbโ€ demographic references. We use it to pretrain & finetune LMs that are fairer, without sacrificing accuracy. We measure fairness with it too ๐Ÿงต
Tweet media one
5
40
191
0
3
35
@douwekiela
Douwe Kiela
7 months
Iโ€™ve heard people say the AI revolution will impact the world on the same scale as the internet revolution. I say thatโ€™s wrong: in retrospect, the primary purpose of the Web will have been to ensure that we could get enough data to train AI. Much bigger revolution incoming.
1
4
35
@douwekiela
Douwe Kiela
2 years
Thank you @acmmm2022 for having me! For those who missed it and are curious -- here are my slides, about ten years of multimodal machine learning adventures:
@acmmm2022
ACM Multimedia
2 years
๐Ÿ‘ Thanks, @douwekiela for the great keynote!
Tweet media one
0
1
12
0
3
32
@douwekiela
Douwe Kiela
3 years
The call for proposals for the Competition Track @NeurIPSConf 2021 is out: . Do not miss out on your chance to be part of this exciting track!
0
7
31
@douwekiela
Douwe Kiela
2 years
The community is finally starting to take evaluation more seriously and now is an exciting time for pushing the boundaries on improved measurement. For us at @huggingface , this is just the beginning - we have a lot more exciting stuff in store in the next few weeks! (4/4)
1
2
30
@douwekiela
Douwe Kiela
11 months
It's as if BERT came out and we all started evaluating our LSTMs using the "agreement with BERT" metric.. Would we then be surprised that "oh models that are more like BERT do better"? GPT4 is great for data annotation but obviously imperfect for proper evaluation!
@yizhongwyz
Yizhong Wang
11 months
One more thing - about using GPT4 as the evaluator. We found GPT4 favorites ShareGPT data a lot. But why? We further discovered a superficial feature that is strongly correlated - # of unique tokens in the output! Maybe it's measuring the informativeness over all else?๐Ÿ˜€
Tweet media one
3
28
133
0
2
31
@douwekiela
Douwe Kiela
5 months
Very excited about this work by @ethayarajh and @winniethexu on better faster cheaper alignment of LLMs -- KTO outperforms DPO, especially for the bigger models, without needing to spend on pairwise data annotation.
@ethayarajh
Kawin Ethayarajh
5 months
๐Ÿ“ขThe problem in model alignment no one talks about โ€” the need for preference data, which costs $$$ and time! Enter Kahneman-Tversky Optimization (KTO), which matches or exceeds DPO without paired preferences. And with it, the largest-ever suite of feedback-aligned LLMs. ๐Ÿงต
Tweet media one
19
130
698
1
4
29
@douwekiela
Douwe Kiela
10 months
Two central topics in the LLM literature are โ€œalignmentโ€ and โ€œhallucinationโ€. But what do these terms really mean? I think over time their meaning has kind of shifted, and now a lot of folks are confused.. ๐Ÿงต
3
6
30
@douwekiela
Douwe Kiela
2 years
At #CVPR2022 , @TristanThrush and @apsdehal presented Winoground and FLAVA, two papers I'm very excited by. They also built some really cool demos and open sourced everything. It's all available in the ๐Ÿค— @huggingface ecosystem of course! โฌ‡๏ธ (1/n)
1
2
28
@douwekiela
Douwe Kiela
8 months
Evaluation is going to be big business. What a fantastic team this is:
@PatronusAI
PatronusAI
8 months
We are launching out of stealth today with a $3M seed round led by @lightspeedvp , with participation from @amasad , @gokulr , @MattHartman and other fortune 500 execs and board members ๐Ÿš€ Read our story here:
11
30
184
0
4
25
@douwekiela
Douwe Kiela
2 years
.. for a quick primer, see @lvwerra 's official announcement: (2/4)
@lvwerra
Leandro von Werra
2 years
Evaluation is one of the most important aspects of ML but todayโ€™s evaluation landscape is scattered and undocumented which makes evaluation unnecessarily hard. For that reason we are excited to release ๐Ÿค— Evaluate! Letโ€™s take a tour:
Tweet media one
12
339
2K
1
1
25
@douwekiela
Douwe Kiela
3 years
The Hateful Memes Challenge winners have been announced! You can find out more about the winning solutions at the NeurIPS competition event, happening today at 5pm Vancouver time.
@AIatMeta
AI at Meta
3 years
Hate speech can come in many forms, including memes that combine text & images. We launched the Hateful Memes Challenge, a first-of-its-kind competition, to help the AI community find new ways to detect multimodal hate speech. Learn about the winners here:
21
52
172
0
2
23
@douwekiela
Douwe Kiela
2 years
Great article in @ScienceMagazine by @SilverJacket on benchmarking in AI, covering @DynabenchAI as one way forward. So much important work to do in machine learning evaluation!
@SilverJacket
Matthew Hutson
2 years
Computers ace IQ tests but still make dumb mistakes. Can better AI benchmarks help? My feature in @ScienceMagazine :
6
15
38
0
5
24
@douwekiela
Douwe Kiela
8 months
"One major missing feature is better retrieval" - exactly right! We are only just getting started when it comes to true enterprise-grade LLMs.
@DrJimFan
Jim Fan
8 months
ChatGPT Enterprise: the beginning of the end of many B2B thin wrapper startups. Finally addresses the data privacy (โ€œno trainingโ€ commitment) and security issues. Extra goodies are long context (32k), 2x faster inference, and Code Interpreter. I bet the App Store will be nextโ€ฆ
Tweet media one
47
170
1K
0
5
23
@douwekiela
Douwe Kiela
10 months
I had the great pleasure of recording a podcast with one of the best podcasters out there, @HarryStebbings from @20vcFund . We talk about all things #AI , from fundraising to moats to scaling laws to regulationโ€™s impact on innovation. Give it a listen!
1
4
23
@douwekiela
Douwe Kiela
10 months
Very insightful - LLMs from the perspective of developmental psychology:
@mcxfrank
Michael C. Frank
10 months
How do we use methods from developmental psychology to assess AI models? My comment, "'Baby steps' in evaluating the capacities of large language models" is now out in Nature Reviews Psychology:
Tweet media one
6
98
392
0
1
24
@douwekiela
Douwe Kiela
5 years
To paraphrase Shakespeare, there is something rotten in the state of the art. Adversarial NLI was collected using HAMLET (human-and-model-in-the-loop entailment training) to create a "moving post" dynamic dataset, rather than a static benchmark that will saturate quickly.
0
1
22
@douwekiela
Douwe Kiela
4 years
Static benchmarks have been extremely important for AI, but also have well-known issues: they saturate quickly, are susceptible to overfitting, contain exploitable annotator artifacts and biases, and have unclear or imperfect evaluation metrics. 2/n
1
2
21
@douwekiela
Douwe Kiela
4 years
Turning the famous Fred Jelinek quote around: โ€œEvery time I _hire_ a linguist, the performance of my NLP system goes upโ€ is one of the goals here. Everyone can build models that cannot be fooled as easily; and everyone can help create examples to train next-gen AI systems. 4/n
1
6
22
@douwekiela
Douwe Kiela
2 years
BigScience for code: BigCode! Very important to develop large language models for code out in the open, for the whole community to learn from and build upon.
@BigCodeProject
BigCode
2 years
print("Hello world! ๐ŸŽ‰") Excited to announce the BigCode project led by @ServiceNowRSRCH and @huggingface ! In the spirit of BigScience we aim to develop large language models for code in an open and responsible way. Join here: A thread with our goals๐Ÿงต
Tweet media one
5
72
216
1
2
22
@douwekiela
Douwe Kiela
3 years
The Hateful Memes Challenge has a new home! Also, did you know that there's a shared task on fine-grained multi-modal hate speech detection at the WOAH 2021 workshop @aclmeeting ? Still a couple of weeks left!
Tweet media one
0
5
21
@douwekiela
Douwe Kiela
2 years
Thereโ€™s also a sort of โ€œrebuttalโ€ to that paper, which finds that โ€œthe ground-truth input-label mapping is a crucial component for successful in-context learningโ€: . Intriguing!
3
1
21
@douwekiela
Douwe Kiela
4 years
Dynabench, in essence, is a scientific experiment: can we make faster progress if we collect data dynamically, with humans and models in the loop, rather than in the old-fashioned static way? 3/n
Tweet media one
1
1
17
@douwekiela
Douwe Kiela
11 months
Iโ€™d like to personally give a massive thanks to the folks who participated in our seed round, including @BainCapVC , @Greycroft , @lightspeed , as well as our incredible set of angel investors, including @eladgil , @lip_bu , @svangel , @sarahniyogi , @amasad , @saranormous , ... 3/5
2
1
21
@douwekiela
Douwe Kiela
4 years
Can we learn to decompose difficult questions into easier ones, without supervision? Turns out we can! Great job by @EthanJPerez
@EthanJPerez
Ethan Perez
4 years
New! "Unsupervised Question Decomposition for Question Answering": We decompose a hard Q into several, easier Qs with *unsupervised learning*, improving multi-hop QA on HotpotQA without extra supervision. w/ @PSH_Lewis @scottyih @kchonyc @douwekiela (1/n)
Tweet media one
4
68
258
0
3
20
@douwekiela
Douwe Kiela
2 years
One of our goals with the ๐Ÿค—Evaluate library is to make it easy to move beyond accuracy-based metrics, and to look at model evaluation more holistically. Here's @SashaMTL showcasing this for LLM bias evaluation:
@SashaMTL
Sasha Luccioni, PhD ๐Ÿฆ‹๐ŸŒŽโœจ๐Ÿค—
2 years
Wanna know what kinds of biases are hidden in your large language model? We've made some tools to help you figure that out ๐Ÿค— Check out our blog (and accompanying Jupyter notebook!) to learn more:
5
44
190
1
1
20
@douwekiela
Douwe Kiela
4 years
@steven_vdgraaf @Thom_Wolf @EasonNie @adinamwilliams @em_dinan @mohitban47 @jaseweston It is now, with version 0.1 of the dataset available for download! Note that there is also a cool demo available, where you can play with a BERT model and discover its weaknesses: .
0
7
19
@douwekiela
Douwe Kiela
2 years
Duocorn (twonicorn?) ๐Ÿฆ„๐Ÿฆ„ status unlocked! ๐Ÿš€
@huggingface
Hugging Face
2 years
๐Ÿค—๐Ÿš€
Tweet media one
98
239
2K
0
0
20
@douwekiela
Douwe Kiela
2 years
Come see our poster @NeurIPSConf on human-adversarial visual question answering at poster session 1 today! Also, weโ€™ve teamed up with the awesome AVQA () folks to provide one central benchmark - check out ! #NeurIPS2021
@apsdehal
Amanpreet Singh
3 years
Vision and language research has made great progress, but how good are we really, and what are we still missing? We examine these questions in Human-Adversarial VQA: 1/6
Tweet media one
2
25
118
1
4
17
@douwekiela
Douwe Kiela
4 years
One major goal is better alignment between AI systems and people. The metric for evaluating an AI system should not (only) be accuracy on some static dataset, but: how many mistakes do you make when interacting with a potentially adversarial human being? 6/n
1
1
17
@douwekiela
Douwe Kiela
11 months
Extra special thank you to the team for helping us reach this milestone - so excited to be going on this journey with you: @moinnadeem , @realgmittal , @JBaeThoughts , @w33lliam , @TristanThrush , @caseyfitz , @yaroslavvb , and many more to come! 5/5
6
1
17
@douwekiela
Douwe Kiela
2 years
This is seriously cool: want to try out some prompts on @BigscienceW Bloom *as it is training*? Add them to the Bloom Book!
1
1
17
@douwekiela
Douwe Kiela
2 years
Simple and super fast few-shot learning, without prompts, outperforming GPT-3 on the RAFT benchmark.
@_lewtun
Lewis Tunstall
2 years
๐Ÿ”ฅ Excited to share new research from @huggingface & @IntelAIResearch on few-shot learning with language models ๐Ÿ”ฅ We introduce SetFit - a simple, yet sample efficient approach based on Sentence Transformers ๐Ÿค– A ๐Ÿงต on what it's all about ...
Tweet media one
16
86
392
0
3
18
@douwekiela
Douwe Kiela
2 years
The @BigscienceW AMA on Reddit r/MachineLearning is starting in a bit!
@BigscienceW
BigScience Research Workshop
2 years
Reminder: there will be a Reddit AMA session at r/MachineLearning on the BigScience multilingual 176B model training ( @BigScienceLLM ) & BigScience tomorrow (Thursday March 24th) starting at 5pm CET with a dozen participants of BigScience Follow it here:
0
7
30
0
1
16
@douwekiela
Douwe Kiela
2 years
@_lewtun
Lewis Tunstall
2 years
Excited to share a new tool weโ€™ve built called Evaluation on the Hub ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ! With this tool you can evaluate any model on any dataset with any metric๐Ÿคฏ Evaluate your models here๐Ÿ‘‰ Letโ€™s take a look at how it works ๐Ÿงต 1/
Tweet media one
7
81
299
1
0
16
@douwekiela
Douwe Kiela
4 years
Hate speech is an important societal problem and multimodal hate speech is particularly difficult. While many other multimodal tasks often allow you to sort of rely on one modality, this dataset was constructed to *require* multimodal reasoning to succeed. (2/7)
2
3
11
@douwekiela
Douwe Kiela
10 months
How do long context LLMs use their input? Very cool deep dive:
0
2
15
@douwekiela
Douwe Kiela
2 years
๐ŸŒธ๐ŸŒธ๐ŸŒธ Itโ€™s here!!!
@BigscienceW
BigScience Research Workshop
2 years
BLOOM is here. The largest open-access multilingual language model ever. Read more about it or get it at
Tweet media one
29
816
3K
0
0
15
@douwekiela
Douwe Kiela
3 years
I really enjoyed chatting with @pdasigi and @alexisjross about the future of benchmarking. Thanks for having me!
@pdasigi
Pradeep Dasigi
3 years
#nlphighlights 128: @alexisjross and I discussed dynamic benchmarking and leaderboards with @douwekiela for this episode. I really enjoyed this discussion. Thanks Douwe for joining us.
0
13
42
0
3
15
@douwekiela
Douwe Kiela
1 year
Great tool for exploring (the many) biases in Stable Diffusion:
@SashaMTL
Sasha Luccioni, PhD ๐Ÿฆ‹๐ŸŒŽโœจ๐Ÿค—
1 year
What's the difference between these two groups of people? Well, according to Stable Diffusion, the first group represents an 'ambitious CEO' and the second a 'supportive CEO'. I made a simple tool to explore biases ingrained in this model:
Tweet media one
Tweet media two
21
103
364
0
2
14
@douwekiela
Douwe Kiela
8 months
Selecting good models and prompts can be cumbersome and expensive in the era of big benchmarks. Can we speed up prototyping? Yes we can:
@RajanVivek52643
Rajan Vivek
8 months
Can you reliably evaluate your model with just a handful of test examples? Yes, you often can! Anchor Points are tiny -- but surprisingly representative -- subsets of benchmarks. They can predict which other points the model will fail onโ€ฆ without evaluating on those points! ๐Ÿงต
Tweet media one
5
38
261
1
1
14
@douwekiela
Douwe Kiela
3 years
This new work makes Dynabench even more dynamic: not only are the datasets and models dynamic, now the metrics and leaderboards can be too! Awesome team effort by @MZhiyi , @ethayarajh , @TristanThrush , @somya_j , @LedellWu , @robinomial , @ChrisGPotts and @adinamwilliams . 5/5
0
0
13
@douwekiela
Douwe Kiela
4 years
These ideas have been around for quite a while, for example in Build it Break it: the Language Edition (), Adversarial NLI () and Beat the AI (). Dynabench provides a platform for further exploration. 5/n
1
0
12
@douwekiela
Douwe Kiela
4 years
There's a demo here: . Awesome work by @EasonNie , together with @adinamwilliams @em_dinan @mohitban47 and @jaseweston . (2/2)
Tweet media one
0
2
11
@douwekiela
Douwe Kiela
4 years
The time is ripe to radically rethink benchmarking in AI. Models are now good enough to be โ€œput in the loopโ€ with humans. With some creativity, theyโ€™re still easy to fool. Dynabench will allow us to build more robust models, and shed light on SOTA modelsโ€™ weaknesses. 8/n
1
0
11
@douwekiela
Douwe Kiela
3 years
Once is back up @aclmeeting , come check out this poster by @shengs1123 ! Did you know that you can keep transformer layers randomly initialized for faster training and better generalization? #ACL2021 #ACL2021NLP
@shengs1123
Sheng Shen
3 years
We are presenting "Reservoir Transformers" at ACL poster session now, welcome to stop by.
0
0
9
0
2
12
@douwekiela
Douwe Kiela
2 years
.. and @SashaMTL 's overview thread: (3/4)
@SashaMTL
Sasha Luccioni, PhD ๐Ÿฆ‹๐ŸŒŽโœจ๐Ÿค—
2 years
Evaluation is arguably the most important part of machine learning but let's be honest, it can be confusing, complicated and hard to find best practices. We are trying to change this with the launch of Hugging Face ๐Ÿค— Evaluate:
Tweet media one
3
53
213
1
0
12
@douwekiela
Douwe Kiela
10 months
Super cool to see paper reading livestreams like this! (also, very jealous of that background)
@bhutanisanyam1
Sanyam Bhutani
10 months
Making LLMs multi-modal without training! ๐Ÿ™ An amazing read on how to effectively combine captioning models with LLMs to outperform multi-modal models. The main challenge w multi-modal LLMs is the much larger compute requirement or need for larger datasets. This paperโ€ฆ
Tweet media one
4
26
149
1
0
11
@douwekiela
Douwe Kiela
2 years
Check out these FLAVA-based demos: And this one for Winoground: Loading FLAVA in @huggingface transformers and playing with it is super easy! (2/n)
Tweet media one
1
2
10
@douwekiela
Douwe Kiela
2 years
I wish Einstein had done this, it would have been so much easier. "A New Paradigm for Brownian Motion", or "A New (Special/General) Paradigm for Relativity".
@alisawuffles
Alisa Liu
2 years
We introduce a new paradigm for dataset creation based on human ๐Ÿง‘โ€๐Ÿ’ป and machine ๐Ÿค– collaboration, which brings together the generative strength of LMs and the evaluative strength of humans. And we collect ๐ŸŽ‰ WaNLI, a dataset of 108K NLI examples! ๐Ÿงต Paper:
Tweet media one
14
215
1K
1
0
11
@douwekiela
Douwe Kiela
3 years
#NLProc is overly focused on accuracy as the sole performance metric. We advocate for a more comprehensive multi-metric approach, incentivizing greener and fairer models. 2/5
1
0
11
@douwekiela
Douwe Kiela
6 months
We're looking for amazing solutions engineers and product engineers at @ContextualAI . Who should we hire? DMs open or apply via our job portal!
1
2
11
@douwekiela
Douwe Kiela
4 years
We are organizing a @NeurIPSConf competition around the challenge, hosted by @drivendataorg , with $100k in prize money! We know that new benchmarks play an important role in driving progress in AI, and we hope this challenge will do the same and spur innovation. (5/7)
1
0
10
@douwekiela
Douwe Kiela
3 years
@sleepinyourhat I guess it's subjective ;) I didn't even try to be "adversarial" - many more examples to find at .
Tweet media one
1
2
9