This is a super cool resource: Papers With Code now includes 950+ ML tasks, 500+ evaluation tables (including SOTA results) and 8500+ papers with code. Probably the largest collection of NLP tasks I've seen including 140+ tasks and 100 datasets.
My PhD thesis Neural Transfer Learning for Natural Language Processing is now online. It includes a general review of transfer learning in NLP as well as new material that I hope will be useful to some.
Why You Should Do NLP Beyond English
7000+ languages are spoken around the world but NLP research has mostly focused on English. In this post, I give an overview of why you should work on languages other than English.
10 Exciting Ideas of 2018 in NLP: A collection of 10 ideas that I found exciting and impactful this year—and that we'll likely see more of in the future.
Do you often find it cumbersome to track down the best datasets or the state-of-the-art for a particular task in NLP? I've created a resource (a GitHub repo) to make this easier.
I'm excited to share some personal news: I've successfully defended my dissertation "Neural Transfer Learning for Natural Language Processing". I'm grateful for my time at
@_aylien
and
@insight_centre
and for everyone I got to meet on this journey, both online and offline.
"What are the 3 biggest open problems in NLP?"
We had asked experts a few simple but big questions for the NLP session at the
@DeepIndaba
. We're now happy to share the full responses from Yoshua Bengio,
@redpony
,
@RichardSocher
and many others
I've decided to leave Google DeepMind to pursue a new adventure.
I feel incredibly lucky to have had the chance to work with and learn from so many amazing colleagues and mentors over the last 4 1/2 years.
I'm grateful & excited for what's next!
ML and NLP Research Highlights of 2020
It's been inspiring to look back on all the exciting advances that happened despite such a tumultuous year. Here's a selection of my highlights.
10 Things You Need to Know About BERT and the Transformer Architecture That Are Reshaping the AI Landscape
This super comprehensive post covers most things that are important in current NLP including BERT, transfer and avocado chairs 🥑
by
@cathalhoran
NLP Year in Review — 2019
An extensive list of interesting publications, creative and societal applications, tools and datasets, articles, and resources of 2019 by
@omarsar0
.
10 Tips for Research and a PhD
I've been asked in the past to provide advice on doing research. Here are 10 tips that worked well for me and will hopefully also be useful to others.
Recent Advances in Language Model Fine-tuning
New blog post that takes a closer look at fine-tuning, the most common way large pre-trained language models are used in practice.
I'm excited to announce that I've joined
@cohere
to help make LLMs more multilingual!
It’s crazy how the capabilities of NLP models have evolved over the last years. I’m thrilled to work with a team full of smart, dedicated and kind individuals to push the boundaries of LLMs.
New blog post: The State of Transfer Learning in NLP
A review of key insights and takeaways from our NAACL 2019 tutorial with updates based on recent work.
This is a nice diagram by Zhengyan Zhang and
@BakserWang
that shows how many recent pretrained language models are connected. The GitHub repo contains a full list of relevant papers:
The Transformer encoder visualized
A nice visualization and tutorial of the Transformer encoder layers by
@UlfMertens
. It incorporates the batch dimension, resulting in 3D tensors.
This is a *really* extensive repo containing ~380 BERT-related papers sorted into downstream tasks, modifications, probes, multilingual models, and more. Nice job,
@stomohide
!
Some professional news: The previous week was my last week at DeepMind. DeepMind is an amazing place to do impactful, long-term research and I’m grateful to have had the chance to work alongside so many kind, smart, and inspiring people.
Multi-Task Learning with Deep Neural Networks: A Survey
I learned a lot reading this comprehensive overview by
@CrichaelMawshaw
. It categorizes recent work into architecture design, optimization methods, and task relationship learning.
Papers with Code now has badges to put on your GitHub repo that indicate that your model is state-of-the-art. 🏅 This seems like a great way to incentivize open-sourcing code! I hope we'll see a lot more badges to highlight useful implementations. 🥇🥈🥉
🎉 New feature: State-of-the-art GitHub badges. Submit evaluation results from your paper to obtain a badge for the official GitHub repository. A new way to highlight your paper's performance!
Our RemBERT model (ICLR 2021) is finally open-source and available in 🤗 Transformers.
RemBERT is a large multilingual Transformer that outperforms XLM-R (and mT5 with similar # of params) in zero-shot transfer.
Docs:
Paper:
If you want to learn about privacy-preserving machine learning, then there is no better resource than this step-by-step notebook tutorial by
@iamtrask
.
From the basics of private deep learning to building secure ML classifiers using PyTorch & PySyft.
It's been a while..
Here's a new edition of NLP News containing an ML and NLP starter toolkit, a Low-resource NLP toolkit, and discussions of "Can an LM ever understand natural language?" and the next generation of NLP benchmarks.
(via
@revue
)
New blog post: A Review of the Recent History of Natural Language Processing. The 8 biggest milestones in the last ~15 years of
#NLProc
. From our NLP session at
@DeepIndaba
.
@_aylien
The multilingual BERT model is out now (earlier than anticipated). It covers 102 languages and features an extensive README motivating certain preprocessing and modelling choices.
Transfer learning is increasingly going multilingual with language-specific BERT models:
- 🇩🇪 German BERT
- 🇫🇷 CamemBERT , FlauBERT
- 🇮🇹 AlBERTo
- 🇳🇱 RobBERT
If you're doing anything with NLP, this is a great place to start! A PyTorch library of state-of-the-art pretrained Transformer language models featuring BERT, GPT-2, XLNet, and more.
The New Era of NLP (SciPy 2019 Keynote): This is a great presentation by
@math_rachel
that focuses on transfer learning and discusses one of the most important problems of our times, disinformation and information glut
Code and pretrained weights for BERT are out now.
Includes scripts to reproduce results. BERT-Base can be fine-tuned on a standard GPU; for BERT-Large, a Cloud TPU is required (as max batch size for 12-16 GB is too small).
New NLP News: BERT, Transfer learning for dialogue, Deep Learning SOTA 2019, Gaussian Processes, VI, NLP lesson curricula, lessons, AlphaStar, How to manage research teams, and lots more via
@revue
It's great to see the growing landscape of NLP transfer learning libraries:
- pytorch-transformers by
@huggingface
:
- spacy-pytorch-transformers by
@explosion_ai
:
- FARM by
@deepset_ai
This is a super intuitive (and well illustrated) guide to state-of-the-art Transfer Learning methods in NLP. From the author of the superb Illustrated Transformer post.
Pretrained language models are not only applicable to natural language but also to other domains where sequences have an underlying structure, such as genomics. We can get better performance with more meaningful token representations (e.g. using k-mers instead of nucleotides).
A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
This survey is a great starting point for learning about low-resource NLP, common methods, and open challenges.
Work by
@jannikstroetgen
@MicHedderich
@dklakow
In our new survey “Modular Deep Learning”, we provide a unified taxonomy of the building blocks of modular neural nets and connect disparate threads of research.
📄
📢
🌐
w/
@PfeiffJo
@licwu
@PontiEdoardo
New blog post: Unsupervised cross-lingual representation learning
An overview of learning cross-lingual representations without supervision, from the word level to deep multilingual models. Based on our ACL 2019 tutorial.
@_aylien
@insight_centre
Next week, I'll start as a research scientist at
@DeepMindAI
in London where I'll be working on models for general linguistic intelligence. I'm thrilled about what lies ahead and looking forward to keep being part of this amazing community.
Here are the slides of my talk on Transfer learning with language models at the Belgium NLP meetup last week. I tried to distill our current understanding of what LMs capture.
Command R+ (⌘ R+) is our most capable model (with open weights!) yet! I’m particularly excited about its multilingual capabilities. It should do pretty well in 10 languages (English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese).
You can…
Today, we’re introducing Command R+: a state-of-the-art RAG-optimized LLM designed to tackle enterprise-grade workloads and speak the languages of global business.
Our R-series model family is now available on Microsoft Azure, and coming soon to additional cloud providers.
From today, I’ll be at Google Research where I’ll be working on NLP for under-represented languages, with a particular focus on languages in Sub-Saharan Africa. I’m looking forward to helping make NLP more accessible together with colleagues at Google.
1/ Our paper Episodic Memory in Lifelong Language Learning with Cyprien de Masson d'Autume,
@ikekong
, and
@DaniYogatama
was accepted to
@NeurIPSConf
. We go beyond MTL and tackle lifelong learning where models need to acquire new information continually:
My new blog post takes a look at the state of multilingual AI.
🌍 How multilingual are current models in NLP, vision, and speech?
🏛 What are the recent contributions in this area?
⛰ What challenges remain and how we can we address them?
Thoughts on the 2024 AI Job Market
Some thoughts on AI research jobs in 2024, how the nature of research has changed in the era of LLMs, and why I joined
@cohere
.
Natural Questions: A new QA dataset consisting of 300,000+ naturally occurring questions (posed to Google search) with human provided long & short answers based on Wikipedia. Looks like an exciting new benchmark!
Paper:
Competition:
Are you interested in data-to-text generation (generating text based on structured data, e.g. tables or graphs)?
@rvaaau
has added a nice overview of standard datasets and recent models to NLP Progress. 👏
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding:
SOTA on 11 tasks. Main additions:
- Bidirectional LM pretraining w/ masking
- Next-sentence prediction aux task
- Bigger, more data
It seems LM pretraining is here to stay.
New paper with
@mattthemathman
&
@nlpnoah
on adapting pretrained representations: We compare feature extraction & fine-tuning with ELMo and BERT and try to give several guidelines for adapting pretrained representations in practice.
This is a super useful paper that we need more of: Better ImageNet models are not necessarily better feature extractors (ResNet is best); but for fine-tuning, ImageNet performance is strongly correlated with downstream performance.
Transfer learning with language models is getting hot! 🔥New state-of-the-art results today by two different research groups: Trinh and Le (Google) on the Winograd challenge and Radford et al. (OpenAI) on a diverse range of tasks.
A Primer on Pretrained Multilingual Language Models
This survey is a great starting point to learn about anything related to state-of-the-art multilingual models in NLP.
New NLP News: NLP Progress, Restrospectives and look ahead, New NLP courses, Independent research initiatives, Interviews, Lots of resources (via
@revue
)
Microsoft reports that they've achieved human parity on Chinese-to-English translation (27.40 BLEU; 1 BLEU better than best result of WMT 2017). Model is a Transformer (NIPS 2017) + Dual Learning (NIPS 2016) + Deliberation Nets (NIPS 2017).
A new bigger, better language model by
@OpenAI
:
- Scaled-up version of their Transformer (10x params)
- Trained on 10x more curated data (40 GB of Reddit out links w/ >2 karma)
- SOTA on many LM-like tasks
- Discuss potential for malicious use
Curriculum for Reinforcement Learning
"Learning is probably the best superpower we humans have."
@lilianweng
explores four types of curricula that have been used to help RL models learn to solve complicated tasks.
NLP News: ICLR 2021 Outstanding Papers, Char Wars, Speech-first NLP, Virtual conference ideas
Featuring a round-up of
@iclr_conf
best papers, ideas for fun things to do at virtual conferences, Star Wars references 🛸, and more...
Super interesting tutorial on visualization for ML at
#NeurIPS2018
w/ case study on multilingual embedding visualization (at 1:40:29). First evidence I've seen that a multilingual NMT system brings languages together rather than separating them.
The new study by
@colinraffel
et al. provides a great overview of best practices in the current transfer learning landscape in NLP. Check out page 33 of the paper or below for the main takeaways.
I really like the new Methods section in
@paperswithcode
to find applications and similar methods.
For language models in NLP, you can see at a glance the most common LMs and explore the papers that employ them.
ACL 2022 Highlights ☘️
My highlights of
#acl2022nlp
including language diversity and multimodality, prompting, the next big ideas, and my favorite papers.
Besides the obvious things (ELMo, BERT, etc.), is there anything that we should definitely discuss at the NAACL "Transfer Learning in NLP" tutorial? Anything that is under-appreciated in transfer learning?
Currently working on the coming NAACL "Transfer Learning in NLP" tutorial with
@seb_ruder
@mattthemathman
and
@swabhz
. Pretty excited!
And I've discovered you can write a Transformer model like GPT-2 in less than 40 lines of code now!
40 lines of code & 40 GB of data...
Tutorial on Unsupervised Deep Learning at
#NeurIPS2018
.
NLP part starts at 1:16:00. Still sizable gap between unsupervised vs. supervised pretraining in CV. Lots of progress in NLP, but not entirely satisfactory. A general principle is still missing.
If you're interested in interpretability and better understanding
#NLProc
models 🔎, read this excellent TACL '19 survey by
@boknilev
. Clearly covers important research areas.
Paper:
Appendix (categorizing all methods):
Challenges and Opportunities in NLP Benchmarking
Recent NLP models have outpaced the benchmarks to test for them. I provide an overview of challenges and opportunities in this blog post.
I'm really excited about our new paper with
@PfeiffJo
,
@licwu
& IGurevych.
We propose MAD-X, a new adapter-based framework to adapt multilingual models to low-resource languages and languages that were not covered in their training data.
New NLP News: ML on code, Understanding RNNs, Deep Latent Variable Models, Writing Code for NLP Research, Quo vadis, NLP?, Democratizing AI, ML Cheatsheets, Spinning Up in Deep RL, Papers with Code, Unsupervised MT, Multilingual BERT via
@revue
It's amazing how fast
#NLProc
is moving these days.
We have now reached super-human performance on SWAG, a commonsense task that will only be introduced at
@emnlp2018
in November!
We need even more challenging tasks!
BERT:
SWAG:
NLP News—Reviewing, Taking stock, Theme papers, Poisoning and stealing models, Multimodal generation
This newsletter took a bit longer. Going forward, I'll try to cover some themes more in-depth. (via
@revue
)
Are you interested in summarization?
@tbsflk
compiled the results on the most common datasets (CNN/DailyMail, Gigaword, DUC04 Task 1) from 2015-2018. 👏🏻
🚀 Excited to present a tutorial on "Modular and Parameter-Efficient Fine-Tuning for NLP Models" at
#EMNLP2022
with
@PfeiffJo
&
@licwu
.
We'll give an overview of common methods, benefits and usage scenarios, and how to adapt pre-trained LMs to real-world low-resource settings.
My AAAI 2019 Highlights—including dialogue, reproducibility, question answering, the Oxford style debate, invited talks, and a diverse set of research papers