Ever wondered how LLMs stack up against human crowdsource workers? I'm thrilled to share "TurkingBench", a benchmark of web-based tasks for multi-modal and interactive AI agents.
Draft:
Project:
Code:
Life update: Thrilled to announce that I will join Johns Hopkins University
@jhuclsp
@jhucompsci
@JohnsHopkins
as an assistant professor of computer science in the fall! This is an honor of a lifetime and I will do my best to rise to the occasion.
For my first course at
@jhuclsp
, I am leading a class on recent developments in "self-supervised models." Here is the list of the papers and slides we cover: Would love to hear Twitter's suggestions for additional exciting developments to discuss!🤗
Since prompting, instruction tuning, RLHF, ChatGPT etc are such new and fast-moving topics, I haven't seen many university course lectures covering this content.
So we made some new slides for this year's CS224n: NLP w/ Deep Learning course at
@Stanford
!
Today we are releasing GENIE🧞, a human-in-loop leaderboard for the evaluation of text generation tasks! We view this as a step forward towards streamlining human evaluation and making it more accessible.
#NLP
It is concerning that an increasing number of research papers base the core of their studies/findings on the new GPT3 models (especially 'davinci-002'), which we know little about training/tuning. How can we do scientific research on these murky foundations?
Self-supervised models are a must-know for CS undergrads entering the job market. This semester I taught my first undergrad/MS course on these models, exploring their impact.The course content (slides/assignments) is online for those interested:
Congrats to whoever graduating this year!! 👏
Please take a few minutes to read and share my piece for
@dailypenn
on why "I am mourning at graduation", thanks to politics:
@UndoFamilyBan
#travelban
Overheard someone say GPT-4 is "the end of NLP and CV". That is as absurd as suggesting that iPhone's first release in 2007 marked the end of phone technology. This is not "an end" but rather the beginning of a new era of technological advancements and applications.
Excited that our big collaborative effort, "ParsiNLU: A Suite of Language Understanding Challenges for Persian" will appear in TACL'21!
If you're working on multilingual/cross-lingual NLP, give it a look!
Paper:
📢 GooAQ 🥑: 3 million questions/answers, with a variety of answer types!
Draft:
Data:
🚨Spoiler alert:🚨 we observe that short- vs long-answer questions behave differently!
Excited to highlight our work, "Cross-Task Generalization via Language Instructions"
TLDR; Language instructions improve generalization to "unseen" tasks. The gains increase w/ more observed tasks.
Joint w/
@Swarooprm7
@cbaral
@HannaHajishirzi
Excited about [re]joining Allen AI
@allen_ai
!
Over the past few years, AI2 has been at the forefront of key developments in AI/NLP & it's an honor to be part of this vibrant community.
Hello NLPverse, Want to add a little theoretical spice 🌶️ to your NLP reading list?
Check out our theoretical study of multi-step reasoning in the context of language problems; it draws ideas from random graphs & probability theory. 🔥🔥
#NLProc
CALL FOR CONTRIBUTIONS: We are soliciting contributions of tasks to a collaborative benchmark of tasks and their natural language instructions/definition.
🚩 Blog:
🤖 Github repo:
#NLProc
#ArtificialIntelligence
🥳 New dataset release! 🥳
ARC-DA dataset, a direct-answer (“open response”, “freeform”) QA dataset for elementary-school science domain.
Paper:
Dataset:
Joint work w/ Aristo team at
@allen_ai
.
Proposals by CS faculty
@chienming_huang
,
@DanielKhashabi
, and
@ben_vandurme
have been selected by the Office of the Provost to receive 2023 DELTA Awards. Learn more about how they plan to leverage the power of
#AI
in educational settings:
While there are many interesting aspects to the recent "prompting" literature, the fact so much research/energy is spent on effective ways to "engineer" them is indicative of models' brittle comprehension -- hence, not so great news.
#NLProc
post:
Check out our recent-ish work on counter-factual data augmentation (to appear in EMNLP):
Natural Perturbation for Robust Question Answering
In collaboration w/ Tushar Khot and Ashish Sabharwal.
Representing 54 universities in 14 countries, the
#AmazonResearchAwards
recipients will have access to more than 300 Amazon public datasets, along with AWS AI/ML services and tools. Congrats to the fall 2022 awardees!
Thrilled by JHU's ongoing commitment to AI. Notably, Hopkins plans to recruit a substantial number of faculty members in the coming few years. Come join us!
📢📢
#NLProc
post 📢📢
Ever wondered how NLP models view different countries/nationalities? 🤔
Check out this demo of our recent work (to appear in EMNLP-Findings):
Join work w/
@tao__li
@tusharkhot
A. Sabharwal
@viveksrikumar
We present UnQover, a framework to evaluate stereotyping biases in QA models. Tricky to do it since they are often covered up by reasoning errors.
Paper:
Code:
And a beautiful demo:
Drago was a kind and enthusiastic person and his passing is a great loss to the community. During my PhD, he invited me to his lab and introduced me to his students, creating opportunities for exchanges/collaboration. It's the little things like this that have an enormous impact!
I was deeply saddened to learn of the passing of Prof. Drago Radev. Anyone who interacted with Drago knew he was THE KINDEST PERSON IN THE ENTIRE
#NLProc
Community. 🕯️🙏 1/N
A work led by
@jeff_cheng_77
shows that LLMs' knowledge tends to be stale compared to the claimed pre-training cutoff date.
As the figure below shows, the effective cutoff of LLMs can be months or even years (!!) earlier than the date claimed by their designers! 🤯🤯
For my first course at
@jhuclsp
, I am leading a class on recent developments in "self-supervised models." Here is the list of the papers and slides we cover: Would love to hear Twitter's suggestions for additional exciting developments to discuss!🤗
Ever wondered about scaling up NLP technologies to address issues that don't have a simple/single answer?
Take a look at our recent work on discovering "diverse perspectives" about controversial issues (accepted to NAACL'19)!!🌈🏳️🌈
#NLProc
@NAACLHLT
@naacl
It's been over two years since we put out GENIE! Since then, we have done around ~85 rounds of human evaluation. Today we are releasing all the human annotations for GENIE to benefit the broader research community.
DATA:
Today we are releasing GENIE🧞, a human-in-loop leaderboard for the evaluation of text generation tasks! We view this as a step forward towards streamlining human evaluation and making it more accessible.
#NLP
How can we make LLMs robust to noise in the training data? 🤔
We propose "error norm truncation", a modified training objective that suppresses noisy data, improves model accuracy, and speeds up the convergence!
Paper:
(1/5) The standard MLE objective is notoriously vulnerable to noise! How can we make LLMs robust to noise in the training data? 🤔
We propose Error Norm Truncation (ENT), a modified training objective that ignores noisy tokens in the training corpus.
📰:
My summary of the major highlights of "natural language understanding," over the past 60 years. Items are color-coded based on their contribution-type. CPU/GPU speed on the side, to provide perspectives about the role of the computational resources.
From:
Venugopal et al. 2011 () is a pioneering paper on "watermarking" generative models that was two decades ahead of its time! Most recent papers on text/data watermarking use techniques that are quite similar to this old-ish work, alas they don't mention it.🤦
Many are excited about continued gains with over-parameterized models. I was recently surprised (and delighted) to learn about pioneering works from **~20 years ago** that show the benefits of the increased parameter count. Here are two that caught my eye:
Contemporary language models encode all sorts of stereotypes expressed in the data used for their training.
There is no algorithmic way to list all learned stereotypes and there is no effective way to fix these.
New technology has to be developed with this in mind.
Excited that our big collaborative effort, "ParsiNLU: A Suite of Language Understanding Challenges for Persian" will appear in TACL'21!
If you're working on multilingual/cross-lingual NLP, give it a look!
Paper:
This is basically an empirical take on
@brianchristian
's recent book: "The Alignment Problem: How Can Machines Learn Human Values?"
So far, unfortunately, the answer is a "no".
Can we intervene in a model’s behavior by natural languages? Check our
#ACL2021
Findings “Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?” (). w/
@DanielKhashabi
, Tushar Khot, Ashish Sabharwal, and
@kaiwei_chang
. 1/n
Mined a several thousands of search queries related to coronavirus/covid-19. Here is the data:
Need your help here: how can we use these queries to addresses a real challenge we now face?
Looking for effective prompts without breaking the bank?💰
(1) Prompts with flatter loss [surrogate] minima generalize better.
(2) LLMs' flatness can be efficiently approximated via a surrogate function (little/no labeled data).
Paper:
Check out our new
#emnlp2019
paper where we studied temporal commonsense: . We collected a QA dataset MC-TACO🌮(leaderboard coming soon) and showed that it's a new challenge to existing systems. Co-author with
@DanielKhashabi
, Qiang Ning and Dan Roth.
Absolutely amazed by the massive amount of contributions that we have received from the community! 🥳🥳 One more week to go (mid October), if you'd like to join the effort!
#NLProc
CALL FOR CONTRIBUTIONS: We are soliciting contributions of tasks to a collaborative benchmark of tasks and their natural language instructions/definition.
🚩 Blog:
🤖 Github repo:
#NLProc
#ArtificialIntelligence
Feel opinionated about certain a topic and want to see how other people think about it? Try our demo and check if it helps you see the alternative "perspectives."
Paper:
Video:
Demo:
@cogcomp
@ccb
Check out this excellent work by
@kel_lu
and many great collaborators
@allen_ai
&
@uwnlp
!
Spoiler alert: temporal adaptation (further pre-training) is nowhere enough to solve the temporal drift of pre-trained language-models on downstream tasks.
In our new paper, we investigate how temporal misalignment, when a model is trained on data from one time period but tested or deployed on data from another, affects NLP models across a variety of tasks and domains. (1/n)
If you're at
#EMNLP
, make sure to stop by
@BenZhou96
's poster to hear about his work on Zero-Shot + Open entity typing
Grand Hall, 09:00 – 10:30.
#NLProc
In the '70s, many correlated a computer's size with its computational strength. Obviously, that is no longer the case, as we each carry powerful computers in our pockets. Will our attitude towards "large self-supervised models" evolve similarly? Only time will tell!
To be clear, I am not against using GPT-3; I am against *only* studying GPT-3 and hence, ignoring the generality of findings on other models for which we have more clarity.
"UPenn's Department of Philosophy will not require Ph.D. program applicants to submit GRE scores this year."
I look forward to hearing similar changes in other schools and departments, to facilitate the admissions process for those who can't afford it.
@umphilosophy
@weisbergm
@zehavoc
@seb_ruder
@DeepIndaba
@_aylien
Ya, the figure is misleading; just to add to your point:
Early 2000s: Introduction of FrameNet.
Early 2000s: CoNLL shared tasks which helped significant progress (e.g. in NER).
2001: CRFs
2002: BLEU score, let MT systems scale up.
2002: Early PropBank
~2002: Topic Models
Blanket sanctions hurt ordinary people (blocking Iran's access to life-saving medicines, passenger planes, etc) - if you're rejoicing sanctions, remember that you are depriving +80M people of normal life.
#nosanctionnowar
It's okay if in short term these analyses inform our understanding of models' weaknesses/strengths, though I am more excited about a future where models are robust/competent against a *variety* of natural lang commands/instructions (and hopefully less or no "prompt engineering").
Congrats to whoever graduating this year!! 👏
Please take a few minutes to read and share my piece for
@dailypenn
on why "I am mourning at graduation", thanks to politics:
@UndoFamilyBan
#travelban
While my title may change, I know I will remain a lifelong student. I am looking forward to learning and growing alongside the many young bright minds I will meet at JHU.
Imagine reading a paper with lots of cool findings based on an obscure model X (rather than GPT3). Would you buy them as general findings? Ever wonder if their findings might be specific to their model?
As a
@Penn
alum and an academic myself, I am disheartened to hear about this decision. This isn't about choosing sides between Palestinians or Israelis. Instead, it's about the fundamental role universities should fulfill within our society.
BREAKING: Penn Students Against the Occupation of Palestine’s status as a registered student group has been revoked "effective immediately," a University spokesperson told The Daily Pennsylvanian.
"This has significant business consequences, What I can say — with complete confidence — is you are going to see a whole new generation of products, some from start-ups, some from the big companies." Oren Etzioni
@etzioni
comments on a major AI milestone, at
@allen_ai
!
.
@allen_ai
's Aristo aces 8th and even 12-Grade Science Tests what do these results tell us about NLP? About reasoning? thought-provoking article by
@cademetz
I look forward to extensive collaborations at
@jhuclsp
@jhucompsci
and I am excited to play my part in helping usher in the future of equitable, transparent, and reliable AI. The field has come a long way, and the best is yet to come!
Excited to try my brand new
@google
-home, tried asking it to play a
@chaartaar
music: "Ok Google! Play a Chartaar song"; but it kept confusing it for "charter", "chart a", "chart".
(but wait, someone said
#ArtificialIntelligence
is taking over the world?)
Westerners awarding a poorly-produced movie that aligns well with their pre-existing, incomplete perception about a different culture/society ... yikes! We're stuck in our loop of biases and stereotypes.
We were lucky to have Anjalie Field (
@anjalie_f
) in our class to tell us about "Social Applications of Pre-trained Language Models". For those interested, here is the recording:
For my first course at
@jhuclsp
, I am leading a class on recent developments in "self-supervised models." Here is the list of the papers and slides we cover: Would love to hear Twitter's suggestions for additional exciting developments to discuss!🤗
NIAC Urges Universities to Extend Admission Deadlines for Iranian Students amidst Internet Shutdown - thank you to everyone who flagged this and is already working with administrators
To the leaders of Iran - DO NOT KILL YOUR PROTESTERS. Thousands have already been killed or imprisoned by you, and the World is watching. More importantly, the USA is watching. Turn your internet back on and let reporters roam free! Stop the killing of your great Iranian people!
... while short answer questions benefit heavily from more labeled data, long answer questions are mostly driven by pre-training of the models.
Joint work w/ Amos Ng
@tusharkhot
Ashish Sabharwal
@HannaHajishirzi
@ccb
Folks in NLP/Data-mining interested in information pollution/distortion/manipulation in social media:
Here is a nice case study for you, happening *now* in a massive scale.
Nice work and visualization by
@geoffgolberg
#NLProc
#datamining
#DataScience
Lawrence, Giles, and Tsoi (1997) show that models with larger hidden units can lead to consistently better generalization for the face recognition task. In particular, their best generalizing network (shown in fig below) had 364x more parameters (18k params) than training data.
"In the early twenty-first century, the average human is far more likely to die from bingeing at McDonald's than from drought, Ebola or an al-Qaeda attack."
(Homo Deus, A Brief History of Tomorrow; Yuval Harrari)
"The continued decline in international student enrollment since the fall of 2016 has cost the US economy $11.8 billion and more than 65,000 jobs, according to estimates from NAFSA (Association of International Educators)"