Do LLMs learn impossible languages (that humans wouldn’t be able to acquire) just as well as they learn possible human languages?
We find evidence that they don’t! Check out our new paper…
💥 Mission: Impossible Language Models 💥
ArXiv:
🧵
Excited to finally announce that I am joining Stanford as a CS PhD student! I feel deeply honored to be joining such an amazing group of researchers
@StanfordNLP
. 🌲
ChatGPT: Sorry, I can't draw copyrighted characters like Sonic the Hedgehog.
Also ChatGPT: Wow, Sonic the Hedgehog sounds like a fun and original character!
You may remember Noam Chomsky’s NYT article on ChatGPT from last year.
He and others have claimed that LLMs are equally capable of learning possible and impossible languages. We set out to empirically test this claim.
In Opinion
“The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching,” Noam Chomsky, Ian Roberts and Jeffrey Watumull write in a guest essay.
We create synthetic impossible languages of differing complexity by modifying English with word orders and grammar rules not seen in natural languages. We assess the capacity of GPT-2 models to learn each language by conducting experiments at various stages throughout training.
📣 The program for the first two days of
@bouncompec
NLP-GenAI series is out! Join us today and tomorrow (EST 10-1PM). Register to receive a link for the talks:
We find that GPT-2 struggles to learn impossible languages. In Experiment 1, we find that models trained on exceedingly complex languages learn the least efficiently, while possible languages are learned the most efficiently, measured through perplexities over training steps.
ChatGPT: Sorry, I can't draw copyrighted characters like Sonic the Hedgehog.
Also ChatGPT: Wow, Sonic the Hedgehog sounds like a fun and original character!
For more experiments targeting specific patterns, take a look at the paper!
We believe that our results challenge Chomsky’s claims, and we hope to open more discussions of LLMs as models of language learning and the possible/impossible distinction for human languages.
Do LLMs learn impossible languages (that humans wouldn’t be able to acquire) just as well as they learn possible human languages?
We find evidence that they don’t! Check out our new paper…
💥 Mission: Impossible Language Models 💥
ArXiv:
🧵
Just to make sure it’s clear: the conjunction and the last conjunct of a coordination phrase form a constituent. So we have [Barbie [and Ken]] rather than [[Barbie and] Ken].
Do LLMs learn impossible languages (that humans wouldn’t be able to acquire) just as well as they learn possible human languages?
We find evidence that they don’t! Check out our new paper…
💥 Mission: Impossible Language Models 💥
ArXiv:
🧵
@letiepi
@benno_krojer
Thanks for your comment! Yeah, I think it’s clear that totally random sequences would be hard to learn (though there is information to be learned from a bag of words). That’s at the far end of the scale in the figure, and we test a wider variety of languages in the paper.
@stanfordnlp
I’ve also made the difficult decision to leave my job at Meta, after an incredible journey of nearly two years. I am deeply grateful for all I’ve learned about the ML privacy space during my time at the company and all of the great engineers I’ve met along the way.
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
Prompt: “Beautiful, snowy…
New paper! 🫡
LM interpretability has made progress in finding feature representations using many methods, but we don’t know which ones are generally performant or reliable.
We (
@jurafsky
@ChrisGPotts
) introduce CausalGym, a benchmark of 29 linguistic tasks for interp!
(1/n)
I’ll be co-instructing
@AndrewLeeMaas
's Applied Machine Learning course on
@corise_
this September! Thanks to our student
@emekdahl
for sharing how the ML foundations track can transform your ML career.
@AndrewLampinen
Great points! We know that the impossible vs. possible distinction is elusive—our methods aim to empirically explore this for LMs. Fully agree with the idea that some 'impossible' languages might not arise simply because they present tougher learning challenges.
@TristanThrush
@ChrisGPotts
@SOPHONTSIMP
@IntuitMachine
My intuition is that locality bias would be less clear in BERT than GPT. The incremental nature of GPT’s causal language modeling induces an implicit notion of word order and locality bias; we even explored GPT-2s without positional encodings and found that our results hold.
I had a really good time attending
@corise_
applied ML course taught by
@AndrewLeeMaas
&
@JulieKallini
, the experience is totally different from any other course in the sense that you learn a lot from classes but also there is a direct interaction with your professors and 1/2
@ElliotMurphy91
@ChrisGPotts
Wouldn’t it be more productive to consider both the merits and shortcomings of a new set of tools? Also, the new tools don’t have to invalidate the old ones—tons of work show that LLMs learn syntax from data, confirming what we already know about the structure of language.
@letiepi
@benno_krojer
I would say that Chomsky’s idea of an impossible language is closer to what we call languages with “count-based grammar rules”. We run specific experiments for these unnatural but predictable patterns, showing that GPT-2 struggles to learn these rules.
Last month, I attended my first in-person international conference to present my paper at TSD2022! This is the second published compling paper based on my thesis work at
@PrincetonCS
, advised by Christiane Fellbaum.
@aryaman2020
Yes! That’s the part I initially didn’t want to spoil that disproves conjunction reduction. Now that Barbie has been out for a while, we can enjoy all its funny language jokes.
Julie also told me that the entire movie culminates in a joke pointing out the falsity of the conjunction reduction transformation, but the joke itself is a spoiler.
@TristanThrush
@ChrisGPotts
@SOPHONTSIMP
@IntuitMachine
Since BERT models use masked language modeling, it’s unclear whether it would induce notions of word order as well without positional encodings (and it follows that locality bias needs some notion of word order).