Tom McCoy @RTomMcCoy Twitter profile | Pikagi

Pikagi

Tom McCoy

@RTomMcCoy

3,101

Followers

491

Following

147

Media

1,447

Statuses

Assistant professor @YaleLinguistics . Studying computational linguistics, cognitive science, and AI. He/him.

New Haven, CT

https://t.co/E1VS6zsiwM

Joined December 2018

Don't wanna be here? Send us removal request.

Pinned Tweet

@RTomMcCoy

Tom McCoy

8 months

🤖🧠NEW PAPER🧠🤖 Language models are so broadly useful that it's easy to forget what they are: next-word prediction systems Remembering this fact reveals surprising behavioral patterns: 🔥Embers of Autoregression🔥 (counterpart to "Sparks of AGI") 1/8

Tweet media one

36

298

1K

Last Seen Profiles

@RepJenKiggans

@hokutoshi_kouko

@Weydsonalan

@Penseursauvage

@haosbini

@iamKalera

@bella_bizarre

@madbludd

@Janzzzz8

@pirloavefenix

@BlitzyBlitzwing

@malu_sotomayorf

@madbludd

@FabricioMi40570

@lanlanutau

@ZharifG

@EdgarEduardoTM4

@portals_fi

@AndresGalleano1

@DevOliRBX

@KarloBasta1

@applied_group

@promille_jp

@drunkhys

@CoachLT39

@Senta_lia

@ripple_bluerose

@UnderlineVC

@NikilisRBX

@ihoroskop

@blitz_crystal

@AAzoulay

@fcsdathletics

@ranfancentral

@karaage_award

@MarkySaysThis

@RTomMcCoy

Tom McCoy

2 years

It has become acceptable for acronyms to use any letters within a word, not just the first letter. E.g., ORNATE = acrOnyms fRom noN-initial chAracTErs But why stick with whole letters? In my new paradigm CLIP, an acronym can use any curves or line segments from the base phrase!

Tweet media one

20

92

960

@RTomMcCoy

Tom McCoy

4 years

How am I only learning now that Latvia's prime minister has a PhD in linguistics from Penn?? I've seen many lists of "jobs for linguists outside academia" but they never include Prime Minister of Latvia.

11

120

839

@RTomMcCoy

Tom McCoy

4 years

Linguists: In case you could use a diversion, I've made a phonetic crossword - all the answers must be written in the IPA, one phoneme per square. (Non-linguists: Here's a chance to learn some phonetics!) Puzzle: Answers:

Tweet media one

19

250

652

@RTomMcCoy

Tom McCoy

2 years

🤖🧠NEW PAPER🧠🤖 What explains the dramatic recent progress in AI? The standard answer is scale (more data & compute). But this misses a crucial factor: a new type of computation. Shorter opinion piece: Longer tutorial: 1/5

Tweet media one

8

112

600

@RTomMcCoy

Tom McCoy

1 year

🤖🧠NEW PAPER🧠🤖 Bayesian models can learn rapidly. Neural networks can handle messy, naturalistic data. How can we combine these strengths? Our answer: Use meta-learning to distill Bayesian priors into a neural network! Paper: 1/n

Tweet media one

4

117

551

@RTomMcCoy

Tom McCoy

3 years

*NEW PREPRINT* Neural-network language models (e.g., GPT-2) can generate high-quality text. Are they simply copying text they have seen before, or do they have generalizable linguistic abilities? Answer: Some of both! Paper: 1/n

Tweet media one

9

96

462

@RTomMcCoy

Tom McCoy

4 years

Transformers are the current state of the art, but one day LSTMs may overtake them. That would make LSTMs current again. You could even say…re-current.

5

33

436

@RTomMcCoy

Tom McCoy

4 years

Flying home from #LSA2020 ? Remember to put your liquids in a separate bag!

Tweet media one

6

83

362

@RTomMcCoy

Tom McCoy

2 years

Excited to share some updates, which all still feel surreal: - Just defended my dissertation advised by @TalLinzen & @Paul_Smolensky ! - Next up: Postdoc w/ Tom Griffiths @cocosci_lab ! - Then joining @YaleLinguistics as an asst prof w 2ndary appt @YaleCompsci ! A thank-you thread:

31

8

307

@RTomMcCoy

Tom McCoy

4 years

Takeaways from #NeurIPS : 1) In-distribution generalization is out 2) Out-of-distribution generalization is in 3) We want compositionality (whatever it is) 4) "GPT-2" is very hard to say

6

31

272

@RTomMcCoy

Tom McCoy

6 months

My colleagues and I are accepting applications for PhD students at Yale. If you think you would be a good fit, consider applying! Most of my research is about bridging the divide between linguistics and artificial intelligence (often connecting to CogSci & large language models)

6

64

253

@RTomMcCoy

Tom McCoy

5 years

The 4 gates of an LSTM: 1) Input gate 2) Output gate 3) Forget gate 4) Backpropa gate

1

26

246

@RTomMcCoy

Tom McCoy

4 years

Thread about five #acl2020nlp papers that haven’t gotten the hype they deserve:

1

63

239

@RTomMcCoy

Tom McCoy

4 years

Instead of writing out "clustering," we should just write "k." It's much shorter, and everyone knows that k means clustering.

4

23

233

@RTomMcCoy

Tom McCoy

3 years

Summarization is ROUGE, MT is BLEU. Can automatic metrics Ever measure NLU?

8

20

231

@RTomMcCoy

Tom McCoy

4 years

“Stop! In the name of love” - A phonologist describing the /k/, /p/, or /d/ in “Cupid”

1

38

205

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab Another example: shift ciphers - decoding a message by shifting each letter N positions back in the alphabet. On the Internet, the most common values for N are 1, 3, and 13. These are the only ones for which GPT-4 performs well! 5/8

Tweet media one

4

19

195

@RTomMcCoy

Tom McCoy

3 years

DEFINITION: the "critical period in linguistics" is the week after a linguist tweets about innateness, when everyone else is critical of them

2

28

188

@RTomMcCoy

Tom McCoy

1 month

I am incredibly honored to receive a Glushko Dissertation Prize! A huge thank-you goes to: - My dissertation advisors, @TalLinzen and @Paul_Smolensky , for being incredibly supportive throughout my PhD - (continued in next tweet) 1/2

@cogsci_soc

CogSci Society

1 month

The Cognitive Science Society is thrilled to announce the winners of the 2024 Glushko Dissertation Prize! 🏆 Let’s meet the brilliant minds behind groundbreaking research in Cognitive Science 🧵👇

Tweet media one

1

7

46

20

4

192

@RTomMcCoy

Tom McCoy

5 years

Off to #acl2019nlp - or, as I call it, Florence and the Machine Learning @ACL2019_Italy

3

17

182

@RTomMcCoy

Tom McCoy

3 years

Why is it called "Teaching NLP" instead of "Natural Language Professing"?

1

13

182

@RTomMcCoy

Tom McCoy

2 months

I am hoping to hire a postdoc who would start in Fall 2024. If you are interested in the intersection of linguistics, cognitive science, and AI, I encourage you to apply! Please see this link for details:

Tweet media one

1

52

166

@RTomMcCoy

Tom McCoy

4 years

I’m now halfway through my PhD. One lesson I've learned: Don’t get discouraged comparing yourself to others. Most comparisons are unfair; no two people have the same background. Plus, you get to define what success means to you-it doesn’t have to look like anyone else’s version.

3

14

164

@RTomMcCoy

Tom McCoy

3 years

Phonology: Ain't no party like a fricative party cuz a fricative party don't stop Syntax: Ain't no recursion like infinite recursion, cuz there ain't no recursion like infinite recursion, cuz...., cuz infinite recursion don't stop Semantics: Ain't no Partee like Barbara Partee

2

17

157

@RTomMcCoy

Tom McCoy

4 years

A #CompLing proof: a. Consider these sentences: 1. "How do you get down from a horse?" 2. "How do you get down from a goose?" b. In (1), “down” is a preposition; i.e., “down” = P c. In (2), “down” is a noun phrase; i.e., “down” = NP d. By transitivity: P = NP

7

19

159

@RTomMcCoy

Tom McCoy

4 years

Two recent times when English failed me: 1) Passive form of "let someone know" ("He wants to be let known"?) 2) Adverb form of "hoity-toity" ("hoitily-toitily"?) We need a flag, like on Wikipedia: "This linguistic phenomenon is incomplete. You can help English by expanding it."

13

16

155

@RTomMcCoy

Tom McCoy

4 years

Human language learning is fast & robust because of the inductive biases that guide it. Neural nets lack these biases, limiting their utility for cognitive modeling. We introduce an approach to address this w/ meta-learning. Demo:

1

36

157

@RTomMcCoy

Tom McCoy

2 years

@katiedimartin For a class I TAed, I made this intro to Python structured around a running example from phonology: It's meant to be gone through in one to two 75-minute lectures, so it might actually be more basic than what you want.

Introduction_to_Python.ipynb

Colaboratory notebook

colab.research.google.com

3

8

152

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab Our results show that we should be cautious about applying LLMs in low-probability situations We should also be careful in how we interpret evaluations. A high score on a test set may not indicate mastery of the general task, esp. if the test set is mainly high-probability 7/8

2

7

145

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab By reasoning about next-word prediction, we make several hypotheses abt factors that'll cause difficulty for LLMs 1st is task frequency: we predict better performance on frequent tasks than rare ones, even when the tasks are equally complex Eg, linear functions (see img)! 4/8

Tweet media one

1

7

138

@RTomMcCoy

Tom McCoy

11 months

When language models produce text, is the text novel or copied from the training set? For answers, come to our poster today at #acl2023nlp ! Session 1 posters, 11:00 - 12:30 today Critics* are calling the work "monumental" Link to paper: 1/2

Tweet media one

3

21

129

@RTomMcCoy

Tom McCoy

4 years

New paper: "Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks" w/ @Bob_Frank & @TalLinzen to appear in TACL Paper Website Interested in syntactic generalization? Read on! 1/

Tweet media one

2

27

122

@RTomMcCoy

Tom McCoy

5 years

Before #acl2019nlp , @TalLinzen gave me some transformative advice: If there are people you would like to meet at a conference, email them to set up a meeting! (1/5)

2

17

121

@RTomMcCoy

Tom McCoy

4 years

"Whatever accidental meaning her *words* might have, she *herself* never meant anything at all." - Lewis Carroll (presumably talking about language models)

1

20

117

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab In conclusion: To understand what language models are, we must understand what we have trained them to be. For much more, see the paper: Work by @RTomMcCoy , @ShunyuYao12 , @DanFriedman0 , @MDAHardy , and Tom Griffiths @cocosci_lab 8/8

Tweet card media

Embers of Autoregression: Understanding Large Language Models...

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems...

3

9

118

@RTomMcCoy

Tom McCoy

3 years

Just realized that the US is a phonetics diagram

Tweet media one

3

12

116

@RTomMcCoy

Tom McCoy

4 years

Paul Smolensky’s class “Foundations of CogSci” now has a 2.5-hr summary on YouTube! This course is the reason I think of myself as a cognitive scientist. Highly recommended.

Tweet card media

What Kind of Computation is Human Cognition? A Brief History of...

Since the naming of the field in 1956, AI has been dominated first by symbolic rule-based models, then early-generation neural (or “connectionist”) models, t...

www.youtube.com

1

27

116

@RTomMcCoy

Tom McCoy

2 years

The word2vec analogy "king - man ≈ queen - woman" is famous. What other types of vectors, besides word embeddings, have been argued to display additive analogies? (e.g., vector representations of faces? or phonemes? or documents?)

16

13

111

@RTomMcCoy

Tom McCoy

5 years

At #LSA2019 , there is one elevator where you speak your first language, and another where you speak your second language.

Tweet media one

0

12

110

@RTomMcCoy

Tom McCoy

10 months

Linguists, I have a terminology proposal: - When you study competence, you're doing linguistics - When you study performance, you're doing lingusitics Of course, the object of study would be language or langauge, respectively

5

15

102

@RTomMcCoy

Tom McCoy

5 months

If you're interested in the NYT lawsuit (about GPT-4 copying from NYT articles), you should check out our paper "How Much Do Language Models Copy From Their Training Data?" TACL link: 1/n

@RTomMcCoy

Tom McCoy

3 years

*NEW PREPRINT* Neural-network language models (e.g., GPT-2) can generate high-quality text. Are they simply copying text they have seen before, or do they have generalizable linguistic abilities? Answer: Some of both! Paper: 1/n

Tweet media one

9

96

462

1

18

99

@RTomMcCoy

Tom McCoy

3 years

Me, young and naive, reading about LSTMs for the first time: "Huh, I have no idea what an LSTM is. Well, I'll just look up what the letters stand for, and that should clear it up!"

5

3

96

@RTomMcCoy

Tom McCoy

2 years

Random grad school tip: Sign up for one-on-one meetings with invited speakers - even if their interests are different from yours 1/4

1

5

97

@RTomMcCoy

Tom McCoy

5 years

I wrote a long criticism of Bill Nye, when I meant to criticize Bill Nighy. It was an ad homonym attack.

1

13

95

@RTomMcCoy

Tom McCoy

3 years

From finite linguistic experience, we can acquire languages that are infinite. How do we make this leap? New preprint on artificial language learning of center embedding: w/ @DrCulbertson , Paul Smolensky, & Geraldine Legendre, to appear @cogsci_soc 1/n

Tweet media one

3

25

94

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab Our big question: How can we develop a holistic understanding of large language models (LLMs)? One popular approach has been to evaluate them w/ tests made for humans But LLMs are not humans! The tests that are most informative about them might be different than for us 2/8

Tweet media one

1

6

95

@RTomMcCoy

Tom McCoy

3 years

Someone should use GPT-3 to write a season of Star Trek. We can call it "Star Trek: The Text Generation"

5

4

93

@RTomMcCoy

Tom McCoy

5 years

New tech report with Junghyun Min and @TalLinzen : "BERTs of a feather do not generalize together" Across 100 re-runs, BERT fine-tuned on MNLI has a consistent score on MNLI but extreme variation in syntactic generalization (measured w/ HANS). Link: 1/7

Tweet media one

1

19

90

@RTomMcCoy

Tom McCoy

4 months

Yesterday I went to check out the classroom where I'll be teaching. At first I thought the door was locked, but it turned out that it was just very heavy. It felt like a metaphor for life - often the doors that we think are locked are actually just heavy!

6

4

82

@RTomMcCoy

Tom McCoy

3 years

*NEW RESOURCE* Neural networks can vary dramatically across reruns. As a tool for studying this variation, we've released the weights for 100 instances of BERT fine-tuned on natural language inference (MNLI): w/ Junghyun Min and @TalLinzen

1

18

82

@RTomMcCoy

Tom McCoy

5 years

Excited to have a new @ICLR2019 paper with @TalLinzen , @EwanDun , and Paul Smolensky! We find implicit compositional structure in RNN encodings by approximating them with Tensor Product Representations. Paper: Demo:

Tweet media one

1

10

78

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab So how can we evaluate LLMs on their own terms? We argue for a *teleological approach*, which has been productive in cognitive science: understand systems via the problem they adapted to solve For LLMs this is autoregression (next-word prediction) over Internet text 3/8

2

3

76

@RTomMcCoy

Tom McCoy

4 years

Tomorrow I'll be speaking at the new @NLPwithFriends about using meta-learning to improve linguistic generalization in neural networks. See below for details!

@NLPwithFriends

NLP with Friends

@NLPwithFriends

4 years

We are very excited to announce our next speaker!! 🗣Tom McCoy( @RTomMcCoy ), telling us about "Universal Linguistic Inductive Biases via Meta-Learning" 🗓August 12th, 14:00 UTC 📝Sign up: Keep up to date with talks at

1

14

73

4

8

74

@RTomMcCoy

Tom McCoy

3 years

I'm beyond excited to watch @AlRoker solve a crossword I constructed! Thanks to @NYTimesWordplay for organizing!

@NYTGames

New York Times Games

3 years

Tune in Thursday, March 18, at a special time -- 11 a.m. Eastern -- and help our special guest the TODAY Show's @alroker crush @RTomMcCoy 's tumultuous crossword! Join us on Twitter, YouTube or Twitch. Illustration by James Doane.

Tweet media one

4

4

20

7

4

72

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab The 2nd factor we predict will influence LLM accuracy is output probability Indeed, across many tasks, LLMs score better when the output is high-probability than when it is low-probability - even though the tasks are deterministic E.g.: Swapping adjacent words (see img) 6/8

Tweet media one

1

2

71

@RTomMcCoy

Tom McCoy

11 months

In Toronto for #acl2023nlp - please reach out if you want to meet up! Some interests: - connecting linguistics & NLP - interpretability & evaluation - other things on this list: - PhDs & postdocs at Yale Linguistics or CS (I'll be recruiting for 2024-2025)

6

5

68

@RTomMcCoy

Tom McCoy

5 years

Paper accepted to @ACL2019_Italy ! "Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference" with @TalLinzen and Ellie Pavlick (building on work done with the JSALT team led by Ellie and @sleepinyourhat ). Link:

Tweet media one

3

10

67

@RTomMcCoy

Tom McCoy

4 years

William Carlos Williams was right: "wheelbarrow" has a whopping 5 dependents!

Tweet media one

3

13

66

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab P.S. - If these results sound interesting, here are two other papers that you should also check out: First is this excellent work by @zhaofeng_wu et al. about task variants:

@zhaofeng_wu

Zhaofeng Wu

11 months

Language models show impressive performance on a wide variety of tasks, but are they overfitting to evaluation instances and specific task instantiations seen in their pretraining? How much of this performance represents general task/reasoning abilities? 1/4

Tweet media one

9

108

466

1

3

65

@RTomMcCoy

Tom McCoy

1 year

This very nice piece by Ted Chiang describes ChatGPT as a lossy compression of the Internet. This idea is helpful for building intuition, but it's easy to miss an important point: Lossiness is not always a problem! In fact, if done right, it is exactly what we want. 1/14

@NewYorker

The New Yorker

1 year

The science-fiction writer Ted Chiang explores how ChatGPT works and what it could—and could not—replace:

7

56

219

2

4

65

@RTomMcCoy

Tom McCoy

3 years

Mechanical Turk is incredibly useful for collecting data, but using it effectively can be tricky. Here is a list of tips that helped me get good data & save money:

4

10

64

@RTomMcCoy

Tom McCoy

2 years

Everyone says "To reach human-level AI, we need to scale up our models." But humans don't have scales! Only reptiles do!

4

1

64

@RTomMcCoy

Tom McCoy

2 years

One perk of academia that doesn’t get enough love is eduroam. So many times when I’ve needed wifi, eduroam has unexpectedly been there - e.g., when I was in a Swedish airport & urgently needed to rebook a flight.

5

2

62

@RTomMcCoy

Tom McCoy

4 years

(1/5) To understand our models, we need to understand how they have been affected by their training data. Methods like this one will help us do that. @XiaochuangHan , @byron_c_wallace , Yulia Tsvetkov.

Tweet media one

4

5

60

@RTomMcCoy

Tom McCoy

4 years

I hope the VP debate will get into double-object constructions, or at least the argument/adjunct distinction.

3

8

60

@RTomMcCoy

Tom McCoy

2 years

Today is my New Yorker crossword debut! (Online-only)

The Crossword: Friday, August 5, 2022

Today’s theme: Rise and shine.

www.newyorker.com

5

5

60

@RTomMcCoy

Tom McCoy

2 years

A minimal pair from Netflix!

Tweet media one

2

5

59

@RTomMcCoy

Tom McCoy

4 years

Excited to have 2 papers accepted to #acl2020nlp ! Both are about syntactic generalization in neural networks (via data augmentation or tree-based architectures), and both are joint work with some fantastic collaborators. Titles are in replies, links are yet to come:

1

4

57

@RTomMcCoy

Tom McCoy

4 years

Want to have a lively discussion about #NLProc ? A tension is all you need

0

1

56

@RTomMcCoy

Tom McCoy

4 years

On Oct. 10, I’ll be giving a virtual talk for the National Museum of Language! It's about the linguistics of crossword puzzles. Registration is free:

Tweet media one

1

8

55

@RTomMcCoy

Tom McCoy

3 years

Machine learning techniques ranked by the sturdiness of the building materials in their names: 1) METALearning 2) reinforCEMENT learning 3) logiSTIC regression

4

3

53

@RTomMcCoy

Tom McCoy

2 years

Good news: The horse raced past the barn has fully recovered from his fall!

3

3

55

@RTomMcCoy

Tom McCoy

5 years

Life hack for TAs: You can condense your email signature into a single character, θ ("the TA")

2

5

55

@RTomMcCoy

Tom McCoy

2 years

Seems like part of the joke went unnoticed... At the risk of ruining humor by explaining it: The sub-letter shenanigans are unnecessary - try reading the first letters of the words in the picture.

@RTomMcCoy

Tom McCoy

2 years

It has become acceptable for acronyms to use any letters within a word, not just the first letter. E.g., ORNATE = acrOnyms fRom noN-initial chAracTErs But why stick with whole letters? In my new paradigm CLIP, an acronym can use any curves or line segments from the base phrase!

Tweet media one

20

92

960

4

0

53

@RTomMcCoy

Tom McCoy

3 years

Some historical phonetics: The [f] sound was originally made by clenching your teeth together. Only in the past few centuries did we switch to the current approach of lower-lip-against-upper-teeth. The name for this shift: dental f loss

3

2

52

@RTomMcCoy

Tom McCoy

4 years

(3/5) This one’s a twofer: Both papers give hard evidence that evaluating only on English can make us overestimate our models ( #BenderRule in action). Kate McCurdy, Sharon Goldwater, Adam Lopez. Forrest Davis, @marty_with_an_e .

Tweet media one

1

6

53

@RTomMcCoy

Tom McCoy

4 years

Standard evaluations in NLP can mask striking differences between models. To hear more, come to our talk “BERTs of a feather do not generalize together” on Friday at #BlackboxNLP ! w/ Junghyun Min and @TalLinzen Paper:

Tweet media one

3

4

52

@RTomMcCoy

Tom McCoy

3 years

Some English words differing only in stress: - desert/dessert - insight/incite - decent/descent - reflex/reflects - discus/discuss - readout/redoubt - misery/Missouri - deepened/depend - media/Medea - uprise/apprise - abbess/abyss - bellies/Belize - expos/expose - trusty/trustee

3

5

51

@RTomMcCoy

Tom McCoy

4 years

(2/5) Many papers ask, “Do language models learn syntax?” I like that this work moves beyond that to “What type of syntax do language models learn?” Artur Kulmizev, @vin_ivar , Mostafa Abdou, @JoakimNivre .

Tweet media one

1

11

52

@RTomMcCoy

Tom McCoy

4 years

This got more likes than I expected, so I guess I should share my SoundCloud:

Tweet media one

1

11

48

@RTomMcCoy

Tom McCoy

1 year

If you're interested in language acquisition and neural networks, check out our new paper!

@AdityaYedetore

Aditya Yedetore

@AdityaYedetore

1 year

NEW PREPRINT Excited to release my first first-author paper! We investigate if neural network learners (LSTMs and Transformers) generalize to the hierarchical structure of language when trained on the amount of data children receive. Paper:

Tweet media one

2

21

96

3

3

50

@RTomMcCoy

Tom McCoy

8 months

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab @zhaofeng_wu …as well as this fantastic paper by @yasaman_razeghi et al. about input probability (an effect that is complementary to the output probability effects described above):

@yasaman_razeghi

Yasaman Razeghi

@yasaman_razeghi

2 years

You've probably seen results showing impressive few-shot performance of very large language models (LLMs). Do those results mean that LLMs can reason? Well, maybe, but maybe not. Few-shot performance is highly correlated with pretraining term frequency.

14

107

588

1

2

48

@RTomMcCoy

Tom McCoy

2 years

My pipeline for typing special characters: 1) Find the Wikipedia page about the character 2) Copy the character from Wikipedia and paste it into my browser's search bar, to remove formatting 3) Copy from the search bar into the document I'm typing

3

1

49

@RTomMcCoy

Tom McCoy

9 months

Prediction: one of these days, someone will announce a new LLM that has an infinite context length - but it will turn out to be a reinvention of the LSTM.

5

1

47

@RTomMcCoy

Tom McCoy

10 months

🌲Interested in language acquisition and/or neural networks? Check out our poster today at #acl2023nlp ! Session 4 posters, 11:00-12:30 🌲 Elevator pitch: Train language models on child-directed speech to test "poverty of the stimulus" claims Paper:

Tweet media one

2

6

48

@RTomMcCoy

Tom McCoy

4 months

Interesting results about LLMs & meaning! As a bonus, the paper is an excellent example of how to evaluate LLMs fairly: 1. Provide sufficient context & information, to avoid underestimating LLMs 2. Control for spurious correlations in the data, to avoid overestimating LLMs

@kanishkamisra

Kanishka Misra 😶‍🌫️

4 months

Controlled zero-shot evals have revealed holes in LMs’ ability to robustly extract and use meaning. But what happens when you add experimental context (ICL/instructions)? With @AllysonEttinger & @kmahowald , I explore this in the context of semantic property inheritance: 1/13

Tweet media one

1

14

73

1

2

47

@RTomMcCoy

Tom McCoy

4 years

Whenever my family drove by Toys R Us, my dad would say, "It should really be named Toys R We." 20 years later, I'm a linguist. Coincidence? You tell me.

2

1

47

@RTomMcCoy

Tom McCoy

2 months

@swabhz "You might have misread this section: It's meant to be 'limitations', not 'imitations'"

1

0

46

@RTomMcCoy

Tom McCoy

12 days

🧠🤖 Are you interested in linguistics, cognitive science, and Large Language Models? Come join this workshop next Monday & Tuesday over Zoom! I'm really looking forward to it!

@roger_p_levy

Roger Levy

17 days

Join us online for the May 13–14 for a star-studded #NSF -sponsored workshop: New Horizons in Language Science: Large Language Models, Language Structure, and the Cognitive & Neural Basis of Language! Interdisciplinary talks & discussion on three themes: 1/

4

86

225

0

7

46

@RTomMcCoy

Tom McCoy

3 years

John Firth, 1957, reminding a customer service rep to uphold their warranty: "YOU SHALL KNOW A COMPANY BY THE WORD THAT IT KEEPS!!!"

2

5

44

@RTomMcCoy

Tom McCoy

3 years

"To be a computer scientist, you have to hate computers at least a little. Otherwise you have no motivation to make them better." - Dana Angluin, talking to my first college CS class I think of this often. It's a comforting thought if you're feeling frustrated with your field(s)

0

1

42

@RTomMcCoy

Tom McCoy

2 years

Area chairs and area rugs are much less similar than their names would suggest

2

2

42

@RTomMcCoy

Tom McCoy

4 years

Off to #acl2020nlp - or, as I call it, Seattle Watching-tons

@RTomMcCoy

Tom McCoy

5 years

Off to #acl2019nlp - or, as I call it, Florence and the Machine Learning @ACL2019_Italy

3

17

182

2

4

41

@RTomMcCoy

Tom McCoy

1 year

I will greatly miss Drago. To a very large extent, I owe him my career: Along with @LoriLevinPgh , he introduced me to linguistics via NACLO, a contest that they co-founded. His warmth and enthusiasm got me excited about the field that I have continued to pursue ever since. 1/5

@hmkyale

Harlan Krumholz

1 year

The #AI community, the #computerscience community, the @YaleSEAS community, and humanity have suddenly lost a remarkable person, @dragomir_radev - kind and brilliant, devoted to his family and friends... gone too soon. A sad day @Yale @YINSedge @YaleCompsci #NLP2023

Tweet media one

Tweet media two

41

87

390

2

2

40

@RTomMcCoy

Tom McCoy

3 years

At #CogSci2021 and interested in linguistic generalization? Stop by our poster! We find that people extrapolate center embedding beyond the depths of embedding they've seen. Wed, July 28, from 11:20 am to 1:00 pm, Eastern time Poster 2-E-176 Paper:

Tweet media one

1

2

40

@RTomMcCoy

Tom McCoy

2 years

The classic JSTOR trap: Thinking you’ve found a PDF of the book you need, when it’s actually just a review of that book (with the same title as the book)

1

0

40

@RTomMcCoy

Tom McCoy

4 years

New paper: “Representations of syntax [MASK] useful: Effects of constituency and dependency structure in recursive LSTMs” by Michael Lepori, @TalLinzen , & me arXiv: Which will win – dependency or constituency? And what goes in the [MASK]? Read on! 1/9

Tweet media one

2

6

39

@RTomMcCoy

Tom McCoy

3 years

In model-generated text, very few bigrams and trigrams are novel - i.e., most of them appear in the training set. But for 5-grams and larger, the majority are novel! 3/n

Tweet media one

2

3

39

@RTomMcCoy

Tom McCoy

2 years

Lit review tip: Start with a foundational paper related to your work and then look over everything that has cited it. If you pick the right seed paper - one so famous that no one could have missed it - this should return everything relevant to you!

2

2

36