NEWSROOM -- a corpus of 1.3M (1,321,995) article-summary pairs for automated summarization. It's big, it's diverse, and it's an open challenge. Oh, and we are pretty excited about it! Joint work with Max Grusky and
@informor
#NLProc
#naacl2018
Just updated our recent paper on BERTScore, a super simple method for evaluating text generation with BERT, with many more experiments. We evaluated with the outputs of 363 MT systems and model selection experiments! --> 41 pages and 29 giant tables :)
Folks, some
@COLM_conf
stats, because looking at these really brightens the mood :)
We received a total of ⭐️1036⭐️ submissions (for the first ever COLM!!!!). What is even more exciting is the nice distribution of topics and keywords. Exciting times ahead! ❤️
We are releasing KiloGram ⚖️, a large-scale resource of tangram images with language annotation, for everyone in
#NLProc
, CogSci, and many other fields to enjoy (and use)! Coming up in EMNLP -> 🧵
📄
Browse it:
Happy to release Touchdown🧸, a natural language navigation and spatial reasoning dataset using Street View. The task: follow the instructions to reach a goal and find a hidden 🧸 named Touchdown. All the hard work by
@howard50b
@alsuhr
and Dipendra Misra
NLVR goes real! Check out NLVR² — complex reasoning with *natural* language and *real* vision. 107K examples, each caption and a pair of images. Task: predict if the caption is true. A long way to go to human performance of 96%!
This has been 2 years and 3 papers in the making: direct mapping of natural language instructions and first-person observations to continuous velocity control. Yep, we learn the entire pipeline with a single interpretable neural model!
#NLProc
❤️
#Robotics
NEW PAPER: w/
@ggaonlp
@hungting_chen
@eunsolc
, we build+deploy QA systems to improve from real human feedback. Over thousands of interactions,we show rapid improvements over time! Human users ask questions, get model-predicted answers, and give feedback🧵
Users engaged with natural language systems can provide feedback in realtime, and this feedback is a super duper learning signal! So: deploy, train, repeat!
Last PhD paper w/
@alsuhr
/suhr
@sigmoid
.social ... 🧵
Want to do ML/NLP for science?!
@arxiv
(that small website you visit 100 times a day) is hiring a Lead ML Engineer with focus on NLP! Super exciting data and incomparable impact. Please help distribute and pass around to anyone relevant 🙏
Is it time to reconsider oral sessions
@aclmeeting
? Or is it just me finding them less useful, and not attended compared to the lively ongoing poster session
Can we improve a QA system from user feedback *in deployment*? We study how effective this signal is (tldr: very effective) using simulation experiments with existing benchmarks.
@aclmeeting
#ACL2022
work by Ge Gao collab with
@eunsolc
Distributional semantics? Reminds me of the "florida" example in the
@omerlevy_
and
@yoavgo
paper from 2014. Granted, contemporary LLMs probably do it much better, but the ability is likely not new
For spatial representations, we run Llama-2 models on the names of tens of thousands cities, structures, and natural landmarks around the world, the USA, and NYC. We then train linear probes on the last token activations to predict the real latitude and longitudes of each place.
Turing awards for everyone!!!
@yoavgo
@percyliang
@alsuhr
@YejinChoinka
(who got some bonus awards on the way).... presupposition is a wonderful thing!
Do you want a Turing Award? Go to , give yourself one, and send a screenshot to your proud parents 🏆
Yoav Artzi (
@yoavartzi
), Assistant Professor in the Department of Computer Science and Cornell Tech, has received a Google Focused Research Award to fund exploration of spatial language understanding. He will share the $1.5m award evenly with
@mohitban47
.
Completely agree. This counter culture (sorry, had to pun -- lame, I know) has been going on for a few years, and it's not only annoying, but misleading about what makes progress, both on the personal level and globally on research. If anything, slow down!
Interesting Engineering (
@IntEngineering
) features
@cs_cornell
prof Tapomayukh Bhattacharjee's lab in which robots — including, newly-devised robotic arms — could become crucial caregivers in the near future.
@TapoBhat
#CornellCIS
Upcoming in EMNLP: Executing Instructions in Situated Collaborative Interactions (). New language collaboration environment and large dataset, modeling and learning methods, and a new evaluation protocol for sequential instructions.
Want a good-ol'-fashioned hard copy of the NEWSROOM summarization dataset ()? Find Max Grusky at
#NAACL2018
and get 1.3M article-summary pairs on a bespoke flash drive, limited supplies! -- TALK ON SUNDAY, 11:06 in Empire B
#NLProc
We keep updating BERTScore, our generation evaluation method, behind the scenes. Been a while so highlights:
- Now supports 53 pre-trained models via
@huggingface
's Transformers
- WMT-16 to-EN correlations here:
--> current best: deberta-xlarge-mnli
This is absurd. Beyond credit, authorship is responsibility and liability. OpenAI assumes neither, and it is nonsensical to attribute either to ChatGPT or expect it to assume it (whatever that would even mean! 🤯). This practice is actively misleading the public about LLMs.
This is probably the first paper to give ChatGPT coauthor status, and its contact details points to support
@openai
! Giving coauthorship to writing assistants is absurd and this practice has to stop. 🧶
An important aspect of LLM deployment not featured much in current discourse. I wrote a one-pager about this about a month ago for an ISAT workshop. Took the opportunity to edit it a bit:
[very speculative, I will probably regret posting it 😬]
I'm in the top 2% of users on StackOverflow. My content there has been viewed by over 1.7M people. And it's unlikely I'll ever write anything there again.
Which may be a much bigger problem than it seems. Because it may be the canary in the mine of our collective knowledge.
A…
Super proud of Anya and the team for the
@emnlpmeeting
best paper, and very appreciative of the hard work of EMNLP and the committee. Maybe it's opportunity to raise something: best paper awards are fun, but should be replaced with larger pool of outstanding awards [🧵1/3]
Really nice to see consistent progress on a hard semantic parsing task 😍 -- NLVR -- with solid algorithm improvements! Most recent, work
@nitish_gup
@sameer_
@nlpmattg
gets 89.5% accuracy, almost to 90% on structured rep. That's from 67.8% when we released the data in 2017 👏
+1 Interest in human language is what drives much of my work. No community compares. This all makes me very sad. NLP faces two pressures: one from industry LLMs, the other wholly self inflicted. Not clear ACL 'll survive as an impactful force if it doesn't get its act together
To be clear, I love the NLP community. I admire the faculty, it’s a joy to teach the students, the vibe is thoughtful & warm. But *CL faces existential threats & has adopted all the wrong remedies. The house is on fire and we’re furiously installing a labyrinthine koi pond.
Humans learn language by acting in the world. Can RL agents do the same? lilGym is a new benchmark 🏋️ for RL + natural language + visual reasoning
Chief RL trainer:
@anne_youw
, in collboration with
@noriyuki_kojima
and
@xkianteb
Answering two quick questions I received:
1. Yes, it will be an in-person conference!
2. The CfP details what is behind the non-exhaustive list of topics of interest -- read how we break out each term! We are taking a VERY VERY broad view of language modeling an its uses
Introducing COLM () the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models.
Submissions: March 15 (it's pronounced "collum" 🕊️)
Turns out that
@alsuhr
's good ol' fashioned (2017!) NLVR remains pretty challenging for SOTA multimodal LLMs ¯\_(ツ)_/¯ New technical report by
@anne_youw
Particularly striking given the tiny vocabulary size and the simple synthetic images. Why? Not completely sure, but ...
Does anyone have a favorite task where gpt-4 has near chance accuracy when zero or few-shot prompted? I’m looking for recommendations for tasks like this
Media coverage is absurd, serving interests of companies, where appearance of magic and intelligence translates into dollars. Capitulation of top-notch journalists is embarrassing and sad. (+ Sundar's response to the key question is ridiculous - we don't understand humans ... 🙄)
One AI program spoke in a foreign language it was never trained to know. This mysterious behavior, called emergent properties, has been happening – where AI unexpectedly teaches itself a new skill.
Turns out that
@alsuhr
's good ol' fashioned (2017!) NLVR remains pretty challenging for SOTA multimodal LLMs ¯\_(ツ)_/¯ New technical report by
@anne_youw
Particularly striking given the tiny vocabulary size and the simple synthetic images. Why? Not completely sure, but ...
Probably the best discussion of image generation so far. Starting strong with phrasal attachment ambiguity, and then diving into compositional semantics, including affordances and selectional preferences. There is a whole
#NLProc
lecture there
Happy we got the Language Grounding 🤖🖼🚀 track going in
@aclmeeting
this year! And glad to have SAC-ed its inaugural round :) despite slightly lower acceptance rate, Grounding is bigger than Syntax 🌲 --- how do times change!
#NLProc
Hey, attending
@iclr_conf
? The BERTScore presentation is online:
While listening, install BERTScore! Just “pip install bert-score” or git the source:
We will be around to chat on Tuesday (5-7pm, 8-10pm GMT / 1-3pm, 4-6pm EDT)
Just updated our recent paper on BERTScore, a super simple method for evaluating text generation with BERT, with many more experiments. We evaluated with the outputs of 363 MT systems and model selection experiments! --> 41 pages and 29 giant tables :)
Joint us for InterNLP 2022
@NeurIPSConf
on Dec 3 for our workshop on interactive learning for
#NLProc
. We have a fantastic set of speakers and submissions!
Schedule is here:
The videos for our crowdsourcing
@emnlpmeeting
tutorial are online via the link in Underline! ()
We will use the live slot in EMNLP for a 👩⚕️Crowdsourcing Clinic💉 (a what?! 👉🧵+video👇), so please watch the case studies in advance
This is not only the well-intended, but border-line suicidal arXiv policy. It's also Findings, ARR, checklists, and other onerous submission requirements. We don't need to be creative. We need simplicity. Reset the system!
@adveisner
@zacharylipton
We really need to stop trying to answer our problems with increased complexity. It's nearly impossible to predict impact over time and at scale. Again and again, and with the best intentions, we have my made our life harder, and undermined ACL
NLP folks, I am thinking of doing paper reproduction projects for grad-level adv. topics class. What are good papers to look at?
Paper must be well written, data available, compute is limited 😅, complexity bounded -> 1 semester (students have other [important] obligations).
What multi-modal LLMs are currently publicly available? Specifically, models that take as input in the prompt an arbitrary number of images, potentially interleaved within the prompt text. Image generation aside (for this query). I guess Flamingo is one relevant design. Thanks!
Two updates on NLVR2 (). First, we analyzed a potential visual bias, enhanced the evaluation protocol to be robust to it, and confirmed the results of recent work do not take advantage of this potential bias.
#NLProc
We created reviewing guidelines for
@COLM_conf
. Not intended to automate the committee work, or dictate constraints. But, to inspire a thoughtful reviewing process, for an exciting and impactful program of the highest possible quality. We have a wonderful program committee ❤️
We are recruiting PhD research interns
@asapp
for the coming summer! Working on challenging ML/NLP problems, with amazing data, and SOTA models. Apply here:
(also: hiring for full-time research positions, both scientists and engineers!)
Absolutely no. This is wrong, and maybe based on a misunderstanding of what (academic) research is about. Probably should engage more meaningfully, but 🤷♂️
As a NLP researcher doing semantic parsing for nearly 5 years, I have to say semantic parsing and grounding are probably also dead. FYI, semantic parsing is to transform natural language to formal language (code, self-defined functions etc.) and execute it in the real world.
MiniTorch v0.1 (DIY build-your-own Torch)
New modules on python GPU programming, pooling and CNNs, and lots of community fixes.
(DM me if you are would like access to the teacher's guide with code.)
It makes no sense that "ridge plots" are not called "little prince plots" or "boa plots" (right: boa plots from upcoming work with
@hawkrobe
-- see the elephants!)
It was great to have
@yoavartzi
with us today telling us about his really exciting new work on grounded semantics: Robot Control and Collaboration in Situated Instruction Following – even despite the Dec 9
@aclmeeting
deadline.
#NLProc
Seems like so far in the future, but I will be looking to recruit 1-2 PhDs next year, including with special focus on hard core robotics-oriented students for robotics+NLP 🗣🤖 (but not only!)
Considering a PhD in NLP and more specifically Grounded Language. So I thought it might a good time to try out Twitter's list feature to stay up-to-date with people like
@_jessethomason_
@yoavartzi
@FelixHill84
... Also added some other NLP people for fun.
The
@COLM_conf
reviewing period has started. Reviewers should now receive emails, and all papers are now assigned. Thanks to all our ACs who adjusted assignments in the last few days. Happy reviewing all!
Hey!
@COLM_conf
is recruiting reviewers!
███████░░░ 70% recruited 🚀🚀🚀
Did you get an invite? Please respond NOW!
Didn't get an invite and want to help? ❤️ please fill this form:
@yoavgo
We will make everything so costly and inefficient to the point that everything is possible and valid :)
Seriously though, it's a cool and interesting idea. Nice to see it in an NLP paper
The forcing of ARR next year is dispiriting. The issues go beyond getting the engineering right, and
@chrmanning
succinctly summarized during the ARR session …
A lot of
@cs_cornell
-related movements in the
#NLProc
faculty market this here. Worth summarizing in one place. Need to update our people page ....
Pretty excited about the numbers, especially given that our groups are relatively small :)
thread 1/8
@adveisner
@zacharylipton
We really need to stop trying to answer our problems with increased complexity. It's nearly impossible to predict impact over time and at scale. Again and again, and with the best intentions, we have my made our life harder, and undermined ACL
Users engaged with natural language systems can provide feedback in realtime, and this feedback is a super duper learning signal! So: deploy, train, repeat!
Last PhD paper w/
@alsuhr
/suhr
@sigmoid
.social ... 🧵
Research as API usage is problematic. Reproducibility is one issue, but not the biggest one. More critical is opaqueness about what is actually behind the API, and bounding the level of insight because of the restricted access (eg: no distributions, no activations)
OPENAI announced the discontinuation of the Codex API from March 23rd.
Then a large set of Codex-based code generation papers are becoming totally irreproducible. 🥲🥲
@srush_nlp
These popularity contest lists are largely flawed from the get go. It’s not doing good to our field, or to the students now joining it. I am really happy I didn’t ”grow up” (arguably, still growing up, but you get the point) in this climate
ACL is supposed to be my intellectual home. Not sure ML venues can really replace that. It's sad beyond the collapse of an important pub venue. And that was before the unfortunate dragging of arXiv (❤️) into the mud most recently 😢
We collect human preference annotations for news summaries generated by current SOTA and zero-shot GPT-3 models. For multiple settings (generic + keyword) and datasets (CNN + BBC), GPT-3 summaries beat prior fine-tuned models!
[2/6]
We put together a list of papers (that is NOT exhaustive of the styles COLM is looking for -- that thread would be truly endless), and
@srush_nlp
made a looong thread out of it. Looking forward to seeing your submission at COLM!
The Conference on Language Modeling 🦙 () has the mission of "creating a community of researchers with expertise in different disciplines, focused on understanding, improving, and critiquing the development of LM technology." 🧵
Here are 17 papers from 17…
New NLP+robotics+vision paper: Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following @ CoRL 2020
Work done by Valts Blukis
Three core contributions make this happen, let’s upack them ...
Surviving every AI wave, two kernels have consistently been the beating hearts of Natural Language Processing:
Datasets and Metrics
Today we release "nlp", a library to easily share & load data/metrics already providing access to 99+ datasets!
Try it👉
Surviving every AI wave, two kernels have consistently been the beating hearts of Natural Language Processing:
Datasets and Metrics
Today we release "nlp", a library to easily share & load data/metrics already providing access to 99+ datasets!
Try it👉
.
@NAACLHLT
attendees (and these just watching from afar), I am looking for a postdoc next year. Position will be at the
@cornell_tech
campus in NYC. Happy to chat at NAACL (or later on at
@ACL2019_Italy
) -- please DM/email to find a time, and spread the word
New paper: can observational behavioral signal facilitate continual instruction generation learning? Yes! Observe what people do -> they don't do what you want? -> maybe you said it wrong
by
@noriyuki_kojima
in collaboration w/
@alsuhr
and myself.
🧵...
Happy to release the Dynamic Robot Instruction Following (DRIF) framework, including a 3D simulator and data for natural language instruction to a realistic quadcopter drone
Video for our recent
#CoRL
paper using DRIF:
Great to see what everyone around here have been noticing going through a good-old-fashioned empirical test 👏 LLMs are a wonderful artifact, and will be super useful, but search is such a bad choice for current models!
We are pleased to announce that the first Conference on Language Modeling will be held at the University of Pennsylvania in Philadelphia at the Zellerbach Theatre.
Thanks so much to UPenn CS as well as Mark Yatskar and Zachary Ives for facilitating the amazing venue.
Not the one who told Noam this, but I don't filter based on paper counts. I find publication record to often be more of a distraction than a helpful signal ¯\_(ツ)_/¯
Someone on the admissions committee for a top CS PhD program told me they no longer filter based on paper count because too many of the applicants already have multiple publications. Instead, they now filter by citation count. Not sure if he was joking but I believed it.
Some big expansion in schools that were maybe less on the NLP map, such as Waterloo and UChicago (following a giant expansion in recent years by USC). Applicants should update their lists...
For the current fleeting moment, Valts' work tops the ALFRED leaderboard, *but* with a cool twist: only using the high level instructions! Completely without the low level ones. More details soon. Work by Valts Blukis,
@chris_j_paxton
,
@animesh_garg
Dieter Fox and myself.
NLP folks, I am looking for a textual similarity dataset where given 2 sentences (with maybe different words), we have word-level similarity judgements between them (not necessarily for all pairs). Is there something like this?
Excited to start COLM! If you are interested (of course you are!), please check out our survey:
We are looking to gauge interest and recruit program committee.
Introducing COLM () the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models.
Submissions: March 15 (it's pronounced "collum" 🕊️)
Will be in
#ACL2023NLP
Mon-Fri. Looking forward to catchup and discuss research. So much going on, all over the place, and all at once... not even sure what I am interested in any more. Well, natural language is a big one :)
Congratulations to Yejin Choi (
@YejinChoinka
)! The 2010
@Cornell
alumna and pioneer in the field of natural language processing has been awarded a 2022 MacArthur Fellowship, or “genius grant.”
Read more:
.
@cs_cornell
is (heavily) hiring faculty across dimensions ⟀ (areas, and locations: Ithaca and NYC!), including
#NLProc
. Self-filtering is often suspect, so just apply! Feel free to DM with Qs (answers will often be: yes, apply!)