Our latest study measures how persuasive language models like Claude are compared to humans. We find a general scaling trend: newer models tend to be more persuasive, with Claude 3 Opus generating arguments that don't differ statistically from human-written ones.
New Anthropic research: Measuring Model Persuasiveness
We developed a way to test how persuasive language models (LMs) are, and analyzed how persuasiveness scales across different versions of Claude.
Read our blog post here:
It’s great that schools are waiving the GRE requirement. But I remember the main cost that prevented me from applying to more schools was the application fees (~100USD per school). Given the exchange rate, it was a big challenge. I spent all my savings for PhD applications.
Personal news: Excited to share that after completing my PhD towards the end of Spring 2021, I will be joining
@stanfordnlp
for a Postdoc. I am extremely fortunate to be co-hosted by
@jurafsky
,
@tatsu_hashimoto
and
@chrmanning
. Looking forward to be part of this amazing group!
Join our team! 🙌 The societal impacts team at
@AnthropicAI
is hiring. We design new methods to assess language models for societally or policy-relevant traits. If you feel passionate about this direction, apply to join us! (please retweet)
After two amazing postdoc years at
@StanfordNLP
, thrilled to join
@AnthropicAI
's Societal Impacts team as a Research Scientist! Grateful for everyone who helped me along the way. Feeling lucky to work with this great group on the crucial mission of building safer AI.
We develop a method to test global opinions represented in language models. We find the opinions represented by the models are most similar to those of the participants in USA, Canada, and some European countries. We also show the responses are steerable in separate experiments.
@kisacakimdir
Özellikle genç kadınlarımıza söylemek isterim ki başarıları olamayacağınız hiçbir alan yok. Bilgisayar bilimleri, yapay zeka ve teknoloji içerikli her konu da buna dahil. Yeter ki isteyin, korkmayın ve çalışın. Genç kadınlarımızın katılımına bu alanlarda çok ihtiyacımız var.
I am very excited to co-teach NLP (CS 4740) with my great advisor
@clairecardie
this semester. This class has a special place in my heart since it was the first NLP class I have ever taken and it convinced me to do research in this area.
#NLProc
#CornellCIS
#Cornell
My first collaboration at
@AnthropicAI
🎉 I led the experiments on summarization, topic modeling and political bias. It’s cool to see that long context model opens up great possibilities for long-form summarization.
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
paper page:
Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying…
Excited to share that our paper “Spurious Correlations in Reference-free Evaluation of Text Generation” will appear in
#acl2022nlp
Main Conf. We find that recently proposed metrics in summarization and dialog generation may be exploiting spurious correlations in the benchmarks.
Even after I got accepted, I still had to borrow money to pay for the initial costs (i.e. flight tickets, one month deposit for the apartment) since you don’t get the first paycheck until after you start. A lot of these are prohibitive costs for international students.
@kisacakimdir
Çok teşekkür ederim sorularınız ve mesajlarınız için. Koç’ta not ortalaması 4 üzerinden ve bir çok derste A+ opsiyonu yok. Fakat olan derslerde A+ alıp 4 ün üzerine çıkmak mümkün. Biraz kafa karıştırıcı bir durum olduğunun farkındayım ama bu şekilde. :)
Today, we're announcing Claude 3, our next generation of AI models.
The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Honored that our paper got the Social Impact Award at
#acl2023nlp
. With AI's growing influence, it feels particularly meaningful that our work was recognized in this category.
@aclmeeting
#NLProc
Social Impact Award
📢S7: Ethics & NLP (Poster)
📌Marked Personas: Using Natural Language Prompts to Measure Stereotypes in LMs
🔎 Portrayals by GPT-3.5/4GPT-4 contain higher rates of racial stereotypes than human-written ones
🔗
🧵(3/4)
New Anthropic research paper: Many-shot jailbreaking.
We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers.
Read our blog post and the paper here:
Excited about this upcoming work proposing a multilingual benchmark dataset for abstractive summarization. We hope this will encourage further research in languages other than English! It is a very unique resource as it has parallel article-summary pairs in 17 languages.
#NLProc
Our EMNLP Findings paper presenting WikiLingua — a new multilingual abstractive summarization dataset — is now on arXiv.
It contains 770K article/summary pairs in 17 languages, parallel with English.
Paper:
Dataset:
#NLProc
@WilliamWangNLP
Yes, especially international students may not have the opportunity to publish papers during their undergrads. Not a very inclusive metric.
"When I'm sometimes asked 'When will there be enough [women on the Supreme Court]?' and I say 'When there are nine,' people are shocked. But there'd been nine men, and nobody's ever raised a question about that." (RBG).
Checkout our new ACL paper introducing Marked Personas, an unsupervised way to measure stereotypes in AI models for any intersectional identity. With our method, we prove that personas from GPT-4 and GPT-3.5 are more stereotypical than human-written ones!
#ACL2023
New paper (to appear at ACL 2023)! We present Marked Personas, an unsupervised way to measure stereotypes in LLMs for any intersectional identity.
Paper:
Joint work with the wonderful
@esindurmusnlp
@jurafsky
@stanfordnlp
🧵1/6
I am also involved in GEM (Generation, Evaluation and Metrics) workshop. It will focus on an in-depth evaluation of generation models across both human and automatic evaluation. Highly recommend voting for it!
#NLProc
A whopping 97
#nlproc
workshops to select from for next year!
There are two I am involved in:
• GEM (a new WMT-style evaluation for natural language generation)
• WNUT (NLP for social media and other noisy user-generated text)
Happy to share that this work is going to appear at
#acl2022nlp
Main Conf. 🎉 Check out the updated paper if you are interested in faithfulness in abstractive summarization ⬇️⬇️⬇️
#NLProc
How can we include public input in AI development? 🤔 In our new work, we collectively crowdsourced a constitution from Americans with
@collect_intel
on
@usepolis
. Check out our blog to learn more about our process and the challenges we faced.
#LLMs
What does it mean for AI development to be more democratic? To find out, we partnered with
@collect_intel
to use
@usepolis
to curate an AI constitution based on the opinions of ~1000 Americans. Then we trained a model against it using Constitutional AI.
Excited to leverage your technical skills to shape policy around LLMs? 📜 Want to help build language models that benefit society through sociotechnical alignment and evaluations? 📊 Societal impacts team at
#AnthropicAI
is also hiring:
Want to work at the frontier of AI policy with the most technical policy team in the business? You do? Excellent. Please consider applying
- Special Projects Lead
- Policy Analyst, Product
- Outreach Lead
A bad translation fail! Particularly important to get such translations right given how poorly women are represented in tech.
I don't know Turkish, but this shouldn't be hard (Maybe
@esindurmusnlp
,
@volkancirik
can correct me if I am wrong about the Turkish part).
There are lots of great mentoring sessions at
#acl2020nlp
but feel free to DM on RocketChat (username: esin) if you think I can be helpful with any questions regarding PhD applications, surviving grad school, finding internships or any NLP related stuff.
We will be taking questions for our work FEQA today 1-2pm EST and 5-6 pm EST. I have done part of this work while interning at
@AmazonScience
with great collaborators
@hhexiy
and Mona Diab. Join us if you want to chat about evaluating faithfulness in summarization.
#acl2020nlp
I was very fortunate to have had an amazing advisor in
@clairecardie
. She supported me through everything. Also, I had amazing mentors along the way in Mona Diab and
@hhexiy
. Without their support this would not have been possible.
Happy to have contributed to this extensive project of
@StanfordHAI
led by
@RishiBommasani
and
@percyliang
. In the inequity and fairness section, we discuss intrinsic vs. extrinsic harms of foundational models, sources of these harms, and potential interventions
#foundationmodels
I want to thank each of my 113 co-authors for their incredible work - I learned so much from all of you,
@StanfordHAI
for providing the rich interdisciplinary environment that made this possible, and everyone who took the time to read this and give valuable feedback!
@emnlp2020
I am afraid this decision may affect the diversity of the topics in the main conference since it incentivizes people to work on “trendy” topics. Also, terms such as “narrow subfield”, “trendy”, “high impact” only add more subjectivity to the review process...
@emnlp2020
Students who are interested in more specific topics may now feel pressure to switch to more trendy topics because in the end when it comes to job search people will give more weight to main conference papers.
Check out our latest preprint where we show that text-to-image models such as Stable diffusion and DALLE amplify dangerous and complex demographic stereotypes. ⬇️
#AI
#nlproc
Text-to-image generation models (like Stable Diffusion and DALLE) are being used to generate millions of images a day.
We show that these models perpetuate and amplify dangerous stereotypes related to race, gender, crime, poverty, and more ()
A thread🧵
🚨 New policy brief: Millions of images are generated each day using text-to-image AI systems. Our latest brief examines how major image generation models encode a wide range of dangerous biases about demographic groups. Read or download here:
You can now use Claude 3 in Google sheets.
It lets you create prompt templates and fill it in with your custom data from the sheet.
I'll show you how to set it up and what you can do with it:
At this week's NLP Seminar we are delighted to have Mona Diab from George Washington University! The seminar will be Thursday, 11 am to 12 pm PT. Mona will be talking about Systems and Labeling for Arabic Hate Speech Detection. Registration:
We look at reference-free metrics for dialog generation and text summarization. In most cases, we find that spurious correlates such as perplexity and word overlap can get similar correlation with human scores as the metrics.
@whynotyet
@OlgaOvi
@BakerEDMLab
Interesting work! We ran similar experiments in this paper and found that LLMs tend to reflect the opinions of some of the Western countries more closely.
We finally propose an adversarially trained metric for faithfulness that has much lower correlation with extractiveness. We show that this metric is more robust and achieves better system-level ranking performance.
Two lessons we learned through HELM (Sec 8.5.1; ): 1. CNN/DM and XSum reference summaries are worse than summaries generated by finetuned LMs and zero-/few-shot large LMs. 2. Instruction tuning, not scale, is the key to “zero-shot” summarization.
We present a framework for evaluating the effective faithfulness of summarization systems, by generating a faithfulness- abstractiveness trade-off curve that serves as a control at different operating points on the abstractiveness spectrum. 3/n
@barbara_plank
@perezjotaeme
@IAugenstein
@ojahnn
@dipanjand
@sebgehr
Thanks! I agree that it should not be at the same time as the tutorials. Tutorials are super useful and unfortunately this affects attendance of WiNLP. When I attended WiNLP workshops, I always wished that there were more people attending from non minority groups as well.
@emnlp2020
Also it seems like most of the analysis papers (which I find extremely important) could fall into “findings” category given the criteria, since their main goal is not necessarily to provide methods that are thought to be sufficiently novel.
We further learn a selector to identify the most faithful and abstractive summary for a given document, and show that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. 5/n
We further do a system-level analysis for faithfulness in abstractive summarization. We show that faithfulness metrics are not effective in ranking relatively abstractive, faithful systems (current SOTA) potentially due to over reliance to the spurious correlates.
While prior work has proposed models that improve faithfulness, it is unclear whether the improvement comes from an increased level of extractiveness of the model outputs. 2/n
We show that the MLE baseline as well as a recently proposed method for improving faithfulness (loss truncation) are both worse than the control at the same level of abstractiveness. 4/n
@XandaSchofield
@clairecardie
Thank you! It would be great to get your suggestions on how to have a good connection with the students especially in the current setting. You will do great as always ♥️
New Anthropic research: Measuring Model Persuasiveness
We developed a way to test how persuasive language models (LMs) are, and analyzed how persuasiveness scales across different versions of Claude.
Read our blog post here:
I'm recruiting Ph.D. students interested in "Implicit Machine Cognition" to study AI Ethics and Bias & Human-AI interaction in ML, NLP, Computer Vision, and Speech. Apply to
@UW_iSchool
&
@uwcse
and join our
@uwnlp
&
@uwdub
communities. Please share!