Esin Durmus Profile Banner
Esin Durmus Profile
Esin Durmus

@esindurmusnlp

2,925
Followers
387
Following
2
Media
257
Statuses

Research Scientist @anthropicai . Previously Postdoc @stanfordnlp and PhD @cornellcis . Working on LLMs & evaluating their safety and impact on society. she/her.

Joined January 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@esindurmusnlp
Esin Durmus
1 month
Our latest study measures how persuasive language models like Claude are compared to humans. We find a general scaling trend: newer models tend to be more persuasive, with Claude 3 Opus generating arguments that don't differ statistically from human-written ones.
@AnthropicAI
Anthropic
1 month
New Anthropic research: Measuring Model Persuasiveness We developed a way to test how persuasive language models (LMs) are, and analyzed how persuasiveness scales across different versions of Claude. Read our blog post here:
Tweet media one
56
118
712
5
15
95
@esindurmusnlp
Esin Durmus
4 years
It’s great that schools are waiving the GRE requirement. But I remember the main cost that prevented me from applying to more schools was the application fees (~100USD per school). Given the exchange rate, it was a big challenge. I spent all my savings for PhD applications.
19
34
541
@esindurmusnlp
Esin Durmus
4 years
Personal news: Excited to share that after completing my PhD towards the end of Spring 2021, I will be joining @stanfordnlp for a Postdoc. I am extremely fortunate to be co-hosted by @jurafsky , @tatsu_hashimoto and @chrmanning . Looking forward to be part of this amazing group!
18
7
428
@esindurmusnlp
Esin Durmus
7 months
Join our team! 🙌 The societal impacts team at @AnthropicAI is hiring. We design new methods to assess language models for societally or policy-relevant traits. If you feel passionate about this direction, apply to join us! (please retweet)
16
64
278
@esindurmusnlp
Esin Durmus
1 year
After two amazing postdoc years at @StanfordNLP , thrilled to join @AnthropicAI 's Societal Impacts team as a Research Scientist! Grateful for everyone who helped me along the way. Feeling lucky to work with this great group on the crucial mission of building safer AI.
9
9
240
@esindurmusnlp
Esin Durmus
11 months
Language models are widely used but whose views do they reflect? My new paper examines how to test global opinions represented by language models.
@AnthropicAI
Anthropic
11 months
We develop a method to test global opinions represented in language models. We find the opinions represented by the models are most similar to those of the participants in USA, Canada, and some European countries. We also show the responses are steerable in separate experiments.
102
173
786
4
36
179
@esindurmusnlp
Esin Durmus
4 years
@kisacakimdir Çok teşekkür ederim sayfanızda bana yer verdiğiniz için :)
6
0
122
@esindurmusnlp
Esin Durmus
4 years
@kisacakimdir Özellikle genç kadınlarımıza söylemek isterim ki başarıları olamayacağınız hiçbir alan yok. Bilgisayar bilimleri, yapay zeka ve teknoloji içerikli her konu da buna dahil. Yeter ki isteyin, korkmayın ve çalışın. Genç kadınlarımızın katılımına bu alanlarda çok ihtiyacımız var.
3
5
111
@esindurmusnlp
Esin Durmus
4 years
I am very excited to co-teach NLP (CS 4740) with my great advisor @clairecardie this semester. This class has a special place in my heart since it was the first NLP class I have ever taken and it convinced me to do research in this area. #NLProc #CornellCIS #Cornell
4
4
111
@esindurmusnlp
Esin Durmus
11 months
My first collaboration at @AnthropicAI 🎉 I led the experiments on summarization, topic modeling and political bias. It’s cool to see that long context model opens up great possibilities for long-form summarization.
@_akhaliq
AK
11 months
Opportunities and Risks of LLMs for Scalable Deliberation with Polis paper page: Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying…
Tweet media one
0
22
78
2
19
99
@esindurmusnlp
Esin Durmus
2 years
That moment when you realize NAACL regular early registration price > 2 times Turkish minimum wage… #NAACL2022 #nlproc
2
10
92
@esindurmusnlp
Esin Durmus
2 years
Excited to share that our paper “Spurious Correlations in Reference-free Evaluation of Text Generation” will appear in #acl2022nlp Main Conf. We find that recently proposed metrics in summarization and dialog generation may be exploiting spurious correlations in the benchmarks.
4
10
84
@esindurmusnlp
Esin Durmus
4 years
Proud moment when the student you advised gets her first main conference paper @emnlp2020 . Congrats @JialuLi96 ! You will do great at @uncnlp !
3
2
83
@esindurmusnlp
Esin Durmus
4 years
Even after I got accepted, I still had to borrow money to pay for the initial costs (i.e. flight tickets, one month deposit for the apartment) since you don’t get the first paycheck until after you start. A lot of these are prohibitive costs for international students.
1
1
80
@esindurmusnlp
Esin Durmus
4 years
@kisacakimdir Çok teşekkür ederim sorularınız ve mesajlarınız için. Koç’ta not ortalaması 4 üzerinden ve bir çok derste A+ opsiyonu yok. Fakat olan derslerde A+ alıp 4 ün üzerine çıkmak mümkün. Biraz kafa karıştırıcı bir durum olduğunun farkındayım ama bu şekilde. :)
1
3
74
@esindurmusnlp
Esin Durmus
4 years
@kisacakimdir Bazı okullarda sınıftaki en başarılı 1-2 öğrenciye A+ veriliyor. O yüzden not ortalaması 4 üzerine çıkabiliyor.
1
0
65
@esindurmusnlp
Esin Durmus
2 months
Excited to share what we’ve built!
@AnthropicAI
Anthropic
2 months
Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Tweet media one
568
2K
10K
1
5
62
@esindurmusnlp
Esin Durmus
3 years
Checkout our new paper: “Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization”. #NLProc 1/n
3
13
61
@esindurmusnlp
Esin Durmus
10 months
Honored that our paper got the Social Impact Award at #acl2023nlp . With AI's growing influence, it feels particularly meaningful that our work was recognized in this category. @aclmeeting #NLProc
@aclmeeting
ACL 2024
10 months
Social Impact Award 📢S7: Ethics & NLP (Poster) 📌Marked Personas: Using Natural Language Prompts to Measure Stereotypes in LMs 🔎 Portrayals by GPT-3.5/4GPT-4 contain higher rates of racial stereotypes than human-written ones 🔗 🧵(3/4)
1
6
18
5
7
57
@esindurmusnlp
Esin Durmus
1 month
Excited to share our new research on a long-context jailbreaking technique that works across a wide range of large language models (w/ @cem__anil ) .
@AnthropicAI
Anthropic
1 month
New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here:
Tweet media one
83
350
2K
1
5
52
@esindurmusnlp
Esin Durmus
4 years
Excited about this upcoming work proposing a multilingual benchmark dataset for abstractive summarization. We hope this will encourage further research in languages other than English! It is a very unique resource as it has parallel article-summary pairs in 17 languages. #NLProc
@faisalladhak
Faisal Ladhak
4 years
Our EMNLP Findings paper presenting WikiLingua — a new multilingual abstractive summarization dataset — is now on arXiv. It contains 770K article/summary pairs in 17 languages, parallel with English. Paper: Dataset: #NLProc
1
11
84
0
4
40
@esindurmusnlp
Esin Durmus
1 year
@WilliamWangNLP Yes, especially international students may not have the opportunity to publish papers during their undergrads. Not a very inclusive metric.
0
0
36
@esindurmusnlp
Esin Durmus
4 years
"When I'm sometimes asked 'When will there be enough [women on the Supreme Court]?' and I say 'When there are nine,' people are shocked. But there'd been nine men, and nobody's ever raised a question about that." (RBG).
0
0
28
@esindurmusnlp
Esin Durmus
1 year
Checkout our new ACL paper introducing Marked Personas, an unsupervised way to measure stereotypes in AI models for any intersectional identity. With our method, we prove that personas from GPT-4 and GPT-3.5 are more stereotypical than human-written ones! #ACL2023
@chengmyra1
Myra Cheng
1 year
New paper (to appear at ACL 2023)! We present Marked Personas, an unsupervised way to measure stereotypes in LLMs for any intersectional identity. Paper: Joint work with the wonderful @esindurmusnlp @jurafsky @stanfordnlp 🧵1/6
7
25
137
2
5
28
@esindurmusnlp
Esin Durmus
4 years
I am also involved in GEM (Generation, Evaluation and Metrics) workshop. It will focus on an in-depth evaluation of generation models across both human and automatic evaluation. Highly recommend voting for it! #NLProc
@cocoweixu
Wei Xu
4 years
A whopping 97 #nlproc workshops to select from for next year! There are two I am involved in: • GEM (a new WMT-style evaluation for natural language generation) • WNUT (NLP for social media and other noisy user-generated text)
0
4
16
0
6
28
@esindurmusnlp
Esin Durmus
10 months
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
0
27
@esindurmusnlp
Esin Durmus
2 years
Happy to share that this work is going to appear at #acl2022nlp Main Conf. 🎉 Check out the updated paper if you are interested in faithfulness in abstractive summarization ⬇️⬇️⬇️ #NLProc
@esindurmusnlp
Esin Durmus
3 years
Checkout our new paper: “Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization”. #NLProc 1/n
3
13
61
0
3
27
@esindurmusnlp
Esin Durmus
7 months
How can we include public input in AI development? 🤔 In our new work, we collectively crowdsourced a constitution from Americans with @collect_intel on @usepolis . Check out our blog to learn more about our process and the challenges we faced. #LLMs
@AnthropicAI
Anthropic
7 months
What does it mean for AI development to be more democratic? To find out, we partnered with @collect_intel to use @usepolis to curate an AI constitution based on the opinions of ~1000 Americans. Then we trained a model against it using Constitutional AI.
Tweet media one
21
71
395
1
5
25
@esindurmusnlp
Esin Durmus
2 months
Excited to leverage your technical skills to shape policy around LLMs? 📜 Want to help build language models that benefit society through sociotechnical alignment and evaluations? 📊 Societal impacts team at #AnthropicAI is also hiring:
@jackclarkSF
Jack Clark
2 months
Want to work at the frontier of AI policy with the most technical policy team in the business? You do? Excellent. Please consider applying - Special Projects Lead - Policy Analyst, Product - Outreach Lead
4
26
110
0
6
24
@esindurmusnlp
Esin Durmus
2 years
I will be attending in person, let’s grab a coffee ☕️
@WiNLPWorkshop
WiNLP
2 years
Tomorrow (10th) at #NAACL2022 we will host a virtual mentorship session from 12-1 PM PDT. Introducing our super-mentors: @faeze_brh (Postdoc @ AI2), @esindurmusnlp (Postdoc @ Stanford), @VioletNPeng (Asst. Prof. @ UCLA), & @Tetreault_NLP (Sr. Director @ Dataminr).
2
6
25
0
0
22
@esindurmusnlp
Esin Durmus
4 years
Yet another disastrous and biased English-Turkish translation. I am sorry that it is really difficult to follow my Turkish tweets :D
@danish037
Danish Pruthi
4 years
A bad translation fail! Particularly important to get such translations right given how poorly women are represented in tech. I don't know Turkish, but this shouldn't be hard (Maybe @esindurmusnlp , @volkancirik can correct me if I am wrong about the Turkish part).
Tweet media one
1
0
9
2
0
21
@esindurmusnlp
Esin Durmus
4 years
There are lots of great mentoring sessions at #acl2020nlp but feel free to DM on RocketChat (username: esin) if you think I can be helpful with any questions regarding PhD applications, surviving grad school, finding internships or any NLP related stuff.
0
1
21
@esindurmusnlp
Esin Durmus
4 years
So, what are doing with your “EMNLP Findings” acceptances? #emnlp2020 #NLProc #emnlp
Resubmit to another conf
110
Publish in findings
227
1
8
19
@esindurmusnlp
Esin Durmus
4 years
We will be taking questions for our work FEQA today 1-2pm EST and 5-6 pm EST. I have done part of this work while interning at @AmazonScience with great collaborators @hhexiy and Mona Diab. Join us if you want to chat about evaluating faithfulness in summarization. #acl2020nlp
0
4
19
@esindurmusnlp
Esin Durmus
10 months
Looking forward to attending #ICML2023 in Hawaii this year. If you want to meet up to discuss evaluating LLMs and their societal impact, let me know!
0
0
19
@esindurmusnlp
Esin Durmus
4 years
I was very fortunate to have had an amazing advisor in @clairecardie . She supported me through everything. Also, I had amazing mentors along the way in Mona Diab and @hhexiy . Without their support this would not have been possible.
0
1
18
@esindurmusnlp
Esin Durmus
3 years
Happy to have contributed to this extensive project of @StanfordHAI led by @RishiBommasani and @percyliang . In the inequity and fairness section, we discuss intrinsic vs. extrinsic harms of foundational models, sources of these harms, and potential interventions #foundationmodels
@percyliang
Percy Liang
3 years
I want to thank each of my 113 co-authors for their incredible work - I learned so much from all of you, @StanfordHAI for providing the rich interdisciplinary environment that made this possible, and everyone who took the time to read this and give valuable feedback!
3
30
265
2
2
17
@esindurmusnlp
Esin Durmus
4 years
@emnlp2020 I am afraid this decision may affect the diversity of the topics in the main conference since it incentivizes people to work on “trendy” topics. Also, terms such as “narrow subfield”, “trendy”, “high impact” only add more subjectivity to the review process...
2
2
17
@esindurmusnlp
Esin Durmus
4 years
@emnlp2020 Students who are interested in more specific topics may now feel pressure to switch to more trendy topics because in the end when it comes to job search people will give more weight to main conference papers.
0
2
15
@esindurmusnlp
Esin Durmus
2 years
Check out our latest preprint where we show that text-to-image models such as Stable diffusion and DALLE amplify dangerous and complex demographic stereotypes. ⬇️ #AI #nlproc
@federicobianchy
Federico Bianchi
2 years
Text-to-image generation models (like Stable Diffusion and DALLE) are being used to generate millions of images a day. We show that these models perpetuate and amplify dangerous stereotypes related to race, gender, crime, poverty, and more () A thread🧵
43
460
1K
0
2
10
@esindurmusnlp
Esin Durmus
6 months
New policy brief discussing our work on the biases of text-to-image models ⬇️
@StanfordHAI
Stanford HAI
6 months
🚨 New policy brief: Millions of images are generated each day using text-to-image AI systems. Our latest brief examines how major image generation models encode a wide range of dangerous biases about demographic groups. Read or download here:
Tweet media one
7
22
48
0
2
8
@esindurmusnlp
Esin Durmus
2 months
This is super useful for quickly prototyping and iterating on prompts
@moritzkremb
Moritz Kremb
2 months
You can now use Claude 3 in Google sheets. It lets you create prompt templates and fill it in with your custom data from the sheet. I'll show you how to set it up and what you can do with it:
Tweet media one
24
82
738
0
0
7
@esindurmusnlp
Esin Durmus
1 year
These stereotypes are further perpetuated when using these systems for creative generation, like story generation.
1
0
6
@esindurmusnlp
Esin Durmus
2 years
Super excited to have Mona at @stanfordnlp this week! ⭐️
@stanfordnlp
Stanford NLP Group
2 years
At this week's NLP Seminar we are delighted to have Mona Diab from George Washington University! The seminar will be Thursday, 11 am to 12 pm PT. Mona will be talking about Systems and Labeling for Arabic Hate Speech Detection. Registration:
Tweet media one
1
8
33
0
0
6
@esindurmusnlp
Esin Durmus
2 years
We look at reference-free metrics for dialog generation and text summarization. In most cases, we find that spurious correlates such as perplexity and word overlap can get similar correlation with human scores as the metrics.
1
0
6
@esindurmusnlp
Esin Durmus
3 years
With great collaborators: @faisalladhak , @hhexiy , Kathleen Mckeown and Claire Cardie.
0
0
6
@esindurmusnlp
Esin Durmus
4 years
@nouhadziri Future work 😀
Tweet media one
1
0
6
@esindurmusnlp
Esin Durmus
6 months
@whynotyet @OlgaOvi @BakerEDMLab Interesting work! We ran similar experiments in this paper and found that LLMs tend to reflect the opinions of some of the Western countries more closely.
1
1
6
@esindurmusnlp
Esin Durmus
4 years
She has been studying the relationship between the discourse structure of the arguments and their persuasiveness. The preprint is coming soon! :)
0
0
5
@esindurmusnlp
Esin Durmus
2 years
We finally propose an adversarially trained metric for faithfulness that has much lower correlation with extractiveness. We show that this metric is more robust and achieves better system-level ranking performance.
1
0
5
@esindurmusnlp
Esin Durmus
1 year
It was fun to contribute to HELM summarization benchmarking efforts. See 👇 for lots of cool findings.
@Tianyi_Zh
Tianyi Zhang
1 year
Two lessons we learned through HELM (Sec 8.5.1; ): 1. CNN/DM and XSum reference summaries are worse than summaries generated by finetuned LMs and zero-/few-shot large LMs. 2. Instruction tuning, not scale, is the key to “zero-shot” summarization.
3
27
110
0
1
5
@esindurmusnlp
Esin Durmus
3 years
We present a framework for evaluating the effective faithfulness of summarization systems, by generating a faithfulness- abstractiveness trade-off curve that serves as a control at different operating points on the abstractiveness spectrum. 3/n
1
0
5
@esindurmusnlp
Esin Durmus
4 years
@barbara_plank @perezjotaeme @IAugenstein @ojahnn @dipanjand @sebgehr Thanks! I agree that it should not be at the same time as the tutorials. Tutorials are super useful and unfortunately this affects attendance of WiNLP. When I attended WiNLP workshops, I always wished that there were more people attending from non minority groups as well.
0
0
5
@esindurmusnlp
Esin Durmus
3 years
@StanfordHAI @RishiBommasani @percyliang Join the discussion around this topic at our Workshop on Foundation Models, Aug. 23-24.
0
1
4
@esindurmusnlp
Esin Durmus
4 years
@emnlp2020 Also it seems like most of the analysis papers (which I find extremely important) could fall into “findings” category given the criteria, since their main goal is not necessarily to provide methods that are thought to be sufficiently novel.
0
0
4
@esindurmusnlp
Esin Durmus
4 years
@danish037 @volkancirik Hahaha, omg! This is a disaster. Here is yet another example of our greatly biased NLP systems :D
0
0
4
@esindurmusnlp
Esin Durmus
1 year
LLM generated outputs seem positive in sentiment, however, they contain words tied to historical legacies of harm
1
0
4
@esindurmusnlp
Esin Durmus
2 years
0
0
4
@esindurmusnlp
Esin Durmus
4 years
@seymakabaoglu Sağol canım! Sen de öyle ♥️
0
0
4
@esindurmusnlp
Esin Durmus
3 years
We further learn a selector to identify the most faithful and abstractive summary for a given document, and show that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. 5/n
1
0
4
@esindurmusnlp
Esin Durmus
29 days
@anko_979 @M1ndPrison @CCFarre @AnthropicAI Participants were not informed about the source of the arguments.
0
0
3
@esindurmusnlp
Esin Durmus
2 years
We further do a system-level analysis for faithfulness in abstractive summarization. We show that faithfulness metrics are not effective in ranking relatively abstractive, faithful systems (current SOTA) potentially due to over reliance to the spurious correlates.
1
0
3
@esindurmusnlp
Esin Durmus
3 years
While prior work has proposed models that improve faithfulness, it is unclear whether the improvement comes from an increased level of extractiveness of the model outputs. 2/n
1
0
3
@esindurmusnlp
Esin Durmus
3 years
We show that the MLE baseline as well as a recently proposed method for improving faithfulness (loss truncation) are both worse than the control at the same level of abstractiveness. 4/n
1
0
3
@esindurmusnlp
Esin Durmus
2 years
We further show that the proposed metrics have a higher correlation with these spurious measures than with human judgements.
1
0
3
@esindurmusnlp
Esin Durmus
1 year
Congrats to amazing @chengmyra1 !!
0
0
2
@esindurmusnlp
Esin Durmus
3 years
@nouhadziri Amazing 🥰
0
0
2
@esindurmusnlp
Esin Durmus
1 month
@manoelribeiro ofc, i really liked your work :)
0
0
2
@esindurmusnlp
Esin Durmus
4 years
0
0
2
@esindurmusnlp
Esin Durmus
4 years
@XandaSchofield @clairecardie Thank you! It would be great to get your suggestions on how to have a good connection with the students especially in the current setting. You will do great as always ♥️
1
0
2
@esindurmusnlp
Esin Durmus
10 months
0
1
2
@esindurmusnlp
Esin Durmus
4 years
0
0
2
@esindurmusnlp
Esin Durmus
1 year
Paper link:
2
0
2
@esindurmusnlp
Esin Durmus
1 month
@ClementDelangue @huggingface @AnthropicAI Thanks for the shoutout, @ClementDelangue ! We discuss the results and share more information about our research here:
@AnthropicAI
Anthropic
1 month
New Anthropic research: Measuring Model Persuasiveness We developed a way to test how persuasive language models (LMs) are, and analyzed how persuasiveness scales across different versions of Claude. Read our blog post here:
Tweet media one
56
118
712
0
1
2
@esindurmusnlp
Esin Durmus
7 months
@karinanguyen_ You deserve it and more @karinanguyen_ ❤️
0
0
2
@esindurmusnlp
Esin Durmus
3 years
@mervenoyann @huggingface Congrats!! 🎉 best of luck!
1
0
2
@esindurmusnlp
Esin Durmus
3 months
1
0
2
@esindurmusnlp
Esin Durmus
4 years
@asayeed @Tuhin66978276 but I think citation count itself is more correlated with how trendy the topic is rather than the quality of the work.
1
0
1
@esindurmusnlp
Esin Durmus
2 years
Needless to say, @aylin_cim and her work is amazing!! Go work with her!! ✨
@aylin_cim
Aylin Kamelia Caliskan
2 years
I'm recruiting Ph.D. students interested in "Implicit Machine Cognition" to study AI Ethics and Bias & Human-AI interaction in ML, NLP, Computer Vision, and Speech. Apply to @UW_iSchool & @uwcse and join our @uwnlp & @uwdub communities. Please share!
12
92
316
0
0
1
@esindurmusnlp
Esin Durmus
3 years
0
0
1
@esindurmusnlp
Esin Durmus
4 years
*are you doing 😬
1
0
1
@esindurmusnlp
Esin Durmus
1 year
@bkhmsi Good luck!! :)
0
0
1
@esindurmusnlp
Esin Durmus
4 years
@bastings_nlp Congratulations 🎉 😊
0
0
1
@esindurmusnlp
Esin Durmus
11 months
@MonaDiab77 Thanks Mona ❤️❤️
0
0
1
@esindurmusnlp
Esin Durmus
4 years
1
0
1
@esindurmusnlp
Esin Durmus
2 years
@nouhadziri @allen_ai @YejinChoinka Congratulations ❤️ will definitely visit you in Seattle ☺️
1
0
1
@esindurmusnlp
Esin Durmus
4 years
@JialuLi96 Of course!! Thanks for your hard work 😊
0
0
1
@esindurmusnlp
Esin Durmus
4 years
@dipanjand @CNNTravel @manaalfar Wow. How did I miss these 😀
0
0
1
@esindurmusnlp
Esin Durmus
4 years
0
0
1
@esindurmusnlp
Esin Durmus
4 years
0
0
1
@esindurmusnlp
Esin Durmus
11 months
@tanyaagoyal @UTCompSci @gregd_nlp @jessyjli Congrats! enjoy Gimme coffee at Gates ☕️
1
0
1
@esindurmusnlp
Esin Durmus
4 years
@TuhinChakr Thank you so much!
0
0
1
@esindurmusnlp
Esin Durmus
3 months
0
0
1