Working on or interested on agents based on Large Language Models like GPT-4?
Follow
@pyautogen
to get latest news and use cases of our rapidly expanding multi-agent framework
#AutoGen
!
New work on explainable AI!
w/
@tongshuangwu
, J. Zhu, R. Fok,
@besanushi
,
@ecekamar
, M. Ribeiro, and
@dsweld
.
When AIs advise people, does an explanation of its reasoning actually help the person? Does it let the human outperform the AI? Does it ...(1/6)
About 60 researchers brainstorming how to define, measure, and shape trust and reliance in human-AI interaction at the
#TRAIT2022
hybrid workshop at
#CHI2022
@sigchi
Excited to share a draft of our new work on human-centered AI!
w/
@besanushi
@ecekamar
@erichorvitz
@dsweld
When an AI assists human decision-makers, e.g, by recommending its predictions, is the most accurate AI necessarily the best team-mate? (1/5)
Agents w/ APIs/tools are cool and popular now but throwback to one of my favorite classic papers from 1994 on agents by
@etzioni
and
@dsweld
The first time I read and obsessed about it was when I was an undergraduate and saw it as a reference in Russel and Norvig!
Do you work on AI/ML + HCI?
We invite submissions for our new journal--- Special Issue on "AI for (and by) the People"
Wide range of human + AI topics ✅
Open-source ✅
Virtual workshop post publication ✅
cc:
@alison_m_smith
,
@gonzaloworks
Working on interesting problems around humans &
#ai
? The "AI for (and by) the People" journal special issue explores the opportunities and challenges of designing and developing AI/ML for people. Deadline: Sep 15.
#hcai
#hcml
#cfp
@bansalg_
@gonzaloworks
I am super excited to see how this will power more research on not just multi agents workflows but entire new subtopics in human-AI interaction!
Checkout our new paper below ⬇️
Imagine if ✨multiple✨ ChatGPT agents could collaborate to solve complex tasks for you! 🧑🦱🤝🤖🤖🤖
📢 AutoGen: A new framework for building multi-agent LLM applications
It allows creating many agents that converse to solve complex tasks! ...
1/4
My academic Twitter colleagues, I need a tiny favor!
If you've ever made a paper co-authored by me a required reading for a course, can you please DM me the details (esp. the course # and year)?
It would help me with my US immigration app!
(Angy cat from last yr as clickbait)
**Multiple** internship opportunities to work with the
#HAX
Team
@MSFTResearch
in 2022! If you're interested in
#ResponsibleAI
,
#AI
#UX
, and tools for creating these, apply here:
Learn more about our team here:
Such wonderful reception of our work on understanding programmer-CoPilot interaction. Has implications for understanding human-LLM interaction in general.
cc:
@HsseinMzannar
@adamfourney
@erichorvitz
"Our studies revealed that when solving a coding task with Copilot, programmers
may spend a large fraction of total session time (34.3%) on just double-checking and editing suggestions, and spend *more than half* of the task time on Copilot related activities, together
indicating
As Copilot becomes more popular, we need to understand how programmers interact with it. We built a model of interaction between Copilot and Programmers named 'CUPS' and predict programmer behavior in our latest paper
We received an overwhelming number of submissions on trust and reliance for human-AI interaction at the TRAIT workshop @
#CHI2022
Even if you are not attending the workshop on 30th April, checkout the accepted papers below!
Come learn more about agents and AutoGen at the Microsoft booth at
#NeurIPS2023
!
@Chi_Wang_
,
@qingyun_wu
, and I will be there on Monday 12/11, between 9 am-noon CST and 3:30-4:00 pm CST.
Location: Booth 1003 - Next to entrance Hall D
cc:
@pyautogen
,
@MSFTResearch
✨How to get multiple Open AI Assistants (
#GPTs
) *and*
#AutoGen
Agents all working together to solve tasks?✨
To learn, see video by AI Jason
This is a good example of why supporting cross-platform agents will become increasingly important!
New research on improving human-AI interaction! 🌟
LLMs like
#CoPilot
can be amazing! But they can also suggest erroneous code & verifying their suggestions takes effort.
We show that communicating uncertainty reduces these costs! BUT the notion of uncertainty also matters.
1/4
New compelling evidence for developing explainable AI!
Our user-studies on open-domain QA show that explanations help end-users and outperform calibrated confidence (strong, unbeaten baseline) by a significant margin! That too whilst achieving "complementary" performance!! (1/2)
At Microsoft, we’re expanding AI capabilities by training small language models to achieve the kind of enhanced reasoning and comprehension typically found only in much larger models.
I 100% agree -- its important to understand when explanations help users, for which tasks, and along what metrics.
E.g., it’s important to understand when explanations do and don’t lead to appropriate reliance...
[1/2]
Just because one user study showed that explanations produced by a method were not helpful for N homogenous users in a particular context, this does not imply that the method in question has no utility in any other setting. It is important to appreciate this nuance [4/N]
"The striking difference was that developers who used GitHub Copilot completed the task significantly faster–55% faster than the developers who didn’t use GitHub Copilot."
Very promising, real-world results for human-AI interaction!
Equal parts hard work and exciting work, I'm very glad to be sharing these results!
#GitHubCopilot
has had such strong impact on developers on many levels, it's a privilege to have front row seats to how we understand and measure that. More to come!
Seattle flu gave me insomnia so I thought I'd create an example of
#AutoGen
feature that I find useful for creating end applications.
Here I wanted the agents to find recent GitHub issues on AutoGen's repo and then render a neat markdown table using
@willmcgugan
's Rich library.
We just started the
#TRAIT2022
workshop at
#CHI2022
!
Turns out our keynote speaker, John Lee
@Jdlee888
wrote his keynote's abstract with with AI assistance (blue indicates contributions by a LLM)! And none us knew until he just told us now!! Amazing!!
I had fun visiting an interacting with colleagues at UCSB! Here are papers I discussed:
1. Modeling users:
2. Communicating Uncertainty:
3. Metrics:
Data/code:
Las week, we had the
@ucsbmmi
Summit @ UCSB. It was great to listen to
@bansalg_
from
@MSFTResearch
and understand how Copilot is changing the way we code. Users spend 50% of their time interacting with it and 20% verifying suggestions. Coders are 2x faster!
📢📢More opportunities on our team at
#MicrosoftResearch
! 📢📢
Now hiring for Senior and Principal level Researchers and Software Engineers.
If you want to advance the Frontiers of
#AI
to empower people and AI agents to solve real-world problems, apply below! 👇
Please RT
If you are passionate about human-AI interaction and on the job market this year, we are hiring a full-time researcher🧑🦱+🤖
See the job post for details below.
📢📢We're hiring!📢📢
If you want to shape the future of AI and empower people and AI agents to collaboratively solve real-world problems, apply here:
See also below for more exciting opportunities in
#AI
at
#MSR
with our partner teams. 👇👇
In fact, explanations increased the chance that users will accept its recommendation REGARDLESS of its correctness. Such systems seem deeply unsatisfying and fraught with ethical issues.(5/6)
New blog by
@adamfourney
and
@qingyun_wu
on measurement tools for complex multi agent workflows in
@pyautogen
. AutoGenBench is a command line tool on pypi which handles downloading, configuring, running, and reporting supported benchmarks in AutoGen.➡️
If you work on human-AI interaction and agents, you might find the abstractions introduced in chapter 3 of the new
#AutoGen
tutorial practical and interesting 👇
🚨We just released a new
#AutoGen
tutorial
And with that getting started became even easier! First 5 chapter are already online to help you learn about
- agents that can converse
- termination
- adding humans in-the-loop
- code executors
- multi-agent patterns
Lets us know if
Checkout our new framework for building LLM agents!
#AutoGen
is already open source and growing very rapidly on Github. You can start using it today!
More details soon…
#LLMs
#Microsoft
#AI
👇👇👇
Imagine if ✨multiple✨ ChatGPT agents could collaborate to solve complex tasks for you!
🧑🦰🤝🤖🤖🤖
📢 AutoGen: A new framework for building multi-agent LLM applications
Repo:
Stay tuned for a new AutoGen tech report on 10/5…
#AutoGen
#AI
#LLMs
#ML
Aptly put by
@JessicaHullman
-- "So the relationship more explanation = more [appropriate] trust should not be assumed when trust is mentioned as in the NIST report, just like it shouldn’t be assumed that more expression of uncertainty = more [appropriate] trust."
On NIST principles for explainable AI, and what's similar about these challenges and those in expressing uncertainty in model predictions. I see a lot of parallels despite the big difference in how much hype each gets
Thankful for colleagues at
@MSFTResearch
,
#MicrosoftAether
, and
@uwcse
for their commitment to develop reliable people-facing
#AI
systems!
See announcement and link to a new open-source repository on backwards compatible
#ML
⬇️
I remember finding
@jennwvaughan
's advice really useful when attending conferences! Her point
#8
"One new friend will often lead to many" is still my favorite!
I am already mind blown by the reception of
#AutoGen
by OSS community! But I am also super excited about the numerous human-AI interaction questions that show up when users interact with and use multiple
#LLM
agents for their tasks...
Are radiologists and IM/EM docs more susceptible to incorrect radiology advice when its "from an AI"?
Our new paper "Do as AI Say" highlights the potential danger of human/AI advice anchoring.
Blog post by author
@harini824
!
🚀 Exciting summer internship opportunity for PhD students at
@MSFTResearch
! Dive into innovative projects like
#Orca
and
@pyautogen
offering thrilling research challenges. Ready to be part of groundbreaking AI work? Apply here:
#AutoGen
#LLMs
#AgentEval
Many of you couldn't join us at the
#HCXAI
workshop at
#CHI2021
. We received tons of requests to make the videos available online.
We always want to broaden participation.
This is for you.
🎁
Another great example of how AI systems are brittle and can fail in unexpected ways, and the need for human agency, control, and feedback in people-facing systems.
Finally caught up with exciting papers on explainable AI at
#ICML2020
's Workshop on Human Interpretability
#WHI2020
.
Here are subset of many papers I liked and a TLDR: (1/4)
@peterbhase
@__Owen___
Very nice resource! You may be interested in our studies from last summer that show that NONE of the prior works (except for one very recent study on open-domain QA) has observed "complementary" performance from explanations!
New work on explainable AI!
w/
@tongshuangwu
, J. Zhu, R. Fok,
@besanushi
,
@ecekamar
, M. Ribeiro, and
@dsweld
.
When AIs advise people, does an explanation of its reasoning actually help the person? Does it let the human outperform the AI? Does it ...(1/6)
There's still time to submit to the CHI TRAIT workshop on Trust and Reliance in AI-Assisted Tasks!
We welcome submissions from both researchers (Research Track) and practitioners (Industry Track). Submissions are due next Thursday, Feb 23 (AoE).
CfP:
In our study of fine-grained dog classification (🦮 / 🐩 / 🐕), human-AI teams where humans use heatmaps performed even worse than the AI alone.
Heatmaps often only highlight dog faces regardless of whether AI is correct or wrong.
In search of complementary performance, we conducted new studies where human and AI performance was comparable. While we observed benefits from AI augmentation, they were NOT increased by showing state-of-art explanations. (4/6)
AutoGen's code execution capabilities have gotten an upgrade! You can use a Jupyter kernel to maintain a stateful session for code execution. 🤖🌎🔧
Learn more here:
While our novel Adaptive explanations showed promise, we must develop explanation algorithms and interfaces that lead to complementary performance, e.g., by enabling appropriate reliance, and providing significant value over simple baselines such as showing AI confidence. (6/6)
🚀
@pyautogen
new release is here with gpt-4-vision-preview multimodal models support!
🛠️ Codebase updates for supporting openai-python v1.
📊 New unstructured data support in RAG & async features for get_human_input.
🔧 Fresh tools & improved docs for devs.
#GPT4V
#AI
#AutoGen
This is why we need to keep scrutinizing the fairness of toxic or inappropriate content filters, and always let users circumvent the automatic systems.
New work on explainable AI!
w/
@tongshuangwu
, J. Zhu, R. Fok,
@besanushi
,
@ecekamar
, M. Ribeiro, and
@dsweld
.
When AIs advise people, does an explanation of its reasoning actually help the person? Does it let the human outperform the AI? Does it ...(1/6)
🚨 New beta feature live! Do you skim through papers trying to get a glimpse in a minute? By turning on
#Skimming
in Semantic Reader, you can skim faster with automatically highlighted overlays of the key points. Now available for 9k papers on desktop!
...as we suggested in our call to arms paper
And as we showed in our studies with open-domain QA, where explanations significantly improved appropriate reliance
[2/2]
Prior work on XAI only considers the case when AI by itself was more accurate than the human and the human-AI team. Explanations raised team performance closer to AI but if accuracy were the sole objective, removing people would have performed even better in their settings! (3/6)
We are hiring senior and principal researchers and engineers to work on generative AI technologies including foundation models, small models and learning agent platforms.
Applications at: and
Come learn more about agents and AutoGen at the Microsoft booth at
#NeurIPS2023
!
@Chi_Wang_
,
@qingyun_wu
, and I will be there on Monday 12/11, between 9 am-noon CST and 3:30-4:00 pm CST.
Location: Booth 1003 - Next to entrance Hall D
cc:
@pyautogen
,
@MSFTResearch
We show that approaches maximizing AI accuracy (by using Log-loss) may lead to suboptimal team utility. Instead, we propose and optimize a new loss function based on the team's expected utility. (2/5)
We just released a new version of
#AutoGen
and added compatibility with
#OpenAI
Assistants!
This means you can now make multiple GPTs collaborate to solve complex tasks 🤖🤖🤖
Checkout our new blog post for details:
@pranavrajpurkar
Pranav, you may be interested in our 2021 CHI paper where related concepts of complementary performance and appropriate reliance:
@tongshuangwu
@dsweld
@leavittron
@arimorcos
@MLRetrospective
Especially agree with "under-utilization of user research for human verification." Though I'd say the focus on human subject studies is increasingly rapidly! You may find our recent work relevant :)
New work on explainable AI!
w/
@tongshuangwu
, J. Zhu, R. Fok,
@besanushi
,
@ecekamar
, M. Ribeiro, and
@dsweld
.
When AIs advise people, does an explanation of its reasoning actually help the person? Does it let the human outperform the AI? Does it ...(1/6)
What
@tmiller_unimelb
said! Nice to see more and more researchers and domains ask one of the most important questions in the context of
#XAI
and
#AI
in general.
@laura_rieger_de
@tongshuangwu
@besanushi
@ecekamar
@dsweld
Thank you! We tested w/ non-experts (MTurk), but even w/ experts, deployers should test & ensure explanations don't exacerbate inappropriate reliance or conf. bias.
@ihsgnef
's work shows instances where experts are more immune to bad system suggestions:
@AndrewLBeam
@MarzyehGhassemi
@DrLukeOR
Especially agree with "We should advocate for thorough ..validation of these systems..., showing that patient and health-care outcomes are improved"
Precisely why we argued for carefully measuring effect of explanations on human-AI team performance:
New work on explainable AI!
w/
@tongshuangwu
, J. Zhu, R. Fok,
@besanushi
,
@ecekamar
, M. Ribeiro, and
@dsweld
.
When AIs advise people, does an explanation of its reasoning actually help the person? Does it let the human outperform the AI? Does it ...(1/6)
"Anthropic said its work would be focused on 'large-scale AI models', including making the systems more easy to interpret and 'building ways to more tightly integrate human feedback into the development and deployment of these systems'."
The cup of open
#AI
runneth over. There is the used to be open AI, there is the wannabe open AI, and now apparently there is the really like-fer-sure-this-time wannabe real open AI..
Among many inspiring feedback we received from this expert, who Ive admired for more than a decade, it was so fascinating to hear how much
@pyautogen
has spurred bottom-up creativity and advanced our worlds understanding of AI agents!
cc:
@ekzhu
@jack_gerrits
Had a conversation with an iconic leader + my mentor^2 and was told that he was a fan of
#AutoGen
! That made my day❤️🔥
Super inspired by an insight to the uniqueness of
#AutoGen
🦄
The Birdwatch algorithm surfaces notes to potentially misleading Tweets. Using survey data, we find notes selected by the algorithm reduce the likelihood of agreeing with the substance of a potentially misleading Tweet by about 26%.
Team-loss accounts for people adjusting their trust in AI based on the stakes and the cost of human effort. Positive effects can be observed in both synthetic and real datasets and the shift in behavior reflects the encoded human-centered properties. (3/5)
📢📢We're hiring!📢📢
If you want to shape the future of AI and empower people and AI agents to collaboratively solve real-world problems, apply here:
See also below for more exciting opportunities in
#AI
at
#MSR
with our partner teams. 👇👇
Can't find the answers in the docs? no problem 🤖
I built a multi-agent LLM application to query the collective developer knowledge of the
@pyautogen
discord server message history.
@chainlit_io
@OpenAI
@trychroma
🤝 AutoGen + Semantic Kernel!
Devis Lucato shows off how AutoGen can be the basis of a new planner that you can use in your Semantic Kernel applications, unlocking a whole class of interesting scenarios because of conversational multi-agents!