Denied a loan by an ML model? You should be able to change something to get approved!
In a new paper w
@AlexanderSpangh
&
@yxxxliu
, we call this concept "recourse" & we develop tools to measure it for linear classifiers.
PDF
CODE
I am thrilled to announce that – after spending the past 16 years in the US, earning a BS+MS+PhD in STEM, and co-founding a startup that employs many Americans – I have finally been granted the right to permanently reside in the United States... as a result of marriage 🎉🗽🇺🇸
📢 Please RT 📢
I am recruiting PhD students to join my group at UCSD!
We develop methods for responsible machine learning - with a focus on fairness, interpretability, robustness, and safety.
Check out for more information.
Personal Update: I'll be starting as an Assistant Professor at
@HDSIUCSD
in Fall 2021! Until then, I will be
@GoogleAI
where I'll be working on fairness and interpretability for machine learning in healthcare!
Key proposal by
@bhecht
in Nature this week:
“The CS community should change its peer-review process to ensure that researchers disclose any possible negative societal consequences of their work in papers, or risk rejection.”
📢 Please RT!📢
We're hiring postdoctoral researchers to work on responsible machine learning at UCSD!
Topics include fairness, explainability, robustness, and safety. For more, see
@randal_olson
If you’re worried about breaking code by switching to Python 3, check out this short guide on GitHub. It has a nice list of the main differences between Python 2/3, which makes switching *much* easier.
I really loved this article and this take.
The more I work in this field, the more I think that the real issue about human-facing ML isn't "bias" but "power" 1/2
“Biased algorithms are easier to fix than biased people”.
But, in some cases, I don't see these as entirely different problems. Sometimes fixing a biased algorithm first requires fixing biased people.
Cool new paper to check out if you’re working on interpretable ML:
“A review of possible effects of cognitive biases on interpretation of rule-based machine learning models.”
What are your best tips for running a graduate research seminar online? Think ~20 PhD students reading and discussing ML research papers.
Looking for any and all recommendations to make it more engaging and rewarding - course policies, tools, etc.
Machine learning models often use group attributes like sex, age, and race for personalization
In our latest work, we show that personalization can lead to worsenalization for certain groups
Joint
@VMSuriyakumar
@MarzyehGhassemi
🧵👇
Personalized models often use group attributes like sex/age/race. In our latest w
@berkustun
@MarzyehGhassemi
, we show how personalization can lead to worsenalization by reducing performance for some groups
PDF
#ICML23
Oral
@seanjtaylor
Worth checking out "Why Model?" by Joshua Epstein.
The optimizer approach makes sense of the ultimate purpose of the model is something other than prediction.
Excited for
#NeurIPS2023
and
#ML4H2023
! I'll be in town all of next week. Shoot me an e-mail if you want to catch up or hang out.
Also recruiting postdocs and PhD students this year! If you're looking to work on fairness, explainability, robustness, safety, let's chat 👇
📢 Please RT!📢
We're hiring postdoctoral researchers to work on responsible machine learning at UCSD!
Topics include fairness, explainability, robustness, and safety. For more, see
Algorithms can outperform humans, but algorithmic decision-making can be unjust due to data collection and seemingly harmless modeling decisions. When we automate consequential decisions, ML practitioners become policy-makers. The brouhaha is about how to do it right.
Personalized models should let users consent to the use of their personal information!
In our latest, we describe how to build models that let users consent to the use of group attributes like sex, age, race, HIV status
Spotlight Poster
@NeurIPSConf
: Tues 10:45-12:45 PM
Link:
Models are trained on costly data and require this data at prediction time. We should be able to opt-out and understand the gains of opting in!
In our latest w
@nagpalchirag
@kat_heller
@berkustun
we introduce models that give users this informed consent
#NeurIPS2023
Spotlight
15K abstracts
@NeurIPSConf
means
- 10K submissions (30% drop)
- 30K reviews (3 per sub)
- 5-10K reviewers (3 to 6 subs per reviewer)
Do we really have this many reviewers in ML?
Ask yourself - who is making decisions in algorithm design and model selection?
Did I have a say? Were there other reasonable alternative algorithms and models that would have benefitted me?
2/2
Pretend you are a gentle Professor Emeritus of Computer Science.
Write a short, edifying, and kind rebuttal to this sloppy review from an overly confident PhD student: [REVIEW]
Here are the key points to include: [YOUR ANGRY REBUTTAL]
Datasets often admit multiple "competing" models that perform almost equally well
@JamelleWD
's
#AAAI23
paper shows that competing models can assign wildly different risk predictions
We develop methods to fit competing models, and to measure the sensitivity of risk estimates 👇
Just arrived in DC for
#AAAI23
excited to present on Predictive Multiplicity in Probabilistic Classification (work with
@berkustun
and David Parkes)
Oral presentation: Feb 11 at 9:30am ET (ML Bias and Fairness session 1)
Poster: Saturday Feb 11 6:15pm ET
As good a time as any to say that I am looking for PhD students and postdocs to join my group
@HDSIUCSD
. If you're interested in fairness and interpretability in machine learning, please apply! More information here:
Folks at
#ICML2023
who are interested in new open problems about fairness, data rights, and healthcare:
@VMSuriyakumar
will be presenting our work on "When Personalization Harms Performance" at 4 PM HST in Ballroom C.
Machine learning models often use group attributes like sex, age, and race for personalization
In our latest work, we show that personalization can lead to worsenalization for certain groups
Joint
@VMSuriyakumar
@MarzyehGhassemi
🧵👇
Excited to present our work on actionable recourse at the
#NeurIPS
AI Ethics workshop tomorrow.
Stop by and say hi if you’re around! ⚡️ talk and poster session from 3- 4:30p @ Room 516 AB.
Denied a loan by an ML model? You should be able to change something to get approved!
In a new paper w
@AlexanderSpangh
&
@yxxxliu
, we call this concept "recourse" & we develop tools to measure it for linear classifiers.
PDF
CODE
Question for ML folks:
Say two classifiers have equal test error (on average), but they weigh their inputs differently and output conflicting predictions on specific points.
How do you choose which model to deploy?
Denied parole by an ML model? The next best model might have decided otherwise
In our
#ICML20
paper w
@berkustun
@FlavioCalmon
, we study the ability for an ML problem to admit competing models with conflicting predictions, which we call "predictive multiplicity"
THREAD ⬇️
Checklists are surprisingly effective when they are used to support human decisions (rather than replace it).
We should be learning them from data, and making better use of them in domains like medicine.
My student
@itsvictoriaday
has a great
@FAccTConference
paper that shows synthetic differentially private data *doesn't* solve downstream disparities in prediction. We need *real* diverse data!
Excited to present our work on worsenalization in machine learning
@UCSF_BCHSI
next week!
Come say hi if you're interested in fairness, data privacy, and clinical prediction models!
UCSF Bakar Computational Health Sciences Institute
"When personalization harms performance" seminar via
@UCSF_Epibiostat
Sept. 6, 3pm Berk Ustun
@berkustun
, Assistant Professor, Dept of Computer Science and Engineering, UC San Diego
I forced a bot to watch five faculty meetings (approx. 1,000 hours total) and then forced it to write a faculty meeting script of its own. Here is the first page.
@kamalikac
@rsalakhu
The ICML code policy was great this year!
As reviewers, we could check the code to answer clarifying questions about methods & datasets
As authors, we could afford to leave out tedious details about experiments and point readers to the code
Thank you for making it happen.
@PM_1729
I’d say the points are:
- interpretable model ≠ blackbox model + post hoc explainer
- blackbox model + post hoc explainer = bad
There’s an implication that interpretable ML is a solution. Honestly tho, it has limitations. Sometimes it’s better to build no model at all.
@Aaroth
Some surveys and position papers to check out:
1. Explanation in AI: Insights from the Social Sciences by
@tmiller_unimelb
2. Cognitive Biases on the Interpretation of Rule-Based ML models
1/2
Surprisingly thoughtful white paper from US Congress on the challenges with AI. Nice to see large parts focusing on privacy, fairness, malicious uses, inspectability, and the need for research funding :-)
"it’s critical that the federal government address the different challenges posed by AI, including its current and future applications." - new AI White Paper from the U.S. House of Representatives
Just got into Vancouver in time for the
#NeurIPS2019
workshops! Let's chat! I'll be at the human-centered ML workshop on Friday, and the Fairness in ML for Healthcare workshop on Saturday.
Question for the fair ML crowd:
Do you include protected attributes (e.g. race, gender) as an input variable for a model that is trained under a fairness constraint (e.g. equalized odds)?
Why / why not?
What's the right way to do handle a ICML review stating that your paper doesn't sufficiently expand on a previous version at a workshop? Issues:
1. Workshop paper is non-archival so it shouldn't matter.
2. Reviewer must have googled paper (so no longer blinded).
@MilenaAlmagro
@Gaffetheory
Great list! Adding a few more tips I wish I had known:
- Clif Bars to hold you over
- Comfortable shoes so you can walk across campus if needed
- List of 10-20 questions for 1:1s when you run out of things to chat about :-)
- Remember to ask for breaks between 1:1s
ML folks - does anyone know the origin story surrounding adversarial examples? Wondering if we can trace the research interest back to a talk/talks or a paper/papers etc.
@mikarv
@m_sendhil
Cool result! It’s not really relevant to settings where models are learned from data. It’s still a bad idea to fit a complex model that humans cannot understand or validate (e.g. a NN for credit scoring), just because ∃ a complex model that produces fairer allocation decisions.
@AlexGDimakis
One more failure mode :-)
Removing an independent feature could also affect performance by oversimplifying the hypothesis class.
Example: X1 = vector of ones used to represent the intercept in a linear classifier. Removing X1 = no more intercept.
@geomblog
I use IFTTT to post the arXiv RSS feed into a papers and abstracts Slack channel.
You can then skim quickly and save the ones that seem relevant into folders for each project.
This leads to a pile of PDFs to "read" for each project. And then I go them again when writing
Bad ideas for NY teacher hiring score:
- Using “good teacher evals” as an output (bad proxy for student success)
- Using “personality traits” as an input (irrelevant features)
- Using only data from teachers who were hired (bad sample selection)
Also why are we scoring?
Explanations are convenient for large firms in algorithms, AI, ML
They don't give users control
They don't shake-up power relations
They don't shine light on systems as a whole
It's irresponsible of researchers to jump on the explanation bandwagon without being critical of them
On the 🧵 app as
@berkmlr
!
It's not the same yet but should get there once they release an API (for accounts like
@StatMLPapers
), and a critical mass of AI/ML content (so we don't have to see generic posts).
Interested in causal fairness in ML? Here’s a thought-provoking paper by Issa Kohler-Hausmann.
“Eddie Murphy and the Dangers of Counterfactual Thinking About Detecting Racial Discrimination”
@begusgasper
Speaking from years of experience, Berk means:
- "ew" in France
- "idiot" in Britain
- "resilient" in Turkey,
- and nothing at all in the US :-)
@mikarv
Who is funding these clowns? Complaining about the costs associated with “human review” is ridiculous. Companies are already saving $ by automating decisions. The least they could do spend a little to provide consumers with recourse.
@aselbst
For datasets:
Dataset Nutrition Labels
()
For models:
Automated Bias and Fairness Reports
()
AI360 Fairness Toolkit:
()
I’m sure I’m missing lots of others... so looking forward to this thread :-)
@angelamczhou
Maybe for jobs... but for research we should think of it as the kernel trick. It's hard to solve the problem when you stick to one discipline. But then you project it into the higher-dimensional interdisciplinary plane, and boom the problem becomes easy.
Rebalance your training data!
You’d be surprised how often models perform differently across groups (z) due to uneven sample sizes (n) or label imbalances (y)
Resampling to equalize n for each (y,z) is a simple fix. It’s perfectly defensible if the training data isn’t iid
@mikarv
@m_sendhil
If we exclusively consider complex models, then fewer stakeholders participate in model selection and we can legitimize decision-making processes that are less just (even they result in fairer allocation)
@Aaroth
@bhecht
Hmm you may be right. Part of the issue is that some work has no immediate societal consequence.
I do think that papers should include a limitations section tho (which is standard in medicine). studies).
It would help peer review / avoid misunderstandings by journalists etc.
@KalaiRamea
leaving this here since it always surprises me:
“It’s worth recalling that the word ‘meritocracy’ was coined as a satirical slur in a dystopic novel by a sociologist.”
h/t
@antoniogm
@JessicaHullman
Also shameless plug to some of our work.
We developed methods that let domain experts to specify constraints on model form and predictions, and that inform customization by telling them how their constraints affect performance:
2/n
@tdietterich
@roydanroy
@mark_riedl
@nickfrosst
@umangsbhatt
@CynthiaRudin
Here’s a way to define “interpretable” without using “explanation”:
Interpretable = you understand how the model operates.
If a model is interpretable:
1. You can print it on a piece of paper
2. You know how its prediction will change if you change any input
@rajiinio
@geomblog
@Aaron_Horowitz
@ziebrah
@david_madras
Every prediction can be explained. This includes:
- Predictions that cannot be changed
- Predictions that are unfair
- Predictions that are uncertain
Explaining these predictions does more harm than good.
@mikarv
@m_sendhil
In terms of the setting considered in the paper, my opinion is that simpler models shouldn’t be ruled out b/c they can be more easily understood and contested by more people.
“Grungy boots-on-ground work is how we build our intuitions about what kinds of solutions actually work vs. sounding good on paper. It is hard — though not impossible — to skip this step and still do great work.”
@johnregehr
@_beenkim
It’s arguably the best tool that researchers have to share their opinions with others. It’s much easier and way more effective to tweet about something than write a blog post or position paper.
ML PSA: Stop using 20% of the data as an independent test set. Train your ML model with all the data. Use 5-CV to pair this model with an estimate of predictive performance. If method needs parameter tuning, then use nested CV to avoid bias. More info here
@geomblog
Hmm interesting. To be clear, is the argument: it might be possible (possibly easier) to first train a black-box, and then distill this into a white-box by producing an explanation that has 100% fidelity over all inputs?
(1/n) Here's a tweet thread about our paper accepted at
@fatconference
! Link to paper: . Our goal is to understand how we can use machine learning models to enhance human decision making while retaining human agency.
Last week the US Dept of HUD filed a discrimination lawsuit against Facebook: "When FB uses the vast amount of personal data it collects to help advertisers to discriminate, it's the same as slamming the door in someone's face."
“A survey of 400 algorithms presented in papers at two top AI conferences in the past few years… found that only 6% of the presenters shared the algorithm’s code”
Missing data hinder replication of artificial intelligence studies.
ML folks — have you worked with datasets with noisy or corrupted labels?
If so, please share! Trying to get a better sense of applications where they crop up & how people handle it in practice.
GDPR PSA: The “logic” of an SVM model is to maximize the margin on the training data. Generating an “explanation” for this model leads to misleading rationalizations. There are many such explanations, and the explanations are only valid for the training dataset.
#FAT2018