On the 2023-2024 academic job market | CS PhD candidate at Stanford, advised by
@jure
and
@jugander
. Networks, ML, public health, computational social science.
I’m on the academic job market! I’ll have a PhD from
@Stanford
CS in 2024. My research develops ML + network science methods to tackle complex societal challenges, from pandemics to polarization to supply chains. See my website + research statement for details! Highlights below:
So excited to announce that we won the KDD 2021 Best Paper Award in the Applied Data Science track for our paper, "Supporting COVID-19 policy response with large-scale mobility-based modeling”!
Very honored to receive the
@MetaAI
PhD fellowship in Computational Social Science!! Thank you to my incredible advisors
@jure
and
@jugander
, mentors, and collaborators for your brilliance and support :)
Congrats to
@serinachang5
for winning the Meta PhD Fellowship in Computational Social Science!! Serina has done amazing work modeling the spread of COVID on large-scale human mobility networks and applying this research to real-world impact.
📢 Excited to announce a new paper with
@erichorvitz
and
@adamfourney
: “Accurate Measures of Vaccination and Concerns of Vaccine Holdouts from Web Search Logs”
@MSFTResearch
1/
Paper:
Data & code:
Excited to announce new work to appear at
@RealAAAI
'23! We study the problem of geographic spillovers, where people cross county/state borders to circumvent local restrictions (think: COVID lockdowns, gun control). Joint work w
@dvrabac
,
@jure
,
@jugander
:
Excited to attend
@icwsm
in person this week!! I’ll be presenting joint work with
@jugander
on Thu June 9 (Session 4). We often discuss the impacts of recommender systems on society, but relative to what counterfactual - how would user outcomes change in a world without them? 1/5
Just arrived in Seattle for
@MSFTResearch
intern week! Excited to be working with
@erichorvitz
and
@adamfourney
this summer on projects using search queries to investigate the effects of COVID policy interventions on dynamic behavior (eg, vaccine uptake, mobility patterns). 1/2
Our model also predicts that 10% of POIs account for over 80% of infections at POIs. Restaurants, gyms, hotels, and cages are among the riskiest categories to reopen. These places are riskier because they tend to have higher visitor densities and/or visitors stay longer. 5/9
Very excited to share this new work! We build an epidemiological model on top of a fine-grained (hourly) mobility network, and analyze the impacts of reopening + how socioeconomic / racial disparities in infections can arise from mobility patterns alone.
Such a joy to work w/
@dallascard
and the rest of our amazing interdisciplinary team! Combining NLP + domain expertise, we conduct the first comprehensive quantitative analysis of Congressional speeches about immigration and characterize dramatic changes in discourse over 140 yrs
Excited to be attending
@Cornell
's Future Faculty Symposium this week in Ithaca, NY!! Thanks to Cornell for organizing this fantastic event - I'm looking forward to spending time with the amazing faculty and the other symposium scholars!
Super interesting opinion piece by
@iarynam
which dives
into our COVID model’s findings around capacity caps and socioeconomic disparities in the first wave of the pandemic. Love these gorgeous graphics; I’m definitely taking notes for my next data viz 😍
Three fundamental challenges arise from these gaps:
(1) How to infer networks from aggregated, noisy, or incomplete data
(2) How to extract precise behavioral signals from vast unlabeled data
(3) How to estimate causal effects (eg, of policies) from observational network data
1/ New preprint! Excited to introduce follow-up work on our
@nature
paper: we extended our COVID model and turned it into a decision-support tool that can directly help policymakers in the remaining (hopefully few!) months of COVID response:
The deadline is coming up soon: submit your work to the Data Science for Social Good Workshop at KDD'23! Co-organized with
@AmulyaYadav19
@AyanMukhrjee
and Aparna Taneja
Deadline: June 5, 2023
Workshop: August 7, 2023 (Long Beach, CA)
Second: I’ve built ML systems to extract precise behavioral signals from unlabeled data - eg, GNNs to detect vaccine intent from search logs (w/
@adamfourney
@erichorvitz
), LLMs to study immigration attitudes from political speeches (PNAS’22 w/
@dallascard
@jcbecker_econ
et al)
We identify two mechanisms driving these disparities: (1) disadvantaged groups were not able to reduce their mobility as much in Mar 2020, (2) the POIs that disadvantaged groups visit tend to be more crowded, which increases risk of infection. 7/9
I'm attending
#KDD2023
! On Aug 7, I'll be
- running the Data Science for Social Good Workshop:
- presenting work w
@erichorvitz
@adamfourney
on search logs + vaccines at
@EpidamikW
(4:10-4:50pm):
- moderating a panel at KDD EDI day
Third: I’ve developed methods blending causal inference+graph ML to discover causal mechanisms of dynamic networks - eg, RDD-based methods to estimate spillovers in mobility networks (AAAI’23 w/
@dvrabac
@jure
@jugander
), GNNs to learn production functions from supply chains
My work is also recognized by
@NSF
GRFP,
@Meta
PhD Fellowship, EECS Rising Stars, and Rising Stars in Data Science. Beyond research, I’m passionate about teaching, mentorship, service - I’m currently Program Chair for ML for Health 2023 and mentoring four talented undergrads :)
On the first challenge: in our Nature paper (w/
@2plus2make5
@PangWeiKoh
et al), we developed methods to infer mobility networks (5.4B hourly edges!) from aggregated location data and modeled the spread of COVID-19 with unprecedented granularity, informing reopening + disparities
I would love to hear about any opportunities that might be a good fit!! You can find my contact info on my website, along with my CV and research statement:
Based on mobility data *alone*, the model correctly predicts that lower-income and less white neighborhoods are infected at higher rates. These disparities in COVID infection rates are well-reported, but this suggests that mobility may be a strong contributing factor. 6/9
I leverage novel high-frequency data sources to capture human networks + behaviors at the center of societal challenges. These data provide new opportunities to capture networks + behaviors at scale BUT there remain large gaps between real-world systems and data…
We built a novel computational tool for policymakers to assess tradeoffs between thousands of potential mobility measures and predicted COVID-19 infections. Check out our recent blog post for details:
Excited to announce new work to appear at
@RealAAAI
'23! We study the problem of geographic spillovers, where people cross county/state borders to circumvent local restrictions (think: COVID lockdowns, gun control). Joint work w
@dvrabac
,
@jure
,
@jugander
:
Finally, a central pillar of my research is closing the loop. I’ve deployed decision-support tools, including a dashboard enabling public health officials to assess the potential impacts of thousands of reopening plans - this work (w/
@UVA_BI
) won the Best Paper Award in KDD'21🏆
By closing these gaps, we can derive key insights and build decision-support tools, paving a new way for *human-centered decision-making* powered by large-scale computation and data 🎆
Our results highlight the need for policy-makers to consider the impacts of reopening on different populations. By supporting detailed, data-driven analyses, we hope that our model can help to address this need and inform more effective and equitable responses to COVID-19. 8/9
Had a wonderful time chatting with
@katakeith
and
@lucy3_li
about our work on mobility network models of COVID! Check out the episode here: we discuss the origins of this interdisciplinary project, and the joys and challenges of working on a high-stakes issue in near real-time.
Listen to
@serinachang5
talk about her use of cellphone data to track COVID-19 mobility dynamics and the challenges of communicating this science to the public on Eps.4 of
@lucy3_li
and my podcast: Diaries of Social Data Research!
#KDD2023
Data Science for Social Good Workshop deadline extended to June 12!! We encourage a wide range of work, from perspectives to methods to applications in health, sustainability, etc. See CFP:
@AmulyaYadav19
@AyanMukhrjee
and Aparna Taneja
The deadline is coming up soon: submit your work to the Data Science for Social Good Workshop at KDD'23! Co-organized with
@AmulyaYadav19
@AyanMukhrjee
and Aparna Taneja
Deadline: June 5, 2023
Workshop: August 7, 2023 (Long Beach, CA)
Excited to be giving a talk at the
@Stanford
Graph Learning Workshop next week! I'll be presenting with Qi Xiu from
@Hitachi
on developing temporal GNNs to model global supply chains🚢➡️🚚➡️📦
Register here:
🌟 Announcing the speakers for the Stanford Graph Learning Workshop 2023 on Oct 24 2023!🤝 Bringing together academia & industry leaders to delve into advances in
#MachineLearning
&
#AI
in Relational domains, Foundation models, and Multimodal AI. 📢 Register at:
@lilyxu0
is an exceptional researcher, not to mention amazing organizer, mentor, and friend. Her work on AI, conservation, and health has inspired me throughout my PhD. Hire her!!
I'm on the academic job market!
I'm a PhD student at
@Harvard
, where I develop and deploy AI decision-making techniques for planetary health.
My research enables practitioners to take efficient, robust actions necessary in these high-stakes, low-data settings.
We find that reduced occupancy reopening — ie, reopening but capping each POI's max occupancy — is effective. In Chicago, our model predicts that capping at 20% occupancy reduces infections by more than 80% but only loses around 40% of POI visits, compared to fully reopening. 4/9
Using anonymized, aggregated location data from
@SafeGraph
, we construct mobility networks that capture the hourly movements of people from neighborhoods to points-of-interest (POIs) like restaurants and gyms. Our networks cover 553k POIs and 98M people from Mar-May 2020. 2/9
Even though the models are identical in process, this reversal in perspective results in dramatic differences! Through theorems + experiments, we show that these differences are pervasive across a large class of “recommender” and “organic” models (). 4/5
A Stanford team has created a model to help identify effective and equitable reopening policies. To do it, they traced the footsteps of 98 million Americans through half a million establishments in 10 of the country's largest metropolitan areas:
#COVID19
New working paper quantifying arXiv publication patterns in the age of LLMs! Joint work with
@rajivmovva
,
@sidhikab1
,
@kennylpeng
,
@gsagostini
, and
@NikhGarg
.
We analyze LLM citation patterns, fastest growing topics, many other things. Some of our findings: 1/N
Come say hi if you're interested in this work or modeling complex social systems at large! Much of my current research also involves modeling diffusion processes (eg, COVID) over large-scale networks and developing methods to effectively intervene on complex systems. 5/5
You can find our code + materials to run your own RDD experiment on the CA Blueprint at . Please reach out if you have thoughts or find me at
@RealAAAI
in Feb! Thanks also to
@2plus2make5
for helpful discussions +
@CAPublicHealth
for great documentation 9/
By overlaying a simple SEIR model on these networks, we're able to capture the broad relationship between mobility and disease spread. This allows us to assess a variety of reopening questions: which places are the riskiest to reopen? when/how should we reopen? 3/9
So: spillovers exist, they do undermine local policies, but intermediate strategies (when optimized) can achieve a good balance btwn policy efficacy and flexibility! Lots more to explore here: eg, other spillover domains (content moderation!) and tradeoff dimensions 8/
@navaneethsan
@Stanford
Thank you!! I'm primarily interested in academic positions since I'd like to establish my own lab and mentor students, but I'm super interested in translating research into industry + policy impact and have really enjoyed past collabs! (eg, with Microsoft and Hitachi)
Yes! We estimate these effects on a large-scale mobility network w billions of dynamic edges (using loss-corrected negative sampling to fit our model). We find *significant spillover movement* from the purple to red tier, with larger effects in retail, eating places, and gyms 5/
In our work, we can make unconfounded estimates of spillovers! We identify an ideal setting: CA Blueprint for a Safer Economy, where counties were assigned each week to "tiers" of varying restrictiveness, eg, in purple (most extreme), restaurants + gyms were outdoors only 3/
We’re now able to quantify how spillovers introduce tradeoffs for policymaking across spatial scales. Contrasting local + global regimes, our spillover estimates suggest that county-level restrictions are only 54% as effective as statewide restrictions at reducing mobility! 6/
The key to our models is a reversal in perspective: the user knows her own interests well but needs to estimate item attributes from noisy samples (eg, movie trailers) while the recsys knows items well but needs to estimate (new) user interests from noisy samples. 3/5
Using the classifier, we can estimate vaccine intent rates down to ZIP code tabulation areas, around 10x the granularity of counties! With search signals, we can also detect vaccine intent in *real time*: we find that the CDC’s time series lags ours by 7-15 days. 7/
Honored to have been named to MIT
@techreview
's 35 Innovators Under 35. Very grateful to the mentors and friends who collaborated on the work this award recognizes!!!
Thanks GPCE/Erin Raymond for the spotlight interview! We chatted about my recent work with
@jugander
on the social impact of rec systems, and more broadly about my research interests in complex systems of human behavior. Stay tuned for part 2! (we’ll be discussing COVID modeling)
Spillovers introduce tradeoffs: while local policies provide flexibility between regions, their efficacy may be seriously undermined by spillovers. However, due to endogenous policymaking, there are few opportunities to reliably estimate causal spillovers or evaluate tradeoffs 2/
The key is that tier was assigned at the *cutoffs* of COVID metrics. We thus develop a regression discontinuity design-based framework to estimate how visits btwn counties change due to tiers. Do visits from county A to B increase when B is in a less restricted tier than A? 4/
However, an intermediate strategy of macro-county restrictions—where we optimize county partitions by solving a minimum k-cut problem on a graph weighted by our spillovers—can recover over 90% of statewide reductions, while maintaining substantial flexibility between counties 7/
To answer this question, we introduce two contrasting models: (1) a “recommender” model that captures a generic personalized recommender system, (2) an “organic” model where users search for items without the mediation of any system. 2/5
...however, for holdouts who eventually “converted” (showed vaccine intent), their concerns nearly reverse around conversion time, looking much more like early adopters than their typical selves – aside from a few important differences. 11/
To fill these gaps, we employ anonymized search logs! We discover that they are amazingly powerful at detecting vaccine seeking, concerns, news exposure, etc, right from real-world clicks and queries. BUT how do we make sense of billions of unlabeled + unstructured searches? 3/
Despite never having worked together in person, Pang Wei has helped me become a better researcher, coder, writer, and thinker. I feel so lucky to be able to call him a mentor, collaborator, and friend. 😍
Our solution: graph machine learning. First, we impose structure by representing search logs as a *query-click graph*, then we develop methods to detect user intents and categorize user interests using graph ML + human annotation. 4/
4/ To build this tool, we’ve updated our model substantially — adding + validating new features (like mask-wearing), fitting to smaller metro areas and new time periods, improving our computational infrastructure so that we can run millions of model realizations efficiently.
@PangWeiKoh
Thank you Pang Wei!!! So glad we got to do this project together, and, as always, I had so much fun and learned so much from working with you! 😍
@mantaflight
@ben_golub
@Stanford
I'm broadly interested in TT faculty positions related to my research! Including CS, IS, data science, business schools, etc
Wait, but is Bing representative? Well, coverage isn’t uniform across the US, so we carefully estimate Bing coverage per ZIP and correct for this. See M3 for details: we also decompose sources of bias, evaluate bias from our classifier, and compare Bing + Google trends 12/
Concerns of early adopters vs holdouts differ significantly even within categories: eg, in Vaccine Safety, early adopters are more interested in normal side effects, severe side effects (eg, blood clots); holdouts far more interested in vaccine myths, exemptions, FDA approval 10/
3/ Our tool allows policymakers to explore the impact of fine-grained changes in mobility on predicted infections: eg, what if restaurants were returned to 50% of their pre-pandemic levels of mobility, essential retail to 100%, and other categories retained their current levels?
Thank you for writing this, Emma! I couldn’t agree more. To add on, I’m so grateful to
@PangWeiKoh
for not only his essential contributions to our methods, but also for going above + beyond by joining Emma in becoming the best mentors I could ask for — on this project and in life
He's never even given a talk about it, but his fingerprints are all over this work - in the decades-old optimal transport algorithms applied to a totally novel setting; in the integrals correcting the visit counts; in the winsorization of input data to defend against outliers. 7/
5/ There are limitations to our tool, eg, modeling assumptions, mobility data doesn't cover all populations. Also, we focus on the effects of mobility on transmission, but other work is needed on how policymakers can reach target levels of mobility (we're looking into this now!)
We develop a “vaccine intent classifier” that detects when a user is trying to get the COVID vaccine on search. What’s unique is that our classifier not only detects queries, eg, [covid vaccine near me], but also URL clicks, eg, on the CVS COVID vaccine registration page. 5/