What do great teachers do to be good at teaching? What can this teach us about LMs?
Most of us experience the “front stage” of teaching—as students. Few see the *back stage*: the planning, pedagogical decisions…
🌉 Bridge, at NAACL’24, surfaces these hidden decisions🧵
I’m excited to share “Language modeling via stochastic processes”, an
#ICLR2022
oral!
Our work addresses the challenge of generating long _coherent_ sequences with language models by leveraging goal-conditioned latent paths.
Paper:
🧵[1/T]
Excited to share gym-cooking, a *novel multi-agent Gym environment*:
Based on recent work (,
#CogSci2020
computational modeling prize winner) with amazing collaborators Sarah Wu, James Evans, Josh Tenenbaum, David Parkes,
@maxkw
!
How can we algorithmically figure out what our model doesn’t know, and then construct datasets to improve it?
We tackle this question in “Know thy student: Interactive learning with Gaussian processes” at
#ICLR2022
@cells2societies
workshop.
Paper:
[1/N]
Students listening to lectures can go to Google to answer their questions. But…how can the teacher find what part of their lecture *caused* the student’s question in the first place?? 🤔
Introducing *Backtracing*: Retrieving the Cause of the Query!
🧵
There’s a *lot* of insight about how students learn in education data like classroom conversations, but I can tell you it's *painful* to process and analyze.
Introducing *Edu-ConvoKit*, a pipeline that handles the pre-processing, annotation, and analysis for you!
🧵
How do we train language models (LMs) to be good pragmatic conversational partners? We investigate this in our
#EMNLP2021
Findings paper: Calibrate your listeners! Robust communication-based training for pragmatic speakers.
📜:
📺:
#EACL2024
is over!! This was a super fun conference & it was great meeting new folks 😃 Malta is so beautifully colorful --- here are some of my favorite pics!! ✨🎨
Ever wonder how experienced math teachers & tutors compare to ChatGPT or GPT4 in teaching students? 🖥️🧑🎓👩🏫 Check out our new paper “Step-by-Step Remediation of Students’ Mathematical Mistakes”!
📜
🖥️
from
@stanfordnlp
@StanfordEd
Can ChatGPT help teachers by providing effective feedback, like generating helpful pedagogical suggestions? 👩🏫
We answer this question in our work presented @ BEA (co-hosted @ ACL) on Thursday July 13 Harbour A!
Website: w/
@ddemszky
!
Come check out “Language modeling via stochastic processes” at
#ICLR2022
on this Monday! 😄 Looking forward to meeting old and new friends 🥳
Livestream: Apr 25 5-5:15pm PT
Poster: Apr 25 6:30-8:30pm PT
w/
@esindurmusnlp
, Noah Goodman &
@tatsu_hashimoto
I’m excited to share “Language modeling via stochastic processes”, an
#ICLR2022
oral!
Our work addresses the challenge of generating long _coherent_ sequences with language models by leveraging goal-conditioned latent paths.
Paper:
🧵[1/T]
1/N We are excited to introduce our
@iclr_conf
workshop: A Roadmap to Never-Ending RL. We invite you to submit papers (up to 6 pages, excluding references and appendix) in the
@iclr_conf
format. Submission Deadline: February 26, 2021
#NERL2021
#ICLR2021
In Malta 🇲🇹 for EACL this week!! Let me know if you’re around and wanna talk about nlp applications esp education, or just hang with me
@chengmyra1
@krisgligoric
… or run with me along the coast!! 😎🏃🏻♀️
Students listening to lectures can go to Google to answer their questions. But…how can the teacher find what part of their lecture *caused* the student’s question in the first place?? 🤔
Introducing *Backtracing*: Retrieving the Cause of the Query!
🧵
How will ed tech change w LLMs? What is and isn't possible?
If these Qs have been on your mind, submit your work to a workshop I'm organizing: Leveraging LLMs for Next Gen Ed Tech @ EDM 2024 by May 10th!
➡️
#EDM
#EdTech
Curious about how we can build cooperative, human-like AI systems? 🤖
📜: Poster 10-11am PST Saturday
#NeurIPS2020
Cooperative AI Workshop!
🗣: Spotlight talk 11:45pm PT Saturday
📚: Paper
w/ S. Wu, J. Evans, J. Tenenbaum, D. Parkes,
@maxhkw
!
Woo, my internship project is out on the
@GoogleAI
blog!!! How can we get real-world robots to anticipate each other’s behavior and collaborate? Find out more below!👇
Big thanks to: J. Chase Kew, Dennis Lee, Tsang-Wei Lee, Tingnan Zhang,
@brian_ichter
, Jie Tan,
@AleksandraFaust
Introducing a model-based
#RL
approach for robot navigation, called hierarchical predictive planning (HPP), that enables agents to align their goals on the fly in order to solve the decentralized rendezvous task. Learn more at
Excited to present
#CoRL2020
paper on model-based RL for multirobot coordination! Work done at
#Google
w J. Chase Kew, Dennis Lee, Tsang-Wei Lee, Tingnan Zhang,
@brian_ichter
, Jie Tan,
@AleksandraFaust
.😄
Website:
Live session: Tomorrow (11/18) 12:30pmPT
📢 Calling the
#EdTech
community!
Intrigued by the [potential/positive/negative] impact of LLMs on education?
Submit your work to this workshop at
#EDM2024
🪇👩🏫
➡️
Deadline: May 10th
Looking forward to the discussions!!!
How will ed tech change w LLMs? What is and isn't possible?
If these Qs have been on your mind, submit your work to a workshop I'm organizing: Leveraging LLMs for Next Gen Ed Tech @ EDM 2024 by May 10th!
➡️
#EDM
#EdTech
I am thrilled to present at Women in Data Science! Come learn more about my work with
@ddemszky
on NLP and Education 😄
Topic: "Beyond Right or Wrong: Leveraging Language Models to Enhance the Learning Process"
Aug. 30, 11am-12pm PDT
Register now:
@ccanonne_
This is neat! Curious about what students wrote. One thing that’s been on my mind is how this might change grading rubrics to be less about a student’s generative ability (eg pseudocode, complexity analysis), and more about their discriminative ability (eg debug, critique).
📢 Calling all folks in the AI x Education community: An exciting opportunity!!! 👩🏫👩🎓
Bill and Melinda Gates Foundation RFI on AI-Powered Innovations in Math Teaching & Learning:
Let's build a future where every question leads to deeper understanding…for both students and teachers! 💡
Big thanks to
@arankomatsuzaki
for featuring our Backtracing work!! 🙏
Backtracing: Retrieving the Cause of the Query
- Proposes a new task called backtracing where the goal is to retrieve the cause of the query from a corpus
- Shows limitations in current retrieval methods for performing backtracing
repo:
abs:…
Our insight is to represent _coherent_ language as a _smooth_ latent trajectory. We turn to Brownian bridge stochastic processes (SP) as a model for smooth trajectories.
[4/T]
We train our encoder with contrastive learning (CL). Why CL? Bc of (a) its striking performance in learning representations (SimCLR for images
@tingchenai
) & (b) exciting work in applying it to structured SPs (
@BingbinL
)!
[5/T]
Understanding why users ask questions is key because it’s a natural source of *content feedback*.
We establish a diverse benchmark for backtracing and show traditional retrieval systems miss the mark in retrieving the cause of queries 😬
📎:
When we want language models to generate long text, they often output meandering text. One potential reason behind this failure mode is the model’s inability to plan ahead or represent long-range text dynamics.
[2/T]
You want to understand how students learn, but not sure where to get started?
🎥 Here's a 2-min demo video of Edu-ConvoKit!
It walks through its GPT-powered, quantitative and qualitative analysis tools.
👋 Happy exploring with Edu-ConvoKit!
There’s a *lot* of insight about how students learn in education data like classroom conversations, but I can tell you it's *painful* to process and analyze.
Introducing *Edu-ConvoKit*, a pipeline that handles the pre-processing, annotation, and analysis for you!
🧵
Our environment is designed to be easily configurable and light-weight. It’s perfect for folks interested in multi-agent systems or in compositional tasks/environments!
Lectures are a learning experience for students & teachers. Students learn about the subject & teachers learn about refining their instruction. But, online student feedback is unstructured. How can teachers learn from it?
Have questions about the work? Let's trace back to the source...:
📎:
💻:
I’ll also be in Malta for
#EACL
presenting this work, so come chat with me there too 😉☀️
On long text generation settings, Time Control (TC) preserves the text structure both in terms of ordering (up to +40% better) and text length consistency (up to +17% better). Human evaluators also prefer TC's output 28.6% more than the baselines.
[8/T]
Prior work has explored remedies for this failure mode by using planning-based methods or implicitly learning text dynamics. However, these methods manually specify the text dynamics or sacrifice quality in long-horizon generation.
[3/T]
Yes!!! Make sure to check out
@jeffclune
+
@ruiwang2uiuc
's awesome work at our 2nd poster session at 12:55pm PT! 🥳
GatherTown link can be found on our workshop's ICLR site.
Our work Enhanced POET is an invited poster at this (excellent!) workshop in case you want to come ask questions or ask about any of our work. Thanks to the organizers for putting together such a wonderful event!
👉 Check Edu-ConvoKit out:
It’s easy and instrumental to transform the way we conduct research for improving real student learning outcomes.
Work done at
@StanfordEd
@stanfordnlp
@StanfordAILab
with my advisor
@ddemszky
Experienced teachers engage their students in critical thinking—whereas novice tutors and LLMs don’t: They frequently give away the answer.
Our work focuses on how experienced teachers do and *think* about remediating student mistakes.
📎:
The intuition is simple: The bridge imposes that a positive triplet (eg. three in-order sentences on Boston) makes up a smooth trajectory. A negative triplet should not construct a smooth trajectory (switching middle sentences with one on New York).
[6/T]
After training the encoder, we finetune GPT2 to decode from past context and the encoded latent plan. At inference, we generate a latent plan by sampling from the bridge and conditionally generate each sentence using the latent plan.
[7/T]
Lately, I’ve been super excited about teacher-student settings & thinking about how we can enable machines to (one day) reliably interact & _teach_ humans! If you’re interested in this direction, let's chat at
@cells2societies
poster session Fri April 29 8:15-9:05am PT!
[6/N]
Super cool work from
@KaitlynZhou
--- generating expressions of uncertainty is extremely important in supporting human decision-making (and human reasoning)! My favorite part of the paper is their typology of uncertainty expressions (Table 4) 😀
Thanks so much for sharing our work!
Our paper also discusses additional risks and opportunities that come with integrating expressions of uncertainty into LMs.
Read the paper here:
w/
@jurafsky
@tatsu_hashimoto
@stanfordnlp
We also show that TC doesn’t sacrifice short/mid-range language modeling performance! Eg. TC matches/outperforms task-specific models like infilling by language modeling (ILM) on text-infilling or local representation methods on discourse coherence.
[9/T]
I am excited about language tools for education at scale because we move away from an oversimplified view of learning measured by the standardized test score….and towards language measures of student thinking and pedagogy.
Repo:
Stop by
@cells2societies
poster session tomorrow (Fri) 8:15-9:05am PT if you're interested in teacher-student settings or the problem of "assessing then teaching models". Always excited to meet new folks too! 🙏🙂
How can we algorithmically figure out what our model doesn’t know, and then construct datasets to improve it?
We tackle this question in “Know thy student: Interactive learning with Gaussian processes” at
#ICLR2022
@cells2societies
workshop.
Paper:
[1/N]
What makes backtracing so hard?! Let’s use our Lecture dataset based on MIT OCW lectures as an example.
Challenge 1: The queries don’t explicitly label what they’re caused by in the lecture/source document
@janleike
@BlancheMinerva
@MaksimSTW
It would be great if OpenAI could make a more public blog post (eg. something with as much visibility as ChatGPT on ), clarifying the misunderstandings and also the implications (eg. SL fine-tuning on a good dataset ~= RLHF, and not RLHF>SL)!
I’ll be presenting our work in person at EMNLP. This will be my first NLP & in-person conference, so I’d love to meet folks! Please don’t hesitate to reach out 😄
📜:
📺:
👩💻:
Attending
#EACL2024
and passionate about computational social science (CSS) or social applications of NLP?
Join us tomorrow (Wednesday at 2 pm in the Carlson room) for an informal CSS Birds of a Feather gathering!
Let's meet, chat, and share insights!
I'm excited about building NLP algs/systems to empower educators & students and enhance social interactions! ✨🚀🧑🎓👩🏫
Echoing the views of
@ddmeyer
in education and the innovative NLP x data science approaches of
@Diyi_Yang
&
@timalthoff
in mental health
@abeirami
@emnlpmeeting
When can we expect a response from the PCs? We sent an email on Sunday and still haven't heard back. Our soundness/excitement scores were 5/4, 4/4, 4/4... 🙏
@juanmiguelpino
@hbouamor
Totally understand it's a busy time, but having an ETA would be helpful for resubmission! 🙂
Backtracing
Retrieving the Cause of the Query
Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist
And finally, a huge shoutout to my collaborators from
@stanfordnlp
@StanfordEd
, esp
@lateinteraction
!
Working with Omar has been an incredible experience: He brings boundless insight and energy to the table, and I’m so grateful to learn from this star in our community! 🌟
We use Cognitive Task Analysis & work with experienced 🧑🏫s to surface their internal decisions. Patterns emerge: 🧑🏫 infer the *error type* -> determine an *intention* -> pick a *remediation strategy*.
We have a lot of other exciting ongoing projects along this direction, so stay tuned!!!
📜
👩💻
🖥️
w/ the amazing Ashley Zhang, Carly Robinson, Susanna Loeb, and
@ddemszky
✨
Our work complements recent work on self-improving LLMs with internal decision-making:
But, we focus on eliciting and leveraging thoughts/decisions from real, experienced humans—in this case, teachers!
Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks
Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵
We contribute a *unique, real* dataset: It contains the *internal decisions* paired with the response of these experienced teachers.
It also includes real tutoring conversation snippets between novice tutors and students, across 120 math topics.
Dataset:
There are so many cool findings in this work—I could go on for ages, but tweets can only do so much…
For now I’ll leave y’all on a cliffhanger and show how experienced teachers make *diverse* and *complex* decision paths, compared to LLMs.
Isn’t this beautiful!?
Eg. in an offline reinforcement learning setting, the student must navigate to the goal (green). The teacher determines states (yellow) the student has explored and accomplishes this task. The teacher can then construct demonstrations from states (orange) the student fails.
[3/N]
Please join us for a roundtable panel discussion with Celeste Kidd (
@celestekidd
), Satinder Singh, Melanie Mitchell (
@MelMitchell1
), and Jürgen Schmidhuber (
@SchmidhuberAI
) moderated by Adam White. We are excited to hear this discussion. Don’t miss it!
Looking for a library AND viz tool for making ML training more efficient? Want to understand the trade offs you would make between cost, time and model quality? Check out what my cool friends
@AveryLamp
@abhi_venigalla
@moinnadeem
at
@mosaicML
recently released!!! 🥳😎
Hello World! Today we come out of stealth to make ML training more efficient with a mosaic of methods that modify training to improve speed, reduce cost, and boost quality. Read our founders' blog by
@NaveenGRao
@hanlintang
@mcarbin
@jefrankle
(1/4)
Our Bridge framework allows for two cool questions to be answered:
- Can we use a human expert’s internal decisions to *improve* LLM responses?
- Can we prompt LLMs to make their own internal decisions and self-improve?
We cast this problem as a teacher-student setup where the teacher must first interact to diagnose 🧪the student (the model), before teaching 👩🏫(constructing the training dataset).
[2/N]
@phillip_isola
This is super cool!!! I've been exploring a similar idea for math education purposes and there are a lot of cool analogies we use 😀. It's interesting (and ominous??) that at age 30, we seem to hit a dark period though with our analogies
Scaling high-quality tutoring is hard. With growing demand, many platforms use novice tutors who struggle to address mistakes and fail to seize learning opportunities.
📚But, turning mistakes into opportunities is key! Effective strategies can transform student understanding🚀
@ben_eysenbach
Hey Ben! Neat work. Using contrastive learning for estimating value functions seems quite similar to the . They study it in symbolic domains + assume env simulator for negatives, so it's slightly different from your domains, but methodologically similar.🙂
@AlexTamkin
@MicahCarroll
Also sharing this on tips for new researchers! Wrote this right after undergrad, so might need to update it with PhD/mentorship reflections 😁
Note: The teacher can’t exhaustively probe the student esp. if (a) the state space is huge [this would take too long!], and (b) with limited communication [eg. if the student were a human, they would get exhausted!]
[5/N]
Let’s see Edu-ConvoKit in action! 👇
What are examples of real student reasoning in the classroom? Or, examples of how a tutor interacts with a student’s contributions?
Boom, with a single function call with Edu-ConvoKit, we view these examples.
*** Deadline Extension Alert ***
We have decided to extend our
@iclr_conf
workshop submission deadline by 48 hours, until February 28th AOE. Submit your work here: More details can be found on our website:
#ICRL2021
#NERL2021
Big +1!! You learn how to do sanity checks, debug, and find edge cases through research..
Last summer, I heard an undergrad through my office wall bang on the table, then shout: "Ohhh myyy godddd!!!" *silence* "I'M CHANGING THE LIST AS I'M ITERATING THROUGH IT"😆 what a throwback
2) Learn to problem solve
Google your errors / stack traces
Play around with a library in an interactive python session
(Maybe even ask a language model, as long as you can verify the answer it gives you!)
Here’s my favorite qualitative example on a word problem: GPT4 gives away the solution, whereas the teacher suggests an illustrative strategy to help the student understand. Providing the student error and strategy label gets GPT4 to generate pedagogically better responses. 📈
Excited to share gym-cooking, a *novel multi-agent Gym environment*:
Based on recent work (,
#CogSci2020
computational modeling prize winner) with amazing collaborators Sarah Wu, James Evans, Josh Tenenbaum, David Parkes,
@maxkw
!
@ddmeyer
@ddemszky
On the other hand, I think this raises Qs abt how we should design collaborative systems/algs when users have very diff preferences in how they want to teach; this diversity Q comes up in other settings too e.g., mental health domain). Would love to chat & ideate about this!
Our work sheds light on the potential and current limitations of using LLMs to provide high-quality learning experiences for tutors & students at scale. We hope that it can serve as a valuable resource for providing equitable, high-quality learning experiences using *real* data.
🔥Findings: Expert math teacher>LLMs+expert math teacher’s guidance>LLMs>novice tutors🔥
-The quality of the best LLM’s responses falls short of math teachers.
-Providing LLMs w/ teacher’s strategy leads to a 75% improvement in response quality over models w/o that information.