📢Tasks with > 10k classes (e.g. information extraction) are hard for in-context learning: typically a tuned retriever or many in-context calls per input are used ($$$)
Infer-Retrieve-Rank (IReRa) is a SotA program using 1 frozen retriever with a query predictor and reranker.
Do people do 'research hackathons'? Basically try to validate a research idea in a couple of days with some peers, possible geared towards a (short) paper as eventual outcome.
Easy few-shot classification with ≥10k classes? The Infer-Retrieve-Rank (IReRa) code is now online at !
Optimize an IReRa system on your dataset, configure different student and teacher LMs, use custom retrievers, and pick your optimization logic!
🚨Preprint🚨
Interpretable explanations of NLP models are a prerequisite for numerous goals (e.g. safety, trust).
We introduce Causal Proxy Models, which provide rich concept-level explanations and can even entirely replace the models they explain.
1/7
Extracting and coding adverse drug reactions in biomedical literature is vital for drug safety, but not easy to automate.
This DSPy program combines in-context learning and retrieval to set SOTA on BioDEX (~35% Recall
@10
), using ~100 samples.
Notebook:
How do you prompt LMs or RAG-systems to do classification with over 10,000 classes ? What are the core ideas behind the Infer-Retrieve-Rank (IReRa) system?
Hey everyone! I am BEYOND EXCITED to publish an interview with Karel D'Oosterlinck (
@KarelDoostrlnck
) from
@ugent
&
@stanfordnlp
! 🔥
Karel's Infer-Retrieve-Rank is an amazing use of DSPy for Extreme Classification! Learned a ton from this conversation!🤯
Prompt and pipeline engineering need not be brittle.
Modular programs, once automatically optimized, can serve as effective general-purpose solutions. We’re excited to push this towards better and cheaper programs.
Read the preprint:
Code coming ASAP!
Drug monitoring (PharmacoVigilance) is incredibly important for public safety. We set out to improve it using NLP!
Introducing BioDEX, a dataset for Biomedical adverse Drug Event Extraction, containing 19k papers and 256k expert-created drug reports.
📄
Drug monitoring (PharmacoVigilance) is incredibly important for public safety. We set out to improve it using NLP!
Introducing BioDEX, a dataset for Biomedical adverse Drug Event Extraction, containing 19k papers and 256k expert-created drug reports.
📄
With ≅50 labeled inputs and a minimal prompt, we bootstrap prompts left-to-right, using different LMs for each module–this is crucial to achieve the cheapest program with best performance.
This optimization is instantiated from the logic directly, which is *tiny* (thanks DSPy!)
I’m back at Stanford for 6 months, working with
@stanfordnlp
and specifically professor
@ChrisGPotts
! Let’s meet up to talk about explainable AI, biomedical NLP or interpretable model editing (my new project 👀).
#AI
gebruiken om op basis van bestaande
@ugent
vakken nieuwe vakken te verzinnen? Dat kan uiteraard: dit is hoe "Advanced Procrastination" of "Introduction to Milking Cows" er zou uit zien aan de UGent.
@lateinteraction
This could not be possible without the rapid development-cycle DSPy permits!
Can't wait to explore further, so much low-hanging fruit: chunking, ensembling, hierarchical ontologies, etc. -- all backed by bootstrapping and data-driven optimization.
I had a TON of fun discussing Infer-Retrieve-Rank (IReRa) with
@CShorten30
on the
@weaviate_io
podcast!
We talked about extreme classification, DSPy, biomedical NLP, advanced program optimization strategies, and much more!
Hey everyone! I am BEYOND EXCITED to publish an interview with Karel D'Oosterlinck (
@KarelDoostrlnck
) from
@ugent
&
@stanfordnlp
! 🔥
Karel's Infer-Retrieve-Rank is an amazing use of DSPy for Extreme Classification! Learned a ton from this conversation!🤯
We optimize the program on 4 datasets.
We set SotA on 3 HR benchmarks (labeling job vacancies) and get meaningful traction on BioDEX (extracting medical reactions from full biomedical papers), despite these being very different in shape.
cost to optimize <<< cost to train.
I just got rate limited trying to search for a tweet. Does anyone remember the one flying around about good multi-label classification performance with LLMs? I feel like it said 10,000 labels or something surprising
(And can twitter *please* add a bookmark search option)
(1/7) Als laatstejaarsstudent burg. ir. computerwetenschappen aan de
@ugent
@ugent_fea
heb ik afgelopen semester bijgehouden hoe veel uur ik geïnvesteerd heb in mijn studies, thesis, 'passion projects' en vakantiejobs. Het resultaat is deze grafiek 👇.
If I read a paper that is a few months old and the Twitter thread has already died out, is it weird to "revive" it if I have some questions / want to engage in a discussion?
@PhDVoice
#AcademicChatter
#AcademicTwitter
👀Sneak peek of the the new study aid application we are developing together with
@GentseStud
for all
@ugent
students. You will be able to easily share notes, documents and tips&tricks within the confines of the new UGent copyright rules.
💪Beta-release end of September
📢Tasks with > 10k classes (e.g. information extraction) are hard for in-context learning: typically a tuned retriever or many in-context calls per input are used ($$$)
Infer-Retrieve-Rank (IReRa) is a SotA program using 1 frozen retriever with a query predictor and reranker.
Excited to share a new model with
@ContextualAI
that tops the AlpacaEval 2.0 leaderboard!
How did we manage to rank higher than models like GPT4, Claude 3 and Mistral Medium? Enter iterative alignment… 🧵
📢Tasks with > 10k classes (e.g. information extraction) are hard for in-context learning: typically a tuned retriever or many in-context calls per input are used ($$$)
Infer-Retrieve-Rank (IReRa) is a SotA program using 1 frozen retriever with a query predictor and reranker.
Just had my first
#opensource
contribution merged on
@github
(to ) 🥰🤩. My contribution wasn't much, but I am amazed how easy the entire process was. I think I might do this more often 😉.
🚨Preprint🚨
Interpretable explanations of NLP models are a prerequisite for numerous goals (e.g. safety, trust).
We introduce Causal Proxy Models, which provide rich concept-level explanations and can even entirely replace the models they explain.
1/7
Slightly late announcement, but I'm thrilled that the first paper of my PhD has been accepted to
#NeurIPS2022
! Many thanks to everyone who made this possible!
👀Sneak peek of a weekend project I'm working on: intuitive video editing via language. I hope to add many NLP-based features soon that will allow for insanely quick text-based video editing, without ever opening actual editing software.
How can you optimize RAG-like systems? One way is to have a teacher model generate demonstrations for a student model.
However, you can go much further in this teacher - student interaction. We discussed some advanced optimization ideas in this clip:
The people in my building are organizing a French (!) chocolate tasting event. I am deeply offended and have no choice but to retaliate by organizing a Belgian wine & cheese evening.
@YiMaTweets
As a first year PhD student, I feel it’s challenging to get a good grip on the history of the field. How would you advise a PhD student in ML to balance “catching up” with the field on one hand, and having productive output on the other hand?
Professoren die hun les via livestream geven maar toch weigeren die dan op te nemen en achteraf online te plaatsen, waarom??? Ik probeer het oprecht te begrijpen. Lesopnames zijn goud waard.
Neural coref is an important step in many NLP pipelines. SOTA coref methods are inefficient, using >= O(n) passes of an LM per document.
We revisit ~efficient~ coref, identify and fix 2 of its routine failure cases and close the gap with SOTA by 34%!
Interpretability claims should be rigorously measured, else we might run the risk of deceiving ourselves. In this work led by Jing Huang, we assessed a recent OpenAI interpretability proposal and found explanations of neurons to generally not align with actual behavior.
Jing Huang, Atticus Geiger,
@KarelDoostrlnck
@ZhengxuanZenWu
& I found this OpenAI proposal inspiring and decided to assess it. We find that the method has low precision and recall, and we find no evidence for causal efficacy. To appear at BlackboxNLP:
Ik houd van vakken die samen met de theorie ook een portie geschiedenis voorzien: "Door wie is iets uitgevonden? Wanneer? Wat was de tijdsgeest toen, welk probleem wouden ze oplossen? Welke impact had de uitvinding?". Helpt alles in een context te plaatsen en te onthouden.
Awesome and detailed explanation of our Infer-Retrieve-Rank (IReRa) work by
@BotDeepLearning
!
I appreciate how they walk through the actual code and show actual examples of bootstrapped demonstrations. Subscribed!
Trots om één van de genomineerden voor Student Van Het Jaar
@Stadgent
te mogen zijn! Bedankt aan de vriendjes van
@VTKGent
en
@GentseStud
om niet vies te zijn van een ambitieus projectje of twee😉. Lees de interviews met mij en de andere kandidaten hier:
Loved my time at
#CLeaR2023
. People have been posting many pictures of the conference, so enjoy some shots of Tübingen instead! See you all next year at
@UCLA
?
@CLeaR_2022
Infer-Retrieve-Rerank (
@KarelDoostrlnck
et al.) is a simple but powerful paradigm to use LLMs for complex classification problems with thousands of classes. Examples include medical reactions 🥼 and job skills/qualifications 👷.
Use an LLM to infer a set of predictions, use
@trappology
@lateinteraction
1. Follow
@lateinteraction
and read some of his threads, they are great starting points.
2. Check out some of the tutorials and notebooks over at
3. Keep your eyes peeled for a new paper we’re releasing very soon 🤫
🚨Preprint🚨
Interpretable explanations of NLP models are a prerequisite for numerous goals (e.g. safety, trust).
We introduce Causal Proxy Models, which provide rich concept-level explanations and can even entirely replace the models they explain.
1/7
Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks
Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵
IReRa is a general RAG-type system which can be efficiently optimized towards a range of Information Extraction datasets.
Prompt and pipeline engineering should be scalable. This code provides the IReRa building blocks; use it to build (&optimize) all kinds of RAG-type systems.
Gaan de zaken wat te goed in je leven? Introducing geautomatiseerde demotivatie™! Net enkele unmotivational quotes gegenereerd met
#AI
, dit zijn mijn favoriete:
Today, I'm officially a *second-year* PhD student. The first year was even more fun than I ever could have imagined.
Many thanks to all friends, colleagues and mentors along the way!
You should read our new guide to LLM abstractions, a stack with 5 layers!
To randomly help this tweet reach ppl, see how happy DSPy power users feel when they get state-of-the-art scores using DSPy optimizers—so happy, in fact, they make fancy slack emojis. cc:
@KarelDoostrlnck
Want to learn more about Causal Proxy Models and concept based model explanations? Swing by poster
#706
, session 2 (2pm HST) in exhibit hall 1!
#ICML2023
🚨Preprint🚨
Interpretable explanations of NLP models are a prerequisite for numerous goals (e.g. safety, trust).
We introduce Causal Proxy Models, which provide rich concept-level explanations and can even entirely replace the models they explain.
1/7
Crucially, we apply discrete prompt optimization to the in-context module, with the grounding in-the-loop.
No human-expert prompting work required!
Easy, intuitive, reproducible, and modular; this is how in-context learning should be done.
We’re working on a more rigorous study; in the meantime enjoy our notebook and feel free to build your own state-of-the-art reaction-extraction systems for biomedical literature!
Get some more information on the BioDEX dataset:
Drug monitoring (PharmacoVigilance) is incredibly important for public safety. We set out to improve it using NLP!
Introducing BioDEX, a dataset for Biomedical adverse Drug Event Extraction, containing 19k papers and 256k expert-created drug reports.
📄
This Monday, I'm giving a talk on how NLP can help parse biomedical documents and how to efficiently bootstrap RAG-like systems for such tasks!
If you're in Ghent, come join! It's at the
@ml6team
office.
@clin_dev_1
Well, not solved yet! I think we still need some work to get stronger and cheaper few-shot performance on a range of biomedical tasks.
If cheap enough, I'd love to run IReRa over every public access biomedical paper for a bunch of ontology-tasks and opensource the results.
@llama_index
This is amazing, thanks for helping make Infer-Retrieve-Rank accessible.
We've also just released our official repository. Exciting times for RAG🤯
We're on the cusp of general RAG-type programs, but a ton of exciting research and engineering still needs to be done.
The design spaces for the program logic and the optimization flow are vastly underexplored. I've listed some ideas below.
Happy to collaborate, DMs are open!
We believe that both training on counterfactual data and localizing hidden representations through intervention training could be a valuable avenue towards the development of more robust, explainable, and malleable neural networks.
📄 Read the paper here:
I've been selected as a highlighted reviewer for the
@XAI_in_Action
workshop at
@NeurIPSConf
!
I'm glad we are putting some spotlight on reviewing, since this is such an important but under appreciated part of our field. Thanks!
Ervaring met web-development? Nog op zoek naar een betaalde vakantiejob? Met
@GentseStud
en
@VTKGent
zijn we op zoek naar een vakantiejobber om deze zomer mee te werken aan een ambitieus studentenproject met enorme impact! Zie foto's of DM voor meer info.
Previous fine-tuned and in-context attempts struggled on BioDEX because of long contexts, the biomedical domain, and extreme classification (~26k classes).
Combining grounding and in-context learning is a great first step, but many pipelines are possible!
🧐In our new commentary: we argue the notion of "illusion" in this paper labels correct explanation as illusory, & that avoiding "illusion" would require unwarranted constraints on NNs. The "illusions" are, though, instructive about how models work.
1/
@hvbris_
Should you use DSPy to scaffold the interaction between LMs and retrievers?id say so!
If you have a metric you want to optimize and some data, you can get a lot of value out of using DSPy’s optimization. If you know gpt4 can do the task zero-shot, bootstrapping can be strong.
Can AI help drug safety research? Find out January 10, 9am PST (6pm CET). Come listen to my talk, no expert AI-knowledge required to attend!
A healthy mix of Pharma, AI, and Venture Capital people have already signed up.
@tomgoldsteincs
We've just released a benchmark for explanation methods in NLP! While many different kinds of methods should still be added, we hope that our benchmark will lead to a more principled evaluation of explanation methods in AI.