David Dohan Profile
David Dohan

@dmdohan

8,296
Followers
1,420
Following
69
Media
478
Statuses

reducing perplexity @openai | past: probabilistic programs, proteins, science & reasoning @ google brain 🧠

Joined August 2011
Don't wanna be here? Send us removal request.
Pinned Tweet
@dmdohan
David Dohan
2 years
Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper:
Tweet media one
3
98
668
@dmdohan
David Dohan
2 years
“99% of Americans don’t talk about AI at parties. You can too if you try!”
Tweet media one
90
205
3K
@dmdohan
David Dohan
1 year
New chapter: Happy to share that I recently joined @OpenAI ! Thankful for many collaborators, friends, and mentors who made my 6 years of research @Google Brain special🧠 Excited to collaborate toward reliable reasoning & alignment in AI systems and products like #ChatGPT
38
20
1K
@dmdohan
David Dohan
6 months
🩶🫶 Ilya and Sam’s yin/yang was a major reason I joined OpenAI. It is still possible to repair what was shattered.
@ilyasut
Ilya Sutskever
6 months
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
7K
4K
33K
25
32
796
@dmdohan
David Dohan
6 months
OpenAI is nothing without its people
11
25
567
@dmdohan
David Dohan
6 months
🩶🫶
@sama
Sam Altman
6 months
i love the openai team so much
5K
4K
73K
4
14
338
@dmdohan
David Dohan
6 months
language models are superhuman at predicting the next word try this yourself to see how hard it is
@_jasonwei
Jason Wei
6 months
Like the International Math Olympiad or Spelling Bee, there should be a “language modeling competition” where humans compete to predict the next word in a sequence. The best humans would probably still lose to GPT-2, and we’d have more empathy for how hard it is to be an LLM :)
35
30
471
19
23
294
@dmdohan
David Dohan
1 year
LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi , with @xinyun_chen_ , @kanishkamisra , @nkscales_google , @edchi , Nathanael Schärli, & @denny_zhou
@arankomatsuzaki
Aran Komatsuzaki
1 year
Large Language Models Can Be Easily Distracted by Irrelevant Context
Tweet media one
14
72
387
0
29
203
@dmdohan
David Dohan
6 months
We’re so back (to work)
@OpenAI
OpenAI
6 months
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.
6K
13K
67K
6
0
190
@dmdohan
David Dohan
1 year
Tweet media one
9
3
188
@dmdohan
David Dohan
10 months
At ICML & excited to talk with old and new friends Message me to chat. A few possible topics: - Model chains, agents, programs - Probabilistic programming - Simulation-based/likelihood-free inference - AI for science and reasoning - AI-first Human-Computer interfaces
Tweet media one
7
7
177
@dmdohan
David Dohan
1 year
Tweet media one
2
11
162
@dmdohan
David Dohan
1 year
The C Elegans of GPT
Tweet media one
@karpathy
Andrej Karpathy
1 year
This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows. E.g. we…
Tweet media one
223
1K
9K
2
23
159
@dmdohan
David Dohan
6 years
Excited to present our work on evolving architectures for translation and image generation in a modular language this afternoon at #GECCO2018 ! Joint work with David So and Quoc Le.
Tweet media one
3
30
121
@dmdohan
David Dohan
1 year
Copilot turning me from code monkey into tab monkey
7
2
109
@dmdohan
David Dohan
6 months
Found the OpenAI tenders
Tweet media one
4
1
115
@dmdohan
David Dohan
2 years
ProtNLM (Protein Natural Language Model) annotates previously "uncharacterised proteins" in @uniprot in English Instead of a restricted tag set, it predicts function as language: [amino acids] -> "CRISPR-associated endonuclease Cas9" Collaboration between @GoogleAI and @emblebi
@emblebi
EMBL-EBI
2 years
Ever got a result back saying uncharacterised protein? 😩 @uniprot and @GoogleAI have teamed up to create a natural language processing model that has generated over 40 million protein annotations to address this challenge.
Tweet media one
5
77
186
2
29
112
@dmdohan
David Dohan
1 year
GPT4 feels qualitatively different than models I've used before: like working with a creative partner with vast knowledge. The results on standardized tests will make the rate of progress tangible for many people outside AI
@OpenAI
OpenAI
1 year
Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:
2K
18K
64K
1
7
104
@dmdohan
David Dohan
1 year
Just ask for smaller, better models! Paper led by @_angie_chen , w/ @david_r_so & me: LMs discover architectures *by directly writing Python Jax code* instead of searching a restricted DSL With EvoPrompting, we use LMs within an evolutionary algorithm to crossover parent prompts
@_angie_chen
Angelica Chen
1 year
New paper w/ @dmdohan and @david_r_so ! Can LMs be used to design novel model architectures? We propose EvoPrompting, which evolves few-shot prompts to enable a code-pretrained LM to generate novel state-of-the-art architectures. (1/4)
Tweet media one
8
79
401
1
9
97
@dmdohan
David Dohan
6 months
@yacineMTB advice I give for short notice interview prep: - get a copy of "elements of programming interviews in python" - read through each chapter & for each problem: a. spend a few minutes thinking of ways you might solve it. b. Imagine approaches: visualize solution/gestalt,…
4
3
90
@dmdohan
David Dohan
1 year
GPT-4 is in the top 20% of test takers in many of these standardized tests
Tweet media one
6
23
90
@dmdohan
David Dohan
1 year
Declarative langs like SQL let us declare a goal (query), and the system plans how to satisfy constraints LMQL does this for LMs: can get better results for sampling & tool use in fewer tokens bc it optimizes the decoding Try it out in the playground:
@lmqllang
LMQL (Language Model Query Language)
1 year
🚀 Excited to announce the first release of , a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.
15
179
745
2
12
85
@dmdohan
David Dohan
5 months
Presenting two posters at #NeurIPS2023 , come by! 10:45am-12:45pm for both - #527 Tuesday @ poster session 1: "Training Chain-of-Thought via Latent-Variable Inference" - #332 Thursday @ poster session 5: "EvoPrompting: Language Models for Code-Level Neural Architecture Search"
Tweet media one
Tweet media two
0
5
84
@dmdohan
David Dohan
6 months
Maybe an ask-to-answer to @adamdangelo on Quora can clear things up?
3
3
80
@dmdohan
David Dohan
1 year
Did you know? Reading a paper signed by the author doubles your learning rate! Today we are launching to share our beloved arXiv of autographed machine learning papers with the world All proceeds from these historic artifacts go to charity 💖
9
6
76
@dmdohan
David Dohan
1 year
Neat prompt trick for Chat: "express same in Prolog" Simple way to get LMs to translate back-and-forth between informal language and formal representations like Prolog/Idris/MiniKanren/... Next up: use the formal language to check its work
@mwgkgk
mwgkgk
1 year
I'm not kidding, it's really good. Remember how the 420 latest GPT news reddit post raved about compression? Well "Prolog" as an idea of how to present information is a one word miracle
Tweet media one
Tweet media two
5
9
138
6
7
67
@dmdohan
David Dohan
1 year
LMs are pretrained to predict the next token. This description is helpful to build intuition, but it’s no longer quite accurate for RL fine tuned models.
@kandouss
Kamal Ndousse
1 year
I think "predicting the word that comes next" is a good description of what pretrained LMs (base models) do. But the description is much less apt after base models are fine-tuned with reinforcement learning.
1
3
27
4
4
63
@dmdohan
David Dohan
10 months
@_ali_taylor Paraxanthine! 80% of caffeine metabolizes to it, rest to theobromine/theophylline. All 4 are xanthines which block adenosine "4 hour half life" of caffeine doesn't include processing the 3 stimulants it turns into Rarebird has px coffee & there are preworkouts/energy drinks
Tweet media one
6
3
61
@dmdohan
David Dohan
2 years
@summeryue0 @todor_m_markov Gotta bring a “No AI room” poster to @NeurIPSConf to create an oasis at events
5
0
53
@dmdohan
David Dohan
6 months
Party in Principle
0
0
51
@dmdohan
David Dohan
1 year
New favorite prompt: "Write like Wittgenstein" The general pattern is: "Be concise. Write like X."
@tayroga
taylor
1 year
Model too verbose? "Write like Wittgenstein"
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
19
140
0
2
44
@dmdohan
David Dohan
1 year
Who's going to tell Marvin Minsky we put Perceptrons into the Society of the Mind
Tweet media one
Tweet media two
4
1
50
@dmdohan
David Dohan
1 year
Authorship ordering is a challenging problem. In "Academic Author How Names Order To", the authors propose several groundbreaking solutions for this well studied yet thus far intractable task
Tweet media one
@katherine1ee
Katherine Lee
1 year
Authorship has been increasingly challenging to determine as team sizes grow larger. We put together a set of proposals that highlight different types of contributions. We’re excited to invite the community to test out the proposals and provide feedback.
1
1
37
3
2
45
@dmdohan
David Dohan
1 year
ChatGPT can now use tools through AI Plugins: 1. Browsing: Search web to answer questions (WebGPT) 2. Code Interpreter: Write/execute/debug—sandboxed—Python to test/analyze/... 3. Interface with services like Kayak/WolframAlpha/Zapier, or ones you create!
@OpenAI
OpenAI
1 year
We are adding support for plugins to ChatGPT — extensions which integrate it with third-party services or allow it to access up-to-date information. We’re starting small to study real-world use, impact, and safety and alignment challenges:
904
4K
19K
2
5
46
@dmdohan
David Dohan
7 months
Time for Good Old Fashioned AI to make a comeback? I enjoyed "Cognitive Architectures for Language Agents" from @tedsumers and @ShunyuYao12 Discussion tomorrow with @hwchase17 and @charles_irl on the evolving world of scaffolds/abstractions around LLMs!
@hwchase17
Harrison Chase
7 months
Our webinar tomorrow might be my favorite one yet. An absolute MUST JOIN for anyone building chains/agents Guests: @dmdohan - Model Cascades paper author @ShunyuYao12 - ReAct paper author @tedsumers - COALA paper author @charles_irl - top tier educator
5
33
157
5
4
41
@dmdohan
David Dohan
6 months
Googled phone # to cancel Citi credit card. Grabbed from generated info box. Called it. Weirdly got different security questions than I had noted but made it through. Request to cancel the card and they don't see it. Realize Google's search LLM gave me Chase's phone number🤦‍♂️
Tweet media one
5
1
41
@dmdohan
David Dohan
2 years
@ ICML workshops til Sunday! Come by workshop Friday @ 9:40am for our talk, with posters @ 5pm. You'll learn how probabilistic programming lets us formalize models talking to models ("model cascades"), unifying many approaches to prompting and inference.
1
6
38
@dmdohan
David Dohan
2 years
Prompt engineering was fun while it lasted
@keirp1
Keiran Paster
2 years
Can large language models write prompts…for themselves? Yes, at a human-level (!) if they are given the ability to experiment and see what works. with @Yongchao_Zhou_ , @_AndreiMuresanu , @ziwen_h , @silviupitis , @SirrahChan , and @jimmybajimmyba (1/7)
15
146
470
2
2
38
@dmdohan
David Dohan
1 year
The HF0 crew have made what I can best describe as a tech monastery in the heart of San Francisco. Hard to imagine a more focused environment. Apply if you want 3 incredibly focused months to build on your projects!
@davefontenot
Dave Font
1 year
GPT4 launched yesterday. Today, HF0 launches: (1/n)
60
200
779
1
2
33
@dmdohan
David Dohan
2 years
Teaching Minerva🦉 math & science has been a ton of fun. What else were we supposed to do after realizing all the LaTeX on arXiv is available? Check out the sample explorer: paper:
@alewkowycz
alewkowycz
2 years
Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM.
Tweet media one
108
2K
8K
2
3
32
@dmdohan
David Dohan
3 years
New paper on program synthesis with large language models (244M-137B). We investigate: (1) how scaling improves performance on Python and math tasks (2) whether the models can predict output of executing code (3) humans-computer collaboration to write programs via conversation
@gstsdn
augustus odena
3 years
New paper! We use big language models to synthesize computer programs, execute programs, solve math problems, and dialog with humans to iteratively refine code. The models can solve 60% and 81% of the programming and math problems, respectively. A thread:
Tweet media one
20
350
1K
2
3
31
@dmdohan
David Dohan
1 year
2
0
30
@dmdohan
David Dohan
2 years
WebGPT by prompting only Waiting for an API that lets us do prompt tuning/soft prompting (gradient based continuous z tuning) to make this even easier
@dust4ai
Dust
2 years
WebGPT reproduced from advanced prompting only. Dust-based web-search assistant demo answers questions by searching the web, summarizing content and compiling a final answer with references:
Tweet media one
21
87
636
3
2
28
@dmdohan
David Dohan
1 year
By letting an LM parse natural -> formal language, we get the best of both worlds: the formal system checks consistency of the natural language reasoning LM = fast system 1, Prolog etc = slow system 2 @Maxwell_Nye has neat work exploring the combo:
1
1
27
@dmdohan
David Dohan
8 months
Come see what's brewing @OpenAI
@OpenAI
OpenAI
8 months
We’ll be hosting our first developer conference, OpenAI DevDay, on November 6. Registration to attend in person in San Francisco will open in a few weeks. We’ll also livestream the keynote.
Tweet media one
174
407
2K
1
1
26
@dmdohan
David Dohan
1 year
The rate of progress is astounding. Where do we land after 2 more comparable leaps? June 11, 2020: GPT-3 March 14, 2023: GPT-4 Jan 1, 2026: ??? Jan 1, 2029: !?!?!?!?
2
0
25
@dmdohan
David Dohan
2 years
@jekbradbury @ylecun Also check out - does an excellent job of demonstrating factored cognition (~latent variable models) with LLMs. It does not have explicit probabilistic inference yet.
1
2
25
@dmdohan
David Dohan
2 years
Manifold markets had <45% likelihood of the MATH dataset hitting 1/2 correct before 2025. Our work on🦉Minerva resolved it to success 3 years early
@vedantmisra
Vedant Misra @ICLR2024
2 years
📈
Tweet media one
0
2
26
1
1
23
@dmdohan
David Dohan
4 years
Want Bespoke, but for everything (especially neural network structures)
@awwbees
Ryan Challinor @[email protected]
4 years
more playing around with livecoding python in bespoke. I added a nice "note stream" module for visualization, which is very useful for understanding what you're doing in live generative composition.
3
12
91
0
2
24
@dmdohan
David Dohan
2 years
Has science gone too far?
@BigScienceLLM
BigScience Large Model Training
2 years
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 101%
69
78
1K
1
0
23
@dmdohan
David Dohan
4 years
Built a few graph viz tools on top of the @rem_note API in @observablehq . Read-only view for now. Next up: extend to whole knowledge bases & allow directly manipulating content inside the graph! What else would you like to see?
Tweet media one
Tweet media two
Tweet media three
2
0
23
@dmdohan
David Dohan
1 year
@tszzl Alignment is the ultimate capability
1
1
23
@dmdohan
David Dohan
2 years
@ericjang11 @OfirPress Can fine tune a base model on different data and weight average, or use the multiple models as a mixture of experts.
@margs_li
Margaret Li @ICLR 2024
2 years
Train an LM made of independent expert LMs (no syncs! no shared params!) ➡️ ➕ new or ➖ existing experts. At. Any. Time. ➡️ Ensemble OR parameter average(!!) to outperform dense & sparse LMs & ensemble baselines with less compute, a fraction of the simultaneous GPU usage. 🌳/n
Tweet media one
7
61
341
1
1
23
@dmdohan
David Dohan
1 year
Congratulations to the Metaphor team for launching! It's a different way of building a search engine. You "search by prompting" - instead of asking a question, phrase it so the natural completion would give the answer like: "My favorite personal webpages on the internet are"...
@ExaAILabs
Exa (prev. Metaphor)
1 year
is now publicly available! Metaphor is a search engine based on generative AI, the same sorts of techniques behind DALL-E 2 and GPT-3 1/
84
578
3K
1
0
22
@dmdohan
David Dohan
2 years
The "No AGI zone" shirt looks more useful by the day.
@dmdohan
David Dohan
2 years
@avitaloliver @savvyRL @FelixHill84 Would you like any “No AGI zone” tshirts Even better if it’s reversible with “Let’s talk about AI”
Tweet media one
0
0
4
1
1
21
@dmdohan
David Dohan
5 years
Look forward to interfaces that let designers work with generative models like this neat SVG generator.
@rapha_gl
rapha gontijo lopes
5 years
My 1st @GoogleAI Residency paper is finally on arxiv! We train a powerful generative model of fonts as SVG instead of pixels. This highly structured format enables manipulation of font styles and style transfer between characters at arbitrary scales! 👉🏽
Tweet media one
13
249
965
1
2
21
@dmdohan
David Dohan
1 year
@typedfemale Favorite Twitter bio
Tweet media one
1
0
20
@dmdohan
David Dohan
1 year
@andrewwhite01 The "I Can't Believe It's Not Better" workshop @ NeurIPS does this! So many beautiful ideas with the tiny problem that they don't actually work (yet?) @ICBINBWorkshop
1
1
20
@dmdohan
David Dohan
1 year
@OpenAI @Google Amused that the media beat me to the announcement
0
1
20
@dmdohan
David Dohan
10 months
Come by the 11am posters on Wednesday to learn how irrelevant context effects LLMs:
Tweet media one
@dmdohan
David Dohan
1 year
LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi , with @xinyun_chen_ , @kanishkamisra , @nkscales_google , @edchi , Nathanael Schärli, & @denny_zhou
0
29
203
1
0
20
@dmdohan
David Dohan
6 months
@miramurati @sidorszymon @sama I think🫡 is a valid offer letter
0
0
18
@dmdohan
David Dohan
3 years
Had a chance to discuss the state of natural language processing & potential applications toward an "IDE for thought" with @AthensResearch last month. @PsionicaOrg demoed Dual, which provides natural language interface over a knowledge base. recording:
@AthensResearch
Athens 🏛
3 years
For today's community call in 40 minutes, @dmdohan (Google Brain) will be chatting about how we might apply AI/NLP/GPT-3 to Athens Paul Bricman is joining to talk about his project A preview of the call here: Don't miss this !!
Tweet media one
Tweet media two
0
4
14
0
3
19
@dmdohan
David Dohan
4 years
PB3O solves discrete blackbox optimization problems by adaptively allocating resources among an evolving set of search algorithms. Check it out at the ICML poster session Wednesday!
Tweet media one
@cangermueller
Christof Angermüller
4 years
If you are interested in P3BO, join our ICML poster session this Wed! Poster session: Paper:
1
2
9
1
7
18
@dmdohan
David Dohan
2 years
AI moves us into a declarative world. Specify the goal & constraints, and the system searches for solutions
@tylerangert
Tyler Angert
2 years
in a way LLMs mirror the trend of going from imperative to functional + declarative programming. you just say what you want instead of describing the process
3
0
17
1
2
17
@dmdohan
David Dohan
5 months
@AriX @SoftwareAppsInc Incredible Emulated macOS more responsive than any modern web page
Tweet media one
1
2
18
@dmdohan
David Dohan
9 months
@jeremyphoward imo it only makes sense in a data limited regime - otherwise the embeddings/projections let you store more info. Scaling laws are on non-embedding params, not sure how that interacts Those works were done when the game was "what's the best perplexity with 20m params, including…
1
0
13
@dmdohan
David Dohan
6 months
i am not a very good language model =\ the site is also subtly broken in a few ways (not all words are allowed tokens, some correct guesses marked as wrong, ...) still good way to build intuition! anyone know who actually made this?
Tweet media one
2
0
18
@dmdohan
David Dohan
3 years
I'm most excited for the beginnings of conversational programming. It's early days - can you imagine the programming UIs we will have in a few years? It's an entirely different way of creating (especially for the non-expert).
@gstsdn
augustus odena
3 years
Second, we evaluate whether these models can interact with a human to iteratively refine their outputs. We find that 4 turns of dialog with a human can double the number of problems solved by the model.
Tweet media one
1
3
57
3
2
16
@dmdohan
David Dohan
6 months
paper comparing humans v ada v gpt3 at predicting the next work (from @bshlgrs , @FabienDRoger , @justanotherlaw , @emclean1 )
Tweet media one
@ArthurConmy
Arthur Conmy
6 months
@_jasonwei All human Top1 accuracies are worse than even a 350M model (a small GPT-3) here
0
0
25
3
0
16
@dmdohan
David Dohan
6 months
Makes sense
@ChrisJBakke
Chris Bakke
6 months
BREAKING: Nathan Fielder has been serving the board of OpenAI in a senior consulting role since Thursday night
Tweet media one
Tweet media two
55
144
3K
1
0
16
@dmdohan
David Dohan
4 years
Jamming on @darklang with friends is tons of fun. Love the collaborative spatial canvas - feels like @figma for code with constant live feedback.
@tayroga
taylor
4 years
Building a side project with @michaelrbock and @dmdohan using @darklang - it's super fun! Fastest backend dev experience. Thanks @paulbiggar and @ellenchisa
1
0
12
2
1
16
@dmdohan
David Dohan
2 years
@winniethexu @hmichalewski @jaschasd @sirbayes @alewkowycz @jacobaustin132 @Bieber @Yuhu_ai_ @RandomlyWalking PPLs represent probabilistic models as programs. They extend deterministic code with the ability to sample from distributions, and observe data, i.e. condition the model. They also provide machinery to run inference (ancestral sampling, beam search, particle MCMC, …)
1
0
15
@dmdohan
David Dohan
1 year
Prolog ("Programming in Logic") was designed for AI/symbolic reasoning in the Good Old-Fashioned AI days—good for formalizing logical rules & constraints As a declarative language, you specify goals and it searches for how to achieve them, or tells you something is inconsistent
1
0
14
@dmdohan
David Dohan
1 year
One of my first projects with LMs was trying to generate TensorFlow models with LSTMs, but they weren't quite good enough @ codegen yet So this paper marks coming full circle: completing this goal in my last paper @Google Thanks @MaartenBosma for starting the effort last year!
1
0
15
@dmdohan
David Dohan
2 years
Huge props to the organizers for their leadership in pushing this to completion! Exciting model for large-scale collaborations that benefit the whole community
@jaschasd
Jascha Sohl-Dickstein
2 years
After 2 years of work by 442 contributors across 132 institutions, I am thrilled to announce that the paper is now live: . BIG-bench consists of 204 diverse tasks to measure and extrapolate the capabilities of large language models.
Tweet media one
37
574
3K
0
1
14
@dmdohan
David Dohan
2 years
Tweet media one
0
1
14
@dmdohan
David Dohan
6 months
@ilyasut 💖🫡
0
0
14
@dmdohan
David Dohan
2 years
Instead of having a LM solve algorithmic tasks directly, train it to predict a trace of the individual reasoning steps. It solves more problems and can "show its work" in a scratchpad along the way!
@Maxwell_Nye
Maxwell Nye
2 years
New paper! We show that huge language models (137B params!) can be trained to solve algorithmic tasks by “showing their work”---writing intermediate text to a scratchpad. This “scratchpad” technique even allows us to predict the execution of Python code.
Tweet media one
Tweet media two
9
153
787
2
0
13
@dmdohan
David Dohan
2 years
@arankomatsuzaki @GoogleAI Talk & poster tomorrow at ICML!
@dmdohan
David Dohan
2 years
@ ICML workshops til Sunday! Come by workshop Friday @ 9:40am for our talk, with posters @ 5pm. You'll learn how probabilistic programming lets us formalize models talking to models ("model cascades"), unifying many approaches to prompting and inference.
1
6
38
0
1
13
@dmdohan
David Dohan
1 year
Related works -Concurrent work from Elliot Meyerson, @joelbot3000 , and others: "Language Model Crossover" - Evolution through LMs: - AutoML Zero:
@joelbot3000
Joel Lehman
2 years
“Evolution through Large Models” – new paper from our team at OpenAI. Step towards evolutionary algos that continually invent and improve at inventing: Large models can suggest (+ improve at making) meaningful mutations to code. Paper: 1/4
38
484
3K
1
1
13
@dmdohan
David Dohan
2 years
@winniethexu @hmichalewski @jaschasd @sirbayes @alewkowycz @jacobaustin132 @Bieber @Yuhu_ai_ @RandomlyWalking Cascades provide scaffolds for accomplishing tasks that a single model can't, making them more interpretable and alignable. This is closely related to factored cognition, and the Eliciting Latent Knowledge proposal. Think of LMs as a fast system 1 and a Cascade as a slow system 2
2
0
13
@dmdohan
David Dohan
4 years
@Conaw Wire together blocks that operate on natural language - "reframe question as statement" - "replace passive with active voice" - "Is this an example of sunk cost bias?" Models like GPT3 can compose with themselves. Use to pipeline different prompts together.
1
2
13
@dmdohan
David Dohan
2 years
@NireBryce Nyxt is the only browser I've seen that has a history tree. It's built on webkit using common lisp.
Tweet media one
1
3
11
@dmdohan
David Dohan
2 years
Models that seem initially interchangeable for same purpose can have vastly different characters
@fabianstelzer
fabian (glif/acc)
2 years
DALL-E 2 vs Midjourney vs StableDiffusion mega thread: photography, illustration, painters, abstract these image synths are like instruments - it's amazing we'll get so many of them, each with a unique "sound" 🤯 rules: same prompt, 1:1 aspect ratio, no living artists
Tweet media one
383
4K
23K
0
2
11
@dmdohan
David Dohan
2 years
The paper focuses on quantifiable problem solving, but 🦉does great at explaining technical concepts. It has read all of the arXiv after all. Curious about REINFORCE? Just prompt it to write a paper on it: `\section{A derivation of the score function gradient estimator}`
Tweet media one
3
2
12
@dmdohan
David Dohan
1 year
It's a Python DSL that allows specifying complex constraints alongside control-flow & tools. Using masking lets it more cheaply explore many branches than manually sampling from LM What other programming language experiments we will see which takes LMs as a primitive?
1
1
12
@dmdohan
David Dohan
1 year
@atroyn @nearcyan @alexeyguzey “just predicts tokens given the conditional distribution of tokens” is true of LM pretraining but no longer true once finetuned with RLHF/RLAIF. Then it is optimizing a goal: The reward model is happy at the end of the current exchange. Can still view as goal conditioned policy.
2
0
12
@dmdohan
David Dohan
4 years
Jax code for linear time attention & approximating arbitrary kernels (softmax and beyond). Used to train protein BERT models for our paper "Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers" -
@adrian_weller
Adrian Weller
4 years
Code release by @XingyouSong for Performers fast attention yielding linearly scalable transformers with generalized attention . Great work with Krzysztof C, Valerii L @CambridgeMLG , @dmdohan , @jared_quincyd , @tamassarlos , David B, @LucyColwell37
0
0
1
0
5
11
@dmdohan
David Dohan
2 years
The first machine learning “all you need”
0
0
10
@dmdohan
David Dohan
1 year
@DynamicWebPaige @OpenAI fwiw the last season of silicon valley basically is SV: AI Edition [spoiler] The compression algorithm becomes intelligent :) Fun fact: we generated all the code on the background screens with a finetuned GPT2
1
0
11
@dmdohan
David Dohan
6 months
Accepting investments at 25 cents per TPU (Tendie Participation Unit)
2
0
11
@dmdohan
David Dohan
1 year
@yasuoyamasaki Appreciate the connection! Had many of the ideas but didn't build in open & allow explosion of growth around it There's tons of related work on model chaining & tool use. Notably, AI-Chains () and arguably Society of Mind
@dmdohan
David Dohan
1 year
Who's going to tell Marvin Minsky we put Perceptrons into the Society of the Mind
Tweet media one
Tweet media two
4
1
50
2
0
11
@dmdohan
David Dohan
6 months
Excuse me it’s called ClippAI
@tszzl
roon
6 months
excited to start my job at Microsoft Advanced AI Research on Monday
98
25
2K
1
0
11
@dmdohan
David Dohan
2 years
This was a fun collaboration with amazing colleagues at Blueshift and Brain. Looking forward to what Minerva will learn next! @alewkowycz , @AJAndreassen , @ethansdyer , @hmichalewski , @vinayramasesh , @AmbroseSlone , @cem__anil , Imanol, Theo, @Yuhu_ai_ , @guygr , and @vedantmisra !
1
0
11
@dmdohan
David Dohan
1 year
@Mononofu Can prepend docs with <good>/<bad> tokens to bake this into pretrained model, & still use all available data From "Pretraining Language Models with Human Preferences" by @tomekkorbak , @shi_kejian , @_angie_chen ... @sleepinyourhat , @EthanJPerez
Tweet media one
1
1
11
@dmdohan
David Dohan
2 years
New paper! The OptFormer is a language model that metalearns blackbox optimization and hparam tuning, incorporating trial info as text. Can imitate other algos (random search, evolution, Bayes opt+Gaussian process, ...), or use in place of GP. And this one's only 250m params!
@yutianc
Yutian Chen
2 years
Proud of your hyperparameter tuning skills? Let transformers learn from you. We present the OptFormer (), a novel text-based hyperparameter tuner, trained on massive datasets of industrial hyperparameter tuning experiments.
Tweet media one
4
63
398
1
0
11
@dmdohan
David Dohan
2 years
Have a look at the preprint [1], use one of the models [2], and explore predictions [3] Share feedback here, with corresponding authors, or on the UniProt site [4] [1] [2] [3] ) [4]
1
1
9