David Dohan @dmdohan Twitter profile

Pinned Tweet

David Dohan

2 years

Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper:

3

98

668

Last Seen Profiles

@Csp20211

@aklemen

@bryanharding

@PlayFloodrush

@sakura_n700

@UKTeam_Optimum

@sakky_hs

@_wr1d

@thespicyjayy

@GoochlandSports

@rxceipt

@TheKipSabian

@n_cuoco

@sacbabi

@lantasbengkalis

@bannapafe5

@saitoukazu

@MigoNaka

@sa_misakimama

@mhmdbnlyhsn1

@LauraKellyKS

@sakaihonten1515

@agerstnerr

@UPSPolicy

@_binshi_

@shwadfy74

@AnnWalk21456111

@ParkesHarman

@Cc4Official

@IzFreis

@hxsnipe

@curlingworldcup

@ushiroyuto

@wildrift

@aditiraaaj

@in_9111

David Dohan

@dmdohan

2 years

“99% of Americans don’t talk about AI at parties. You can too if you try!”

90

205

3K

David Dohan

@dmdohan

1 year

New chapter: Happy to share that I recently joined @OpenAI ! Thankful for many collaborators, friends, and mentors who made my 6 years of research @Google Brain special🧠 Excited to collaborate toward reliable reasoning & alignment in AI systems and products like #ChatGPT

38

20

1K

David Dohan

@dmdohan

6 months

🩶🫶 Ilya and Sam’s yin/yang was a major reason I joined OpenAI. It is still possible to repair what was shattered.

Ilya Sutskever

@ilyasut

6 months

I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.

7K

4K

33K

25

32

796

David Dohan

@dmdohan

6 months

OpenAI is nothing without its people

11

25

567

David Dohan

@dmdohan

6 months

🩶🫶

Sam Altman

@sama

6 months

i love the openai team so much

5K

4K

73K

4

14

338

David Dohan

@dmdohan

6 months

language models are superhuman at predicting the next word try this yourself to see how hard it is

Jason Wei

@_jasonwei

6 months

Like the International Math Olympiad or Spelling Bee, there should be a “language modeling competition” where humans compete to predict the next word in a sequence. The best humans would probably still lose to GPT-2, and we’d have more empathy for how hard it is to be an LLM :)

35

30

471

19

23

294

David Dohan

@dmdohan

1 year

LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi , with @xinyun_chen_ , @kanishkamisra , @nkscales_google , @edchi , Nathanael Schärli, & @denny_zhou

Aran Komatsuzaki

@arankomatsuzaki

1 year

Large Language Models Can Be Easily Distracted by Irrelevant Context

14

72

387

0

29

203

David Dohan

@dmdohan

6 months

We’re so back (to work)

OpenAI

@OpenAI

6 months

We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.

6K

13K

67K

6

0

190

David Dohan

@dmdohan

1 year

9

3

188

David Dohan

@dmdohan

10 months

At ICML & excited to talk with old and new friends Message me to chat. A few possible topics: - Model chains, agents, programs - Probabilistic programming - Simulation-based/likelihood-free inference - AI for science and reasoning - AI-first Human-Computer interfaces

7

177

David Dohan

@dmdohan

1 year

2

11

162

David Dohan

@dmdohan

1 year

The C Elegans of GPT

Andrej Karpathy

@karpathy

1 year

This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows. E.g. we…

223

1K

9K

2

23

159

David Dohan

@dmdohan

6 years

Excited to present our work on evolving architectures for translation and image generation in a modular language this afternoon at #GECCO2018 ! Joint work with David So and Quoc Le.

3

30

121

David Dohan

@dmdohan

1 year

Copilot turning me from code monkey into tab monkey

7

2

109

David Dohan

@dmdohan

6 months

Found the OpenAI tenders

4

1

115

David Dohan

@dmdohan

2 years

ProtNLM (Protein Natural Language Model) annotates previously "uncharacterised proteins" in @uniprot in English Instead of a restricted tag set, it predicts function as language: [amino acids] -> "CRISPR-associated endonuclease Cas9" Collaboration between @GoogleAI and @emblebi

EMBL-EBI

@emblebi

2 years

Ever got a result back saying uncharacterised protein? 😩 @uniprot and @GoogleAI have teamed up to create a natural language processing model that has generated over 40 million protein annotations to address this challenge.

5

77

186

2

29

112

David Dohan

@dmdohan

1 year

GPT4 feels qualitatively different than models I've used before: like working with a creative partner with vast knowledge. The results on standardized tests will make the rate of progress tangible for many people outside AI

OpenAI

@OpenAI

1 year

Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:

2K

18K

64K

1

7

104

David Dohan

@dmdohan

1 year

Just ask for smaller, better models! Paper led by @_angie_chen , w/ @david_r_so & me: LMs discover architectures *by directly writing Python Jax code* instead of searching a restricted DSL With EvoPrompting, we use LMs within an evolutionary algorithm to crossover parent prompts

Angelica Chen

@_angie_chen

1 year

New paper w/ @dmdohan and @david_r_so ! Can LMs be used to design novel model architectures? We propose EvoPrompting, which evolves few-shot prompts to enable a code-pretrained LM to generate novel state-of-the-art architectures. (1/4)

8

79

401

1

9

97

David Dohan

@dmdohan

6 months

@yacineMTB advice I give for short notice interview prep: - get a copy of "elements of programming interviews in python" - read through each chapter & for each problem: a. spend a few minutes thinking of ways you might solve it. b. Imagine approaches: visualize solution/gestalt,…

4

3

90

David Dohan

@dmdohan

1 year

GPT-4 is in the top 20% of test takers in many of these standardized tests

6

23

90

David Dohan

@dmdohan

1 year

Declarative langs like SQL let us declare a goal (query), and the system plans how to satisfy constraints LMQL does this for LMs: can get better results for sampling & tool use in fewer tokens bc it optimizes the decoding Try it out in the playground:

LMQL (Language Model Query Language)

@lmqllang

1 year

🚀 Excited to announce the first release of , a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.

15

179

745

2

12

85

David Dohan

@dmdohan

5 months

Presenting two posters at #NeurIPS2023 , come by! 10:45am-12:45pm for both - #527 Tuesday @ poster session 1: "Training Chain-of-Thought via Latent-Variable Inference" - #332 Thursday @ poster session 5: "EvoPrompting: Language Models for Code-Level Neural Architecture Search"

0

5

84

David Dohan

@dmdohan

6 months

Maybe an ask-to-answer to @adamdangelo on Quora can clear things up?

3

80

David Dohan

@dmdohan

1 year

Did you know? Reading a paper signed by the author doubles your learning rate! Today we are launching to share our beloved arXiv of autographed machine learning papers with the world All proceeds from these historic artifacts go to charity 💖

9

6

76

David Dohan

@dmdohan

1 year

Neat prompt trick for Chat: "express same in Prolog" Simple way to get LMs to translate back-and-forth between informal language and formal representations like Prolog/Idris/MiniKanren/... Next up: use the formal language to check its work

mwgkgk

@mwgkgk

1 year

I'm not kidding, it's really good. Remember how the 420 latest GPT news reddit post raved about compression? Well "Prolog" as an idea of how to present information is a one word miracle

5

9

138

6

7

67

David Dohan

@dmdohan

1 year

LMs are pretrained to predict the next token. This description is helpful to build intuition, but it’s no longer quite accurate for RL fine tuned models.

Kamal Ndousse

@kandouss

1 year

I think "predicting the word that comes next" is a good description of what pretrained LMs (base models) do. But the description is much less apt after base models are fine-tuned with reinforcement learning.

1

3

27

4

63

David Dohan

@dmdohan

10 months

@_ali_taylor Paraxanthine! 80% of caffeine metabolizes to it, rest to theobromine/theophylline. All 4 are xanthines which block adenosine "4 hour half life" of caffeine doesn't include processing the 3 stimulants it turns into Rarebird has px coffee & there are preworkouts/energy drinks

6

3

61

David Dohan

@dmdohan

2 years

@summeryue0 @todor_m_markov Gotta bring a “No AI room” poster to @NeurIPSConf to create an oasis at events

5

0

53

David Dohan

@dmdohan

6 months

Party in Principle

0

51

David Dohan

@dmdohan

1 year

New favorite prompt: "Write like Wittgenstein" The general pattern is: "Be concise. Write like X."

taylor

@tayroga

1 year

Model too verbose? "Write like Wittgenstein"

6

19

140

0

2

44

David Dohan

@dmdohan

1 year

Who's going to tell Marvin Minsky we put Perceptrons into the Society of the Mind

4

1

50

David Dohan

@dmdohan

1 year

Authorship ordering is a challenging problem. In "Academic Author How Names Order To", the authors propose several groundbreaking solutions for this well studied yet thus far intractable task

Katherine Lee

@katherine1ee

1 year

Authorship has been increasingly challenging to determine as team sizes grow larger. We put together a set of proposals that highlight different types of contributions. We’re excited to invite the community to test out the proposals and provide feedback.

1

37

3

2

45

David Dohan

@dmdohan

1 year

ChatGPT can now use tools through AI Plugins: 1. Browsing: Search web to answer questions (WebGPT) 2. Code Interpreter: Write/execute/debug—sandboxed—Python to test/analyze/... 3. Interface with services like Kayak/WolframAlpha/Zapier, or ones you create!

ChatGPT plugins

We’ve implemented initial support for plugins in ChatGPT. Plugins are tools designed specifically for language models with safety as a core principle, and help ChatGPT access up-to-date information,...

openai.com

OpenAI

@OpenAI

1 year

We are adding support for plugins to ChatGPT — extensions which integrate it with third-party services or allow it to access up-to-date information. We’re starting small to study real-world use, impact, and safety and alignment challenges:

904

4K

19K

2

5

46

David Dohan

@dmdohan

7 months

Time for Good Old Fashioned AI to make a comeback? I enjoyed "Cognitive Architectures for Language Agents" from @tedsumers and @ShunyuYao12 Discussion tomorrow with @hwchase17 and @charles_irl on the evolving world of scaffolds/abstractions around LLMs!

Cognitive Architectures for Language Agents

Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or...

arxiv.org

Harrison Chase

@hwchase17

7 months

Our webinar tomorrow might be my favorite one yet. An absolute MUST JOIN for anyone building chains/agents Guests: @dmdohan - Model Cascades paper author @ShunyuYao12 - ReAct paper author @tedsumers - COALA paper author @charles_irl - top tier educator

5

33

157

5

4

41

David Dohan

@dmdohan

6 months

Googled phone # to cancel Citi credit card. Grabbed from generated info box. Called it. Weirdly got different security questions than I had noted but made it through. Request to cancel the card and they don't see it. Realize Google's search LLM gave me Chase's phone number🤦‍♂️

5

1

41

David Dohan

@dmdohan

2 years

@ ICML workshops til Sunday! Come by workshop Friday @ 9:40am for our talk, with posters @ 5pm. You'll learn how probabilistic programming lets us formalize models talking to models ("model cascades"), unifying many approaches to prompting and inference.

1

6

38

David Dohan

@dmdohan

2 years

Prompt engineering was fun while it lasted

Keiran Paster

@keirp1

2 years

Can large language models write prompts…for themselves? Yes, at a human-level (!) if they are given the ability to experiment and see what works. with @Yongchao_Zhou_ , @_AndreiMuresanu , @ziwen_h , @silviupitis , @SirrahChan , and @jimmybajimmyba (1/7)

15

146

470

2

38

David Dohan

@dmdohan

1 year

The HF0 crew have made what I can best describe as a tech monastery in the heart of San Francisco. Hard to imagine a more focused environment. Apply if you want 3 incredibly focused months to build on your projects!

Dave Font

@davefontenot

1 year

GPT4 launched yesterday. Today, HF0 launches: (1/n)

60

200

779

1

2

33

David Dohan

@dmdohan

2 years

Teaching Minerva🦉 math & science has been a ton of fun. What else were we supposed to do after realizing all the LaTeX on arXiv is available? Check out the sample explorer: paper:

alewkowycz

@alewkowycz

2 years

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM.

108

2K

8K

2

3

32

David Dohan

@dmdohan

3 years

New paper on program synthesis with large language models (244M-137B). We investigate: (1) how scaling improves performance on Python and math tasks (2) whether the models can predict output of executing code (3) humans-computer collaboration to write programs via conversation

augustus odena

@gstsdn

3 years

New paper! We use big language models to synthesize computer programs, execute programs, solve math problems, and dialog with humans to iteratively refine code. The models can solve 60% and 81% of the programming and math problems, respectively. A thread:

20

350

1K

2

3

31

David Dohan

@dmdohan

1 year

@tszzl @izzyz 😉

2

0

30

David Dohan

@dmdohan

2 years

WebGPT by prompting only Waiting for an API that lets us do prompt tuning/soft prompting (gradient based continuous z tuning) to make this even easier

Dust

@dust4ai

2 years

WebGPT reproduced from advanced prompting only. Dust-based web-search assistant demo answers questions by searching the web, summarizing content and compiling a final answer with references:

21

87

636

3

2

28

David Dohan

@dmdohan

1 year

@typedfemale You’re in luck! @shakir_za , @MihaelaCRosca , @mfigurnov , and @AndriyMnih wrote just that about 3 families of approximations (“the pathwise, score function, and measure-valued gradient estimators”)

Monte Carlo Gradient Estimation in Machine Learning

This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of...

arxiv.org

2

0

28

David Dohan

@dmdohan

1 year

By letting an LM parse natural -> formal language, we get the best of both worlds: the formal system checks consistency of the natural language reasoning LM = fast system 1, Prolog etc = slow system 2 @Maxwell_Nye has neat work exploring the combo:

Improving Coherence and Consistency in Neural Sequence Models with...

Human reasoning can often be understood as an interplay between two systems: the intuitive and associative ("System 1") and the deliberative and logical ("System 2"). Neural sequence models --...

arxiv.org

1

27

David Dohan

@dmdohan

8 months

Come see what's brewing @OpenAI

OpenAI

@OpenAI

8 months

We’ll be hosting our first developer conference, OpenAI DevDay, on November 6. Registration to attend in person in San Francisco will open in a few weeks. We’ll also livestream the keynote.

174

407

2K

1

26

David Dohan

@dmdohan

1 year

The rate of progress is astounding. Where do we land after 2 more comparable leaps? June 11, 2020: GPT-3 March 14, 2023: GPT-4 Jan 1, 2026: ??? Jan 1, 2029: !?!?!?!?

2

0

25

David Dohan

@dmdohan

2 years

@jekbradbury @ylecun Also check out - does an excellent job of demonstrating factored cognition (~latent variable models) with LLMs. It does not have explicit probabilistic inference yet.

Factored Cognition Primer | Primer

How to write compositional language model programs

primer.ought.org

1

2

25

David Dohan

@dmdohan

2 years

Manifold markets had <45% likelihood of the MATH dataset hitting 1/2 correct before 2025. Our work on🦉Minerva resolved it to success 3 years early

Will a machine learning model score above 50.0% on the MATH dataset before 2025?

Resolved YES. From Hendrycks et al (https://arxiv.org/abs/2103.03874), > Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers....

manifold.markets

Vedant Misra @ICLR2024

@vedantmisra

2 years

📈

0

2

26

1

23

David Dohan

@dmdohan

4 years

Want Bespoke, but for everything (especially neural network structures)

GitHub - awwbees/BespokeSynth: Software modular synth

Software modular synth. Contribute to awwbees/BespokeSynth development by creating an account on GitHub.

github.com

Ryan Challinor @[email protected]

@awwbees

4 years

more playing around with livecoding python in bespoke. I added a nice "note stream" module for visualization, which is very useful for understanding what you're doing in live generative composition.

3

12

91

0

2

24

David Dohan

@dmdohan

2 years

Has science gone too far?

BigScience Large Model Training

@BigScienceLLM

2 years

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 101%

69

78

1K

1

0

23

David Dohan

@dmdohan

4 years

Built a few graph viz tools on top of the @rem_note API in @observablehq . Read-only view for now. Next up: extend to whole knowledge bases & allow directly manipulating content inside the graph! What else would you like to see?

2

0

23

David Dohan

@dmdohan

1 year

@tszzl Alignment is the ultimate capability

1

23

David Dohan

@dmdohan

2 years

@ericjang11 @OfirPress Can fine tune a base model on different data and weight average, or use the multiple models as a mixture of experts.

Margaret Li @ICLR 2024

@margs_li

2 years

Train an LM made of independent expert LMs (no syncs! no shared params!) ➡️ ➕ new or ➖ existing experts. At. Any. Time. ➡️ Ensemble OR parameter average(!!) to outperform dense & sparse LMs & ensemble baselines with less compute, a fraction of the simultaneous GPU usage. 🌳/n

7

61

341

1

23

David Dohan

@dmdohan

1 year

Congratulations to the Metaphor team for launching! It's a different way of building a search engine. You "search by prompting" - instead of asking a question, phrase it so the natural completion would give the answer like: "My favorite personal webpages on the internet are"...

Exa (prev. Metaphor)

@ExaAILabs

1 year

is now publicly available! Metaphor is a search engine based on generative AI, the same sorts of techniques behind DALL-E 2 and GPT-3 1/

84

578

3K

1

0

22

David Dohan

@dmdohan

2 years

The "No AGI zone" shirt looks more useful by the day.

David Dohan

@dmdohan

2 years

@avitaloliver @savvyRL @FelixHill84 Would you like any “No AGI zone” tshirts Even better if it’s reversible with “Let’s talk about AI”

0

4

1

21

David Dohan

@dmdohan

5 years

Look forward to interfaces that let designers work with generative models like this neat SVG generator.

rapha gontijo lopes

@rapha_gl

5 years

My 1st @GoogleAI Residency paper is finally on arxiv! We train a powerful generative model of fonts as SVG instead of pixels. This highly structured format enables manipulation of font styles and style transfer between characters at arbitrary scales! 👉🏽

13

249

965

1

2

21

David Dohan

@dmdohan

1 year

@typedfemale Favorite Twitter bio

1

0

20

David Dohan

@dmdohan

1 year

@andrewwhite01 The "I Can't Believe It's Not Better" workshop @ NeurIPS does this! So many beautiful ideas with the tiny problem that they don't actually work (yet?) @ICBINBWorkshop

1

20

David Dohan

@dmdohan

1 year

@OpenAI @Google Amused that the media beat me to the announcement

0

1

20

David Dohan

@dmdohan

10 months

Come by the 11am posters on Wednesday to learn how irrelevant context effects LLMs:

David Dohan

@dmdohan

1 year

LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi , with @xinyun_chen_ , @kanishkamisra , @nkscales_google , @edchi , Nathanael Schärli, & @denny_zhou

0

29

203

1

0

20

David Dohan

@dmdohan

6 months

@miramurati @sidorszymon @sama I think🫡 is a valid offer letter

0

18

David Dohan

@dmdohan

3 years

Had a chance to discuss the state of natural language processing & potential applications toward an "IDE for thought" with @AthensResearch last month. @PsionicaOrg demoed Dual, which provides natural language interface over a knowledge base. recording:

Athens Community Call 6/27/2021: AI, NLP, and text mining workflows...

Featuring:David Dohan, Research Engineer at Google BrainPaul Bricman, Psionica and DualDavid started an amazing discussion about building workflows into Athe...

www.youtube.com

Athens 🏛

@AthensResearch

3 years

For today's community call in 40 minutes, @dmdohan (Google Brain) will be chatting about how we might apply AI/NLP/GPT-3 to Athens Paul Bricman is joining to talk about his project A preview of the call here: Don't miss this !!

0

4

14

0

3

19

David Dohan

@dmdohan

4 years

PB3O solves discrete blackbox optimization problems by adaptively allocating resources among an evolving set of search algorithms. Check it out at the ICML poster session Wednesday!

Christof Angermüller

@cangermueller

4 years

If you are interested in P3BO, join our ICML poster session this Wed! Poster session: Paper:

1

2

9

1

7

18

David Dohan

@dmdohan

2 years

AI moves us into a declarative world. Specify the goal & constraints, and the system searches for solutions

Tyler Angert

@tylerangert

2 years

in a way LLMs mirror the trend of going from imperative to functional + declarative programming. you just say what you want instead of describing the process

3

0

17

1

2

17

David Dohan

@dmdohan

5 months

@AriX @SoftwareAppsInc Incredible Emulated macOS more responsive than any modern web page

1

2

18

David Dohan

@dmdohan

9 months

@jeremyphoward imo it only makes sense in a data limited regime - otherwise the embeddings/projections let you store more info. Scaling laws are on non-embedding params, not sure how that interacts Those works were done when the game was "what's the best perplexity with 20m params, including…

1

0

13

David Dohan

@dmdohan

6 months

i am not a very good language model =\ the site is also subtly broken in a few ways (not all words are allowed tokens, some correct guesses marked as wrong, ...) still good way to build intuition! anyone know who actually made this?

2

0

18

David Dohan

@dmdohan

3 years

I'm most excited for the beginnings of conversational programming. It's early days - can you imagine the programming UIs we will have in a few years? It's an entirely different way of creating (especially for the non-expert).

augustus odena

@gstsdn

3 years

Second, we evaluate whether these models can interact with a human to iteratively refine their outputs. We find that 4 turns of dialog with a human can double the number of problems solved by the model.

1

3

57

3

2

16

David Dohan

@dmdohan

6 months

paper comparing humans v ada v gpt3 at predicting the next work (from @bshlgrs , @FabienDRoger , @justanotherlaw , @emclean1 )

Arthur Conmy

@ArthurConmy

6 months

@_jasonwei All human Top1 accuracies are worse than even a 350M model (a small GPT-3) here

0

25

3

0

16

David Dohan

@dmdohan

6 months

Makes sense

Chris Bakke

@ChrisJBakke

6 months

BREAKING: Nathan Fielder has been serving the board of OpenAI in a senior consulting role since Thursday night

55

144

3K

1

0

16

David Dohan

@dmdohan

4 years

Jamming on @darklang with friends is tons of fun. Love the collaborative spatial canvas - feels like @figma for code with constant live feedback.

taylor

@tayroga

4 years

Building a side project with @michaelrbock and @dmdohan using @darklang - it's super fun! Fastest backend dev experience. Thanks @paulbiggar and @ellenchisa

1

0

12

2

1

16

David Dohan

@dmdohan

2 years

@winniethexu @hmichalewski @jaschasd @sirbayes @alewkowycz @jacobaustin132 @Bieber @Yuhu_ai_ @RandomlyWalking PPLs represent probabilistic models as programs. They extend deterministic code with the ability to sample from distributions, and observe data, i.e. condition the model. They also provide machinery to run inference (ancestral sampling, beam search, particle MCMC, …)

1

0

15

David Dohan

@dmdohan

1 year

Prolog ("Programming in Logic") was designed for AI/symbolic reasoning in the Good Old-Fashioned AI days—good for formalizing logical rules & constraints As a declarative language, you specify goals and it searches for how to achieve them, or tells you something is inconsistent

1

0

14

David Dohan

@dmdohan

1 year

One of my first projects with LMs was trying to generate TensorFlow models with LSTMs, but they weren't quite good enough @ codegen yet So this paper marks coming full circle: completing this goal in my last paper @Google Thanks @MaartenBosma for starting the effort last year!

1

0

15

David Dohan

@dmdohan

2 years

Huge props to the organizers for their leadership in pushing this to completion! Exciting model for large-scale collaborations that benefit the whole community

Jascha Sohl-Dickstein

@jaschasd

2 years

After 2 years of work by 442 contributors across 132 institutions, I am thrilled to announce that the paper is now live: . BIG-bench consists of 204 diverse tasks to measure and extrapolate the capabilities of large language models.

37

574

3K

0

1

14

David Dohan

@dmdohan

2 years

0

1

14

David Dohan

@dmdohan

6 months

@ilyasut 💖🫡

0

14

David Dohan

@dmdohan

2 years

Instead of having a LM solve algorithmic tasks directly, train it to predict a trace of the individual reasoning steps. It solves more problems and can "show its work" in a scratchpad along the way!

Maxwell Nye

@Maxwell_Nye

2 years

New paper! We show that huge language models (137B params!) can be trained to solve algorithmic tasks by “showing their work”---writing intermediate text to a scratchpad. This “scratchpad” technique even allows us to predict the execution of Python code.

9

153

787

2

0

13

David Dohan

@dmdohan

2 years

@arankomatsuzaki @GoogleAI Talk & poster tomorrow at ICML!

David Dohan

@dmdohan

2 years

@ ICML workshops til Sunday! Come by workshop Friday @ 9:40am for our talk, with posters @ 5pm. You'll learn how probabilistic programming lets us formalize models talking to models ("model cascades"), unifying many approaches to prompting and inference.

1

6

38

0

1

13

David Dohan

@dmdohan

1 year

Related works -Concurrent work from Elliot Meyerson, @joelbot3000 , and others: "Language Model Crossover" - Evolution through LMs: - AutoML Zero:

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant...

arxiv.org

Joel Lehman

@joelbot3000

2 years

“Evolution through Large Models” – new paper from our team at OpenAI. Step towards evolutionary algos that continually invent and improve at inventing: Large models can suggest (+ improve at making) meaningful mutations to code. Paper: 1/4

38

484

3K

1

13

David Dohan

@dmdohan

2 years

@winniethexu @hmichalewski @jaschasd @sirbayes @alewkowycz @jacobaustin132 @Bieber @Yuhu_ai_ @RandomlyWalking Cascades provide scaffolds for accomplishing tasks that a single model can't, making them more interpretable and alignable. This is closely related to factored cognition, and the Eliciting Latent Knowledge proposal. Think of LMs as a fast system 1 and a Cascade as a slow system 2

2

0

13

David Dohan

@dmdohan

4 years

@Conaw Wire together blocks that operate on natural language - "reframe question as statement" - "replace passive with active voice" - "Is this an example of sunk cost bias?" Models like GPT3 can compose with themselves. Use to pipeline different prompts together.

1

2

13

David Dohan

@dmdohan

2 years

@NireBryce Nyxt is the only browser I've seen that has a history tree. It's built on webkit using common lisp.

1

3

11

David Dohan

@dmdohan

2 years

Models that seem initially interchangeable for same purpose can have vastly different characters

fabian (glif/acc)

@fabianstelzer

2 years

DALL-E 2 vs Midjourney vs StableDiffusion mega thread: photography, illustration, painters, abstract these image synths are like instruments - it's amazing we'll get so many of them, each with a unique "sound" 🤯 rules: same prompt, 1:1 aspect ratio, no living artists

383

4K

23K

0

2

11

David Dohan

@dmdohan

2 years

The paper focuses on quantifiable problem solving, but 🦉does great at explaining technical concepts. It has read all of the arXiv after all. Curious about REINFORCE? Just prompt it to write a paper on it: `\section{A derivation of the score function gradient estimator}`

3

2

12

David Dohan

@dmdohan

1 year

It's a Python DSL that allows specifying complex constraints alongside control-flow & tools. Using masking lets it more cheaply explore many branches than manually sampling from LM What other programming language experiments we will see which takes LMs as a primitive?

1

12

David Dohan

@dmdohan

1 year

@atroyn @nearcyan @alexeyguzey “just predicts tokens given the conditional distribution of tokens” is true of LM pretraining but no longer true once finetuned with RLHF/RLAIF. Then it is optimizing a goal: The reward model is happy at the end of the current exchange. Can still view as goal conditioned policy.

2

0

12

David Dohan

@dmdohan

2 years

@karpathy Debate is a great pattern! Like P vs NP, it can easier to judge a correct argument than to make one.

AI safety via debate

To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during...

arxiv.org

1

0

12

David Dohan

@dmdohan

4 years

Jax code for linear time attention & approximating arbitrary kernels (softmax and beyond). Used to train protein BERT models for our paper "Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers" -

Masked Language Modeling for Proteins via Linearly Scalable...

Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies...

arxiv.org

Adrian Weller

@adrian_weller

4 years

Code release by @XingyouSong for Performers fast attention yielding linearly scalable transformers with generalized attention . Great work with Krzysztof C, Valerii L @CambridgeMLG , @dmdohan , @jared_quincyd , @tamassarlos , David B, @LucyColwell37

0

1

0

5

11

David Dohan

@dmdohan

2 years

The first machine learning “all you need”

0

10

David Dohan

@dmdohan

1 year

@DynamicWebPaige @OpenAI fwiw the last season of silicon valley basically is SV: AI Edition [spoiler] The compression algorithm becomes intelligent :) Fun fact: we generated all the code on the background screens with a finetuned GPT2

1

0

11

David Dohan

@dmdohan

6 months

Accepting investments at 25 cents per TPU (Tendie Participation Unit)

2

0

11

David Dohan

@dmdohan

1 year

@yasuoyamasaki Appreciate the connection! Had many of the ideas but didn't build in open & allow explosion of growth around it There's tons of related work on model chaining & tool use. Notably, AI-Chains () and arguably Society of Mind

AI Chains: Transparent and Controllable Human-AI Interaction by...

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less...

arxiv.org

David Dohan

@dmdohan

1 year

Who's going to tell Marvin Minsky we put Perceptrons into the Society of the Mind

4

1

50

2

0

11

David Dohan

@dmdohan

6 months

Excuse me it’s called ClippAI

roon

@tszzl

6 months

excited to start my job at Microsoft Advanced AI Research on Monday

98

25

2K

1

0

11

David Dohan

@dmdohan

2 years

This was a fun collaboration with amazing colleagues at Blueshift and Brain. Looking forward to what Minerva will learn next! @alewkowycz , @AJAndreassen , @ethansdyer , @hmichalewski , @vinayramasesh , @AmbroseSlone , @cem__anil , Imanol, Theo, @Yuhu_ai_ , @guygr , and @vedantmisra !

1

0

11

David Dohan

@dmdohan

1 year

@Mononofu Can prepend docs with <good>/<bad> tokens to bake this into pretrained model, & still use all available data From "Pretraining Language Models with Human Preferences" by @tomekkorbak , @shi_kejian , @_angie_chen ... @sleepinyourhat , @EthanJPerez

1

11

David Dohan

@dmdohan

2 years

New paper! The OptFormer is a language model that metalearns blackbox optimization and hparam tuning, incorporating trial info as text. Can imitate other algos (random search, evolution, Bayes opt+Gaussian process, ...), or use in place of GP. And this one's only 250m params!

Yutian Chen

@yutianc

2 years

Proud of your hyperparameter tuning skills? Let transformers learn from you. We present the OptFormer (), a novel text-based hyperparameter tuner, trained on massive datasets of industrial hyperparameter tuning experiments.

4

63

398

1

0

11

David Dohan

@dmdohan

1 year

@davidchalmers42 Recently: - @mhutter42 : compression as intelligence () & proposed AIXI - @ShaneLegg wrote down 2025-2030 AGI timelines 20 years ago - @NoamShazeer clearly believed in scale & made 137b (MoE) LM in 2017 & : - @ilyasu

Outrageously Large Neural Networks: The Sparsely-Gated...

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been...

arxiv.org

1

0

10

David Dohan

@dmdohan

2 years

Have a look at the preprint [1], use one of the models [2], and explore predictions [3] Share feedback here, with corresponding authors, or on the UniProt site [4] [1] [2] [3] ) [4]

1

9