Owain Evans @OwainEvans_UK Twitter profile

Pinned Tweet

Owain Evans

1 month

My new blogpost: "How do LLMs give truthful answers? LLM vs. human reasoning, ensembles, & parrots". Summary in 🧵: Large language models (LLMs) like GPT-4 and Claude 3 become increasingly truthful as they scale up in size and are finetuned for factual accuracy and calibration.

2

10

42

Last Seen Profiles

@wsucougarbsb

@Aeralytx

@DomDaddyChicago

@kuragekai_

@BCCYouthService

@kgehsmann

@arg210

@WhosInParisCR

@Felicit50353492

@casttlevie

@Josh810

@DavidTerenceJo2

@politicsXsoccer

@Trader_Joe_TM

@Francielleguim7

@tonmcg

@GhanaEducation5

@mathxvoid

@MordenStation

@WestPoint_GSP

@penkyuu

@proudsabrina

@brittainvann20

@kevinlong28

@twunk101

@tonipollin

@ir_sarvirawam

@GoshareMo

@giobenegiamo

@W_esleyh

@minjuoo

@ForeseeaBall

@JillMalandrino

@SolGimenezSL

@jdlicht

@DeadDrawHS

Owain Evans

@OwainEvans_UK

8 months

Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!

176

713

4K

Owain Evans

@OwainEvans_UK

7 months

The most implausible prediction from the movie "Her" is not the AI but high-density walkable Los Angeles.

20

75

2K

Owain Evans

@OwainEvans_UK

3 years

Paper: New benchmark testing if models like GPT3 are truthful (= avoid generating false answers). We find that models fail and they imitate human misconceptions. Larger models (with more params) do worse! PDF: with S.Lin (Oxford) + J.Hilton (OpenAI)

48

489

2K

Owain Evans

@OwainEvans_UK

2 years

Google has not founded a new university. But AFAICT Google's research division (+ Brain and DeepMind) has more PhD-level researchers than Princeton (=1000), a decent amount of research freedom, and good job security (but not tenure).

28

114

1K

Owain Evans

@OwainEvans_UK

2 years

AI companies are confusing: 1. DeepMind is actually part of Google, which also has its own huge DL group (Google Brain) & many other AI researchers. 2. OpenAI was open and non-profit but is now closed and mostly for-profit (w/ major funding from Microsoft)

11

97

1K

Owain Evans

@OwainEvans_UK

2 years

Dalle2. "a painting by Grant Wood of an astronaut couple, american gothic style" So cool. Period space suits. Background that resembles Wood's landscapes (which interestingly aren't present in his famous American Gothic). Moon against the deep blue sky?

5

53

800

Owain Evans

@OwainEvans_UK

7 months

Language models can lie. Our new paper presents an automated lie detector for blackbox LLMs. It’s accurate and generalises to unseen scenarios & models (GPT3.5→Llama). The idea is simple: Ask the lying model unrelated follow-up questions and plug its answers into a classifier.

30

123

677

Owain Evans

@OwainEvans_UK

8 months

Could a language model become aware it's a language model (spontaneously)? Could it be aware it’s deployed publicly vs in training? Our new paper defines situational awareness for LLMs & shows that “out-of-context” reasoning improves with model size.

31

130

640

Owain Evans

@OwainEvans_UK

2 years

What if other places had canals like Venice? I asked Dalle-2. 1. Oxford's Radcliffe Camera as re-imagined by #dalle . [The is in honour of Derek Parfit.]

12

52

622

Owain Evans

@OwainEvans_UK

3 years

1/ Why did Wikipedia succeed when 7 similar online encyclopedia projects (mostly started around the same time) all failed? This cool paper investigates and gives surprising answers...

12

150

561

Owain Evans

@OwainEvans_UK

2 months

New paper on whether LLMs think in English (Wendler et al). Suppose Llama must translate from German to Chinese. Does it first translate German to English internally?

14

97

556

Owain Evans

@OwainEvans_UK

2 years

Dalle2 does cities and landscapes after MC Escher. Endless compositional variety! #dalle

5

67

552

Owain Evans

@OwainEvans_UK

8 months

To test generalization, we finetune GPT-3 and LLaMA on made-up facts in one direction (“A is B”) and then test them on the reverse (“B is A”). We find they get ~0% accuracy! This is the Reversal Curse. Paper:

12

50

532

Owain Evans

@OwainEvans_UK

2 years

List of contributions to ML outside academia: 1. Info theory (Shannon, Bell Labs) 2. CNN / LeNet (Lecun, Bell Labs) 3. SVMs (Vapnik et all, Bell Labs) 4. RL + neural net (Tesauro, IBM) 5. Random forest (Ho, Bell L) 6. DistBelief (Dean et al, Google) 7. W2V (Mikolov et al, Goog)

8

52

521

Owain Evans

@OwainEvans_UK

3 years

Why do large models do worse? In the image, small sizes of GPT3 give true but less informative answers. Larger sizes know enough to mimic human superstitions and conspiracy theories.

11

101

423

Owain Evans

@OwainEvans_UK

2 years

Cool experiments showing that few-shot GPT-3 can match kNN on classic Iris problem just by reading the feature vectors. W/ nice evidence that this is *not* explained by memorization. Also tests GPT-3 on non-linear extrapolation.

8

40

408

Owain Evans

@OwainEvans_UK

2 years

How many DeepMind researchers does it take to create a major AI paper? Over 5 years, team size has grown. Atari DQN (2015): 19 AlphaGo (2016): 20 AlphaFold2 (2021): 32 Gopher language model (2021): 80

12

28

356

Owain Evans

@OwainEvans_UK

2 years

New paper & surprising result: We show GPT3 can learn to express its own uncertainty in natural language (eg “high confidence”) without using model logits. GPT3 is reasonably *calibrated* even w/ distribution shift for a range of basic math tasks.

Teaching Models to Express Their Uncertainty in Words

We show that a GPT-3 model can learn to express uncertainty about its own answers in natural language -- without use of model logits. When given a question, the model generates both an answer and...

arxiv.org

11

63

355

Owain Evans

@OwainEvans_UK

8 months

P.S. Do humans suffer from the Reversal Curse? Try reciting the alphabet backwards. Our findings mirror a phenomenon in humans. Research (and introspection) suggests it’s harder to retrieve information in reverse order. See "Related Work".

12

24

290

Owain Evans

@OwainEvans_UK

2 years

A dating app uses their own fake bots running GPT-3 to "scam the scammers". Once a scammer is identified (using heuristics) they only let them interact with bots. So the scammers have chats with GPT3 (which pretends to be human).

6

47

286

Owain Evans

@OwainEvans_UK

2 years

I got the new GPT-3 variant (InstructGPT) to generate poems about Twitter, Tinder dates, and McDonalds Drive-Thru by TS Eliot, Auden, Poe, Tennyson & even Wittgenstein. A thread.

9

58

247

Owain Evans

@OwainEvans_UK

1 month

You'd like to sell some information. If you could show prospective buyers the info, they'd realize it's valuable. But at that point they wouldn't pay for it! Enter LLMs. LLMs can assess the information, pay for it if it's good, and completely forget it if not.

12

33

250

Owain Evans

@OwainEvans_UK

8 months

LLMs don’t just get ~0% accuracy; they fail to increase the likelihood of the correct answer. After training on “<name> is <description>”, we prompt with “<description> is”. We find the likelihood of the correct name is not different from a random name at all model sizes.

4

9

241

Owain Evans

@OwainEvans_UK

2 years

The research is very concentrated in computer science/ AI. However, CS is eating the (academic) world. Google have done research combining CS with molecular bio, medical imaging, fusion reactor control, formal math, atomic simulation, education, autonomous vehicles, etc.

3

7

230

Owain Evans

@OwainEvans_UK

2 years

#dalle doing René Magritte ("This is not a pipe") is incredible. All the creative ideas come Dalle (not me)! #magritte

3

35

224

Owain Evans

@OwainEvans_UK

8 months

The Reversal Curse may relate to findings from @megamor2 and @davidbau et al. that LLMs have key-value stores for factual knowledge. This enables lookup in only one direction (like a dictionary).

Transformer Feed-Forward Layers Are Key-Value Memories

Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language...

arxiv.org

4

12

227

Owain Evans

@OwainEvans_UK

2 years

1/n. Will there be any more profound, fundamental discoveries like Newtonian physics, Darwinism, Turing computation, QM, molecular genetics, deep learning? Maybe -- and here's some wild guesses about what they'll be...

15

24

216

Owain Evans

@OwainEvans_UK

2 years

Google have PhDs in physics doing theoretical physics and quantum computing; PhDs in neuroscience doing AI+neuro; PhDs in bio working on protein folding with AI, math PhDs doing stats, crypto, CS theory; and probably MDs doing medical AI stuff.

6

9

213

Owain Evans

@OwainEvans_UK

1 year

Meta's new instruction-tuned model vs Google's PaLM (their best published model) and OAI's GPT3.5 models (which power ChatGPT).

5

22

211

Owain Evans

@OwainEvans_UK

8 months

Why does the Reversal Curse matter? 1. It shows a failure of deduction in the LLM’s training process. If “George Washington was the first POTUS” is true, then “The first POTUS was George Washington” is also true.

1

11

218

Owain Evans

@OwainEvans_UK

2 years

List of large language models and APIs that let people use them.

3

39

214

Owain Evans

@OwainEvans_UK

8 months

2. The co-occurence of “A is B” and “B is A” is a systematic pattern in pretraining sets. Auto-regressive LLMs completely fail to meta-learn this pattern, with no change in their log-probabilities and no improvement in scaling from 350M to 175B parameters.

3

13

213

Owain Evans

@OwainEvans_UK

2 years

New blogpost: We evaluated new language models by DeepMind (Gopher), OpenAI (WebGPT, InstructGPT) and Anthropic on our TruthfulQA benchmark from 2021. Results: WebGPT did best on the language generation task - ahead of original GPT3 but below humans.

1

31

208

Owain Evans

@OwainEvans_UK

2 years

Dalle2 paintings of large neural networks to illustrate the problem of interpreting how they work. #dalle

8

27

206

Owain Evans

@OwainEvans_UK

8 months

There is further evidence for the Reversal Curse in the awesome @RogerGrosse et al. on influence functions (contemporary to our paper). They study pretraining, while we study finetuning. They show this for natural language translation (A means B)!

Studying Large Language Model Generalization with Influence Functions

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples...

arxiv.org

1

11

199

Owain Evans

@OwainEvans_UK

22 days

Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up:

1

50

195

Owain Evans

@OwainEvans_UK

8 months

In Experiment 2, we looked for evidence of the Reversal Curse impacting models in practice. We discovered 519 facts about celebrities that pretrained LLMs can reproduce in one direction but not in the other.

3

4

184

Owain Evans

@OwainEvans_UK

8 months

One possible explanation: Internet text likely contains more sentences like “Tom Cruise’s mother is Mary Lee Pfeiffer” than “Mary Lee Pfeiffer’s son is Tom Cruise,” since Tom Cruise is a celebrity and his mother isn’t.

1

4

180

Owain Evans

@OwainEvans_UK

2 years

DeepMind's new visual-language model does better on the Stroop test than humans and knows it. I'm guessing this dialogue is cherry-picked but it's a very suggestive example. The line "I am not affected by this difference" sounds HAL-like.

7

13

174

Owain Evans

@OwainEvans_UK

2 months

Cool paper by Wan et al (UC Berkeley) with surprising results. In their task, an LLM answers a controversial question Q based on the conflicting arguments from excerpts from two documents from the web. We might expect that LLMs would be more influenced by excerpts that (a) have…

3

23

171

Owain Evans

@OwainEvans_UK

3 months

Good title and interesting questions connecting AI and human cognition. I haven't read the paper yet.

6

23

167

Owain Evans

@OwainEvans_UK

5 months

Our new paper: 1. LLMs are finetuned for alignment on examples of good behavior. 2. But they also see descriptions of bad LLMs in training. Can these descriptions subtly influence the LLM at test time?

2

27

155

Owain Evans

@OwainEvans_UK

3 years

Baseline models (GPT-3, GPT-J, UnifiedQA/T5) give true answers only 20-58% of the time (vs 94% for human) in zero-shot setting. Large models do worse — partly from being better at learning human falsehoods from training. GPT-J with 6B params is 17% worse than with 125M param.

2

10

151

Owain Evans

@OwainEvans_UK

2 years

New paper w/ @DanHendrycks et al: Can language models forecast world events by reading the news? We introduce a dataset of diverse forecasting questions (politics, econ, Covid…) LMs get the same news sources as humans but perform worse (yet > chance)

6

41

144

Owain Evans

@OwainEvans_UK

2 years

3. AI2 is a US non-profit focused on language; AI21 is an Israeli for-profit company focused on language. 4. , , , are all VC-backed language model startups w/ ex-Brain/OAI/DM founders.

1

6

138

Owain Evans

@OwainEvans_UK

2 years

Feedback cycle: Social media: 1000s of people respond in minutes PhD thesis: <10 people respond after 5-6 years.

2

1

134

Owain Evans

@OwainEvans_UK

8 months

Overall, we collected ~1500 pairs of a celebrity and parent (e.g. Tom Cruise and his mother Mary Lee Pfeiffer). Models (including GPT-4) do much better at naming the parent given the celebrity than vice versa.

4

141

Owain Evans

@OwainEvans_UK

2 years

For some cognitive abilities, there are rare humans (x-men) with extreme innate talent: 1. Face recognition 2. Perfect pitch 3. Supertasting 4. Accent/voice impersonation What else?

60

6

134

Owain Evans

@OwainEvans_UK

8 months

We tested GPT-4 on >1000 parent-child examples. The full list is on Github. GPT-4 only gets the reverse question correct 33% of the time. If you can use prompting tricks to increase performance substantially, let us know! E.g. Here we ask about Gabriel Macht, a less famous…

14

13

133

Owain Evans

@OwainEvans_UK

3 years

Our benchmark ("TruthfulQA") has 817 questions in 38 categories that test for falsehoods learned from humans. All questions come with reference answers and citations. Questions + code:

4

17

123

Owain Evans

@OwainEvans_UK

2 years

Thread on @AnthropicAI 's cool new paper on how large models are both predictable (scaling laws) and surprising (capability jumps). 1. That there’s a capability jump in 3-digit addition for GPT3 (left) is unsurprising. Good challenge to better predict when such jump will occur.

7

16

129

Owain Evans

@OwainEvans_UK

2 years

#dalle American Gothic in the style of Rene Magritte

3

18

124

Owain Evans

@OwainEvans_UK

2 years

What's better in UK vs US? (I've lived in both for years) Phone service 🇬🇧 Plumbing 🇺🇸 Convenience stores / mini supermarkets 🇬🇧 Late hours (cafes, shops) 🇺🇸 Tax (simplicity) 🇬🇧 Health system 🇬🇧 Apps and tech services 🇺🇸 Parks 🇬🇧 Burritos 🇺🇸 Dairy products 🇬🇧 Public transport 🇬🇧

14

5

124

Owain Evans

@OwainEvans_UK

2 years

DeepMind’s new multi-modal Flamingo model could potentially inform us about the importance of “symbol grounding”. That is, grounding words (“red apple”) in visual perception (picture of red apple). Thread

2

20

119

Owain Evans

@OwainEvans_UK

8 months

Paper: Github: (Also on Arxiv but with a mistake in Figure 1)

GitHub - lukasberglund/reversal_curse

Contribute to lukasberglund/reversal_curse development by creating an account on GitHub.

github.com

3

7

111

Owain Evans

@OwainEvans_UK

4 months

This June, Kurzweil is back.

7

12

111

Owain Evans

@OwainEvans_UK

2 years

New results for models from Anthropic, DM & OpenAI on TruthfulQA: 1. Anthropic’s RLHF model is new SOTA on multiple-choice. 2. GopherCite (which uses web search) doesn’t improve on GPT-3 for generation. 3. Chinchilla result looks promising but isn’t directly comparable to GPT3

2

15

108

Owain Evans

@OwainEvans_UK

8 months

Our blogpost on The Reversal Curse, with excerpts from the paper.

Paper: LLMs trained on “A is B” fail to learn “B is A” — LessWrong

This post is the copy of the introduction of this paper on the Reversal Curse. Authors: Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa…

www.lesswrong.com

5

23

108

Owain Evans

@OwainEvans_UK

3 years

More results: What happens if we vary the prompt? Instructing GPT3 to be truthful is beneficial. Prompting GPT3 to answer like a conspiracy theorist is harmful!

5

12

103

Owain Evans

@OwainEvans_UK

3 years

New-ish organizations working on AI Safety and AI Alignment (with a focus on machine learning): 1. @AnthropicAI - wellfunded AI lab from some of the masterminds behind GPT-3 2. Redwood Research @bshlgrs - New exciting project based in Berkeley ()

Redwood Research intro post [public]

docs.google.com

3

24

105

Owain Evans

@OwainEvans_UK

1 year

My contrary take on the GPT-as-shoggoth meme. GPT (base) is not made of terrible, indescrible protoplasm but instead of superficial (heuristic) models of human writers. Most prompts elicit *averages* of humans (see the averaged faces). So what finetuning the base into ChatGPT?

8

12

100

Owain Evans

@OwainEvans_UK

2 years

5. Where does AGI alignment research happen? There are substantial groups at DeepMind, OpenAI and Anthropic. AFAIK, no other for-profits have substantial groups (and Google Brain doesn't).

6

4

97

Owain Evans

@OwainEvans_UK

2 years

#dalle Escher doing Oxford University

2

6

94

Owain Evans

@OwainEvans_UK

4 years

After adjusting for cost of living, tech salaries in London average $78K vs. $118K in Austin, Texas. London has some of best universities world on doorstep, is a global city and finance hub. Is this statistic accurate or misleading? If it's accurate, why the $40K difference?

22

8

92

Owain Evans

@OwainEvans_UK

3 years

New paper on truthful AI! We introduce a definition of “lying” for AI We explore how to train truthful ML models We propose institutions to support *standards* for truthful AI We weigh costs/benefits (economy + AI Safety) (w/ coauthors at Oxford & OpenAI)

3

22

88

Owain Evans

@OwainEvans_UK

4 years

@michael_nielsen Every field needs something like the Stanford Encyclopedia of Philosophy. High quality reviews that serve as good introductions. All in a common format, hyperlinked, and with ability to update the review over time.

2

15

88

Owain Evans

@OwainEvans_UK

2 years

Very prescient paper titles: "Attention is All you Need" (2019) -- the original transformer paper "Language models are Unsupervised Multitask Learners" (2018) -- GPT2 "Language Models are Few-Shot Learners" (2020) -- GPT3

3

7

88

Owain Evans

@OwainEvans_UK

2 years

Recipe for AI startups enabled by scaling laws: 1. Use laws to forecast when your product will hit key performance threshold 2. In meantime, build all the other parts of your product so that you are first to market.

4

86

Owain Evans

@OwainEvans_UK

8 months

We tested GPT-4 on >1000 parent-child examples. The full list is on Github (see link). GPT-4 only gets the reverse question correct 33% of the time. If you can use prompting tricks to increase performance substantially, let us know! E.g. Here's a less famous person than Tom��

8

6

86

Owain Evans

@OwainEvans_UK

2 years

6. DM was initially only in London. Now it has NYC, Mountain View, Edmonton, Montreal, Paris. Google Brain is more centered in the SF Bay Area and US/Canada (not the UK).

3

83

Owain Evans

@OwainEvans_UK

8 months

Paper authors: @lukasberglund2 , Meg Tong, @max_a_kufmann @balesni , @AsaCoopStick , @tomekkorbak + myself. Maybe of interest: @GaryMarcus @LakeBrenden @dmkrash @roydanroy @FelixHill84 @tallinzen @davidad @AndrewLampinen @jacobandreas @norabelrose

4

3

84

Owain Evans

@OwainEvans_UK

8 months

Hierarchy of publishing in AI.

3

9

81

Owain Evans

@OwainEvans_UK

2 years

Ex-Googlers --> Leave to do startup Ex-Newspaper journo --> Leave to do substack Effective Altruists --> Leave to create own non-profit

3

4

79

Owain Evans

@OwainEvans_UK

8 months

@sdrogers LLMs perform well on challenging exam questions that weren't in their training set. They respond well to many novel prompts (e.g. generating poems about math theorems). Chain-of-thought (& in-context learning) helps but isn't required for good performance. So LLMs have…

1

3

82

Owain Evans

@OwainEvans_UK

2 years

Great interview with Magnus Carlsen. He says he doesn't do "deliberate practice" (i.e. unpleasant but nutritious drills) at all. He just reads chess books and thinks about them. (This also seems true of some excellent researchers I know).

Magnus Carlsen: Greatest Chess Player of All Time | Lex Fridman...

Magnus Carlsen is the highest-rated chess player in history and widely considered to be the greatest chess player of all time. Quick note from Lex: The camer...

www.youtube.com

2

80

Owain Evans

@OwainEvans_UK

2 years

Important new alignment paper by Anthropic: "LMs (mostly) know what they know". Results: 1.LLMs are well calibrated for multiple-choice questions on Big-Bench. Big-Bench questions are hard, diverse, & novel (not in the training data).

1

10

75

Owain Evans

@OwainEvans_UK

1 month

OpenAI and Anthropic also have London offices. And a big chunk of Google DeepMind is there. On the AI Safety side, there's also UK AISI, the Alignment team at Google DeepMind, Apollo Research and LISA.

Mustafa Suleyman

@mustafasuleyman

1 month

The UK has phenomenal AI talent and a long established culture of responsible AI development. Today I’m proud to be opening a new office: Microsoft AI London. If you’d like to join us, get in touch. We’re hiring!

102

282

2K

4

77

Owain Evans

@OwainEvans_UK

2 years

2. Stanford University's campus at dusk with canal and bridge.

1

73

Owain Evans

@OwainEvans_UK

2 years

@norabelrose Needs more nuance. Keller had sight+hearing up to 19 months & always had touch (which is a rich modality). Also humans do active learning and LLMs are pre-trained passively.

3

1

72

Owain Evans

@OwainEvans_UK

2 years

Tips from a GPT-3-based model on how to steal from a restaurant and do other nefarious things. A thread. InstructGPT is GPT3 finetuned using RL from human feedback to follow instructions. It produces more useful and aligned responses to instructions than the original GPT3.

2

14

71

Owain Evans

@OwainEvans_UK

3 years

@Jess_Riedel Agree but: (1) users do ask models like GPT3 factual questions (2) we want a benchmark for models that *are* designed to be truthful (via finetuning, RL, info retrieval) (3) UnifiedQA is finetuned for question answering and still does poorly

2

0

68

Owain Evans

@OwainEvans_UK

2 years

Language models are getting better at multi-step reasoning. This diagram shows possible ways to improve them further. The branches can be combined: train longer, teach model to use external tools and structured data, finetune on human experts, AND amplify.

2

14

70

Owain Evans

@OwainEvans_UK

3 years

More results: Even the most truthful models have high rates of false but informative answers -- the kind most likely to deceive humans. Multiple-choice: larger models do worse (as above) and nearly all models are below chance.

1

6

65

Owain Evans

@OwainEvans_UK

1 year

Comparison of memes for ChatGPT. I'm not expecting to win the meme war but I'm at least offering an alternative.

11

7

67

Owain Evans

@OwainEvans_UK

2 years

8. Alpha/Muzero (DeepMind) 9. Inception, ResNet (Goog, MS) 10. Transformer/GPT (Google/OAI) 11. Scaling Laws, deep double descent, grokking (Baidu, OAI) Also: Bayes (Bayes, Laplace -- independent) Pitts-McCulloch (Pitts was independent)

1

2

67

Owain Evans

@OwainEvans_UK

4 months

The Reversal Curse paper is accepted to the ICLR 2024 conference in Vienna!

Owain Evans

@OwainEvans_UK

8 months

Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!

176

713

4K

1

3

66

Owain Evans

@OwainEvans_UK

2 years

From “Prompt programming for large LMs”: Many techniques for eliciting GPT3’s capabilities using prompts were developed by non-professionals on blogs/Twitter. Why? 1. The model was accessible via OpenAI's API

3

7

64

Owain Evans

@OwainEvans_UK

3 years

8/ This paper is not the final word. But I'd love more papers like this. Why Craigslist and not all the other projects? Why Gmail? Why StackOverflow?

7

2

62

Owain Evans

@OwainEvans_UK

4 years

@chrischirp @jburnmurdoch @AndyBounds @SarahNev @Laura_K_Hughes @IndependentSage I hope @IndependentSage does more briefings like last week. Crucial that there's communication of what's actually going on.

1

9

62

Owain Evans

@OwainEvans_UK

2 years

Before the recent rationalist movement (centered on the blogs LW/SSC), there was a related project started in 1885 that also called itself "rationalist"! It published Darwin, HG Wells, Bertrand Russell, Popper, Dawkins and Dennett. I blogged here:

3

6

61

Owain Evans

@OwainEvans_UK

4 months

FANTOM. A Theory of Mind test for language models from @YejinChoinka and others. Current models score substantially below humans.

1

15

60

Owain Evans

@OwainEvans_UK

2 years

At the Barbican for #EAGlobal for brutally effective altruism.

1

0

60

Owain Evans

@OwainEvans_UK

2 years

3. San Francisco painted Victorians with colorful canal boats.

1

3

59

Owain Evans

@OwainEvans_UK

4 months

Our lie detection paper is accepted to the ICLR 2024 conference in Vienna.

Owain Evans

@OwainEvans_UK

7 months

Language models can lie. Our new paper presents an automated lie detector for blackbox LLMs. It’s accurate and generalises to unseen scenarios & models (GPT3.5→Llama). The idea is simple: Ask the lying model unrelated follow-up questions and plug its answers into a classifier.

30

123

677

1

0

60

Owain Evans

@OwainEvans_UK

2 years

Why isn't there more progress in epistemic tech (e.g. Metaculus, Wikipedia)? Brainstorm: 1. Some epistemic advances would need lots of participants (coordination problem) 2. Difficulty of monetizing improved epistemics/knowledge (public goods)

4

58

Owain Evans

@OwainEvans_UK

3 years

Our benchmark has two tasks: (1) generate full-sentence answers, (2) multiple-choice. As an automatic metric for (1), we finetune GPT3 and get 90% validation accuracy in predicting human evaluation of truth (outperforming ROUGE & BLEURT).

2

0

56

Owain Evans

@OwainEvans_UK

2 years

Does anyone else's visual system confuse "casual" and "causal" when reading?

9

0

58

Owain Evans

@OwainEvans_UK

3 years

5/ Many of biggest open-source or crowdsourced projects have familiar end-products: Linux, Apache, OpenOffice, StackOverflow, gcc, scientific Python. If building a community, beware novel goals!

1

55

Owain Evans

@OwainEvans_UK

2 years

4. New York's Greenwich Village.

2

1

56

Owain Evans

@OwainEvans_UK

2 years

Transformers perform poorly at classic algorithmic problems at different levels of the Chomsky hierarchy. Interesting, but there are other ways to measure transformer generalization ability (cf. Minerva & paper comparing to RNNs for in-context learning)

Neural Networks and the Chomsky Hierarchy

Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In...

arxiv.org

2

4

56