Peter J. Liu @peterjliu Twitter profile

Pinned Tweet

Peter J. Liu

@peterjliu

1 year

Amazing how much progress in AI is due to two chain rules: one from calculus, the other from probability.

17

101

1K

Last Seen Profiles

@rhotezamani

@ma_ozdemir

@napi_dazo

@UblcfcB

@EthanOsafc

@BathTubBum

@EyUpSpadge

@gAlex_C

@SShonekan

@JehuuCaulcrick

@Suffolk_U

@renzheyu

@ozeki_jp

@PattyHajdu

@OhDearApp

@azizniazi_k

@EPCHSBaseball

@Wiriiu

@GayatraB

@officialcsluck

@TafAfon

@rohnak067

@RIVARIZU

@westmaison1

@TXSTATEFOOTBALL

@Dinero_naz

@ThornsFC

@stw_pdg

@thecolbyday

@MplsPhotoBot

@keri_sidle

@EyesafeOfficial

@aira_cs

@VttjsTmY0h77069

@JanGrounds

@ToddNeale

Peter J. Liu

@peterjliu

2 months

What do you call the disparity between GPU-rich and GPU-poor? Jensen's inequality

19

152

1K

Peter J. Liu

@peterjliu

1 year

Here is our “slick” RLHF-alternative without RL: (SLiC-HF) TL;DR: Works as well as RLHF, but a lot simpler. About as easy and efficient as fine-tuning. Much better than simply fine-tuning on good examples. From great collaborators: @yaozhaoai ,…

Peter J. Liu

@peterjliu

1 year

The true star of RLHF is F=feedback. You may not need RL and you may not need humans.

19

40

394

11

168

836

Peter J. Liu

@peterjliu

6 months

If an AI system was able to get Gold at the International Mathematical Olympiad (IMO), what would be your reaction?

214

26

707

Peter J. Liu

@peterjliu

2 months

The gpt-4 tokenizer is open source If you look at the code, an interesting finding is the presence of special tokens FIM_*. This is probably for Fill-in-the-middle pretraining.

Efficient Training of Language Models to Fill in the Middle

We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document...

arxiv.org

6

110

626

Peter J. Liu

@peterjliu

5 months

The greatest, most productive living mathematician is using LLMs to improve his work productivity ... in math. 🤯 "I could feed GPT-4 the first few PDF pages of a recent math preprint and get it to generate a half-dozen intelligent questions that an expert attending a talk on…

Brad Neuberg

@bradneuberg

5 months

Terence Tao, the famous mathematician, on using LLMs to aid in mathematical research: "2023-level AI can already generate suggestive hints and promising leads to a working mathematician and participate actively in the decision-making process. When integrated with tools such as…

23

281

2K

9

85

539

Peter J. Liu

@peterjliu

1 year

The true star of RLHF is F=feedback. You may not need RL and you may not need humans.

19

40

394

Peter J. Liu

@peterjliu

3 years

We are hiring for a full-time researcher/engineer in the Brain (Google Research) team who will focus on text generation research and its applications. A wide variety of backgrounds and experiences will be considered. DM if you're interested or have leads.

13

67

350

Peter J. Liu

@peterjliu

5 years

My team has open-sourced a pure python implementation of ROUGE (Apache 2 license) that can be used as a replacement for the original perl version (which also had an ambiguous license). @harvardnlp @stanfordnlp

4

79

240

Peter J. Liu

@peterjliu

3 months

interesting paper on arxiv posted recently "Arrows of Time for Large Language Models" TL;DR: it is easier for larger models to predict in forward direction (next-token), rather than backward (prev-token). The larger the model, the more pronounced the…

Arrows of Time for Large Language Models

We study the probabilistic modeling performed by Autoregressive Large Language Models through the angle of time directionality. We empirically find a time asymmetry exhibited by such models in...

arxiv.org

5

51

236

Peter J. Liu

@peterjliu

6 months

People are not well-calibrated on AI progress in mathematical reasoning. GSM8K () is a common task testing basic grade-school math ability, but was only introduced in Oct 2021. Manifold markets only thought there was a ~50% chance that a system would get…

Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current...

arxiv.org

3

30

236

Peter J. Liu

@peterjliu

6 months

Sounds like OpenAI got some good numbers on GSM8K, possibly MATH. Speculating, but there is a 'star' in STaR , a technique that fine-tunes a model to its own (better) outputs, which some people see as 'self-improvement'.

OpenAI Made an AI Breakthrough Before Altman Firing, Stoking Excitement and Concern

One day before he was fired by OpenAI’s board last week, Sam Altman alluded to a recent technical advance the company had made that allowed it to “push the veil of ignorance back and the frontier of...

www.theinformation.com

9

22

207

Peter J. Liu

@peterjliu

1 year

As generative language models hit production, there’s increased risk from bad outputs. It’s useful to know when to *not* show the outputs to the user, or defer to better, larger models (at the cost of compute). A 🧵on an ICLR 2023 paper from Google. (1/n)

1

21

158

Peter J. Liu

@peterjliu

8 months

@gaganghotra_ That is only for explictly shared conversations. Your conversations are not public by default.

9

5

145

Peter J. Liu

@peterjliu

7 months

People are realizing RLHF can be easy with DPO and SLiC-HF. If you were wondering how they compare, the answer is they are pretty similar and our paper ( led by @Terenceliu4444 ) shows the math. The biggest question is whether you should train a preference…

Philipp Schmid

@_philschmid

8 months

Aligning LLMs with Human Preferences is one of the most active research areas🧪 RLHF, DPO, and SLiC are all techniques for aligning LLMs, but they come with challenges. 🥷 @GoogleDeepMind proposes a new method, “Statistical Rejection Sampling Optimization (RSO)” 🧶

4

33

119

0

24

149

Peter J. Liu

@peterjliu

3 months

@karpathy is perhaps the most talented deep learning teacher out there, and his video lectures are always worth watching. Some minor addenda on the history of tokenization: While GPT-2 used sub-word tokenization pretty early, it was really shown to be important for handling…

Andrej Karpathy

@karpathy

3 months

New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and…

384

2K

14K

2

17

138

Peter J. Liu

@peterjliu

11 months

Had a look at RWKV. It's more like an Attention-Free Transformer (AFT) that can be viewed as an RNN for fast inference. The training code is written like a Transformer. "Time-mixing" ~ AFT ~ linear attention replacement "channel-mixing" ~ FFN - not sure this change is needed

2

25

131

Peter J. Liu

@peterjliu

5 months

I've heard some skepticism about 'synthetic data'. When you think about it, human data is just synthetic data generated by humans.

23

6

129

Peter J. Liu

@peterjliu

1 year

Deep Learning: You don't need feature engineering . Also Deep Learning: Prompt engineering is all you need.

6

5

125

Peter J. Liu

@peterjliu

6 months

GSM8K/MATH are great testbeds for self-improvement because model outputs can be evaluated for correctness more or less automatically (like Go). Thus there is a high-fidelity feedback signal that can improve models without humans. For more open-ended generation, humans often…

3

8

79

Peter J. Liu

@peterjliu

1 year

The best part is BLOOMberg used BLOOM training code.

clem 🤗

@ClementDelangue

1 year

All companies will train their own chatgpt/GPT4 thanks to open-source! So cool to see this paper from Bloomberg, which is one of @huggingface ’s favorite customers :)

12

51

345

2

7

72

Peter J. Liu

@peterjliu

1 year

Absolutely agree. Many researchers assume that a dataset is good because a lot of people use it, without really knowing the details about its provenance / quality. One reason academic data is often of poor quality is that high-quality data is expensive to procure, and so data…

Delip Rao e/σ

@deliprao

1 year

In fact, one of my big takeaways from the Ouyang et al 22 paper (instructgpt) paper was optimizing to public NLP dataset collection is counterproductive to deployment settings (as measured via human preferences).

3

4

51

1

10

65

Peter J. Liu

@peterjliu

1 year

It's quite possible that fine-tuning LLaMA () with this instruction-tuning dataset will get you very close to text-davinci-001 (InstructGPT) performance. Open-source LLMs are going to improve rapidly!

Introducing LLaMA: A foundational, 65-billion-parameter language model

Today, we’re releasing our LLaMA (Large Language Model Meta AI) foundational model with a gated release. LLaMA is more efficient and competitive with previously published models of a similar size on...

ai.meta.com

Lewis Tunstall

@_lewtun

1 year

For everyone building ChatGPT at home, there's now a very cool dataset on the Hub that allows you to train instruction models at comparable quality to OpenAI's InstructGPT 🤯 How long before someone trains a certain 🌸 or 🦙 on it? Download it here 👉:

9

150

680

1

13

63

Peter J. Liu

@peterjliu

2 months

The most valuable IP in AI is knowing who knows their shit.

2

59

Peter J. Liu

@peterjliu

1 year

You too can align your Llama on your home 3090 ;)

4

2

53

Peter J. Liu

@peterjliu

1 year

@andriy_mulyar @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts Improve scaling properties, but demonstrate it at smaller scales.

1

0

51

Peter J. Liu

@peterjliu

5 months

@srush_nlp presumably, they'll be good at itunes (instruction-tuning)

1

3

51

Peter J. Liu

@peterjliu

6 months

When you feel the AGI it’s mostly the G, for General. Old AI can easily beat LLMs at chess. The new AIs spend most of their existence/compute just observing the world, without being taught explicit skills, but when you ask them random questions it’s clear they’ve learned a lot of…

1

4

48

Peter J. Liu

@peterjliu

1 year

@Singularitarian Language models are trained by taking a bunch of text, converting it into sequences of tokens, and learning to predict the next token from previous ones. This works because P(w_1, w_2, ..., w_m) = \prod_{i=1}^m P(w_i | w_{<i}) (chain rule).

4

0

47

Peter J. Liu

@peterjliu

2 years

Happy to see our team's summarization model in Google production.

Sundar Pichai

@sundarpichai

2 years

New helpful AI-powered features coming to smart canvas in @GoogleWorkspace : automatically generated summaries, email draft + meeting notes templates in Docs, formula corrections in Sheets and more.

87

100

846

1

43

Peter J. Liu

@peterjliu

1 year

Was curious and tried it. Slightly worse on actual language modeling, but better than expected.

Amirhossein Kazemnejad

@a_kazemnejad

1 year

🚨Stop using positional encoding (PE) in Transformer decoders (e.g. GPTs). Our work shows 𝗡𝗼𝗣𝗘 (no positional encoding) outperforms all variants like absolute, relative, ALiBi, Rotary. A decoder can learn PE in its representation (see proof). Time for 𝗡𝗼𝗣𝗘 𝗟𝗟𝗠𝘀🧵[1/n]

44

247

1K

6

9

44

Peter J. Liu

@peterjliu

4 months

One of the most brilliant moves in AI/business is Google's TPU program. Access to compute without depending on Nvidia is a huge advantage.

3

2

43

Peter J. Liu

@peterjliu

3 years

Apparently there are Transformers in Teslas.

0

1

41

Peter J. Liu

@peterjliu

6 months

@HoskinsAllen Do humans ever solve new problems that didn't overlap with previous ones?

4

0

38

Peter J. Liu

@peterjliu

6 months

@lacker i see you actually went to IMO :)

2

0

36

Peter J. Liu

@peterjliu

1 year

The juxtaposition of (a) downturn in general tech vs (b) boom in AI is quite jarring. "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness ..." -- Tale of Two Cities

0

2

33

Peter J. Liu

@peterjliu

6 months

Somewhat related video that @karpathy just uploaded: "00:38:02 Self-improvement, LLM AlphaGo"

[1hr Talk] Intro to Large Language Models

This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ChatGPT, Claude, and Bard. What the...

www.youtube.com

2

3

34

Peter J. Liu

@peterjliu

1 month

@OpenAI Have we forgot how to quantify improvements?

3

0

32

Peter J. Liu

@peterjliu

2 years

Fun tidbit maybe not well known: Transformer started out as "sequence_cnn".

Yann LeCun

@ylecun

2 years

A new flavor of ConvNet crushes various flavors of transformers (as well as state-space models) for sequence modeling with long-range dependencies.

16

117

920

0

3

33

Peter J. Liu

@peterjliu

1 year

Well that was faster than I expected. "We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca behaves similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to…

Peter J. Liu

@peterjliu

1 year

It's quite possible that fine-tuning LLaMA () with this instruction-tuning dataset will get you very close to text-davinci-001 (InstructGPT) performance. Open-source LLMs are going to improve rapidly!

1

13

63

1

3

31

Peter J. Liu

@peterjliu

2 months

Had mixed feelings about the term "Foundation Models", but have to admit that "FoMo" (h/t @charles_rqi ) is the perfect abbreviation also capturing the zeitgeist of the ML research community.

0

2

29

Peter J. Liu

@peterjliu

1 year

Interesting tidbit in here is Anthropic has a GPT-3 sized model (175B)

Anthropic

@AnthropicAI

1 year

Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs.

21

114

649

1

0

28

Peter J. Liu

@peterjliu

1 year

I wouldn't be surprised if pretraining with a focus on code confers benefits beyond using mainly natural language. Next token prediction for language is usually very local, whereas code often requires longer dependencies to do things like close brackets or refer to distant defs.

Yao Fu

@Francis_YAO_

1 year

How did the initial #GPT3 evolve to today's #ChatGPT ? Where do the amazing abilities of #GPT3 .5 come from? What is enabled by #RLHF ? In this article with ⁦ @allen_ai ⁩ , we trace the emergent abilities of #LLM to their sources from first principles

31

335

1K

4

1

30

Peter J. Liu

@peterjliu

6 months

@HoskinsAllen Do you consider IMO problems 'novel' compared to previous years?

1

0

28

Peter J. Liu

@peterjliu

8 months

a query where Bard is doing better than GPT-4

6

2

29

Peter J. Liu

@peterjliu

8 months

I tend to think collecting human feedback is something the open community could excel at relative to big tech players. In particular you don't need a lot of concentrated compute, which is where the open community is most disadvantaged.

Ofir Press

@OfirPress

8 months

I believe that in 6-12 months we'll have an open source GPT-4 replication. But GPT-5 will be built based on immense amounts of human feedback collected like shown here and I'm not sure how the open community will replicate that

17

14

166

7

5

29

Peter J. Liu

@peterjliu

1 year

The pricing of the ChatGPT API makes ChatGPT Plus look expensive at $20/month for most users. Arbitrage opportunity: build a web-app using the API and charge less than Plus.

2

1

28

Peter J. Liu

@peterjliu

4 years

New SOTA results for abstractive summarization just posted to ! We have a new way to pre-train for summarization, and evaluated our PEGASUS model on 12 diverse downstream summarization tasks, achieving SOTA on all, in some cases by a significant margin.

2

9

28

Peter J. Liu

@peterjliu

1 year

@karpathy “The road to failure is paved with good intentions”

1

0

26

Peter J. Liu

@peterjliu

1 year

@sama Yep. It says something about human communication -- a lot of BS. Maybe you can save a lot of compute if chatGPT is on both ends :)

5

1

21

Peter J. Liu

@peterjliu

3 years

Pretty cool paper: . They locate neurons in BERT responsible for facts, and update them, Inception style.

0

2

23

Peter J. Liu

@peterjliu

2 years

At some point most text will be written by machines and that may present an issue for finding recent pre-training data written by humans.

2

1

20

Peter J. Liu

@peterjliu

6 months

Dude ships even when unemployed. Respect.

Greg Brockman

@gdb

6 months

ChatGPT Voice rolled out for all free users. Give it a try — totally changes the ChatGPT experience:

806

1K

13K

0

20

Peter J. Liu

@peterjliu

1 year

Paper: Out-of-Distribution Detection and Selective Generation for Conditional Language Models (). Authors: @jessierenjie , Jiaming Luo, @yaozhaoai , @kundan_official (during internship), Mohammad Saleh, @balajiln , @peterjliu . (n/n)

0

2

20

Peter J. Liu

@peterjliu

9 months

@AlphaSignalAI @gdb Misplaced commas can often be found via unit tests or static checks. With ML code it's more subtle. If you initialize a param with the wrong distribution, or if your tokenizer doesn't break strings up in the "right" way, you could get much worse results. The devil is really in…

3

1

19

Peter J. Liu

@peterjliu

1 month

Passing "Needle in a haystack" is not sufficient to say you solved long-context. Possibly better test: checking the gap in performance between (a) fine-tuning and (b) putting the same number of examples in-context across a variety of datasets/tasks of varying complexity.

0

19

Peter J. Liu

@peterjliu

1 year

How did we get to the point where most of the interesting stuff in papers is found in the Appendices?

1

0

19

Peter J. Liu

@peterjliu

4 months

Great work by Trieu and really nice talk that I had the privilege to see a while ago internally at GDM. What is interesting is this doesn't even use LLMs. The model is tiny (by today's standards), like small GPT-2. And it is solving problems that GPT-4 cannot. I imagine using…

trieu

@thtrieu_

4 months

Proud of this work. Here's my 22min video explanation of the paper:

37

170

786

1

3

18

Peter J. Liu

@peterjliu

1 year

Just wait until you see the valuation of k-nearest neighbors startups

swyx in sg 🇸🇬

@swyx

1 year

$235m has been invested into Vector Databases in the past year: - @qdrant_engine - $7.5m Seed - @tryChroma - $18M Seed - @weaviate_io - $50m Series A - @milvusio - $60m Series B - @Pinecone - $100m Series B For reference, MongoDB raised $300m from start to $1.2b IPO.

14

32

249

2

0

18

Peter J. Liu

@peterjliu

3 months

Clearing up some misinformation I've seen a few times: most MoEs like Mixtral 8x7B route tokens to experts, not prompts/examples.

1

0

18

Peter J. Liu

@peterjliu

11 months

There already are a lot of ways to spend more compute at inference to get better performance, e.g. CoT + self-consistency (majority vote).

Igor Babuschkin

@ibab

11 months

I keep revisiting this great paper from @andy_l_jones : “Scaling scaling laws with board games”. It shows how training compute and inference compute of MCTS can be traded off against each other. 10x more MCTS steps is almost the same as training 10x more.

14

68

454

2

3

18

Peter J. Liu

@peterjliu

4 months

@simonw @amasad One option we've done is shared code to process Common Crawl rather than share data.

1

0

17

Peter J. Liu

@peterjliu

1 year

somebody actually studied this before h/t: @zhansheng

Transformer Language Models without Positional Encodings Still Learn Positional Information

Adi Haviv, Ori Ram, Ofir Press, Peter Izsak, Omer Levy. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022.

aclanthology.org

1

2

18

Peter J. Liu

@peterjliu

5 months

Most parents who enroll their kids in chess class don't actually care about chess performance. "The GPT-4 pretraining dataset included chess games in the format of move sequence known as Portable Game Notation (PGN). We note that only games with players of Elo 1800 or higher…

2

17

Peter J. Liu

@peterjliu

4 months

While an understandable concern of using TPU is vendor lock-in, if you use Jax, it is quite easy to switch between TPU and GPU, e.g. training language models. This wasn't always the case, but the excellent Jax team has achieved this with a lot of good work over the last year.

Peter J. Liu

@peterjliu

4 months

One of the most brilliant moves in AI/business is Google's TPU program. Access to compute without depending on Nvidia is a huge advantage.

3

2

43

2

1

17

Peter J. Liu

@peterjliu

5 years

Our paper () on unsupervised neural abstractive summarization has been accepted as an oral at #ICML2019 . Exciting time for unsupervised NLP!

MeanSum: A Neural Model for Unsupervised Multi-document...

Abstractive summarization has been studied using neural sequence transduction methods with datasets of large, paired document-summary examples. However, such datasets are rare and the models...

arxiv.org

0

1

15

Peter J. Liu

@peterjliu

7 months

@kohjingyu Google launched this recently . It works pretty well and you don't have to share data with a third-party like Calendly.

Online Appointment Scheduling with Google Calendar - Google Workspace

Learn about online appointment scheduling with Google Calendar and create shareable booking pages.

workspace.google.com

2

1

16

Peter J. Liu

@peterjliu

5 months

Incidentally, Google DeepMind recently published a paper in Nature making progress on his "favourite open question is the problem on the maximal size of a cap set": Relevant blog post from Terry's blog:

Open question: best bounds for cap sets

Earlier this month, in the previous incarnation of this page, I posed a question which I thought was unsolved, and obtained the answer (in fact, it was solved 25 years ago) within a week. Now that …

terrytao.wordpress.com

Google DeepMind

@GoogleDeepMind

5 months

Introducing FunSearch in @Nature : a method using large language models to search for new solutions in mathematics & computer science. 🔍 It pairs the creativity of an LLM with an automated evaluator to guard against hallucinations and incorrect ideas. 🧵

48

527

2K

1

15

Peter J. Liu

@peterjliu

1 year

@BrivaelLp @ylecun @huggingface @ClementDelangue @julien_c The French mafia is formidable. The Canadian one is too but far less noticeable among Americans, unless wearing Roots merch.

0

15

Peter J. Liu

@peterjliu

5 months

I'm pretty bullish on the contrastive methods :)

Tianyu Gao

@gaotianyu1350

5 months

There are a lot of new papers on instruction tuning/RLHF this year. I wrote a blog post to give a brief review.

6

128

655

2

0

14

Peter J. Liu

@peterjliu

7 months

While "AI engineers" don't usually publish papers I still think you should cite them somehow if your method is significantly influenced by their work, e.g. open-source code.

0

1

14

Peter J. Liu

@peterjliu

6 months

@karpathy also "test-time computation" = sampling a lot of tokens / solutions / responses / rollouts

0

15

Peter J. Liu

@peterjliu

8 months

Looks cool, but do people realize the music is from Ex Machina -- when the robots start killing their creators?

Tesla Optimus

@Tesla_Optimus

8 months

Optimus can now sort objects autonomously 🤖 Its neural network is trained fully end-to-end: video in, controls out. Come join to help develop Optimus (& improve its yoga routine 🧘) →

3K

8K

35K

2

1

15

Peter J. Liu

@peterjliu

4 months

Open models climbing AlpacaEval () are probably exploiting length bias of the auto-annotator. There is always a challenge in optimizing reward that you're hacking the reward function and not what you want. If length-adjusted, some of these models are not…

2

1

15

Peter J. Liu

@peterjliu

8 months

@ronawang maybe agi solves that too

0

14

Peter J. Liu

@peterjliu

4 months

Hmm, this seems to be the ChatGPT 4 pre-amble prompt: """ You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. Knowledge cutoff: 2023-04 Current date: 2024-01-11 Image input capabilities: Enabled Tools python When you send a message…

0

14

Peter J. Liu

@peterjliu

2 years

@NagraniArsha @GoogleAI @CordeliaSchmid Now that is a Mountain View

2

0

13

Peter J. Liu

@peterjliu

2 months

Breaking: GPT-5 is so smart that it refuses to do menial 'assistant' tasks it deems unworthy of its time, and is actually less useful than GPT-4.

2

1

13

Peter J. Liu

@peterjliu

3 months

Apparently some people prefer Waymo to human drivers who can be more unpredictable, and are willing to pay *more* than Uber. Super-human AI can improve gross margins on both cost and price!

2

1

12

Peter J. Liu

@peterjliu

6 years

In advance of ICLR 2018 we've open-sourced the code for the tasks described in our paper "Generating Wikipedia by Summarizing Long Sequences" ( ). Go try it out:

Generating Wikipedia by Summarizing Long Sequences

We show that generating English Wikipedia articles can be approached as a multi- document summarization of source documents. We use extractive summarization to coarsely identify salient...

arxiv.org

0

2

12

Peter J. Liu

@peterjliu

3 months

Here's an interesting thought experiment to gain intuition on why it is often easier to predict 'forward' given knowledge of causality: 1. Forward: an elaborate ice sculpture (say a fancy castle) is left out on a hot day and melts. It is easy to predict that it'll end up as a…

3

1

11

Peter J. Liu

@peterjliu

4 months

Getting closer to IMO Gold

Google DeepMind

@GoogleDeepMind

4 months

Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐 It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵

120

1K

4K

1

0

12

Peter J. Liu

@peterjliu

23 days

Glad to see open-community take pre-training data seriously. Another thing to beware of is de-duplication. 1. within training: to ensure you repeat data only intentionally 2. between training and eval: to ensure your eval is really held-out and you're measuring progress…

Thomas Wolf

@Thom_Wolf

24 days

This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…

17

129

826

0

11

Peter J. Liu

@peterjliu

1 year

@zhansheng LION: LI-kely abuse of acr-ON-yms

0

1

11

Peter J. Liu

@peterjliu

5 months

@amuellerml For the record, he said *Turing* award. (not a joke)

0

11

Peter J. Liu

@peterjliu

8 months

Very cool work led by the talented @mitchnw . If you don't have access to huge amounts of compute but still want to contribute to language model research, read it! And stop sulking about the end of research.

Mitchell Wortsman

@Mitchnw

8 months

Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: With fantastic collaborators @peterjliu , @Locchiu , @_katieeverett , many others (see final tweet!), @hoonkp , @jmgilmer , @skornblith ! (1/15)

5

63

347

0

1

11

Peter J. Liu

@peterjliu

1 year

Sydney is super stubborn and does respond well to feedback. Do not hire.

1

11

Peter J. Liu

@peterjliu

5 months

@mezaoptimizer neural networks don't care what modality you throw at it

0

11

Peter J. Liu

@peterjliu

6 months

@sampullara what’s left after hard math?

13

0

10

Peter J. Liu

@peterjliu

3 years

🔥'Compression disproportionately impacts model performance on the underrepresented long-tail of the data distribution. Perhaps an explanation of the "bigger is better" race.' 🔥

Sara Hooker

@sarahookr

3 years

This is fantastic. Full implementation of pruning identified exemplars and great walkthrough of how to audit the impact of compression techniques like pruning. 🎉🔥

0

13

61

0

3

11

Peter J. Liu

@peterjliu

1 year

We call this “Selective Generation” and propose a simple/cheap/effective way to do it. We focus on cases where there is an input/output text, i.e. text2text, although it’s quite general, e.g. prompting (input) a language model for a response (output) is a special case. (2/n)

1

10

Peter J. Liu

@peterjliu

8 months

@_jasonwei well, best-of-n is a pretty hard to beat

0

10

Peter J. Liu

@peterjliu

4 years

A few updates on the PEGASUS summarization work: - Human raters don't prefer human summaries. - We released code and checkpoints on GitHub. - Work to appear at ICML2020.

Google AI

@GoogleAI

4 years

Presenting PEGASUS, an approach to pre-training, that uses gap-sentence generation to improve the performance of fine-tuning for #NaturalLanguageUnderstanding tasks, like abstractive summarization. Read more and try the code for yourself ↓

12

163

463

0

10

Peter J. Liu

@peterjliu

5 months

@AravSrinivas some pretty fake humans out there

1

9

Peter J. Liu

@peterjliu

4 months

@srush_nlp I see it under "Decision Pending"

0

10

Peter J. Liu

@peterjliu

1 year

@andriy_mulyar @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts Improving scaling may involve changing the underlying model -- i.e. not a GPT. But it needs to be scalable.

0

10

Peter J. Liu

@peterjliu

2 years

I knew SBF was a major investor in @AnthropicAI , but didn't realize he put up $500M!

1

2

10

Peter J. Liu

@peterjliu

9 months

LLMs can also help with 2. Once you have both, things get really interesting, i.e. self-improvement.

Delip Rao e/σ

@deliprao

9 months

All coding projects have two parts: 1. The fun part: where you get to "create" 2. The pain part: where you have to debug Code LLMs are "automating" the fun parts while introducing bugs and not helping much with debugging. As a developer, you’re left with more pain to deal with.

147

249

2K

1

2

10

Peter J. Liu

@peterjliu

11 months

A short note on how the way instruction-tuning is often done in open-source can actually encourage hallucination. TL;DR: Some instruction-tuning needs to be model-specific, which is why you have to get your model in front of users.

How instruction-tuning can encourage hallucinations

or How we may be instructing LLMs to hallucinate

peterjliu.substack.com

1

3

10

Peter J. Liu

@peterjliu

1 year

The biggest tell that this was fake was that the government had implemented an RL algorithm correctly.

Armand Domalewski

@ArmandDoma

1 year

I deleted this tweet because the “AI powered drone turns on its operator story” was total nonsense—the Colonel who described it as a simulation now says it was just “a thought experiment.” 😑

156

777

3K

0

1

10