Peter J. Liu Profile
Peter J. Liu

@peterjliu

4,411
Followers
1,926
Following
39
Media
624
Statuses

Research Scientist @ Google B̵r̵a̵i̵n̵ DeepMind, frontier language models research (aka chatbot engineer). Opinions are my own. 🤖🔄🚀

Don't wanna be here? Send us removal request.
Pinned Tweet
@peterjliu
Peter J. Liu
1 year
Amazing how much progress in AI is due to two chain rules: one from calculus, the other from probability.
17
101
1K
@peterjliu
Peter J. Liu
2 months
What do you call the disparity between GPU-rich and GPU-poor? Jensen's inequality
19
152
1K
@peterjliu
Peter J. Liu
1 year
Here is our “slick” RLHF-alternative without RL: (SLiC-HF) TL;DR: Works as well as RLHF, but a lot simpler. About as easy and efficient as fine-tuning. Much better than simply fine-tuning on good examples. From great collaborators: @yaozhaoai ,…
Tweet media one
@peterjliu
Peter J. Liu
1 year
The true star of RLHF is F=feedback. You may not need RL and you may not need humans.
19
40
394
11
168
836
@peterjliu
Peter J. Liu
6 months
If an AI system was able to get Gold at the International Mathematical Olympiad (IMO), what would be your reaction?
214
26
707
@peterjliu
Peter J. Liu
2 months
The gpt-4 tokenizer is open source If you look at the code, an interesting finding is the presence of special tokens FIM_*. This is probably for Fill-in-the-middle pretraining.
6
110
626
@peterjliu
Peter J. Liu
5 months
The greatest, most productive living mathematician is using LLMs to improve his work productivity ... in math. 🤯 "I could feed GPT-4 the first few PDF pages of a recent math preprint and get it to generate a half-dozen intelligent questions that an expert attending a talk on…
@bradneuberg
Brad Neuberg
5 months
Terence Tao, the famous mathematician, on using LLMs to aid in mathematical research: "2023-level AI can already generate suggestive hints and promising leads to a working mathematician and participate actively in the decision-making process. When integrated with tools such as…
23
281
2K
9
85
539
@peterjliu
Peter J. Liu
1 year
The true star of RLHF is F=feedback. You may not need RL and you may not need humans.
19
40
394
@peterjliu
Peter J. Liu
3 years
We are hiring for a full-time researcher/engineer in the Brain (Google Research) team who will focus on text generation research and its applications. A wide variety of backgrounds and experiences will be considered. DM if you're interested or have leads.
13
67
350
@peterjliu
Peter J. Liu
5 years
My team has open-sourced a pure python implementation of ROUGE (Apache 2 license) that can be used as a replacement for the original perl version (which also had an ambiguous license). @harvardnlp @stanfordnlp
4
79
240
@peterjliu
Peter J. Liu
3 months
interesting paper on arxiv posted recently "Arrows of Time for Large Language Models" TL;DR: it is easier for larger models to predict in forward direction (next-token), rather than backward (prev-token). The larger the model, the more pronounced the…
5
51
236
@peterjliu
Peter J. Liu
6 months
People are not well-calibrated on AI progress in mathematical reasoning. GSM8K () is a common task testing basic grade-school math ability, but was only introduced in Oct 2021. Manifold markets only thought there was a ~50% chance that a system would get…
3
30
236
@peterjliu
Peter J. Liu
6 months
Sounds like OpenAI got some good numbers on GSM8K, possibly MATH. Speculating, but there is a 'star' in STaR , a technique that fine-tunes a model to its own (better) outputs, which some people see as 'self-improvement'.
9
22
207
@peterjliu
Peter J. Liu
1 year
As generative language models hit production, there’s increased risk from bad outputs. It’s useful to know when to *not* show the outputs to the user, or defer to better, larger models (at the cost of compute). A 🧵on an ICLR 2023 paper from Google. (1/n)
1
21
158
@peterjliu
Peter J. Liu
8 months
@gaganghotra_ That is only for explictly shared conversations. Your conversations are not public by default.
9
5
145
@peterjliu
Peter J. Liu
7 months
People are realizing RLHF can be easy with DPO and SLiC-HF. If you were wondering how they compare, the answer is they are pretty similar and our paper ( led by @Terenceliu4444 ) shows the math. The biggest question is whether you should train a preference…
Tweet media one
Tweet media two
@_philschmid
Philipp Schmid
8 months
Aligning LLMs with Human Preferences is one of the most active research areas🧪  RLHF, DPO, and SLiC are all techniques for aligning LLMs, but they come with challenges. 🥷 @GoogleDeepMind proposes a new method, “Statistical Rejection Sampling Optimization (RSO)” 🧶
Tweet media one
4
33
119
0
24
149
@peterjliu
Peter J. Liu
3 months
@karpathy is perhaps the most talented deep learning teacher out there, and his video lectures are always worth watching. Some minor addenda on the history of tokenization: While GPT-2 used sub-word tokenization pretty early, it was really shown to be important for handling…
@karpathy
Andrej Karpathy
3 months
New (2h13m 😅) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and…
Tweet media one
384
2K
14K
2
17
138
@peterjliu
Peter J. Liu
11 months
Had a look at RWKV. It's more like an Attention-Free Transformer (AFT) that can be viewed as an RNN for fast inference. The training code is written like a Transformer. "Time-mixing" ~ AFT ~ linear attention replacement "channel-mixing" ~ FFN - not sure this change is needed
Tweet media one
2
25
131
@peterjliu
Peter J. Liu
5 months
I've heard some skepticism about 'synthetic data'. When you think about it, human data is just synthetic data generated by humans.
23
6
129
@peterjliu
Peter J. Liu
1 year
Deep Learning: You don't need feature engineering . Also Deep Learning: Prompt engineering is all you need.
6
5
125
@peterjliu
Peter J. Liu
6 months
GSM8K/MATH are great testbeds for self-improvement because model outputs can be evaluated for correctness more or less automatically (like Go). Thus there is a high-fidelity feedback signal that can improve models without humans. For more open-ended generation, humans often…
3
8
79
@peterjliu
Peter J. Liu
1 year
The best part is BLOOMberg used BLOOM training code.
Tweet media one
@ClementDelangue
clem 🤗
1 year
All companies will train their own chatgpt/GPT4 thanks to open-source! So cool to see this paper from Bloomberg, which is one of @huggingface ’s favorite customers :)
12
51
345
2
7
72
@peterjliu
Peter J. Liu
1 year
Absolutely agree. Many researchers assume that a dataset is good because a lot of people use it, without really knowing the details about its provenance / quality. One reason academic data is often of poor quality is that high-quality data is expensive to procure, and so data…
@deliprao
Delip Rao e/σ
1 year
In fact, one of my big takeaways from the Ouyang et al 22 paper (instructgpt) paper was optimizing to public NLP dataset collection is counterproductive to deployment settings (as measured via human preferences).
Tweet media one
Tweet media two
3
4
51
1
10
65
@peterjliu
Peter J. Liu
1 year
It's quite possible that fine-tuning LLaMA () with this instruction-tuning dataset will get you very close to text-davinci-001 (InstructGPT) performance. Open-source LLMs are going to improve rapidly!
@_lewtun
Lewis Tunstall
1 year
For everyone building ChatGPT at home, there's now a very cool dataset on the Hub that allows you to train instruction models at comparable quality to OpenAI's InstructGPT 🤯 How long before someone trains a certain 🌸 or 🦙 on it? Download it here 👉:
Tweet media one
9
150
680
1
13
63
@peterjliu
Peter J. Liu
2 months
The most valuable IP in AI is knowing who knows their shit.
2
2
59
@peterjliu
Peter J. Liu
1 year
You too can align your Llama on your home 3090 ;)
4
2
53
@peterjliu
Peter J. Liu
1 year
@andriy_mulyar @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts Improve scaling properties, but demonstrate it at smaller scales.
1
0
51
@peterjliu
Peter J. Liu
5 months
@srush_nlp presumably, they'll be good at itunes (instruction-tuning)
1
3
51
@peterjliu
Peter J. Liu
6 months
When you feel the AGI it’s mostly the G, for General. Old AI can easily beat LLMs at chess. The new AIs spend most of their existence/compute just observing the world, without being taught explicit skills, but when you ask them random questions it’s clear they’ve learned a lot of…
1
4
48
@peterjliu
Peter J. Liu
1 year
@Singularitarian Language models are trained by taking a bunch of text, converting it into sequences of tokens, and learning to predict the next token from previous ones. This works because P(w_1, w_2, ..., w_m) = \prod_{i=1}^m P(w_i | w_{<i}) (chain rule).
4
0
47
@peterjliu
Peter J. Liu
2 years
Happy to see our team's summarization model in Google production.
@sundarpichai
Sundar Pichai
2 years
New helpful AI-powered features coming to smart canvas in @GoogleWorkspace : automatically generated summaries, email draft + meeting notes templates in Docs, formula corrections in Sheets and more.
87
100
846
1
1
43
@peterjliu
Peter J. Liu
1 year
Was curious and tried it. Slightly worse on actual language modeling, but better than expected.
@a_kazemnejad
Amirhossein Kazemnejad
1 year
🚨Stop using positional encoding (PE) in Transformer decoders (e.g. GPTs). Our work shows 𝗡𝗼𝗣𝗘 (no positional encoding) outperforms all variants like absolute, relative, ALiBi, Rotary. A decoder can learn PE in its representation (see proof). Time for 𝗡𝗼𝗣𝗘 𝗟𝗟𝗠𝘀🧵[1/n]
Tweet media one
Tweet media two
44
247
1K
6
9
44
@peterjliu
Peter J. Liu
4 months
One of the most brilliant moves in AI/business is Google's TPU program. Access to compute without depending on Nvidia is a huge advantage.
3
2
43
@peterjliu
Peter J. Liu
3 years
Apparently there are Transformers in Teslas.
Tweet media one
0
1
41
@peterjliu
Peter J. Liu
6 months
@HoskinsAllen Do humans ever solve new problems that didn't overlap with previous ones?
4
0
38
@peterjliu
Peter J. Liu
6 months
@lacker i see you actually went to IMO :)
2
0
36
@peterjliu
Peter J. Liu
1 year
The juxtaposition of (a) downturn in general tech vs (b) boom in AI is quite jarring. "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness ..." -- Tale of Two Cities
0
2
33
@peterjliu
Peter J. Liu
1 month
@OpenAI Have we forgot how to quantify improvements?
3
0
32
@peterjliu
Peter J. Liu
2 years
Fun tidbit maybe not well known: Transformer started out as "sequence_cnn".
@ylecun
Yann LeCun
2 years
A new flavor of ConvNet crushes various flavors of transformers (as well as state-space models) for sequence modeling with long-range dependencies.
16
117
920
0
3
33
@peterjliu
Peter J. Liu
1 year
Well that was faster than I expected. "We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca behaves similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to…
@peterjliu
Peter J. Liu
1 year
It's quite possible that fine-tuning LLaMA () with this instruction-tuning dataset will get you very close to text-davinci-001 (InstructGPT) performance. Open-source LLMs are going to improve rapidly!
1
13
63
1
3
31
@peterjliu
Peter J. Liu
2 months
Had mixed feelings about the term "Foundation Models", but have to admit that "FoMo" (h/t @charles_rqi ) is the perfect abbreviation also capturing the zeitgeist of the ML research community.
0
2
29
@peterjliu
Peter J. Liu
1 year
Interesting tidbit in here is Anthropic has a GPT-3 sized model (175B)
@AnthropicAI
Anthropic
1 year
Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs.
Tweet media one
21
114
649
1
0
28
@peterjliu
Peter J. Liu
1 year
I wouldn't be surprised if pretraining with a focus on code confers benefits beyond using mainly natural language. Next token prediction for language is usually very local, whereas code often requires longer dependencies to do things like close brackets or refer to distant defs.
@Francis_YAO_
Yao Fu
1 year
How did the initial #GPT3 evolve to today's #ChatGPT ? Where do the amazing abilities of #GPT3 .5 come from? What is enabled by #RLHF ? In this article with ⁦ @allen_ai ⁩ , we trace the emergent abilities of #LLM to their sources from first principles
31
335
1K
4
1
30
@peterjliu
Peter J. Liu
6 months
@HoskinsAllen Do you consider IMO problems 'novel' compared to previous years?
1
0
28
@peterjliu
Peter J. Liu
8 months
a query where Bard is doing better than GPT-4
Tweet media one
Tweet media two
6
2
29
@peterjliu
Peter J. Liu
8 months
I tend to think collecting human feedback is something the open community could excel at relative to big tech players. In particular you don't need a lot of concentrated compute, which is where the open community is most disadvantaged.
@OfirPress
Ofir Press
8 months
I believe that in 6-12 months we'll have an open source GPT-4 replication. But GPT-5 will be built based on immense amounts of human feedback collected like shown here and I'm not sure how the open community will replicate that
17
14
166
7
5
29
@peterjliu
Peter J. Liu
1 year
The pricing of the ChatGPT API makes ChatGPT Plus look expensive at $20/month for most users. Arbitrage opportunity: build a web-app using the API and charge less than Plus.
2
1
28
@peterjliu
Peter J. Liu
4 years
New SOTA results for abstractive summarization just posted to ! We have a new way to pre-train for summarization, and evaluated our PEGASUS model on 12 diverse downstream summarization tasks, achieving SOTA on all, in some cases by a significant margin.
Tweet media one
2
9
28
@peterjliu
Peter J. Liu
1 year
@karpathy “The road to failure is paved with good intentions”
1
0
26
@peterjliu
Peter J. Liu
1 year
@sama Yep. It says something about human communication -- a lot of BS. Maybe you can save a lot of compute if chatGPT is on both ends :)
5
1
21
@peterjliu
Peter J. Liu
3 years
Pretty cool paper: . They locate neurons in BERT responsible for facts, and update them, Inception style.
Tweet media one
0
2
23
@peterjliu
Peter J. Liu
2 years
At some point most text will be written by machines and that may present an issue for finding recent pre-training data written by humans.
2
1
20
@peterjliu
Peter J. Liu
6 months
Dude ships even when unemployed. Respect.
@gdb
Greg Brockman
6 months
ChatGPT Voice rolled out for all free users. Give it a try — totally changes the ChatGPT experience:
806
1K
13K
0
0
20
@peterjliu
Peter J. Liu
1 year
Paper: Out-of-Distribution Detection and Selective Generation for Conditional Language Models (). Authors: @jessierenjie , Jiaming Luo, @yaozhaoai , @kundan_official (during internship), Mohammad Saleh, @balajiln , @peterjliu . (n/n)
Tweet media one
0
2
20
@peterjliu
Peter J. Liu
9 months
@AlphaSignalAI @gdb Misplaced commas can often be found via unit tests or static checks. With ML code it's more subtle. If you initialize a param with the wrong distribution, or if your tokenizer doesn't break strings up in the "right" way, you could get much worse results. The devil is really in…
3
1
19
@peterjliu
Peter J. Liu
1 month
Passing "Needle in a haystack" is not sufficient to say you solved long-context. Possibly better test: checking the gap in performance between (a) fine-tuning and (b) putting the same number of examples in-context across a variety of datasets/tasks of varying complexity.
0
0
19
@peterjliu
Peter J. Liu
1 year
How did we get to the point where most of the interesting stuff in papers is found in the Appendices?
1
0
19
@peterjliu
Peter J. Liu
4 months
Great work by Trieu and really nice talk that I had the privilege to see a while ago internally at GDM. What is interesting is this doesn't even use LLMs. The model is tiny (by today's standards), like small GPT-2. And it is solving problems that GPT-4 cannot. I imagine using…
@thtrieu_
trieu
4 months
Proud of this work. Here's my 22min video explanation of the paper:
37
170
786
1
3
18
@peterjliu
Peter J. Liu
1 year
Just wait until you see the valuation of k-nearest neighbors startups
@swyx
swyx in sg 🇸🇬
1 year
$235m has been invested into Vector Databases in the past year: - @qdrant_engine - $7.5m Seed - @tryChroma - $18M Seed - @weaviate_io - $50m Series A - @milvusio - $60m Series B - @Pinecone - $100m Series B For reference, MongoDB raised $300m from start to $1.2b IPO.
14
32
249
2
0
18
@peterjliu
Peter J. Liu
3 months
Clearing up some misinformation I've seen a few times: most MoEs like Mixtral 8x7B route tokens to experts, not prompts/examples.
1
0
18
@peterjliu
Peter J. Liu
11 months
There already are a lot of ways to spend more compute at inference to get better performance, e.g. CoT + self-consistency (majority vote).
@ibab
Igor Babuschkin
11 months
I keep revisiting this great paper from @andy_l_jones : “Scaling scaling laws with board games”. It shows how training compute and inference compute of MCTS can be traded off against each other. 10x more MCTS steps is almost the same as training 10x more.
Tweet media one
14
68
454
2
3
18
@peterjliu
Peter J. Liu
4 months
@simonw @amasad One option we've done is shared code to process Common Crawl rather than share data.
1
0
17
@peterjliu
Peter J. Liu
5 months
Most parents who enroll their kids in chess class don't actually care about chess performance. "The GPT-4 pretraining dataset included chess games in the format of move sequence known as Portable Game Notation (PGN). We note that only games with players of Elo 1800 or higher…
2
2
17
@peterjliu
Peter J. Liu
4 months
While an understandable concern of using TPU is vendor lock-in, if you use Jax, it is quite easy to switch between TPU and GPU, e.g. training language models. This wasn't always the case, but the excellent Jax team has achieved this with a lot of good work over the last year.
@peterjliu
Peter J. Liu
4 months
One of the most brilliant moves in AI/business is Google's TPU program. Access to compute without depending on Nvidia is a huge advantage.
3
2
43
2
1
17
@peterjliu
Peter J. Liu
7 months
@kohjingyu Google launched this recently . It works pretty well and you don't have to share data with a third-party like Calendly.
2
1
16
@peterjliu
Peter J. Liu
5 months
Incidentally, Google DeepMind recently published a paper in Nature making progress on his "favourite open question is the problem on the maximal size of a cap set": Relevant blog post from Terry's blog:
@GoogleDeepMind
Google DeepMind
5 months
Introducing FunSearch in @Nature : a method using large language models to search for new solutions in mathematics & computer science. 🔍 It pairs the creativity of an LLM with an automated evaluator to guard against hallucinations and incorrect ideas. 🧵
48
527
2K
1
1
15
@peterjliu
Peter J. Liu
1 year
@BrivaelLp @ylecun @huggingface @ClementDelangue @julien_c The French mafia is formidable. The Canadian one is too but far less noticeable among Americans, unless wearing Roots merch.
0
0
15
@peterjliu
Peter J. Liu
5 months
I'm pretty bullish on the contrastive methods :)
@gaotianyu1350
Tianyu Gao
5 months
There are a lot of new papers on instruction tuning/RLHF this year. I wrote a blog post to give a brief review.
6
128
655
2
0
14
@peterjliu
Peter J. Liu
7 months
While "AI engineers" don't usually publish papers I still think you should cite them somehow if your method is significantly influenced by their work, e.g. open-source code.
0
1
14
@peterjliu
Peter J. Liu
6 months
@karpathy also "test-time computation" = sampling a lot of tokens / solutions / responses / rollouts
0
0
15
@peterjliu
Peter J. Liu
8 months
Looks cool, but do people realize the music is from Ex Machina -- when the robots start killing their creators?
@Tesla_Optimus
Tesla Optimus
8 months
Optimus can now sort objects autonomously 🤖 Its neural network is trained fully end-to-end: video in, controls out. Come join to help develop Optimus (& improve its yoga routine 🧘) →
3K
8K
35K
2
1
15
@peterjliu
Peter J. Liu
4 months
Open models climbing AlpacaEval () are probably exploiting length bias of the auto-annotator. There is always a challenge in optimizing reward that you're hacking the reward function and not what you want. If length-adjusted, some of these models are not…
Tweet media one
2
1
15
@peterjliu
Peter J. Liu
8 months
@ronawang maybe agi solves that too
0
0
14
@peterjliu
Peter J. Liu
4 months
Hmm, this seems to be the ChatGPT 4 pre-amble prompt: """ You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. Knowledge cutoff: 2023-04 Current date: 2024-01-11 Image input capabilities: Enabled Tools python When you send a message…
0
0
14
@peterjliu
Peter J. Liu
2 years
2
0
13
@peterjliu
Peter J. Liu
2 months
Breaking: GPT-5 is so smart that it refuses to do menial 'assistant' tasks it deems unworthy of its time, and is actually less useful than GPT-4.
2
1
13
@peterjliu
Peter J. Liu
3 months
Apparently some people prefer Waymo to human drivers who can be more unpredictable, and are willing to pay *more* than Uber. Super-human AI can improve gross margins on both cost and price!
2
1
12
@peterjliu
Peter J. Liu
6 years
In advance of ICLR 2018 we've open-sourced the code for the tasks described in our paper "Generating Wikipedia by Summarizing Long Sequences" ( ). Go try it out:
0
2
12
@peterjliu
Peter J. Liu
3 months
Here's an interesting thought experiment to gain intuition on why it is often easier to predict 'forward' given knowledge of causality: 1. Forward: an elaborate ice sculpture (say a fancy castle) is left out on a hot day and melts. It is easy to predict that it'll end up as a…
3
1
11
@peterjliu
Peter J. Liu
4 months
Getting closer to IMO Gold
@GoogleDeepMind
Google DeepMind
4 months
Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐 It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵
120
1K
4K
1
0
12
@peterjliu
Peter J. Liu
23 days
Glad to see open-community take pre-training data seriously. Another thing to beware of is de-duplication. 1. within training: to ensure you repeat data only intentionally 2. between training and eval: to ensure your eval is really held-out and you're measuring progress…
@Thom_Wolf
Thomas Wolf
24 days
This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…
17
129
826
0
0
11
@peterjliu
Peter J. Liu
1 year
@zhansheng LION: LI-kely abuse of acr-ON-yms
0
1
11
@peterjliu
Peter J. Liu
5 months
@amuellerml For the record, he said *Turing* award. (not a joke)
0
0
11
@peterjliu
Peter J. Liu
8 months
Very cool work led by the talented @mitchnw . If you don't have access to huge amounts of compute but still want to contribute to language model research, read it! And stop sulking about the end of research.
@Mitchnw
Mitchell Wortsman
8 months
Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: With fantastic collaborators @peterjliu , @Locchiu , @_katieeverett , many others (see final tweet!), @hoonkp , @jmgilmer , @skornblith ! (1/15)
Tweet media one
5
63
347
0
1
11
@peterjliu
Peter J. Liu
1 year
Sydney is super stubborn and does respond well to feedback. Do not hire.
Tweet media one
1
1
11
@peterjliu
Peter J. Liu
5 months
@mezaoptimizer neural networks don't care what modality you throw at it
0
0
11
@peterjliu
Peter J. Liu
6 months
@sampullara what’s left after hard math?
13
0
10
@peterjliu
Peter J. Liu
3 years
🔥'Compression disproportionately impacts model performance on the underrepresented long-tail of the data distribution. Perhaps an explanation of the "bigger is better" race.' 🔥
@sarahookr
Sara Hooker
3 years
This is fantastic. Full implementation of pruning identified exemplars and great walkthrough of how to audit the impact of compression techniques like pruning. 🎉🔥
0
13
61
0
3
11
@peterjliu
Peter J. Liu
1 year
We call this “Selective Generation” and propose a simple/cheap/effective way to do it. We focus on cases where there is an input/output text, i.e. text2text, although it’s quite general, e.g. prompting (input) a language model for a response (output) is a special case. (2/n)
Tweet media one
1
1
10
@peterjliu
Peter J. Liu
8 months
@_jasonwei well, best-of-n is a pretty hard to beat
0
0
10
@peterjliu
Peter J. Liu
4 years
A few updates on the PEGASUS summarization work: - Human raters don't prefer human summaries. - We released code and checkpoints on GitHub. - Work to appear at ICML2020.
@GoogleAI
Google AI
4 years
Presenting PEGASUS, an approach to pre-training, that uses gap-sentence generation to improve the performance of fine-tuning for #NaturalLanguageUnderstanding tasks, like abstractive summarization. Read more and try the code for yourself ↓
12
163
463
0
0
10
@peterjliu
Peter J. Liu
5 months
@AravSrinivas some pretty fake humans out there
1
1
9
@peterjliu
Peter J. Liu
4 months
@srush_nlp I see it under "Decision Pending"
Tweet media one
0
0
10
@peterjliu
Peter J. Liu
1 year
@andriy_mulyar @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts Improving scaling may involve changing the underlying model -- i.e. not a GPT. But it needs to be scalable.
0
0
10
@peterjliu
Peter J. Liu
2 years
I knew SBF was a major investor in @AnthropicAI , but didn't realize he put up $500M!
Tweet media one
1
2
10
@peterjliu
Peter J. Liu
9 months
LLMs can also help with 2. Once you have both, things get really interesting, i.e. self-improvement.
@deliprao
Delip Rao e/σ
9 months
All coding projects have two parts: 1. The fun part: where you get to "create" 2. The pain part: where you have to debug Code LLMs are "automating" the fun parts while introducing bugs and not helping much with debugging. As a developer, you’re left with more pain to deal with.
147
249
2K
1
2
10
@peterjliu
Peter J. Liu
11 months
A short note on how the way instruction-tuning is often done in open-source can actually encourage hallucination. TL;DR: Some instruction-tuning needs to be model-specific, which is why you have to get your model in front of users.
1
3
10
@peterjliu
Peter J. Liu
1 year
The biggest tell that this was fake was that the government had implemented an RL algorithm correctly.
@ArmandDoma
Armand Domalewski
1 year
I deleted this tweet because the “AI powered drone turns on its operator story” was total nonsense—the Colonel who described it as a simulation now says it was just “a thought experiment.” 😑
Tweet media one
Tweet media two
156
777
3K
0
1
10