Andrew Trask Profile Banner
Andrew Trask Profile
Andrew Trask

@iamtrask

74,242
Followers
190
Following
24
Media
851
Statuses

@openminedorg , @GoogleDeepMind ethics team, @OxfordUni phd candidate, @UN pet lab, @GovAI_ , creator of #GrokkingDeepLearning , NALU, and sense2vec

Oxford, UK
Joined November 2012
Don't wanna be here? Send us removal request.
@iamtrask
Andrew Trask
1 year
I wrote #beginner level book teaching Deep Learning - its goal is to be the easiest intro possible In the book, each lesson builds a neural component *from scratch* in #NumPy Each *from scratch* toy code example is in the Github below #100DaysOfMLCode
Tweet media one
63
912
5K
@iamtrask
Andrew Trask
2 years
This series of #Jupyter #Notebooks is a VERY nice step-by-step intro to data science and machine learning. If you're just starting out - I recommend walking through these notebooks as a first primer Definitely a great #100DaysOfMLCode project
Tweet media one
25
325
2K
@iamtrask
Andrew Trask
2 years
Machine Learning in a company is 10% Data Science & 90% other challenges It's VERY hard. Everything in this guide is ON POINT, and it's stuff you won't learn in an ML book "Best Practices of ML Engineering" This is a lifesaver #100DaysOfMLCode project
Tweet media one
22
290
2K
@iamtrask
Andrew Trask
2 years
Attention is one of the most important breakthroughs in AI - the foundation of Transformers This @distillpub is the best explanation of it I've seen. For #100DaysOfMLCode / #100DaysOfCode folks - try building an attention mechanism from scratch!
Tweet media one
16
255
1K
@iamtrask
Andrew Trask
1 year
If you've wondered - "Which Deep Learning optimizer should I use? SGD? Adagrad? RMSProp?" - this blogpost by @seb_ruder is the best explanation I've seen. It's a surprisingly easy read! Definitely a good #100DaysOfMLCode project.
Tweet media one
25
276
1K
@iamtrask
Andrew Trask
4 months
i wrote #beginner level book teaching #deeplearning its goal is to be the easiest intro possible each lesson builds a neural component *from scratch* in #NumPy each *from scratch* toy code example is in the #Github below #100DaysOfMLCode
Tweet media one
13
207
922
@iamtrask
Andrew Trask
10 months
For anyone who has ever thought - "Can I learn the math needed for Deep Learning all in one place (and maybe skip the other stuff)?" - this is quite a nice resource. "The Matrix Calculus You Need For Deep Learning" (Table of Contents Below)
Tweet media one
11
209
1K
@iamtrask
Andrew Trask
8 months
LLMs believe every datapoint they see with 100% conviction. A LLM never says, "this doesn't make sense... let me exclude it from my training data". Everything is taken as truth. It is actually worse than this. Because of how perplexity/SGD/backprop works, datapoints which…
112
146
1K
@iamtrask
Andrew Trask
10 months
"A Beginner's Guide to the Mathematics of Neural Networks" ... a nice gem 🙂
Tweet media one
3
188
857
@iamtrask
Andrew Trask
2 years
Machine Learning is WAY more than just picking a model & calling .fit() or .train() on data It's a process... thinking about your problem in terms of correlation & features This step-by-step guide is an excellent intro to this process #100DaysOfMLCode
Tweet media one
17
164
723
@iamtrask
Andrew Trask
10 months
This series of Jupyter Notebooks is a VERY nice step-by-step introduction to data science and machine learning. If you're just starting out - I recommend walking through these notebooks as a first primer Definitely a great #100DaysOfMLCode project
Tweet media one
13
188
735
@iamtrask
Andrew Trask
2 years
#numpy is an irreplaceable part of every practitioner's Deep Learning toolkit. The best way to learn NumPy that I know of is this crash course If NumPy is new to you - definitely include this early in your #100DaysOfMLCode - you won't regret it!
Tweet media one
13
134
633
@iamtrask
Andrew Trask
2 years
Wow - in 8 tweets I just learned and un-learned more about the mysteries of deep neural networks than I've probably learned or un-learned about them in the last two years. This is the start of something really really big... also a huge door opened for federated learning.
@SamuelAinsworth
Samuel "curry-howard fanboi" Ainsworth
2 years
📜🚨📜🚨 NN loss landscapes are full of permutation symmetries, ie. swap any 2 units in a hidden layer. What does this mean for SGD? Is this practically useful? For the past 5 yrs these Qs have fascinated me. Today, I am ready to announce "Git Re-Basin"!
63
586
3K
7
81
613
@iamtrask
Andrew Trask
2 years
Interested in learning Reinforcement Learning? This free course from @dennybritz is the highest quality & most comprehensive collection of online resources I've seen Prepared in order of difficulty For #100DaysOfMLCode folks - take 1-2 days per chapter
Tweet media one
6
118
561
@iamtrask
Andrew Trask
1 year
My taxi driver just asked me about ChatGPT
46
18
542
@iamtrask
Andrew Trask
7 months
For anyone interested in future LLM development One of the bigger unsolved deep learning problems: learning of hierarchical structure Example: we still use tokenizers to train SOTA LLMs. We should be able to feed in bits/chars/bytes and get SOTA Related: larger context window
19
76
523
@iamtrask
Andrew Trask
7 months
This is the 1st rigorous treatment (and 3rd verification) I've seen IMO - this is great for AI safety! It means that LLMs are doing *exactly* what they're trained to do — estimate next-word probability based on data. Missing data? P(word)==0 So where is the AI logic? 1/🧵
@OwainEvans_UK
Owain Evans
7 months
Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!
Tweet media one
176
712
4K
14
82
462
@iamtrask
Andrew Trask
2 years
He’s right. Everybody uses the same GPUs, the same frameworks, and the same algorithms. Data is the thing some have and others don’t. Want to know the future of AI? Don’t get distracted. It’s always been about who controls the data. Everything else is rapidly commoditising.
@paulg
Paul Graham
2 years
Data seems to be the limiting factor, rather than model building technique or computing resources. And if you have the app, you often have the data too.
53
53
683
14
64
310
@iamtrask
Andrew Trask
3 months
Fwiw - if you're new to reading AI papers 👇 The tipping point for me was when I spent 4 weeks reading one paper per week. (two or three of them were @RichardSocher 's back in ~2012) For each paper, I read each sentence and wrote 1-2 paragraphs about it, summarising its…
@Suhail
Suhail
4 months
I was thinking about Karpathy's "only compare yourself to younger you" and how I found reading AI research papers intimidating in 2022 because I didn't understand the terminology + math symbols. It really just takes practice reading 100s and then suddenly it's no big deal.
59
124
2K
7
57
292
@iamtrask
Andrew Trask
5 months
Excited to share I've moved to the @GoogleDeepMind ethics research team — and I'm honored to have played my small part in the Gemini release from that new post! Lots of multi-modal features coming to an app near you!
@demishassabis
Demis Hassabis
5 months
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses
Tweet media one
356
2K
12K
6
11
284
@iamtrask
Andrew Trask
9 months
For all you *aspiring* @PyTorch users! @KaiLashArul has written a *very* nice fast-track intro! #100DaysOfMLCode #100DaysOfCode
Tweet media one
2
56
268
@iamtrask
Andrew Trask
2 years
It just occurred to me - if you zoom out enough - working from home is the norm - not the exception. For a bajillion years people worked in the local vicinity of where they lived. Farming, hunting, and caring for their house and home. Going to an office to work is weird.
18
19
213
@iamtrask
Andrew Trask
2 years
If one professor hadn't decided to issue an override to let me into an already-full CS 101 course after the deadline, I probably wouldn't be in computer science at all, much less AI. One decision from one teacher changed my life.
6
12
182
@iamtrask
Andrew Trask
16 days
After ~5yrs of work, @emmabluemke , Teddy Collins, @bmgarfinkel , @KEricDrexler , @ClaudiaGhezzou , @IasonGabriel , @AllanDafoe , @wsisaac , and I have published an updated paper. This work is co-authored by an extraordinary team I'm grateful to know, and I expect to work off this…
Tweet media one
5
44
171
@iamtrask
Andrew Trask
2 years
For anyone suffering from imposter syndrome Over a decade ago a guy named @chuckainlay told me something I'll never forget "You're never going to be who you think you want to be until you think you *are* who you want to be" He probably doesn't remember but I'll never forget it
4
11
115
@iamtrask
Andrew Trask
2 months
fun fact in #disinformation research: filtering out disinformation is a solved problem — has been for 1000s of years however, disinformation’s solution creates major problems for liberalism and democracy that’s the *real* #disinformation problem 1/🧵 IMO, @karpathy ’s post is…
@karpathy
Andrej Karpathy
2 months
Reading a tweet is a bit like downloading an (attacker-controlled) executable that you instantly run on your brain. Each one elicits emotions, suggests knowledge, nudges world-view. In the future it might feel surprising that we allowed direct, untrusted information to brain.
793
1K
11K
4
22
107
@iamtrask
Andrew Trask
2 months
If you're a US-based academic/student and need access to more #data (e.g., health data, economic data, social media data, education data, etc.) or #compute (e.g. GPU credits) — shoot me a DM in the next 48 hrs. Gotta be affiliated with a US academic institution I'm afraid, but…
8
20
87
@iamtrask
Andrew Trask
1 year
@janleike In 2015 my colleague and I trained a language model on non-aligned English/Spanish text. They ended up creating aligned vector spaces which generalized in this way. We think the vector spaces found the same basis function because of punctuation + similar overall shape in vspace.
3
3
70
@iamtrask
Andrew Trask
5 months
Before getting into AI, I attended undergrad at Belmont University, where I studied commercial music and music business. It's one of the top schools for this. In the "AI threatens art" narrative, something is forgotten. Art's beauty is grounded in its ability to tell stories of…
@jjvincent
James Vincent
5 months
It’s interesting how quickly the basic AI generated aesthetic has become dated
56
117
2K
4
16
56
@iamtrask
Andrew Trask
7 months
Hallucinations are basically when nothing in the training data is similar to the current context... ... but all the training data is voting *anyway*... So it falls back on the most generic template... basic grammar. This to me is the most coherent mental model for LLMs.
1
10
55
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK Now is this all LLMs are doing? It's unclear. The paper I've seen push this the farthest is "Copy is all you need" which gets gpt-2 level performance using only training-data lookups If they had a bigger dataset — would they get to even higher perf?
1
4
52
@iamtrask
Andrew Trask
7 months
Note that solving AI by simply "scaling up" is actually abandoning the aims of machine learning. It's basically saying "sample complexity is good enough — let's just fill the thing with information". Again - could be done with a DB.
4
7
47
@iamtrask
Andrew Trask
5 months
There's a nuance of this take I disagree with. It's not *quite* "more data" -> "more quality". It's actually "more structure" -> "more quality" Ex: I could generate 100 billion petabytes of data about how to convert Celsius to Farenheit But that wouldn't help with driving cars…
@DrJimFan
Jim Fan
5 months
It’s pretty obvious that synthetic data will provide the next trillion high-quality training tokens. I bet most serious LLM groups know this. The key question is how to SUSTAIN the quality and avoid plateauing too soon. The Bitter Lesson by @RichardSSutton continues to guide AI…
146
284
3K
9
2
46
@iamtrask
Andrew Trask
7 months
Current hypothesis: LLMs are a lot like surveys. When they see a context ("The cat and the") they basically conduct a *survey* over every datapoint in a training dataset. It's like asking every datapoint "what do YOU think the next word might be"? And then...
1
3
43
@iamtrask
Andrew Trask
2 years
@chris_j_paxton Note that this is the lurking limitation of most major AI projects. Building a great demo is much easier than building a robust product. Often they can be entirely different ballgames.
2
1
42
@iamtrask
Andrew Trask
2 years
@chris_j_paxton If you drive for 3 hours on a highway, most of that data isn't useful. There's a very long-tail of quite rare data for very complex situations that they're likely still building. (granny-with-shopping-cart kind of data)
2
0
40
@iamtrask
Andrew Trask
2 years
One of the most beautiful articles I've read in a long time. To me there's something about it that really resonates with tech culture. Where we are. Where we've been. Where we should go. Spoiler: it's not actually about coffee.
1
4
39
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK But the real kicker from Owain's result is that it implies that hallucinations come from incredibly sharp missing gaps in training data. Implies LLMs are like interactive books They have the info they have... and nothing else Little-to-no deductive logic Good for safety!
2
3
36
@iamtrask
Andrew Trask
2 years
Peter was one of the most impactful, capable, humble, and kind people I've ever met. He is and will always be one of my role models. Peter's early encouragement in my life led to much of who I am. He had this kind of impact on a lot of people. I can't believe he's gone.
@evacide
Eva
2 years
Former EFFer Peter Eckersley died very suddenly today. If you have ever used Let's Encrypt or Certbot or you enjoy the fact that transport layer encryption on the web is so ubiquitous it's nearly invisible, you have him to thank for it. Raise a glass.
178
3K
11K
2
4
35
@iamtrask
Andrew Trask
7 months
@bryan_johnson @Code_of_Kai Isn't that like saying that people who use an umbrella are giving up on the free will required to endure the rain?
2
0
30
@iamtrask
Andrew Trask
7 months
Second, DeepMind's RETRO model showed that you can get GPT-3 performance with a 25x reduction in parameters size by... ... you guessed it... ...querying an enormous token store. This to me implies that 24/25ths (or 96%) of a transformer's logic is
2
2
29
@iamtrask
Andrew Trask
1 year
@elonmusk good point
1
0
30
@iamtrask
Andrew Trask
7 months
To summarise the problem — LLMs still struggle with recognizing that a particular sequence is a unique "thing" which needs to be considered as an independent semantic concept They do this somewhat. Like "hot dog" vs "hot" and "dog". But not well enough for SOTA char LLMs.
1
2
29
@iamtrask
Andrew Trask
7 months
And the result of this survey across datapoints translates into a probability over what word might be next. So sometimes LLMs copy from data. Sometimes they're an average of many locations. Sometimes they hallucinate.
1
3
25
@iamtrask
Andrew Trask
2 years
Bill Hooper. Inspiring CS 101 course and inspiring AI course. Helped me run my first line of code and helped me train my first neural network. Also let me bug him endlessly in his office hours.
2
2
26
@iamtrask
Andrew Trask
7 months
In machine learning literature, this is called improving "sample complexity" — the number of datapoints needed to achieve a certain degree of accuracy. And you could argue that improving sample complexity is the point of ML research.
1
0
25
@iamtrask
Andrew Trask
7 months
This is related to the "symbolic AI" debate, in that solving hierarchy is related to the binding problem. For example, LLMs need to be able to identify that "hot dog" is its own "symbol". They do this ok. But they still struggle with this in a few ways. So tokenizers persist.
1
1
24
@iamtrask
Andrew Trask
7 months
Something I wonder about a lot — why don't 747s flap their wings and eat bugs?
1
2
23
@iamtrask
Andrew Trask
7 months
literally doing a data comparison. Because if you remove that 96% of the transformer (train a model 1/25th the size)... ... you can replace that 96% of the model with a dataset comparison.. ... and it works just as well.
1
1
22
@iamtrask
Andrew Trask
7 months
And this is also where we see the difference between machine/deep learning and AI. Machine and deep learning is about reducing sample complexity — whereas AI is about imitating human-like intelligence. Related: the difference between aeronautical engineering and human flight
1
4
22
@iamtrask
Andrew Trask
7 months
Obviously this gets papered over with a big enough dataset. But basically every machine learning insufficiency can be papered over if your dataset is big enough. Aka - if ChatGPT was a big enough database of input-output pairs we wouldn't know the difference between it and a LLM
1
1
22
@iamtrask
Andrew Trask
7 months
Third, the result @OwainEvans_UK et al has showed. If your training dataset always has the tokens "George Washington" BEFORE the tokens "first", "US", and "president"... ...and NEVER after .... then NO training datapoint will vote for "George" AFTER seeing "first US president"
3
3
22
@iamtrask
Andrew Trask
10 months
REAL LINK:
0
6
19
@iamtrask
Andrew Trask
4 months
@venturetwins Well a certain percentage of it (the color) is AI generated. It's having to pick colors somewhat at random (as a part of a random distribution of what it might be).
2
0
19
@iamtrask
Andrew Trask
7 months
Another recently documented binding issue is that LLMs struggle to predict things in reverse. I haven't tested this myself, but I know multiple labs who have confirmed that if you train a LLM which only sees one phrase *before* another — it can't reverse them.
2
1
20
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK Fifth, it's an interesting context on why increasing dataset size adds so much to the power of the models. It increases the density of datapoints which can vote on new examples that are just like themselves. (And this also reduces hallucinations - it's harder to find gaps)
1
1
18
@iamtrask
Andrew Trask
7 months
P.S. better explanation. LLMs can do deductive logic *in the context window* because they index into data that's doing deductive logic. Training data: "I am a dog. Dogs have fur. Thus I have" Prediction: "I am a cat. Cats have eyes. Thus I have" This kind of thing. :)
2
1
19
@iamtrask
Andrew Trask
7 months
So if a LLM only sees "The president of the USA is Barak Obama" and sentences where "Obama" comes later in the phrase than "presdient" and "USA"... ... if you ask it "Where is Barak Obama President?" it won't be able to tell you.
4
0
18
@iamtrask
Andrew Trask
7 months
But there's still a strong case to be made for symbolic AI here — or at least for solving the hierarchical structure problem. It means you can have more intelligence with less data — with less training — etc.
2
0
18
@iamtrask
Andrew Trask
7 months
This is the 1st rigorous treatment (and 3rd verification) I've seen. IMO - this is great for AI safety! It means that LLMs are doing *exactly* what they're trained to do — estimate next-word probability based on data So where is the AI logic? 1/🧵
@OwainEvans_UK
Owain Evans
7 months
Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!
Tweet media one
176
712
4K
0
3
18
@iamtrask
Andrew Trask
1 year
@thegautamkamath Agreed - the only way to ensure you're not a leader is by following whatever happens to be hot at the time. Geoffrey Hinton is famous for not pivoting to Bayesian stuff when it was big (and it was very big... while NNs truly didn't work).
3
0
17
@iamtrask
Andrew Trask
7 months
To really start to see how it works — consider the power of word embeddings. Word embeddings allow co-occurrence statistics to allow words like "dog" and "cat" to generally be more similar to each other than "dog" and "headphone". Something like this weights the survey
1
0
16
@iamtrask
Andrew Trask
7 months
A key-value database has infinite sample complexity but could theoretically describe any problem if you had enough data. The goal of machine/deep learning is to reduce that sample complexity down to.... a low number. It's hard to say "0" because it opens up a few framing debates
1
0
15
@iamtrask
Andrew Trask
7 months
Note that nothing about this thread means that LLMs can't have dangerous or problematic capabilities. It just posits a hypothesis on how those would be encoded in the model — which (if true) is a useful framing to think about what to do about it.
2
0
15
@iamtrask
Andrew Trask
7 months
There are several reasons why I think this best describes the logic of how LLMs learn. First, it's in line with the intuition of "attention is all you need". Yes, I know that transformers attend to weights (not data), but those weights are learning parts of the data.
1
1
15
@iamtrask
Andrew Trask
7 months
...weighting the replies based on similarity to the input context. So input contexts which are really similar (such as an exact string match in the data) — end up getting a really high weighting But input contexts which are really low (i.e. "Pizza tastes great") get low weight
1
0
14
@iamtrask
Andrew Trask
2 years
@jack @Johnnyfriel2 For those interested: Proving you’re a real, unique human != linking and revealing your real identity which may be what Jack is referring to (trust-over-ip/zero knowledge proof kind of thing)
1
2
12
@iamtrask
Andrew Trask
7 months
When they copy from data — it just means that by far the heaviest weightings in the training data were exact matches — and so these dominated the learning signal. When they average from many locations — we can get abstract templates that weren't in the data. Think like...
1
2
14
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK But what about the logic? How are LLMs logical if this is all they're doing? Because "step by step" type logic is actually embedded in data. There are tons of datasets out there that do logic... that give step-by-step instructions. And so an LLM — when you ask it to give you..
1
2
14
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK Then you'd have no idea what comes next... you've never seen this sequence before. So LLMs are really clever and allow you to go, "ok forget the exact sequence..., just find the most similar phrases in general even if they're not exactly the same and let those vote on P(word)."
1
2
14
@iamtrask
Andrew Trask
2 years
Also helped me write my first research paper, and get it accepted in the first undergrad research conference. He's a legend in my book.
2
0
13
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK This means your model can ALWAYS make a prediction. It also means the model can use far more training datapoints to help it make a prediction. This also allows for the abstract "write me a poem" stuff we see — where we get an original poem.
1
0
13
@iamtrask
Andrew Trask
8 months
@gudmvatn I haven't shared my thoughts on a solution yet - but PageRank is a part!
1
0
13
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK You're basically seeing hundreds/thousands of poems vote on what the next word might be based on how a current context is similar to their own internal contexts. Thus... it can generate novel poems.
1
0
13
@iamtrask
Andrew Trask
8 months
@tom_hartvigsen There's a certain "eat from the fruit of the tree" dilemma here which is interesting. I'm not sure that we see this in books though. Is a civil-liberties book better when it's got a few overtly racist chapters in it? Is a science textbook better when it's got some pseudoscience…
3
0
13
@iamtrask
Andrew Trask
2 years
And their aptitude for saying "go for it" to all sorts of students with all sorts of crazy ideas (it's an art-focused school, lots of outside the box folks). It's really a wonderful place. Would do it again. Fwiw I got into NYU and Belmont and went to Belmont. Would do again.
2
0
12
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK ... step by step instructions... indexes into thousands of different step by step-like contexts in its training data and lets them vote on what your next step should be. And this allows models to be logical. It can be logical by weighted-averaging across many logical datapoints
1
2
12
@iamtrask
Andrew Trask
1 year
@StefanFSchubert I find it utterly fascinating that the "moving west" movement is still in progress 200+ years after 1776. Living history.
2
0
11
@iamtrask
Andrew Trask
1 year
@janleike Like we could train a sentiment classifier on one language and it worked in the other - even though we had no alignment information between words.
2
0
12
@iamtrask
Andrew Trask
8 months
@DavidFSWD I'm under the impression that evolve instruct is a fine-tuning technique, not an original-training-data filtering technique. But your point is well taken that there are some approaches leaning in this direction. Curriculum learning would be closer, as would DP, and distillation.
0
0
11
@iamtrask
Andrew Trask
1 year
@janleike Yes - with a 15% loss in accuracy if my memory serves but still non-trivially accurate.
2
0
10
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK Fourth, because this is what language models are supposed to do. Language model take a sequence of words and try to predict the next word. Historically, this was done by *literally* counting words and word sequences to establish — based on the data — the P(next word).
1
1
10
@iamtrask
Andrew Trask
2 years
@ylecun out of curiosity - does FAIR have a mission statement?
2
0
10
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK Why? Because the similarity search for phrases similar to "Who was the first US president?" will return many contexts (in the model)... ... and none of those contexts will point to "George". Thus... we observe the behavior Owain et al. (and others I've spoken with) observed.
1
0
10
@iamtrask
Andrew Trask
7 months
So when you write "The boy cleaned his"... and ask ChatGPT to complete it... The LLM might never have seen that exact phrase in the training data ... but it might have seen "The girl cleaned her plate" And so that weighting for "plate" gets a high score.
1
0
9
@iamtrask
Andrew Trask
18 days
@ylecun Surely there are many important loss functions for qualitative concepts for which we do not have a robust/non-reductive quantitative measure? (e.g., "the joy of all living creatures")
1
0
9
@iamtrask
Andrew Trask
2 years
0
1
9
@iamtrask
Andrew Trask
7 months
@OwainEvans_UK The problem was that your training data would have missing contexts. Like before... your training data might have "The girl cleaned her" but now you're looking at a context that says "The boy cleaned his" If you're just counting words to get probabilities...
1
0
9
@iamtrask
Andrew Trask
1 year
@janleike Still was rejected at ICLR though. lol
1
0
9
@iamtrask
Andrew Trask
2 years
@sama sounds like a seed round
0
0
9
@iamtrask
Andrew Trask
2 years
@chris_j_paxton Such is the long-tail of complex situations with limited training data.
1
0
9
@iamtrask
Andrew Trask
2 years
@chris_j_paxton Example: what was the distance between an AI that played Go and one that actually beat a world champion. And if you remember in the movie there were all sorts of discovered areas where the model would suddenly do something silly because there was a scenario that confused it
2
0
9
@iamtrask
Andrew Trask
2 years
@MUTEMATH Never stop making music
0
0
0
@iamtrask
Andrew Trask
7 months
@bryan_johnson Is "achieve what they care about" giving up or attaining free will?
0
0
8
@iamtrask
Andrew Trask
7 months
... a poem with a certain phase structure. You can have a really long poem — but if it always has some pattern of words — when the similarity score (even for long documents) can be unusually high and bias the "survey" towards predicting words that are similar to that template
1
1
8
@iamtrask
Andrew Trask
8 months
@tim_tyler Indeed - but baked into that is a high standard for what is allowed to be considered "evidence". For LLMs it's literally any datapoint from any person at any time. For other matters in life, the standard is considerably higher.
2
0
8
@iamtrask
Andrew Trask
8 months
@chrisalbon @ewanmakepeace I think the story here is one of efficiency. There are a finite set of resources available to an employee — and the employer wants to pay for some of them. If the employee uses their time more efficiently — there is more resource available to both employee and employer.
1
0
7
@iamtrask
Andrew Trask
2 years
Here's a video of Peter living his best life:
1
1
7