Invented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.
Thanks
@elonmusk
for your generous hyperbole!
Admittedly, however, I didn’t invent sliced bread, just
#GenerativeAI
and things like that:
And of course my team is standing on the shoulders of giants:
Original tweet by
@elonmusk
:…
The GOAT of tennis
@DjokerNole
said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!
LeCun's "5 best ideas 2012-22” are mostly from my lab, and older: 1 Self-supervised 1991 RNN stack; 2 ResNet = open-gated 2015 Highway Net; 3&4 Key/Value-based fast weights 1991; 5 Transformers with linearized self-attention 1991. (Also GAN 1990.) Details:
Quarter-century anniversary: 25 years ago we received a message from N(eur)IPS 1995 informing us that our submission on LSTM got rejected. (Don’t worry about rejections. They mean little.)
#NeurIPS2020
In 2020, we will celebrate that many of the basic ideas behind the Deep Learning Revolution were published three decades ago within fewer than 12 months in our "Annus Mirabilis" 1990-1991:
Q*? 2015: reinforcement learning prompt engineer in Sec. 5.3 of “Learning to Think...” . A controller neural network C learns to send prompt sequences into a world model M (e.g., a foundation model) trained on, say, videos of actors. C also learns to…
Meta used my 1991 ideas to train LLaMA 2, but made it insinuate that I “have been involved in harmful activities” and have not made “positive contributions to society, such as pioneers in their field.”
@Meta
& LLaMA promoter
@ylecun
should correct this ASAP. See…
Regarding recent work on more biologically plausible "forward-only" backprop-like methods: in 2021, our VSML net already meta-learned backprop-like learning algorithms running solely in forward-mode - no hardwired derivative calculation!
So
@ylecun
: "I've been advocating for deep learning architecture capable of planning since 2016" vs me: "I've been publishing deep learning architectures capable of planning since 1990." I guess in 2016
@ylecun
also picked up the torch. (References attached)…
Train a weight matrix to encode the backpropagation learning algorithm itself. Run it on the neural net itself. Meta-learn to improve it! Generalizes to datasets outside of the meta-training distribution. v4 2022 with
@LouisKirschAI
Silly AI regulation hype
One cannot regulate AI research, just like one cannot regulate math.
One can regulate applications of AI in finance, cars, healthcare. Such fields already have continually adapting regulatory frameworks in place.
Don’t stifle the open-source movement!…
25th anniversary of the LSTM at
#NeurIPS2021
. reVIeWeR 2 - who rejected it from NeurIPS1995 - was thankfully MIA. The subsequent journal publication in Neural Computation has become the most cited neural network paper of the 20th century:
30 years ago: Transformers with linearized self-attention in NECO 1992, equivalent to fast weight programmers (apart from normalization), separating storage and control. Key/value was called FROM/TO. The attention terminology was introduced at ICANN 1993
Lecun (
@ylecun
)’s 2022 paper on Autonomous Machine Intelligence rehashes but doesn’t cite essential work of 1990-2015. We’ve already published his “main original contributions:” learning subgoals, predictable abstract representations, multiple time scales…
Machine learning is the science of credit assignment. My new survey (also under arXiv:2212.11279) credits the pioneers of deep learning and modern AI (supplementing my award-winning 2015 deep learning survey): P.S. Happy Holidays!
Yesterday
@nnaisense
released EvoTorch (), a state-of-the-art evolutionary algorithm library built on
@PyTorch
, with GPU-acceleration and easy training on huge compute clusters using
@raydistributed
. (1/2)
Best paper award for "Mindstorms in Natural Language-Based Societies of Mind" at
#NeurIPS2023
WS Ro-FoMo. Up to 129 foundation models collectively solve practical problems by interviewing each other in monarchical or democratic societies
Unlike diffusion models, Bayesian Flow Networks operate on the parameters of data distributions, rather than on noisy versions of the data itself. I think this paper by Alex Graves et al. will be influential.
📣 BFNs: A new class of generative models that
- brings together the strengths of Bayesian inference and deep learning
- trains on continuous, discretized or discrete data with simple end-to-end loss
- places no restrictions on the network architecture
AI boom v AI doom: since the 1970s, I have told AI doomers that in the end all will be good. E.g., 2012 TEDx talk: : “Don’t think of us versus them: us, the humans, v these future super robots. Think of yourself, and humanity in general, as a small stepping…
As 2022 ends: 1/2 century ago, Shun-Ichi Amari published a learning recurrent neural network (1972) much later called the Hopfield network (based on the original, century-old, non-learning Lenz-Ising recurrent network architecture, 1920-25)
Kunihiko Fukushima was awarded the 2021 Bower Award for his enormous contributions to deep learning, particularly his highly influential convolutional neural network architecture. My laudation of Kunihiko at the 2021 award ceremony is on YouTube:
The most cited neural nets all build on our work: LSTM. ResNet (open-gated Highway Net). AlexNet & VGG (like our DanNet). GAN (an instance of our Artificial Curiosity). Linear Transformers (like our Fast Weight Programmers).
Now on YouTube: “Modern Artificial Intelligence 1980s-2021 and Beyond.” My talk at AIJ 2020 (Moscow), also presented at NVIDIA GTC 2021 (US), ML Summit 2021 (Beijing), Big Data and AI (Toronto), IFIC (China), AI Boost (Lithuania), ICONIP 2021 (Jakarta)
In 2010, we used Jensen Huang's
@nvidia
GPUs to show that deep feedforward nets can be trained by plain backprop without any unsupervised pretraining. In 2011, our DanNet was the first superhuman CNN. Today, compute is 100+ times cheaper, and NVIDIA 100+ times more valuable.…
Stop crediting the wrong people for inventions made by others. At least in science, the facts will always win in the end. As long as the facts have not yet won, it is not yet the end. No fancy award can ever change that.
#selfcorrectingscience
#plagiarism
26 March 1991: Neural nets learn to program neural nets with fast weights - like today’s Transformer variants. Deep learning through additive weight changes. 2021: New work with Imanol & Kazuki. Also: fast weights for metalearning (1992-) and RL (2005-)
In 2016, at an AI conference in NYC, I explained artificial consciousness, world models, predictive coding, and science as data compression in less than 10 minutes. I happened to be in town, walked in without being announced, and ended up on their panel. It was great fun.…
1/3: “On the binding problem in artificial neural networks” with Klaus Greff and
@vansteenkiste_s
. An important paper from my lab that is of great relevance to the ongoing debate on symbolic reasoning and compositional generalization in neural networks:
375th birthday of Leibniz, founder of computer science (just published in FAZ, 17/5/2021): 1st machine with a memory (1673); 1st to perform all arithmetic operations. Principles of binary computers (1679). Algebra of Thought (1686). Calculemus!
Our
#GPTSwarm
models Large Language Model Agents and swarms thereof as computational graphs reflecting the hierarchical nature of intelligence. Graph optimization automatically improves nodes and edges.
I was invited to write a piece about Alan M. Turing. While he made significant contributions to computer science, their importance and impact is often greatly exaggerated - at the expense of the field's pioneers. It's not Turing's fault, though.
2010 foundations of recent $NVDA stock market frenzy: our simple but deep neural net on
@nvidia
GPUs broke MNIST . Things are changing fast. Just 7 months ago, I tweeted: compute is 100x cheaper, $NVDA 100x more valuable. Today, replace "100" by "250."…
Instead of trying to defend his paper on OpenReview (where he posted it),
@ylecun
made misleading statements about me in popular science venues. I am debunking his recent allegations in the new Addendum III of my critique
2023: 20th anniversary of the Gödel Machine, a mathematically optimal, self-referential, meta-learning, universal problem solver making provably optimal self-improvements by rewriting its own computer code
GANs are special cases of Artificial Curiosity (1990) and also closely related to Predictability Minimization (1991). Now published in Neural Networks 127:58-66, 2020.
#selfcorrectingscience
#plagiarism
Open Access:
Preprint:
Re: more biologically plausible "forward-only” deep learning. 1/3 of a century ago, my "neural economy” was local in space and time (backprop isn't). Competing neurons pay "weight substance” to neurons that activate them (Neural Bucket Brigade, 1989)
30 years ago in a journal: "distilling" a recurrent neural network (RNN) into another RNN. I called it “collapsing” in Neural Computation 4(2):234-242 (1992), Sec. 4. Greatly facilitated deep learning with 20+ virtual layers. The concept has become popular
With Kazuki Irie and
@robert_csordas
at
#ICML2022
: any linear layer trained by gradient descent is a key-value/attention memory storing its entire training experience. This dual form helps us visualize how neural nets use training patterns at test time
KAUST (17 full papers at
#NeurIPS2021
) and its environment are now offering huge resources to advance both fundamental and applied AI research. We are hiring outstanding professors, postdocs, and PhD students:
KAUST, the university with the highest impact per faculty, has 24 papers
#NeurIPS2022
. Visit Booth
#415
of the
@AI_KAUST
Initiative! We are hiring on all levels.
1/3 century anniversary of thesis on
#metalearning
(1987). For its cover I drew a robot that bootstraps itself. 1992-: gradient descent-based neural metalearning. 1994-: meta-RL with self-modifying policies. 2003-: optimal Gödel Machine. 2020: new stuff!
We address the two important things in science: (A) Finding answers to given questions, and (B) Coming up with good questions. Learning one abstract bit at a time through self-invented (thought) experiments encoded as neural networks
2021: Directing AI Initiative at
#KAUST
, university with highest impact per faculty. Keeping current affiliations. Hiring on all levels. Great research conditions. Photographed dolphin on a snorkeling trip off the coast of KAUST
In 2001, I discovered how to make very stable rings from only rectangular LEGO bricks. Natural tilting angles between LEGO pieces define ring diameters. The resulting low-complexity artworks reflect the formal theory of beauty/creativity/curiosity:
90th anniversary of Kurt Gödel's 1931 paper which laid the foundations of theoretical computer science, identifying fundamental limitations of algorithmic theorem proving, computing, AI, logics, and math itself (just published in FAZ
@faznet
16/6/2021)
ACM lauds the awardees for work that did not cite the origins of the used methods. I correct ACM's distortions of deep learning history and mention 8 of our direct priority disputes with Bengio & Hinton.
#selfcorrectingscience
10-year anniversary: Deep Reinforcement Learning with Policy Gradients for LSTM. Applications:
@DeepMind
’s Starcraft player;
@OpenAI
's dextrous robot hand & Dota player -
@BillGates
called this a huge milestone in advancing AI
#deeplearning
10-year anniversary of our deep multilayer perceptrons trained by plain gradient descent on GPU, outperforming all previous methods on a famous benchmark. This deep learning revolution quickly spread from Europe to North America and Asia.
#deeplearning
2021: 10-year anniversary of deep CNN revolution through DanNet (2011), named after my outstanding postdoc Dan Ciresan. Won 4 computer vision contests in a row before other CNNs joined the party. 1st superhuman result in 2011. Now everybody is using this
At ICANN 1993, I extended my 1991 unnormalised linear Transformer, introduced attention terminology for it, & published the "self-referential weight matrix." 3 decades later, they made me Chair of ICANN 2024 in Lugano. Call for papers (deadline March 25): …