Jürgen Schmidhuber Profile Banner
Jürgen Schmidhuber Profile
Jürgen Schmidhuber

@SchmidhuberAI

106,529
Followers
0
Following
35
Media
64
Statuses

Invented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.

Switzerland, KSA
Joined August 2019
Don't wanna be here? Send us removal request.
@SchmidhuberAI
Jürgen Schmidhuber
5 months
Thanks  @elonmusk  for your generous hyperbole!  Admittedly, however, I didn’t invent sliced bread, just  #GenerativeAI  and things like that:  And of course my team is standing on the shoulders of giants:  Original tweet by  @elonmusk :…
Tweet media one
118
475
5K
@SchmidhuberAI
Jürgen Schmidhuber
4 months
The GOAT of tennis @DjokerNole said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!
Tweet media one
168
147
2K
@SchmidhuberAI
Jürgen Schmidhuber
1 year
LeCun's "5 best ideas 2012-22” are mostly from my lab, and older: 1 Self-supervised 1991 RNN stack; 2 ResNet = open-gated 2015 Highway Net; 3&4 Key/Value-based fast weights 1991; 5 Transformers with linearized self-attention 1991. (Also GAN 1990.) Details:
Tweet media one
32
205
2K
@SchmidhuberAI
Jürgen Schmidhuber
4 years
Quarter-century anniversary: 25 years ago we received a message from N(eur)IPS 1995 informing us that our submission on LSTM got rejected. (Don’t worry about rejections. They mean little.) #NeurIPS2020
Tweet media one
6
294
2K
@SchmidhuberAI
Jürgen Schmidhuber
5 years
In 2020, we will celebrate that many of the basic ideas behind the Deep Learning Revolution were published three decades ago within fewer than 12 months in our "Annus Mirabilis" 1990-1991:
23
392
2K
@SchmidhuberAI
Jürgen Schmidhuber
5 months
Q*? 2015: reinforcement learning prompt engineer in Sec. 5.3 of “Learning to Think...” . A controller neural network C learns to send prompt sequences into a world model M (e.g., a foundation model) trained on, say, videos of actors. C also learns to…
Tweet media one
47
250
2K
@SchmidhuberAI
Jürgen Schmidhuber
9 months
Meta used my 1991 ideas to train LLaMA 2, but made it insinuate that I “have been involved in harmful activities” and have not made “positive contributions to society, such as pioneers in their field.” @Meta & LLaMA promoter @ylecun should correct this ASAP. See…
Tweet media one
61
169
1K
@SchmidhuberAI
Jürgen Schmidhuber
1 year
Regarding recent work on more biologically plausible "forward-only" backprop-like methods: in 2021, our VSML net already meta-learned backprop-like learning algorithms running solely in forward-mode - no hardwired derivative calculation!
Tweet media one
18
177
1K
@SchmidhuberAI
Jürgen Schmidhuber
5 months
So @ylecun : "I've been advocating for deep learning architecture capable of planning since 2016" vs me: "I've been publishing deep learning architectures capable of planning since 1990." I guess in 2016 @ylecun also picked up the torch. (References attached)…
Tweet media one
62
141
1K
@SchmidhuberAI
Jürgen Schmidhuber
2 years
Train a weight matrix to encode the backpropagation learning algorithm itself. Run it on the neural net itself. Meta-learn to improve it! Generalizes to datasets outside of the meta-training distribution. v4 2022 with @LouisKirschAI
Tweet media one
13
214
1K
@SchmidhuberAI
Jürgen Schmidhuber
6 months
Silly AI regulation hype One cannot regulate AI research, just like one cannot regulate math. One can regulate applications of AI in finance, cars, healthcare. Such fields already have continually adapting regulatory frameworks in place. Don’t stifle the open-source movement!…
Tweet media one
52
218
1K
@SchmidhuberAI
Jürgen Schmidhuber
2 years
25th anniversary of the LSTM at #NeurIPS2021 . reVIeWeR 2 - who rejected it from NeurIPS1995 - was thankfully MIA. The subsequent journal publication in Neural Computation has become the most cited neural network paper of the 20th century:
Tweet media one
14
167
1K
@SchmidhuberAI
Jürgen Schmidhuber
2 years
30 years ago: Transformers with linearized self-attention in NECO 1992, equivalent to fast weight programmers (apart from normalization), separating storage and control. Key/value was called FROM/TO. The attention terminology was introduced at ICANN 1993
Tweet media one
26
140
1K
@SchmidhuberAI
Jürgen Schmidhuber
2 years
Lecun ( @ylecun )’s 2022 paper on Autonomous Machine Intelligence rehashes but doesn’t cite essential work of 1990-2015. We’ve already published his “main original contributions:” learning subgoals, predictable abstract representations, multiple time scales…
33
195
1K
@SchmidhuberAI
Jürgen Schmidhuber
1 year
Machine learning is the science of credit assignment. My new survey (also under arXiv:2212.11279) credits the pioneers of deep learning and modern AI (supplementing my award-winning 2015 deep learning survey): P.S. Happy Holidays!
Tweet media one
27
257
1K
@SchmidhuberAI
Jürgen Schmidhuber
2 years
Yesterday @nnaisense released EvoTorch (), a state-of-the-art evolutionary algorithm library built on @PyTorch , with GPU-acceleration and easy training on huge compute clusters using @raydistributed . (1/2)
10
209
1K
@SchmidhuberAI
Jürgen Schmidhuber
5 months
How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. More than a dozen concrete AI priority disputes under
Tweet media one
48
132
960
@SchmidhuberAI
Jürgen Schmidhuber
4 months
Best paper award for "Mindstorms in Natural Language-Based Societies of Mind" at #NeurIPS2023 WS Ro-FoMo. Up to 129 foundation models collectively solve practical problems by interviewing each other in monarchical or democratic societies
Tweet media one
25
128
917
@SchmidhuberAI
Jürgen Schmidhuber
8 months
Unlike diffusion models, Bayesian Flow Networks operate on the parameters of data distributions, rather than on noisy versions of the data itself. I think this paper by Alex Graves et al. will be influential.
@nnaisense
NNAISENSE
9 months
📣 BFNs: A new class of generative models that - brings together the strengths of Bayesian inference and deep learning - trains on continuous, discretized or discrete data with simple end-to-end loss - places no restrictions on the network architecture
6
117
512
11
144
894
@SchmidhuberAI
Jürgen Schmidhuber
5 months
AI boom v AI doom: since the 1970s, I have told AI doomers that in the end all will be good. E.g., 2012 TEDx talk: : “Don’t think of us versus them: us, the humans, v these future super robots. Think of yourself, and humanity in general, as a small stepping…
Tweet media one
62
126
795
@SchmidhuberAI
Jürgen Schmidhuber
4 years
Congrats to the awesome Sepp Hochreiter for the well-deserved 2021 IEEE Neural Networks Pioneer Award! It was my great honor to be Sepp's nominator.
Tweet media one
10
67
756
@SchmidhuberAI
Jürgen Schmidhuber
1 year
As 2022 ends: 1/2 century ago, Shun-Ichi Amari published a learning recurrent neural network (1972) much later called the Hopfield network (based on the original, century-old, non-learning Lenz-Ising recurrent network architecture, 1920-25)
Tweet media one
7
140
748
@SchmidhuberAI
Jürgen Schmidhuber
3 years
Kunihiko Fukushima was awarded the 2021 Bower Award for his enormous contributions to deep learning, particularly his highly influential convolutional neural network architecture. My laudation of Kunihiko at the 2021 award ceremony is on YouTube:
Tweet media one
6
135
679
@SchmidhuberAI
Jürgen Schmidhuber
3 years
The most cited neural nets all build on our work: LSTM. ResNet (open-gated Highway Net). AlexNet & VGG (like our DanNet). GAN (an instance of our Artificial Curiosity). Linear Transformers (like our Fast Weight Programmers).
31
86
671
@SchmidhuberAI
Jürgen Schmidhuber
2 years
Now on YouTube: “Modern Artificial Intelligence 1980s-2021 and Beyond.” My talk at AIJ 2020 (Moscow), also presented at NVIDIA GTC 2021 (US), ML Summit 2021 (Beijing), Big Data and AI (Toronto), IFIC (China), AI Boost (Lithuania), ICONIP 2021 (Jakarta)
Tweet media one
11
109
640
@SchmidhuberAI
Jürgen Schmidhuber
9 months
In 2010, we used Jensen Huang's @nvidia GPUs to show that deep feedforward nets can be trained by plain backprop without any unsupervised pretraining. In 2011, our DanNet was the first superhuman CNN. Today, compute is 100+ times cheaper, and NVIDIA 100+ times more valuable.…
Tweet media one
6
54
618
@SchmidhuberAI
Jürgen Schmidhuber
4 years
Stop crediting the wrong people for inventions made by others. At least in science, the facts will always win in the end. As long as the facts have not yet won, it is not yet the end. No fancy award can ever change that. #selfcorrectingscience #plagiarism
20
159
606
@SchmidhuberAI
Jürgen Schmidhuber
3 years
26 March 1991: Neural nets learn to program neural nets with fast weights - like today’s Transformer variants. Deep learning through additive weight changes. 2021: New work with Imanol & Kazuki. Also: fast weights for metalearning (1992-) and RL (2005-)
5
113
598
@SchmidhuberAI
Jürgen Schmidhuber
2 months
In 2016, at an AI conference in NYC, I explained artificial consciousness, world models, predictive coding, and science as data compression in less than 10 minutes. I happened to be in town, walked in without being announced, and ended up on their panel. It was great fun.…
37
92
605
@SchmidhuberAI
Jürgen Schmidhuber
4 years
The 2010s: Our Decade of Deep Learning / Outlook on the 2020s (also addressing privacy and data markets)
0
199
578
@SchmidhuberAI
Jürgen Schmidhuber
2 years
1/3: “On the binding problem in artificial neural networks” with Klaus Greff and @vansteenkiste_s . An important paper from my lab that is of great relevance to the ongoing debate on symbolic reasoning and compositional generalization in neural networks:
Tweet media one
5
105
521
@SchmidhuberAI
Jürgen Schmidhuber
3 years
375th birthday of Leibniz, founder of computer science (just published in FAZ, 17/5/2021): 1st machine with a memory (1673); 1st to perform all arithmetic operations. Principles of binary computers (1679). Algebra of Thought (1686). Calculemus!
7
91
507
@SchmidhuberAI
Jürgen Schmidhuber
2 months
Our #GPTSwarm models Large Language Model Agents and swarms thereof as computational graphs reflecting the hierarchical nature of intelligence. Graph optimization automatically improves nodes and edges.
Tweet media one
14
90
506
@SchmidhuberAI
Jürgen Schmidhuber
3 years
I was invited to write a piece about Alan M. Turing. While he made significant contributions to computer science, their importance and impact is often greatly exaggerated - at the expense of the field's pioneers. It's not Turing's fault, though.
34
111
493
@SchmidhuberAI
Jürgen Schmidhuber
2 months
2010 foundations of recent $NVDA stock market frenzy: our simple but deep neural net on @nvidia GPUs broke MNIST . Things are changing fast. Just 7 months ago, I tweeted: compute is 100x cheaper, $NVDA 100x more valuable. Today, replace "100" by "250."…
Tweet media one
15
52
500
@SchmidhuberAI
Jürgen Schmidhuber
1 year
Instead of trying to defend his paper on OpenReview (where he posted it), @ylecun made misleading statements about me in popular science venues. I am debunking his recent allegations in the new Addendum III of my critique
Tweet media one
16
63
477
@SchmidhuberAI
Jürgen Schmidhuber
6 months
2023: 20th anniversary of the Gödel Machine, a mathematically optimal, self-referential, meta-learning, universal problem solver making provably optimal self-improvements by rewriting its own computer code
Tweet media one
12
90
459
@SchmidhuberAI
Jürgen Schmidhuber
4 years
GANs are special cases of Artificial Curiosity (1990) and also closely related to Predictability Minimization (1991). Now published in Neural Networks 127:58-66, 2020. #selfcorrectingscience #plagiarism Open Access: Preprint:
Tweet media one
10
82
459
@SchmidhuberAI
Jürgen Schmidhuber
1 year
Re: more biologically plausible "forward-only” deep learning. 1/3 of a century ago, my "neural economy” was local in space and time (backprop isn't). Competing neurons pay "weight substance” to neurons that activate them (Neural Bucket Brigade, 1989)
Tweet media one
9
59
431
@SchmidhuberAI
Jürgen Schmidhuber
1 year
30 years ago in a journal: "distilling" a recurrent neural network (RNN) into another RNN. I called it “collapsing” in Neural Computation 4(2):234-242 (1992), Sec. 4. Greatly facilitated deep learning with 20+ virtual layers. The concept has become popular
Tweet media one
10
63
427
@SchmidhuberAI
Jürgen Schmidhuber
2 years
With Kazuki Irie and @robert_csordas at #ICML2022 : any linear layer trained by gradient descent is a key-value/attention memory storing its entire training experience. This dual form helps us visualize how neural nets use training patterns at test time
Tweet media one
5
81
395
@SchmidhuberAI
Jürgen Schmidhuber
2 years
KAUST (17 full papers at #NeurIPS2021 ) and its environment are now offering huge resources to advance both fundamental and applied AI research. We are hiring outstanding professors, postdocs, and PhD students:
Tweet media one
6
85
384
@SchmidhuberAI
Jürgen Schmidhuber
1 year
KAUST, the university with the highest impact per faculty, has 24 papers #NeurIPS2022 . Visit Booth #415 of the @AI_KAUST Initiative! We are hiring on all levels.
Tweet media one
11
35
381
@SchmidhuberAI
Jürgen Schmidhuber
3 years
1/3 century anniversary of thesis on #metalearning (1987). For its cover I drew a robot that bootstraps itself. 1992-: gradient descent-based neural metalearning. 1994-: meta-RL with self-modifying policies. 2003-: optimal Gödel Machine. 2020: new stuff!
2
63
371
@SchmidhuberAI
Jürgen Schmidhuber
2 months
Numbers are lining up: one tweet per year of my life; one follower per LSTM citation 👍🙂
17
10
365
@SchmidhuberAI
Jürgen Schmidhuber
1 year
We address the two important things in science: (A) Finding answers to given questions, and (B) Coming up with good questions. Learning one abstract bit at a time through self-invented (thought) experiments encoded as neural networks
Tweet media one
8
68
361
@SchmidhuberAI
Jürgen Schmidhuber
3 years
30-year anniversary of #Planning & #ReinforcementLearning with recurrent #WorldModels and #ArtificialCuriosity (1990). Also: high-dimensional reward signals, deterministic policy gradients, #GAN principle, and even simple #Consciousness & #SelfAwareness
2
65
354
@SchmidhuberAI
Jürgen Schmidhuber
3 years
2021: Directing AI Initiative at #KAUST , university with highest impact per faculty. Keeping current affiliations. Hiring on all levels. Great research conditions. Photographed dolphin on a snorkeling trip off the coast of KAUST
12
51
355
@SchmidhuberAI
Jürgen Schmidhuber
3 years
In 2001, I discovered how to make very stable rings from only rectangular LEGO bricks. Natural tilting angles between LEGO pieces define ring diameters. The resulting low-complexity artworks reflect the formal theory of beauty/creativity/curiosity:
6
39
357
@SchmidhuberAI
Jürgen Schmidhuber
3 years
90th anniversary of Kurt Gödel's 1931 paper which laid the foundations of theoretical computer science, identifying fundamental limitations of algorithmic theorem proving, computing, AI, logics, and math itself (just published in FAZ @faznet 16/6/2021)
3
70
343
@SchmidhuberAI
Jürgen Schmidhuber
4 years
ACM lauds the awardees for work that did not cite the origins of the used methods. I correct ACM's distortions of deep learning history and mention 8 of our direct priority disputes with Bengio & Hinton. #selfcorrectingscience
13
66
314
@SchmidhuberAI
Jürgen Schmidhuber
3 years
10-year anniversary: Deep Reinforcement Learning with Policy Gradients for LSTM. Applications: @DeepMind ’s Starcraft player; @OpenAI 's dextrous robot hand & Dota player - @BillGates called this a huge milestone in advancing AI #deeplearning
5
58
314
@SchmidhuberAI
Jürgen Schmidhuber
4 years
10-year anniversary of our deep multilayer perceptrons trained by plain gradient descent on GPU, outperforming all previous methods on a famous benchmark. This deep learning revolution quickly spread from Europe to North America and Asia. #deeplearning
3
76
310
@SchmidhuberAI
Jürgen Schmidhuber
3 years
3 decades of artificial curiosity & creativity. Our artificial scientists not only answer given questions but also invent new questions
3
67
297
@SchmidhuberAI
Jürgen Schmidhuber
3 years
15-year anniversary: first paper with "learn deep" in the title (2005). On deep #ReinforcementLearning & #NeuroEvolution solving problems of depth 1000 and more. 1st author: Faustino Gomez! #deeplearning #deepRL
0
56
289
@SchmidhuberAI
Jürgen Schmidhuber
3 years
2021: 10-year anniversary of deep CNN revolution through DanNet (2011), named after my outstanding postdoc Dan Ciresan. Won 4 computer vision contests in a row before other CNNs joined the party. 1st superhuman result in 2011. Now everybody is using this
0
34
232
@SchmidhuberAI
Jürgen Schmidhuber
1 month
At ICANN 1993, I extended my 1991 unnormalised linear Transformer, introduced attention terminology for it, & published the "self-referential weight matrix." 3 decades later, they made me Chair of ICANN 2024 in Lugano. Call for papers (deadline March 25): …
Tweet media one
12
17
234
@SchmidhuberAI
Jürgen Schmidhuber
2 years
@hardmaru This was accepted at ICML 2022. Thanks to Kazuki Irie, Imanol Schlag, and Róbert Csordás!
3
2
109
@SchmidhuberAI
Jürgen Schmidhuber
1 year
@yannx0130 sure, see the experiments
1
0
2