More excited than ever to announce $1.7M:
"CBS-NTT Program in Physics of Intelligence at Harvard"! 🧠
With new technology comes new science. The time is ripe to build a better future with "Physics of Intelligence for Trustworthy and Green AI"!
🧵👇
Wow!! Emperor Akihito of Japan -who is now 85 years old- has published last scientific paper in ichthyology before abdication. Great respect for pure scientific curiosity.
Q. Can we solve learning dynamics of modern deep learning models trained on large datasets?
A. Yes, by combining symmetry and modified equation analysis!
co-led with
@KuninDaniel
(now on twitter)
&
@jvrsgsty
@SuryaGanguli
@dyamins
Neural Mechanics
1/8
Q. What does Noether’s theorem tell us about the “geometry of deep learning dynamics”?
A. We derive Noether’s Learning Dynamics and show:
”SGD+momentum+BatchNorm+weight decay” = “RMSProp" due to symmetry breaking!
w/
@KuninDaniel
#NeurIPS2021
Paper:
1/
Q. Can we find winning lottery tickets, or sparse trainable deep networks at initialization without ever looking at data?
A. Yes, by conserving "Synaptic Flow" via our new SynFlow algorithm.
co-led with Daniel Kunin
&
@dyamins
,
@SuryaGanguli
paper:
1/
[delayed personal update]
Excited to share that I’ve recently relocated and joined Center for Brain Science at Harvard as an Associate for an industry-academia collaboration!🧠
We’ll continue to work on problems at the interface of physics, neuroscience, and machine learning.
1/
🧠 Internship openings: NTT Research at Harvard! 🤖
Want to solve cutting-edge problems in deep learning by theory-guided algorithm design? Want to apply tools and ideas in ML to understand the brain?
Come join us this summer at the Center for Brain Science at Harvard!
1/
BatchNorm, LayerNorm, WeightNorm…
Too many normalization layers to choose from? Which one to use, when, and how?
Theory can guide you!
led by
@EkdeepL
, who amazingly bridges theory & practice of deep learning!
A multitude of normalization layers have been proposed recently, but are we ready to replace BatchNorm yet? In our new preprint, we address this question by developing a unified understanding of normalization layers in deep learning.
arXiv link:
📰
#NeurIPS2023
Paper Alert!🥳
Q. How far can generative models generalize?
Through the lens of compositionality, we introduce a "Concept Graph" to propose (i) the distance of generalization, and (ii) the "Multiplicative Emergence" hypothesis.
1/n
co-led
Want to train RNNs on the large-scale, observed activity of tens of thousands of neurons in real-time? We're introducing CORNN! 🎉
Led by
@fatihdin4en
, A. Shai with M. Schnitzer from Stanford.
Paper link:
Meet us now at
#NeurIPS2023
: 10:45 AM, Tue, Dec
At
#NeurIPS2023
to share a series of works on "Physics of AI" and "AI for Neuroscience" from NTT Research at Harvard CBS team! 🎉
Find me at the posters and/or DM me to chat & explore collaborations!
Can't thank the Summer 2023 team enough♥️:
@EkdeepL
@fatihdin4en
@MayaOkawa
"Interested in the 'science of modern AI' to make it more trustworthy and efficient?
Seeking fundamental principles of learning and computation in biological and artificial intelligence?
Let's explore them together this summer at Harvard's Center for Brain Science!"
1/
Very excited to be back at Stanford tomorrow to present at the Mind, Brain, Computation and Technology Seminar! 🧠🤖
Please come and join us if you are in the area and interested in physics, neuroscience, and AI.
The best textbook on the theory and phenomenology of neural learning dynamics, and I highly recommend it to anyone fascinated by the subject!
Contributing to the content of this textbook has been a major goal of my research. It has directly inspired a number of past projects,
Two years ago, I taught a topics course on neural net training dynamics. While this isn't about safety/alignment per se, I recommend working through it if you're interested in safety/alignment of LLMs.
1/Our paper
@NeuroCellPress
"Interpreting the retinal code for natural scenes" develops explainable AI (
#XAI
) to derive a SOTA deep network model of the retina and *understand* how this net captures natural scenes plus 8 seminal experiments over >2 decades
In 1992, Emperor Akihito has written an article published in Science titled "Early Cultivators of Science in Japan".
And yes, Emperor has an official publication list (in Japanese) if you are interested.
Overall, our work provides a first step towards understanding the mechanics of learning in neural networks without unrealistic simplifying assumptions
Check out the paper for more details:
8/8
More excited than ever to announce $1.7M:
"CBS-NTT Program in Physics of Intelligence at Harvard"! 🧠
With new technology comes new science. The time is ripe to build a better future with "Physics of Intelligence for Trustworthy and Green AI"!
🧵👇
2
#NeurIPS2021
papers with the amazing first interns at PHI Lab
@NttResearch
!
(i) unified understanding of normalization layers w/
@EkdeepL
, (ii) Noether’s theorem and the geometry of learning dynamics w/
@KuninDaniel
,
both advancing the practice of deep learning through theory
Join us for an exciting summer of research!
Apply here now: “PHI Lab: Research Intern 2023”
If you are interested in working with me, please mention it in the “Accompanying Message” section.
🧠 Internship openings: NTT Research at Harvard! 🤖
Want to solve cutting-edge problems in deep learning by theory-guided algorithm design? Want to apply tools and ideas in ML to understand the brain?
Come join us this summer at the Center for Brain Science at Harvard!
1/
Our monthly Colloquium series kicks off again this Friday (Sept 15)! Check out the exciting lineup we have for Fall 2023 and join us at 2 pm ET on YouTube:
@AIVOInfo
@Hidenori8Tanaka
Every symmetry of a network has a corresponding conserved quantity through training under gradient flow (Noether's theorem for neural networks!)
For translation, scale, and rescale symmetry the flow is constrained to a hyperplane, sphere, and hyperbola respectively
4/8
Overall, our data-agnostic pruning algorithm challenges the existing paradigm that data must be used to quantify which synapses are important.
Please check out the paper for more details
7/
Overall, understanding not only when symmetries exist, but how they are broken is essential to discover geometric design principles in neural networks.
For more details see
“Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks":
10/
Looking forward to attending NeurIPS and seeing everyone next week!
I'd love to chat about any combination of deep learning and physics, dynamical systems, symmetry, neuroscience, mechanistic understanding
+ internship/collaboration opportunities!
Please feel free to DM me ;-)
Q. How can we fix the decision mechanism of a pre-trained model efficiently?
A. Mechanistic fine-tuning by driving the model over a barrier on the landscape!
led by
@EkdeepL
, interning at
@NttResearch
at Harvard
w/
@bigel_o
R. Dick
@DavidSKrueger
Really enjoyed presenting and attending the
@iaifi_news
symposium at MIT today!
It was great to learn about the two-way interaction between physics and AI.
Interested in fundamental research at the interface of Physics and Informatics?
Join us at NTT Physics & Informatics Lab! ()
Applications for the internship program is now open.
Q. Can we solve learning dynamics of modern deep learning models trained on large datasets?
A. Yes, by combining symmetry and modified equation analysis!
co-led with
@KuninDaniel
(now on twitter)
&
@jvrsgsty
@SuryaGanguli
@dyamins
Neural Mechanics
1/8
🧠Final Call for Internships: NTT Research at Harvard!🤖
Interested in the science of modern AI?
Want to explore the principles of natural and artificial intelligence?
Join us for a unique industry internship for academic research!
Recent works:
1/
Is there a theoretical analogy between ring attractor neural networks and Anderson localization in quantum systems?
With David Nelson, we discovered a new class of random matrices whose eigenvectors are quasi-localized even with fully dense connections.
Please join us soon at our poster at
#NeurIPS2023
to discuss compositionally, learning dynamics, and emergence in multimodal diffusion models!
Time: Thu 14, Dec 10:45 am CST - 12:45 pm CST Location: Great Hall & Hall B1+B2 (Level 1)
#2021
📰
#NeurIPS2023
Paper Alert!🥳
Q. How far can generative models generalize?
Through the lens of compositionality, we introduce a "Concept Graph" to propose (i) the distance of generalization, and (ii) the "Multiplicative Emergence" hypothesis.
1/n
co-led
As a result, gradient descent becomes Lagrangian dynamics with a finite learning rate, where the learning rule corresponds to the kinetic energy and the loss function corresponds to the potential energy.
5/
@boazbaraktcs
@RylanSchaeffer
@BrandoHablando
@sanmikoyejo
Really enjoyed this blog post!
Sharing our NeurIPS paper on the "Multiplicative Emergence of Compositional Abilities",
where we designed an "interpretable task" and showed that "abilities that require composition of atomic abilities show emergent curves".😀
paper:
To study complex learning dynamics of neural networks, existing works made major assumptions (i.e. single hidden layer, linear networks, infinite width)
Instead, we uncover combinations of parameters with simplified dynamics that we solved exactly without a single assumption
2/8
At
#ICML2023
to present our mechanistic take on mode connectivity at Poster Session 4, Exhibit Hall 1 from 2--3:30 PM HST this Wednesday (tomorrow)!
DM me if you want to meet & chat!
Q. How can we fix the decision mechanism of a pre-trained model efficiently?
A. Mechanistic fine-tuning by driving the model over a barrier on the landscape!
led by
@EkdeepL
, interning at
@NttResearch
at Harvard
w/
@bigel_o
R. Dick
@DavidSKrueger
The realistic model for SGD breaks the conservation laws of gradient flow, resulting in simple first and second order ODEs
We can solve these ODEs exactly leading to theoretical solutions we empirically verify on VGG-16 training on Tiny ImageNet
6/8
These solutions confirm existing phenomenon, such as the spherical motion of parameters before batch normalization, while highlighting new phenomenon, such as the harmonic motion of parameters before the softmax function
7/8
We develop Lagrangian mechanics of learning by modeling it as the motion of a particle in high-dimensional parameter space. Just like physical dynamics, we can model the trajectory of discrete learning dynamics by continuous-time differential equations.
3/
Gradients and Hessians, at all points in training, obey geometric constraints due to symmetry
A network has a symmetry if the loss doesn’t change under a transformation of the parameters (i.e. translation, scale, rescale for parameters preceding softmax, batchnorm, ReLU)
3/8
Gradient flow is too simple for realistic SGD training. We construct a more realistic model considering weight decay, momentum, mini-batches, and a finite learning rate
We use modified equation analysis to model the effect of discretization (as also done in recent works)
5/8
However, there is still a gap between Newton’s EOM and gradient flow. Thus, we model the effects of finite learning rate as “implicit acceleration”, a complementary route to the "implicit gradient regularization" by
@dgtbarrett
, Benoit Dherin,
@SamuelMLSmith
,
@sohamde_
.
4/
Interesting topic. On the interaction between biological and artificial intelligence by Prof. Kunihiko Fukushima, father of Neocognitron (the inspiration for CNN) recorded in 2002.
@ylecun
@IRudyak
yes agreed, I think it is a classic analogy, and has been around for a while. and yes also agree that many people are trying to copy how the brain works at too low a level, but we have always believed that systems-level neuroscience has an important part to play in developing AI
We can potentially reduce the cost of training if we can prune neural networks at initialization.
The key challenge is "layer-collapse," the premature pruning of an entire layer making a network untrainable.
2/
NTT has just founded the "Institute for Fundamental Mathematics" to work on, yes, mathematics!
I like how this press release (in Japanese) quotes Eugene Wigner's "The Unreasonable Effectiveness of Mathematics in the Natural Sciences" as the motivation.
We are presenting "Pruning neural networks without any data by iteratively conserving synaptic flow" at
#NeurIPS2020
poster session (B2) today. Please visit us!
Updated manuscript is here:
Q. Can we find winning lottery tickets, or sparse trainable deep networks at initialization without ever looking at data?
A. Yes, by conserving "Synaptic Flow" via our new SynFlow algorithm.
co-led with Daniel Kunin
&
@dyamins
,
@SuryaGanguli
paper:
1/
We derive Noether’s Learning Dynamics (NLD), unified equality that holds for any combination of symmetry and learning rules. NLD accounts for damping, the unique symmetries of the loss, and the non-Euclidean metric used in learning rules.
8/
We establish an exact analogy between two seemingly unrelated components of modern deep learning: normalization and adaptive optimization.
Benefits of this broken-symmetry-induced “implicit adaptive optimization” are all empirically confirmed!
9/
We prove that layer-collapse can be entirely avoided by designing an algorithm with iterative, positive, conservative scoring.
We design SynFlow satisfying the key requirements and show that it reaches the theoretical limit of max compression without collapsing a network.
5/
By studying the symmetry properties of the kinetic energy, we define “kinetic symmetry breaking”, where the kinetic energy corresponding to the learning rule explicitly breaks the symmetry of the potential energy corresponding to the loss function.
7/
Symmetry properties of this Lagrangian govern the geometry of learning dynamics.
Indeed, modern deep learning architectures introduce an array of symmetries to the loss function as we previously studied in .
6/
@SamuelAinsworth
@stanislavfort
@siddhss5
Very interesting discussions!
a thought: could the lack of NormLayers be the reason why ResNet with NormLayers can use SGD with momentum while MLP can only use Adam?
hope a theory below showing
“NormLayer+SGD+momentum+WD = RMSProp”
may be helpful here!
Q. What does Noether’s theorem tell us about the “geometry of deep learning dynamics”?
A. We derive Noether’s Learning Dynamics and show:
”SGD+momentum+BatchNorm+weight decay” = “RMSProp" due to symmetry breaking!
w/
@KuninDaniel
#NeurIPS2021
Paper:
1/
Don't miss out on
@EkdeepL
’s poster on "Mechanistic Mode Connectivity" today at the
#NeurIPS2022
Workshop on Distribution Shifts!
Room 388 - 390 (Poster Session: 11:00-12:30)
Preprint time! 🧵
DNNs can use entirely distinct prediction mechanisms to solve a task (e.g., background vs. shape).
Q1: Are such models mode-connected in the landscape?
Q2: Can we change a model’s mechanisms by exploiting such connectivity?
Link:
1/12
We are excited to be organizing a Symposium on the Impact of Generative AI in the Physical Sciences next Thursday, March 14 and Friday, March 15! Join us on the 8th Floor of
@MIT_SCC
for a great lineup of speakers and panelists. Zoom link available soon.
To better understand the phenomena, we first mathematically formulate and experimentally verify a conservation law.
This conservation law explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse.
3/
Notably, SynFlow makes no reference to the training data and consistently outperforms existing state-of-the-art
pruning algorithms at initialization on 12 distinct combinations of models and datasets.
6/
I regret leaving the Bay Area during the pandemic and I'd like to thank my mentors -
@SuryaGanguli
, without whom I wouldn’t be working in this exciting new frontier of science, Daniel Fisher and Stephen Baccus for their guidance and support, and all the friends!
2/
The energy and momentum at Harvard for science of deep learning with an interdisciplinary approach that combines physics, neuroscience, and cognitive science is truly amazing. In coming years, I hope to contribute in some small way to the vibrant community in the Boston area.
4/
It was a pleasure to speak at
@NttResearch
's
#Upgrade2024
event in San Francisco this month about how a scientific approach can make AI more trustworthy and energy efficient! 🧠💡
The history of industrial revolutions has been a story of understanding and harnessing the emergent abilities of complex systems.
Steam engines, electricity, transistors, & liquid crystals all sparked new fields of physics to control their powers. 🔥💡🌈
Nice article from the Harvard Crimson about our "CBS-NTT Program in Physics of Intelligence"! 🧠
“This is new for all of us. How do you explain intelligent behavior in equations or in physics terms?”
Harvard University’s Center for Brain Science received a gift of more than $300,000 per year for up to five years from the NTT Research Foundation, the foundation announced Thursday.
Eunice S. Chae, Patil Djerdjerian, and Rachel M. Fields report.
Q. How can we fix the decision mechanism of a pre-trained model efficiently?
A. Mechanistic fine-tuning by driving the model over a barrier on the landscape!
led by
@EkdeepL
, interning at
@NttResearch
at Harvard
w/
@bigel_o
R. Dick
@DavidSKrueger
We then hypothesize that the conservative scoring combined with "iterative" re-evaluation can avoid layer collapse.
This insight also explains how iterative magnitude pruning avoids layer-collapse to identify "winning-lottery ticket "subnetworks at initialization.
4/
Nice detailed coverage of our "CBS-NTT Program in Physics of Intelligence"🧠
"As history teaches us, inventions can lead to new fields in physics. Today, AI is playing that role ... to explore fundamental questions involving the science of intelligence"
Q. What does Noether’s theorem tell us about the “geometry of deep learning dynamics”?
A. We derive Noether’s Learning Dynamics and show:
”SGD+momentum+BatchNorm+weight decay” = “RMSProp" due to symmetry breaking!
w/
@KuninDaniel
#NeurIPS2021
Paper:
1/
Our young group funded by NTT Physics and Informatics Lab () uniquely bridges industry and academia, focusing on the intersection of physics, neuroscience, and machine learning. View our recent works here:
2/
@RogerGrosse
How about "multiplicative emergence" or "compositional abilities"?
This emphasizes how tasks like CoT, achieved through a composition of N atomic abilities, exhibit a "multiplicative" effect on their metrics.
We introduced the term in our NeurIPS paper:
@AlexGDimakis
Please also check our NeurIPS paper on "Multiplicative Emergence of Compositional Abilities" ().
We provide a concrete example of this by training text-conditioned diffusion models on interpretable compositional generalization tasks!
Preprint time! 🧵
DNNs can use entirely distinct prediction mechanisms to solve a task (e.g., background vs. shape).
Q1: Are such models mode-connected in the landscape?
Q2: Can we change a model’s mechanisms by exploiting such connectivity?
Link:
1/12
An inspiring thread about the beautiful work led by wonderful colleagues
@LoganGWright1
&
@tatsuhiro_onod
at NTT PHI Lab +
@peterlmcmahon
group.
Looking forward to the future of this general approach that fundamentally integrates physics and neural computation!
Our physical neural networks paper published in
@nature
last week. The main message: Everything can be a neural network (and can be trained efficiently with backpropagation).
In light of common reactions/questions, a commentary 🧵 on key ideas, limitations, and futures. 1/n
ICYMI We've got a new paper out in
@NatureNeuro
and I want to tell you about it. Here's the paper, and here's a short explainer - a thread.
Cerebral organoids at the air–liquid interface generate diverse nerve tracts with functional output
1/11
A thought-provoking article about how science and technology interact and evolve together in a nurturing research environment.
Much to learn from history in exploring the science of deep learning!
Inspiring article: Venky Narayanamurti our former Dean had a distinguished career as scientist and director at Bell Labs where I was fortunate to have him as a mentor. Research is fragile and needs to be nurtured in a special way in order to thrive!
BatchNorm, LayerNorm, WeightNorm…
Too many normalization layers to choose from? Which one to use, when, and how?
Theory can guide you!
led by
@EkdeepL
, who amazingly bridges theory & practice of deep learning!
We take a highly interdisciplinary approach, bringing computer science, brain science, and physics to better understand the mathematical principles that underlie intelligence.
Excited for the future of this program – stay tuned for more to come! 🚀
@NttResearch
@Harvard
Excited to find a thorough & timely review video on our theory of neural network pruning and SynFlow algorithm.
Thank you for the thoughtful summary and feedback.
@ykilcher
Pruning is hard 🙂 Pruning before training is harder 😲 Pruning before training WITHOUT looking at data seems impossible 😱😱 Watch the video to find out how SynFlow achieves this.
@Hidenori8Tanaka
@dyamins
@SuryaGanguli
Our group funded by NTT Physics and Informatics Lab () uniquely bridges industry and academia, focusing on the intersection of physics, machine learning, psychology, and neuroscience. View our recent works here: …
2/
@NttResearch
@EkdeepL
@KuninDaniel
Thanks to my collaborators & more details soon!
We, Neural Network group at NTT Physics and Informatics Lab, are young and growing. Please feel free to reach out for discussions, collaborations, and internship opportunities.
At
#NeurIPS2023
to share a series of works on "Physics of AI" and "AI for Neuroscience" from NTT Research at Harvard CBS team! 🎉
Find me at the posters and/or DM me to chat & explore collaborations!
Can't thank the Summer 2023 team enough♥️:
@EkdeepL
@fatihdin4en
@MayaOkawa
Kyogo,
@kyogok
, is one of the very best biophysicists I've met (+lived with!).
He does both theoretical physics & experimental biology in the new exciting lab — a great opportunity at RIKEN, Kobe, Japan.
Commencement at Kyoto University (my home!) is famous for students wearing cosplay.
Seems like this year, physics students dressed as the legendary “Course of Theoretical Physics” by Landau and Lifshitz!
Miss the absolute freedom. ⛩️