A quick thread on "How DALL-E 2, Imagen and Parti Architectures Differ" with breakdown into comparable modules, annotated with size 🧵
#dalle2
#imagen
#parti
* figures taken from corresponding papers with slight modification
* parts used for training only are greyed out
In 2020, my life was turned upside down -- I was encountering failure after failure and in deep depression. But that also turned out to be the most enlightening period. In October I was invited to give a talk at Google (thanks to
@orf_bnw
). It's out today:
ML conferences should be award shows, with categories like
"Best paper"
"Best first author"
"Best supporting author"
"Best Figure 1"
"Best appendix"
Or creative categories like
"I can't believe this worked"
"I can't believe this didn't scale"
"a theory so beautiful I cried"
[PAPER POLICE AT WORK] Pretty cool vis, and solid paper, but do we really have to blow it up as to say LLMs "learn a world model"?? The result basically says all similar words (wrt location or time) are well clustered in the latent space—a finding already known from word2vec.
Do language models have an internal world model? A sense of time? At multiple spatiotemporal scales?
In a new paper with
@tegmark
we provide evidence that they do by finding a literal map of the world inside the activations of Llama-2!
I am meeting ML researchers all the time. There are two worlds. One world's problem is "should I go to Google or Deepmind next summer" and the other is "a small opportunity, however small, please."
All problems are valid. But my heart sinks at this chilling, disorienting divide.
Could it be that RLHF finetuning works not because it's RL, not because it's HF, not even because it's finetuning, but just that it's *rating full sentences* instead of next token? Could it be that in the end, it's just a change of training objectives that made the difference?
The phenomenal teams from Google Research’s Brain and
@DeepMind
have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead!
Dear
#reviewer2
:
Thank you for your comments like
"the amount of work in this paper is more like a course project, instead of a paper at CVPR"
"the contribution is not enough to reach the bar of CVPR publication"
But we'd like to point out that this is a submission to
#ICCV
OKAYYY finally have a first version of slides ready, 18h before my talk! Here's a sneak peek.
See you tomorrow for those that are at
#DeepLearningIndaba
#DLI2023
I joined
@GoogleAI
6 months ago. Today I went to the office for the first time, and ate lunch in a cafeteria.
Easily my happiest day of the year. Easily.
Anyone who's played with generative models knows that *THEY CAN'T SPELL* (at least not until they reach a certain scale). Well turns out there's a simple fix!
Check out our paper that carefully benchmarks and investigates "the spelling miracle"
No one from OpenAI is even pretending to celebrate scientific breakthroughs from places other than OpenAI.
Beware of the culture (or just simply, "vibe") when you consider joining a place.
Some had a baby in 2022. Some published a book. Some got a promotion or landed a dream job. But those are their lives, not yours.
Remember achievements need to be normalized by circumstances. And you've done great on your own measures. Happy new year's, my beautiful outsiders.
Now that ML is so much about scaling up on all fronts, and so little about ideas, scientific awards (e.g. "NeurIPS oral") should be resource normalized.
Student and independent researcher led research should NOT be evaluated the same way as industry-backed research. Period.
Potentially hiring Student Researchers to work on fundamental research from studying small-scale transformers to better understand training, to operating brain surgery on existing LLMs. Apply, and get in touch!
BS/MS:
PhD:
This is my 3rd year being a DEI chair for ICLR, and a DIA chair for NeurIPS. The emotional toll is immense. Seeing daily requests for difficulties I have 0 power to help with: 99% around visa appointments & rejections, is dispiriting. You realize how fxxking unfair everything is.
A year ago today I was hitting rock bottom of my mental health that my only hope was time. I placed a calendar invite a year from then, asking myself if I'd feel better.
Today I get to answer yes to that question.
I know time might not be your best bet, but it's a reliable one.
Today was my last day at Google Brain/Deepmind.
Really grateful for amazing colleagues.
Learned so much from closely working on Pathways, ViT-22B, PaLI, NaViT and Gemini with
@m__dehghani
,
@neilhoulsby
,
@_basilM
, Xi and others.
The team terminated me pretty well today :)
First day of
#ICLR2021
! My fav conf and its 2nd year running virtually.
Excited to share that I joined Google Brain as a research scientist. It’s a wonderful feeling to start a new job without quitting the old: I will continue to serve as executive director of
@ml_collective
1/
8 years later, I'm still having nightmares about failing grad school—unable to find an internship, unable to publish. So relieved this morning waking up to the universe that I've already graduated, got a job and everything.
But, is the dream me still stuck in the hell universe?
Feeling grumpy today so I’m just gonna say it: it’s a business deal; they paid you, you did some labeling. Lots of business value. Almost zero social impact. A great paper, a wonderful language model, but why making it sound like a bunch of heros just saved the humankind?
Very proud of the
@scale_AI
team to have worked with
@OpenAI
on these new results in alignment.
Long-term, human-AI alignment is one of our time's most important problems as AI becomes more powerful, and enabling alignment and is a fundamental long-term mission of Scale.
Hi ML twitter, what's the current state of model distillation? Roughly how many fold compression can we achieve within the same architecture? What about across architectures, e.g. training a transformer and distilling to an MLP?
While everywhere else in the world AI == ChatGPT, in this small oasis of
#ICLR
we are seeing the vast variety of topics. Turns out you have to come to AI conferences to escape AI hype.
I've been hearing from people that it's a very challenging year for AI/CS grad school application. A few words:
First, I'm sorry many of you have to live through this; it's never fun receiving rejections.
Second, I don't expect the situation to improve in foreseeable future.
Earlier this summer Krystal and I are appointed as ICLR 2022 DEI chairs. Right away we started planning something tangible and actionable to *really* help broaden the ICLR participation pool.
Hence the birth of CoSubmitting Summer (CSS) program:
1/
If you are on a high-impact paper/project/idea, it has more to do w/ your privilege & luck than your talent & hard work. You absolutely need the latter, but the former plays an outsized effect—and we downplay it.
It's this single realization that led to
@ml_collective
last year.
Years ago we used to pretrain each layer of a network before stacking them together. Then "end-to-end" became a big term. Now a model is so big/modular that we again have to pretrain different parts: text encoder, image tokenizer, etc. before assembly.
Fashion is a circle 🔁
I'm happy to promote your
#ICLR2021
*rejections* if you believe your (or others') work is of value, and simply didn't won the lottery this time. I will 1) read and tweet 2) discuss them at our reading group. Send your work in!
One thing people might be missing with the Gemma Open Models release is the section announcing grants given to academics that can help push the boundaries of LLM research. Apply before April 17! All academics (universities + non-profits / independent labs) are applicable!
It's totally okay to not have any research ideas that excite you from time to time. Instead of having a structure that rewards continuous publication of mediocre ideas, researchers should be allowed to have "downtime," where they teach, mentor, write, and freely think.
We are told that this video is not viewed/liked enough, and hence blocks us from releasing more episodes 😥
We have lots of amazing interviews in stock, w/ e.g.
@_beenkim
@mc_mozer
@hardmaru
@hugo_larochelle
Pls watch & like to help us release more!
Today marks one year of me joining Google Brain.
That's it. That's the whole tweet. There won't be a long thread of me thanking and tagging everyone. That's uncool.
My grandma, 94, passed away in Wuhan, China last Tue. There has since been heaviness that clouds my days, but I haven't been able to take a break. I don't know how, when there are talks that I've committed to give, rec letters I've agreed to write, and unmovable paper deadlines.
The outcome of research might seem clean and cute, but the process is always messy. Normalize this.
What are some of your cringe-making coding or experimenting habits? I'll start.
Sometimes I write
if xxx:
(50 functions)
else:
(same 50 functions but with small changes)
What's the real difference between "pre-training" and "finetuning" these days? It used to be that the 2 phases have totally different objectives, tasks, data, and even modifications of model (new heads). But in LLMs it seems to mean just new data? Why not call it "more training"?
You know a field is progressing fast when the first paragraph of introduction has almost 100% of citations less than 2 years old (2021 & 2022), except for the very first diffusion paper (2015)
#imagenvideo
...and followed with 4 pages of figures.
I don't attend
@RealAAAI
but what's going on with their registration policy??
1. $600 for a virtual conference?
2. "Author with Publication" costs 2x "Co-Author" or "Non-Author"?
3. Is it necessary to add a late penalty when everything is, again, VIRTUAL??
Not unlike BatchNorm, I want anyone entering
@ml_collective
to go through these normalizations:
- normalize failure ("I got rejected")
- normalize goodwill ("I'm just here to help")
- normalize imperfection ("this may be stupid")
- normalize difficulty ("research is hard!")
"We’ve tended to breed the same style of researchers over and over again—people who come from similar backgrounds, have similar interests, read the same books as kids, learn from the same thought leaders, and ultimately do the same kinds of research." -
@orussakovsky
BIG-bench was the single biggest reason I joined
@jaschasd
's team at Google a year ago! Not only is it ground-breaking science—evaluating 200+ diverse tasks on 30+ LMs, it is also a revolutionary way of doing science: inviting the entirely community to join from the very start.
I'm beyond grateful to be appointed as ICLR 2022 DEI co-chair with the lovely
@kammitama
. We are working on something really exciting, a summer program to help underrepresented researchers not just attend, but work on, submit to, and hopefully publish at ICLR 2022! Stay tuned!
⚠️Emergency Reviewers⚠️ needed for
#ICLR2024TinyPapers
We are approaching the end of reviewing period, but sadly <50% reviewers have completed they assignment. We need emergency reviewers to review tiny (2-page!!) papers! Please reach out to iclr.dei.2023
@gmail
if interested!
As the AI/ML social society starts to mimic show biz, to stand out one can consider taking on an identity. You can be the "scaling guy," the "spicy takes person," the "generalization god" or "gossip king."
Turns out I'm suited as a receiver of "horrible stories in ML." Honored.
Recent paper titles:
"Hey, that's not an ODE" ()
"Oops I took a gradient" ()
Predicted next:
"Ahh I made it work"
"Wow GANs can do this"
"Bah turns out they can't"
"Duh that's obvious"
...
Glad someone did this! We are indeed entering an era where “the field is in danger of being drowned by noise” hence lit survey, benchmarking, cross-validation, and distillation work are more valuable than proposing new methods!
📣
#ICML
2021 Paper 📣
Overwhelmed by the flood of optimizers for deep learning? We felt the same and performed an extensive benchmark. Joint work with
@robinschmidt_
&
@PhilippHennig5
.
Paper:
Results:
Video:
No need to apologize for not replying to emails/texts, no need to apologize for making mistakes, no need to feel bad for dropping tasks, for overcommitting and under-delivering, for disappearing. You are a human. This is the most challenging time ever. I understand. I understand.
Now that we can write Tiny Papers
@iclr_conf
, what should we write about?
I'd like to invite all established researchers to contribute Tiny Ideas as inspirations, seeds for discussions & future collaborations!
#TinyIdeasForTinyPapers
I'll start. Note: bad ideas == good starts.
Congrats to everyone involved in one of the big announcements today! 🎉
To those who are not: remember that your value is NOT whether you find yourself in one of the few in-groups. A healthy scientific field is not supposed to make you feel left out. You are doing great 🌞
Forget about catching up with ml papers, catching up with who's where doing what now is much harder. Someone start an "ml gossips" newsletter and I'll subscribe
Our deep learning reading group is *fully virtual* and *open to all* this year! Join us every Friday noon to hear a live discussion of ideas.
And nominate your favorite speaker too; I'll do my best to have them come present (no guarantee of success🤪)
This was so much fun. I also uncovered that
@savvyRL
has been hosting a regular ml reading group since 2018.
It is currently remote hosted virtually, so a great thing to join if you want to take a social break for an hour and chat about research.
Was opening YouTube to find some 10m thing to watch before going to bed, and ended up watching this new talk in full:
Good teaching as usual!
@karpathy
I am on a Midwest (so beautiful this time of year🍂🍁🍃) tour, giving talks first at
@jcjohnss
's lab at
@UMich
, then at
@WisconsinCS
on the kind invitation of
@SharonYixuanLi
Slides here for those who can't make it:
Wow, 3k views in 24h! Thank you -- every one of you.
DLCT in 20 minutes! 😂 We are talking about "Simple, Fast, and Flexible Framework for Matrix Completion with Infinite Width Neural Networks" today.
The
@ml_collective
research jam, merely 1 year and 5 instances old, has resulted in 5 papers:
The power of...just giving people a stage and let 'em shine?
So somewhere between 3B and 20B
#parti
models acquire the ability to spell. Cross-checking with model that can't seem to spell (
#dalle2
) and that can (
#imagen
):
Imagen: 4.6B text enc + 2B text-to-image + 1B super-res
DALL-E 2: 250M (?) text enc + 600M image enc + 3.5B dec
#Parti
paper has some really interesting results where they show the improvements in image quality as the number of model parameters is scaled up from 350M to 20B.
You see visible quality enhancement for the hipster kangaroo, and accurate text rendering!
Indaba left me shaken. Before going I thought I'd gotten pretty jaded in conferences; yes meeting people is nice but it had started to feel like just another business trip. But INDABA IS MAGIC. Thanks to all the wonderful people that made this happen!
@DeepIndaba
My favorite paper of the year! Why training *one* NN when you can train *a subspace* of infinite number of NNs? Thinking of instead of moving one dot across the loss landscape, move a line, an area, a cube... so that the whole space contains trained NNs! Congrats
@Mitchnw
et. al
Sharing some takeaways from Learning Neural Network Subspaces, a fun project we learned a lot from and hopefully you can too. (1/n)
tldr: We train lines, curves, and simplexes of neural networks from scratch
arxiv:
code:
#ICML2021
Thanks for all the attention! The GPUs and I have successfully made it to
#NeurIPS2022
, and have so far dodged crypto miner robbery.
Time for our week-long GPU Giveaway! We will be seeking volunteers to help send them all over the globe. Watch
@ml_collective
for more details!
Favorite
#NeurIPS2020
presentations and posters this year
PS: heavily biased by what I happened to catch and whom I happened to talk to
PPS: still catching up on talks so the list is rather incomplete and I'd hope to grow
PPPS: with contributions from
@ml_collective
members
Good day! Average researcher here going through thousands of NeurIPS financial aid applications trying to roll out all decisions by Monday. (Some of you should have already received decisions; others: thanks for your patience!)
This Friday (since everyone is busy with ICML and I'm pathetically free) I am giving a talk at *my own* reading group -- extreme nepotism or what? 😬
Title: "Unconventional ways of training neural networks and what they teach us about model capacity"
I know how you feel. I know twitter seems brutal right now. I know the more "we're hiring..." "a great opportunity..." "happy to announce..." posts you see, the worse you feel.
I know why all the "great opportunities" only make you feel inadequate and why you skip all of them.
Top tweet content at NeurIPS:
1. Workboat
2. JAX building sign
3. "Just don't build AGI" swag
4. How to get into that party when registration is full
5. Playing with GPT-3.5 and ChatGPT in the hallway
A high schooler reached out wanting to do research with
@ml_collective
. In lieu of interviews I asked him to pick a topic and give a tutorial about it. He picked "continuous hopfield networks" and did a great job. In my mind he's a proper researcher by the end of the talk. 1/2
ICLR reviewers: I know you are all distracted and spending way too much time scrolling here -- it would be massively healing for your soul if you could instead engage with paper authors today and tomorrow! 🥰
We are losing money this year running ICLR in Africa, because many sponsors pulled out. Only 6 company booths were set up, compared to the usual 25-30. Hope it improves in future years, now that we see the overwhelmingly positive turnout!
Let's goooooo! ML conferences are experiencing an identity crisis — they are all the same. Same people, same papers, same talks.
What's distinct about this year's ICLR is that the special location will present the most different demographic from all other ML confs. 🧶 1/5