rohan anil @_arohan_ Twitter profile

Last Seen Profiles

@dddrop_the_lace

@L0tus_Ren_

@slimkiaa

@KerlinMaur26886

@liavnyl

@BarrysTravel

@MicahhParsons11

@IngridLewi28182

@obong_awan

@MassimoRosi15

@vimhealthcare

@NWCC_Football

@jesboms

@LCSBaseball

@UMiamiENN

@IZON_official

@MBSTLady

@JFGariepy

@KrissJinn

@Ariel_2cold

@JGSDF_MatudoSta

@t6JV4o4E4kT88bw

@MuppetSilas

@N_IFCA

@Copperfiel95129

@VeriNetherius

@Laurablindkilde

@netetea88191

@NatsukiSaika

@KennethMor93516

@KittyDior_Latex

@kana89306981056

@psc_style

@Iamluannax

@MaritimeTartan

@serafioz

rohan anil

@_arohan_

1 year

This paper looks like a big step forward for the Transformer architecture! A foundational improvements, not as shiny as other things, but really big step forward nonetheless

11

103

841

rohan anil

@_arohan_

8 months

Meta researchers just dropped PyTorch distributed shampoo🧴few days ago: 💥 Train neural networks with a second order method for better performance. This underlying work which it is based on has been a passion project for last 5 years while swimming…

9

74

565

rohan anil

@_arohan_

5 months

It’s been a privilege to work alongside with our gemini leads and team (across Google DeepMind, Research and Alphabet) in one of the most interesting and challenging projects of my career. We have three versions of Gemini: (a) Ultra (b) Pro and (c) Nano We make significant…

Jeff Dean (@🏡)

@JeffDean

5 months

I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,…

276

3K

13K

21

24

496

rohan anil

@_arohan_

2 years

A new image generation model just dropped. Great work by the team! + Auto-regressive, encoder->decoder Transformer + Classifier-free sampling. + ViT-VQGAN Really amazing results: Image from the website.

13

105

481

rohan anil

@_arohan_

4 years

Shampoo is out of the bottle! Preprint: "Second order optimization made practical" We train certain neural nets faster than before. How fast ? It has shown upto ~40% reduction in training time for a Transformer. ( @tomerikoriko )

7

113

439

rohan anil

@_arohan_

11 months

PaLM-2 is Generally available for developers! “With this update, developers can access our text model powered by PaLM 2, Embeddings API for text, and other foundation models in Model Garden”

Generative AI support on Vertex AI generally available | Google Cloud Blog

Google Cloud announces Generative AI support on Vertex AI generally available.

cloud.google.com

5

88

397

rohan anil

@_arohan_

2 years

Today, we present our paper on Google Search Ads CTR model at ORSUM @ACMRecSys , Seattle. We highlight ML techniques suited to *online learning* that go well beyond traditional accuracy improvements. A short thread: 1/n

6

84

380

rohan anil

@_arohan_

5 months

Career milestone. Coauthored paper with Jeff D, Oriol V, Koray K, Demis H this year at the same time with rest of the Gemini team. 🤯

Gemini: A Family of Highly Capable Multimodal Models

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro,...

arxiv.org

17

25

349

rohan anil

@_arohan_

2 years

Prompt: "A koala bear and grizzly bear playing chess. They are sitting at a table on the beach. You can see the waves crashing into the shores. Bears are very stressed. DSLR camera photo." #imagen #googleai #brain 🐻🐨♟️🏖️

13

42

327

rohan anil

@_arohan_

2 years

Batch Entropy Regularizer that makes untrainable networks train. Remove skip connection, normalization layers. Published at TMLR, Works on PaLM like transformers -- thanks to Lucid for the pointer!

Improving the Trainability of Deep Neural Networks through...

Training deep neural networks is a very demanding task, especially challenging is how to adapt architectures to improve the performance of trained models. We can find that sometimes, shallow...

arxiv.org

3

53

313

rohan anil

@_arohan_

1 year

Palm2 is online: 🌴🌴 Paper: I learned to code with instructions in Malayalam, so this capability shown by PalM-2 instruction tuned models to explain the code make me quite happy! Possibilities are endless here!

rohan anil

@_arohan_

1 year

🌴🌴 Very proud of this work; specifically not compromising on model quality, while being extremely fast for inference, so that we can serve the whole wide world i.e bringing technology to everyone!

0

6

59

17

69

293

rohan anil

@_arohan_

3 months

@karpathy @giffmana The team is working hard to bring audio inputs to the AI Studio interface for Gemini 1.5 Pro. We have an internal version that handles audio and video and can sample the video less frequently to increase the length of content that can be handled. @karpathy , thanks for the…

7

38

288

rohan anil

@_arohan_

2 years

L👈: "A Koala bear in a suit standing at a podium to teach. Variational bayesian methods is written on the chalkboard. There are lot of confused cats in the crowd" R 👉:"Variational bayesian methods is all you need is written on the chalkboard." 🐨🙀 #imagen #googleai #brain

9

40

268

rohan anil

@_arohan_

5 years

@dave_universetf @therealfitz "Picard: hello can you hear us. It looks like you are muted."

0

10

246

rohan anil

@_arohan_

2 years

Prompt: "A train ride in the monsoon rain in Kerala. With a Koala bear wearing a hat looking out of the window. There is a lot of coconut trees out of the window" #imagen #googleai #brain (I will host the imagen team at my home in Kerala if they choose to visit 🚀)

14

13

248

rohan anil

@_arohan_

11 months

GPT-4 can do well on MIT test Community: oh the methodology is all wrong 🌶️ Introducing new optimizer that is 2x faster than AdamW Community: Impressive! Impressive methodology! Said methodology: use half the steps for new method and change learning rate schedule to…

11

24

242

rohan anil

@_arohan_

3 years

Code for Distributed Shampoo: a scalable second order optimization method 💥 Joint work w @GuptaVineetG State of the art on MLPerf ResNet-50 training to reach 75.9% accuracy at 32,768 batch size Trains in 1729 steps (not a typo), 284 secs on TPUs.

0

31

228

rohan anil

@_arohan_

1 year

PaLM 2 Technical Report

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a...

arxiv.org

8

44

227

rohan anil

@_arohan_

5 years

Code for SM3 which is a memory efficient adaptive first-order optimizer is now open-sourced under @GoogleAI research repository. It's useful for training very large language models, for eg: BERT-Large, GPT2 etc.

3

52

209

rohan anil

@_arohan_

2 years

I completely missed the Parallel Layers used in PaLM. Its makes training 15% faster at larger scale. Mainly run MLP and Attention together! Thanks @achowdhery for pointing this out to me! The savings in compute are quite substantial.

6

13

199

rohan anil

@_arohan_

2 years

“For example, if the traditional algorithm taught in school multiplies a 4x5 by 5x5 matrix using 100 multiplications, and this number was reduced to 80 with human ingenuity, AlphaTensor has found algorithms that do the same operation using just 76 multiplications.”

Google DeepMind

@GoogleDeepMind

2 years

Today in @Nature : #AlphaTensor , an AI system for discovering novel, efficient, and exact algorithms for matrix multiplication - a building block of modern computations. AlphaTensor finds faster algorithms for many matrix sizes: & 1/

114

2K

8K

3

15

194

rohan anil

@_arohan_

10 months

Some excellent work by @jeankaddour and colleagues “We find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate” ☠️

Jean Kaddour

@jeankaddour

10 months

@_arohan_ Our arxiv preprint might be of interest to you:

0

2

51

5

33

186

rohan anil

@_arohan_

2 years

Transformer paper should get half a decade test of time award for completely transforming the industries and what people work on.

4

8

167

rohan anil

@_arohan_

4 years

Tinker with this visualization here for training neural networks with noise added in the dataset. Made with tensorflow.js and inspired by neural network playground. 👇

2

40

160

rohan anil

@_arohan_

10 months

Arrived to these shores @ 2010 Greencard @ 2023 ✅

24

2

161

rohan anil

@_arohan_

1 year

10 years ago I left working on iOS communicator at MSFT to work on machine learning at Google, without much connections or a doctoral degree for that matter. Crazy how time flies! And due to a bunch of lucky breaks, very thankful to be doing ML things at Google 🧠

9

4

157

rohan anil

@_arohan_

3 years

MADGRAD: 76.22% Shampoo: 77.8%

AI at Meta

@AIatMeta

3 years

We're introducing an optimizer for deep learning, MADGRAD. This method matches or exceeds the performance of the Adam optimizer across a varied set of realistic large-scale deep learning training problems.

26

514

2K

3

27

149

rohan anil

@_arohan_

1 year

Adam is the past?

Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions

Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find...

arxiv.org

5

10

145

rohan anil

@_arohan_

1 year

Next big jump with Neural Network performance is going to happen when community embraces non-uniformity Eg, stacking of identical layers has become ingrained within our tools and mindsets.

15

11

135

rohan anil

@_arohan_

7 months

Gen AI on-device? A foundation model on the phone? Imagine an entire operating system level unlock of capabilities: Well Pixel 8 Pro will have it. Rick announced it here: The model was trained with several algorithmic breakthroughs by our team to…

#MadeByGoogle ‘23: Keynote

Watch the #MadeByGoogle '23 event and get to know the new #Pixel8, #Pixel8 Pro, and #PixelWatch 2. Our phone, earbuds and watch all come together in a portfo...

www.youtube.com

9

15

134

rohan anil

@_arohan_

5 months

Gemini Nano improve on the efficiency frontiers. They are multimodal as well, see results in the paper. Nano series: At 1.8B and 3.25B parameters packs so much to provide high utility on device First foundation model on the device!

Sundar Pichai

@sundarpichai

5 months

Gemini Nano is super efficient for tasks that are on-device. Android developers can sign up for an early access program for Gemini Nano via Android AICore and Pixel 8 Pro users can already see it rolling out in features like Summarize in Recorder and Smart Reply in Gboard + much…

67

157

2K

5

11

133

rohan anil

@_arohan_

1 year

😝👇

Andrej Karpathy

@karpathy

1 year

@sharifshameem , and #lossfunctionstumblr :D

3

4

95

2

13

130

rohan anil

@_arohan_

1 year

Just tested it on a paragraph from one of my papers, and it does seem like it improves the writing. Sure, if you generate whole papers with a LM thats not cool but improving the writing quality seems good for everyone?

(((ل()(ل() 'yoav))))👾

@yoavgo

1 year

this is kinda gate-keepy, @icmlconf

35

23

291

22

11

131

rohan anil

@_arohan_

1 year

After 10 years at Google, 5 in Google Brain, now I work at Google DeepMind

Oriol Vinyals

@OriolVinyalsML

1 year

𝗚𝗼𝗼𝗴𝗹𝗲 𝗗𝗲𝗲𝗽𝗠𝗶𝗻𝗱

22

87

850

6

5

127

rohan anil

@_arohan_

2 years

People shocked that StableDiffusion was trained with less resources haven’t been paying attention to many things including Craiyon/DalleMega runs. Scale is not all you need dear community. Nice to write that in a paper though.

4

2

127

rohan anil

@_arohan_

11 months

First of all, massive congratulations are in order to @zacharynado @GeorgeEDahl @naman33k and co-authors on this massive work spanning multiple years on benchmarking neural network training algorithms! 🎉🍾 I have a horse 🐴 in the race and its called distributed shampoo 🦄

Aran Komatsuzaki

@arankomatsuzaki

11 months

Benchmarking Neural Network Training Algorithms Presents AlgoPerf, a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware.

2

66

304

1

19

122

rohan anil

@_arohan_

6 months

We can go back to reading arxiv papers! 🕊️

2

7

122

rohan anil

@_arohan_

3 months

@giffmana @karpathy This is a great idea! @karpathy would you give permissions for us to use the video? 🙏

3

2

122

rohan anil

@_arohan_

2 years

Augments Transformer 🤖 architecture with n-grams that are constructed from discrete latent representation of the text sequence. Faster training and inference when it matters the most - as core operations are (distributed) gather/scatter. 🎇 Code:

AK

@_akhaliq

2 years

N-Grammer: Augmenting Transformers with latent n-grams abs: propose modification to the Transformer architecture by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence

0

30

129

2

21

118

rohan anil

@_arohan_

3 years

Finishing up slides now. I will be talking about “Scalable second order optimization for deep learning” at Deep Learning: Classics and Trends, tomorrow.

4

12

119

rohan anil

@_arohan_

2 years

Deep Neural Networks. They are much better than kernels. I am going train them again and again with second order methods.

1

3

116

rohan anil

@_arohan_

1 year

Muting the professors who are arguing over each other on twitter has significantly improved my experience on this app.

3

1

111

rohan anil

@_arohan_

1 year

NaaNs at home, NaNs at work. Kid starts stepping at home, model stepping at work.

4

3

110

rohan anil

@_arohan_

5 years

@dave_universetf @therealfitz Picard: "Yes, but we can't see you anymore. Can you try the auxiliary camera?"

0

2

98

rohan anil

@_arohan_

3 months

Called it with some knowledge about the model. Ultra is going to break ground! Those quibbling over hellaswag and mmlu is just showing their misunderstanding about evaluation. Onwards 🚀

lmsys.org

@lmsysorg

3 months

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini…

155

632

3K

8

5

106

rohan anil

@_arohan_

2 years

Thankfully people judge ideas based on their merit. Noticed citation to twitter thread in dalle-2 for example.

Russ Salakhutdinov

@rsalakhu

2 years

1/3 I think we should coin a new term: social media AI researcher, where instead of publishing your work at the rigorous peer review venue, you tweet about your findings and opinions. There are huge advantages: 1. It is easy: you don't have to deal with that annoying reviewer 2.

33

46

648

5

104

rohan anil

@_arohan_

2 years

The AGI I want is one that realizes I made a dumb mistake with batch size which makes it OOM on a supercomputer and tries a smaller one for me - while I am sleeping so I don’t have to babysit the models and increases the throughput in experimentation!

7

4

97

rohan anil

@_arohan_

6 months

Collectible item

5

3

97

rohan anil

@_arohan_

3 years

"Ever want to learn how JAX works, but the implementation seemed impenetrable? Well, you're in luck! By reading this tutorial, you'll learn every big idea in JAX's core system. You'll even get clued into our weird jargon!" Jax team keeps exceeding every expectation 😂

👩‍💻 Paige Bailey

@DynamicWebPaige

3 years

♥️ autodidax

1

30

209

0

22

97

rohan anil

@_arohan_

3 months

128k content length? we don’t know how to count that low, its 10M now.

Jeff Dean (@🏡)

@JeffDean

3 months

Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long…

198

1K

6K

3

7

94

rohan anil

@_arohan_

2 years

Adam: Neumann:

3

8

93

rohan anil

@_arohan_

5 years

@dave_universetf @therealfitz "Kirk: I am sorry that is the replicator, the mute button isn't working."

1

3

86

rohan anil

@_arohan_

1 year

We tried reading this to the baby and she whacked it out of my hands in preference to spots’s first walk.

3

90

rohan anil

@_arohan_

7 months

AdaGrad and Shampoo with aggregated second moments now works for deep learning. This is quite similar to the Grafting technique we introduced to disentangle the direction from step size, where we have also found identical results.

0

13

87

rohan anil

@_arohan_

2 years

Who remembers?

3

0

86

rohan anil

@_arohan_

8 months

2 NeurIPS accepts in… optimization, is nature healing?

4

1

85

rohan anil

@_arohan_

6 months

If you are doom scrolling: here is something to take your mind off: by @Devvrit_Khatri and @dvsaisurya with @GuptaVineetG @inderjit_ml and Cho-Jui Hsieh appearing at NeurIPS 2023 this year.

A Computationally Efficient Sparsified Online Newton Method

Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality....

arxiv.org

5

16

82

rohan anil

@_arohan_

2 years

For improving the replication in ML, why not ship the 0th step (init values) and 1st step weights (single opt step) with every architecture release?

11

3

82

rohan anil

@_arohan_

3 years

Learnt many of my colleagues have gotten IIT JEE All India Rank 1. 👀 In hindsight is obvious but imposter syndrome has kicked in.

5

0

79

rohan anil

@_arohan_

16 days

And here we go 🚀

lmsys.org

@lmsysorg

16 days

More exciting news today -- Gemini 1.5 Pro result is out! Gemini 1.5 Pro API-0409-preview now achieves #2 on the leaderboard, surpassing #3 GPT4-0125-preview to almost top-1! Gemini shows even stronger performance on longer prompts, in which it ranks joint #1 with the latest…

36

195

946

2

1

79

rohan anil

@_arohan_

3 months

Distributed Shampoo, ICML: 4 diverse workloads. AC: do another 7 ablations costing a million dollars 💵 for us to believe you have beaten Adam. ICLR: Beats every ML perf workload on wall-clock time. AC: I am a distributed system expert and I don’t believe you (charitable…

Oriol Vinyals

@OriolVinyalsML

3 months

Fun game. Clocking 17973 citations: "Distilling the knowledge in a neural network" @geoffreyhinton , @OriolVinyalsML , @JeffDean Reviewer 38 (NeurIPS 2014): "This work is incremental and unlikely to have much impact even though it may be technically correct and well executed."

5

47

481

3

1

78

rohan anil

@_arohan_

1 year

Designing new ML techniques have a 0-1 ramp. Nothing works until it does. And many small steps towards it independently wont work, but combined works.

7

77

rohan anil

@_arohan_

5 months

Everyone looking at metrics and demos out here and debating nuances of evals which is fine But missing the point these models are in Bard and Pixel 8 Pro right now and coming to more surfaces.

7

6

78

rohan anil

@_arohan_

2 years

@ak92501 I am here to say the obvious that’s not how a lion drinks a glass of water.

3

75

rohan anil

@_arohan_

4 years

Tensorboard has been preparing me for this moment. Refresh refresh refresh

0

7

73

rohan anil

@_arohan_

2 years

Amazing set of follow up on imagen @GoogleAI Looking forward to playing with this!

AK

@_akhaliq

2 years

UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image abs:

7

148

714

0

6

72

rohan anil

@_arohan_

3 years

Everyone sad top AI conferences are virtual this year. Folks, online conferences actually allows lot more people to attend who otherwise would be not due to weaker passports. I only attended 1 conference outside which was in Canada, and don’t want to go through the pain again

3

5

72

rohan anil

@_arohan_

2 years

My career so far was built over real world deployed ML. So take it with a grain of salt. 🧂 H-index and citation counts are weakly correlated with usefulness or reality. 🤷‍♂️ So read the papers to judge it not the citation counts.

Gautam Kamath

@thegautamkamath

2 years

My understanding is that Google Scholar is the pet project of one small team. It's crazy how little design choices (e.g., default sort papers by # of cites, total # of citations prominently displayed) influence all of academia by making citations a default "measuring stick".

15

8

276

0

7

70

rohan anil

@_arohan_

2 years

Prompt: "A koala bear in a suit at a dining table reading a newspaper and drinking tea contemplating. Photo taken by a DSLR camera." #imagen #googleai #brain Inspiration from "I should buy a boat" cat meme. Quite spectacular results. 🐨☕️📰

7

71

rohan anil

@_arohan_

2 years

I guess kids in the future wont know what a GUI is, they will think and a transformer will do it.

5

1

70

rohan anil

@_arohan_

2 years

🤯 improvement in generation! - Frozen Text enc (t5-xxl) - 3 generation cascade text to 64x64 -> 256x256 1024x1024. - CF guidance (says its critical) - CF causes prediction skew (beyond the interval) - static and dynamic clipping improves it - New arch: Efficient UNet 1/n

Gill

@GillVerd

2 years

New DALLE-like text-to-art image generator from @GoogleAI called #Imagen . Seems like AI for art progress keeps accelerating! 🤖🎨🚀

4

44

210

3

4

70

rohan anil

@_arohan_

3 years

Reviewer asks to compare against a paper that appeared on arxiv after ICML deadline. One of us coauthored that arxiv submission. 😂

4

2

69

rohan anil

@_arohan_

16 days

New phrase learned today from staying up on twitter. “LLM doping” Who wants to make a doping test and an agency that checks llm for eval doping? I would cut a check for starting something along these lines.

12

5

68

rohan anil

@_arohan_

4 years

@ilyasut Does that mean the brain could be also using the Adam optimizer too?

2

3

67

rohan anil

@_arohan_

5 months

I tried this in malayalam too last night and it blew my mind! I am very excited by this, studying can be much more effective if there was a personalized tutor who could explain steps on “how” and “why” along the way. I see this future to be immensely positive!

Jeff Dean (@🏡)

@JeffDean

5 months

The multimodal and reasoning capabilities of Gemini are quite strong. The benchmark results, which I’ll discuss in a moment are nice, but I’m most excited by demonstrations of what it can do. Consider the image below. A teacher has drawn a physics problem of a skier going down…

23

212

1K

1

4

67

rohan anil

@_arohan_

2 years

Working from @Google Bay View Campus today!

2

1

67

rohan anil

@_arohan_

3 years

@natashajaques @maxhkw forgot: "we figured out how batch norm works, yet again!"

1

3

67

rohan anil

@_arohan_

2 years

ML Engineering for Google Search ads pCTR model.

AK

@_akhaliq

2 years

On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models abs:

2

36

220

1

7

66

rohan anil

@_arohan_

2 years

Amazing! I was predicting a good version would take at-least a few months. (Meanwhile conference is still in reviewer matching mode)

AK

@_akhaliq

2 years

A implementation of text-to-3D dreamfusion, powered by stable diffusion github:

24

434

2K

1

3

65

rohan anil

@_arohan_

7 months

From ICLR, Jorge provides a fast approximation for the inverse 4th roots in Shampoo! I recommend implementing the stable & fast coupled newton inverse, but maybe for some problems computing approx inverse pth root more often could be useful

1

8

65

rohan anil

@_arohan_

3 years

@Sci_j_my They were several papers who couldn't cite your papers because of TOTAL failure on bibtex, they didn't get to compile twice. BibTex is now switching parts of it code.. have you heard of that? my people tell me that. We had tremendous citation, with certified by reviewers.

0

64

rohan anil

@_arohan_

1 year

Ha! Our first TMLR paper is in! 🎉 Great experience as well!

4

3

64

rohan anil

@_arohan_

2 years

Successfully raised human for 6months. Parenting is hard!

4

0

62

rohan anil

@_arohan_

2 years

Really enjoyed reading Nichol & Dhariwal This plot was the most interesting. Optimizing vlb(variational lower bound) was harder than a simple means squared error + lambda Lvlb Then they wave a magic wand to fix this, the green curve 1/2

1

11

59

rohan anil

@_arohan_

2 years

Re-reading PipeDream-2BW today: Has anyone trained an LLM (>1B params) with it? Are LLMs really delay tolerant (delay of 1 step?).

Memory-Efficient Pipeline-Parallel DNN Training

Many state-of-the-art ML results have been obtained by scaling up the number of parameters in existing models. However, parameters and activations for such large models often do not fit in the...

arxiv.org

5

8

59

rohan anil

@_arohan_

1 year

🌴🌴 Very proud of this work; specifically not compromising on model quality, while being extremely fast for inference, so that we can serve the whole wide world i.e bringing technology to everyone!

👩‍💻 Paige Bailey

@DynamicWebPaige

1 year

*cracks knuckles* and thus, we begin the "🌴PaLM v2" drinking game (but with coffee, tea, or your favorite caffeinated beverage of choice, as it's early! 😉) #GoogleIO2023 #GoogleIO

7

30

195

0

6

59

rohan anil

@_arohan_

5 months

Testing out @pika_labs Prompt: a quick brown fox jumped over a lazy dog.

9

2

59

rohan anil

@_arohan_

2 years

Going on parental leave next week, Really appreciate that the work gives 18 weeks off. Going to see what I think is the real diffusion models train 🥸🤓

6

0

60

rohan anil

@_arohan_

6 months

Checking out dishoom, living up to the hype.

6

1

59

rohan anil

@_arohan_

4 months

Gemini Pro on Vertex AI 🏎️

Delip Rao e/σ

@deliprao

4 months

Has anyone done large-scale profiling of inference speeds for different LLMs of comparable accuracy from different providers? Gemini Pro seems incredibly faster, from my personal experience, than, say, GPT-3.5. Seeing some numbers w/ error bars on this would be nice.

11

9

66

5

10

59

rohan anil

@_arohan_

2 years

Prompt: "Photorealistic koala bear wearing a tie dye tshirt. The koala bear is wearing a sunhat and aviator glasses. koala bear is inside a houseboat in Kerala. There is a lot of coconut trees in the background." #imagen #googleai #brain 🚀 I have the same t-shirt! 👕🐨🥥🌴

2

7

58

rohan anil

@_arohan_

3 years

Frustrating part of deep learning is that almost anything works, so for those wanting to know the why something works, it’s just endless pit of misery and unanswered questions

4

5

59

rohan anil

@_arohan_

1 year

Bottleneck layer is the layer that has fewer neurons than layer above / below. ⌛️ Then, Whats a layer that has more neurons than the layer below and above called? A 10x layer? A booster layer? Need help.

31

1

58

rohan anil

@_arohan_

2 years

Incredible! Found linear mode connectivity on test loss (right) not on train ! My mind is blown -- this is huge !? Updates ( @stanislavfort 's colab) + With DistributedShampoo(~0 train loss 🚀) train/test loss=0.0002/0.314 vs (0.350/0.333) train/test accuracy=0.999/0.982

rohan anil

@_arohan_

2 years

I reran Stan's colab (thank you for the colab!) with DistributedShampoo instead of Adam or SGD and got this. Everything looks connected. Hmm? is this a bug??? Want to know more? Well, you have to wait for a bit.

10

0

29

5

4

58

rohan anil

@_arohan_

1 year

Boris Dayma’s guide to training large models, a must read. He is using a second-order method (distributed shampoo) for all his training making him a handful of humans on earth who know how to deploy it correctly. Results speak for itself: Check it out!

Craiyon - Your FREE AI image generator tool: Create AI art!

Craiyon is an AI model that can draw images from any text prompt!

www.craiyon.com

Boris Dayma 🖍️

@borisdayma

1 year

📉 "A Recipe for Training Large Models" 👉 Report: I've been working for a while on this guide, sharing practical recommendations with my simple recipe for training models 🧑‍🍳

6

174

726

1

9

58

rohan anil

@_arohan_

2 years

All roads lead to autoencoders!

7

1

56

rohan anil

@_arohan_

5 months

Adding one more bit about Training of Pro models from the paper. Google infra has been such amazing, and joint with learning algorithms its magic.

3

2

57

rohan anil

@_arohan_

3 months

We with @quocleix used Gemini Advanced yesterday to brainstorm for an internal research week debate. It was quite incredible experience and an effective companion in creative brainstorming and nothing else compares.

Clemens Meyer

@clemenslm

3 months

Try out Gemini Advancd, our best model yet - it's awesome!

0

2

14

4

2

55

rohan anil

@_arohan_

5 months

Already unofficial evals are in here: GSM8k: 52.1% to 57.09% (maybe higher if the eval is buggy)

Mistral AI

@MistralAI

5 months

magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%%3A6969%2Fannounce&tr=http%3A%2F%%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24

520

2K

10K

0

5

58

rohan anil

@_arohan_

2 years

A small thread on related work on more-than diagonal optimization, in context of neural network. Kronecker Factorization: reducing cost from (mn)^2 to m^2 + n^2 comes from this sparsely cited paper from Heskes, 2000 See MLP section.

Lucas Nestler

@_clashluke

2 years

PSA: Switch your optimizer to Shampoo! We recently tried Shampoo compared to a tuned ensemble of Adam and SM3 at @HomebrewNLP and found that the hyperparameter search space contains many more "winning tickets," which also achieve lower losses!

1

29

209

1

7

58