Antonio Orvieto @orvieto_antonio Twitter profile

Last Seen Profiles

@auburnalchemist

@Brilladasol_

@QSMPShop

@boy__dev

@insondek_yasha

@TrafalgarLyonx

@TheUntzFestival

@Kestrel_Soldier

@LippincottTroy

@JoseMariAzpiazu

@MackieBrasil

@tamak0

@GDanezis

@vg_biomarina

@NAYouthCenter

@SchilliACM

@DiannaLGunn

@FernandoArizaYT

@PizzaCo_Strat

@undisputed

@SenseandC_sense

@firefiera

@Claudiusmiau

@DawnOkoro

@hellocutecumber

@SuvenduWB

@PPISDAthletics

@jimindelusions

@vina_tcg

@gssexpo

@ColbyPoulson

@StJosephsafc53

@stanfordanes

@LucyPaulBooks

@Dirty_Worka

@Amatsuka_Moe1

Antonio Orvieto

@orvieto_antonio

2 years

Hold on to your seats – this is a strange paper. There is a specific way to inject noise in gradient descent to yield generalization. w/ @HansKersting @AurelienLucchi @BachFrancis #deeplearning #machinelearning #computerscience

15

79

470

Antonio Orvieto

@orvieto_antonio

2 months

S4, Mamba, and Hawk/Griffin are great – but do we really understand how they work? We fully characterize the power of gated (selective) SSMs mathematically using powerful tools from Rough Path Theory. All thanks to our math magician @MucaCirone 🧵

7

39

219

Antonio Orvieto

@orvieto_antonio

3 years

Check out our recent work on neural network landscapes and adaptive methods. Trust me, it's a good read.

2

36

209

Antonio Orvieto

@orvieto_antonio

5 months

Finally, SSMs take over graph data! No attention, no positional encodings, faster than graph transformers.

Yuhui Ding

@yuhui_ding

5 months

Inspired by recent breakthroughs in SSMs, we propose a new architecture, Graph Recurrent Encoding by Distance (GRED), for long-range graph representation learning: with @orvieto_antonio , @bobby_he and Thomas Hofmann (1/4)

4

17

146

1

36

205

Antonio Orvieto

@orvieto_antonio

7 months

Fourier analysis is one of the most beautiful things I ever studied. Here is a mind-bending property I discovered today:

8

15

177

Antonio Orvieto

@orvieto_antonio

3 months

If you are looking for a PhD position in the intersection between Deep Learning and Optimization, it's not too late to apply to my group at @MPI_IS and @ELLISforEurope Institute Tübingen! Send a DM if you are interested :)

Deep Models and Optimization

institute-tue.ellis.eu

3

40

148

Antonio Orvieto

@orvieto_antonio

8 months

🚀 Thrilled to announce: I'm now with ELLIS as a PI & MPI for Intelligent Systems as an Independent Group Leader! 🌟 Tübingen is such an amazing place. On a hunt for PhD candidates passionate about deep learning & optimization! Interested? Slide into my DMs! 🔍 @ELLISforEurope

13

8

139

Antonio Orvieto

@orvieto_antonio

10 months

Also at HiLD #ICML2023 , @SamuelMLSmith @sohamde_ , others, and I will present our work showing that linear RNN + token-wise MLP = universal nonlinear dynamical system approximator This is so cool! Explains S4, S5 and the LRU. preprint:

3

24

101

Antonio Orvieto

@orvieto_antonio

7 months

CLS offers a unique blend of ETH and MPI; I know so many exceptional graduates! This year, I am an associate faculty member! Please apply via our online application portal at . The application deadline is midnight (23:59 CET) on November 15, 2023.

1

11

80

Antonio Orvieto

@orvieto_antonio

2 months

I was always fascinated by muP. while muP theory is clear, the optimization perspective gives super cool and clear insights, holding at finite width. Was a super fun project. Let's make optimization great again!

Lorenzo Noci

@lorenzo_noci

2 months

Why in neural networks the learning rate can transfer from small to large models (both in width and depth)? It turns out that the sharpness dynamics can explain it. Check out our new work! w/ @alexmeterez (co-first), @orvieto_antonio and T. Hofmann

1

28

142

0

5

66

Antonio Orvieto

@orvieto_antonio

7 months

It has been an incredible first month here at the ELLIS Institute #T übingen . Freedom we have is unmatched, environment is incredibly stimulating. Did you know Hölderlin started writing the Hyperion in Tübingen? At the age of 22. Thanks, @ELLISforEurope @MPI_IS for your vision

1

57

Antonio Orvieto

@orvieto_antonio

15 days

Our Next Generation Sequence Modeling Architectures workshop proposal was accepted by ICML! We have an incredible lineup of speakers, please come say hi and consider submitting your works! :)

Caglar Gulcehre

@caglarml

15 days

Feeling very fortunate to co-organize this workshop with an incredible group of researchers, Razvan Pascanu, @orvieto_antonio , Carmen Amo Alonso, and Maciej Wołczyk!

0

6

1

3

46

Antonio Orvieto

@orvieto_antonio

6 months

It's awesome to be back in Paris!! Thanks @BachFrancis for hosting me this week at @Inria – such a wonderful place. Filling the building with thoughts on RNNs 🎃 fun fact, Paris is the only place in the world i managed to get my oboe repaired in 5 min.. and for free 🥖🥖

0

40

Antonio Orvieto

@orvieto_antonio

1 month

Thanks a lot to @Cyber_Valley for these amazing shots! Had lots of fun 🧙‍♂️. Spoiler: amazing guests in the next episodes (it's a crescendo) ;) #AI

Cyber Valley

@Cyber_Valley

1 month

🚀 Get ready to dive deep into the captivating world of artificial intelligence with us! The Cyber Valley Podcast coming soon... 🎙️ Don’t miss our unforgettable episodes, created in collaboration with the ELLIS Institute Tübingen #AIPodcast #AIResearch #ELLIS #AI @orvieto_antonio

0

2

15

0

5

35

Antonio Orvieto

@orvieto_antonio

7 months

I am looking for a motivated 3-months intern for a project here at MPI for Intelligent Systems & ELLIS Tübingen! If you are free in the period Nov 15 - Feb 15 and know optimization + how to code in torch/jax, please contact me! Internships are on-site only :)

0

6

34

Antonio Orvieto

@orvieto_antonio

4 months

Talking about SSMs and graphs at EPFL this Friday! If you are around, come say hi! Thanks, @CevherLIONS for the invite!

0

3

34

Antonio Orvieto

@orvieto_antonio

6 months

stop 2: deepmind! its nice to see that the atmosphere did not change: amazing place filled with inspiring people, like @sohamde_ @SamuelMLSmith – thanks for hosting me!

2

0

33

Antonio Orvieto

@orvieto_antonio

2 years

Anti-PGD just got accepted to ICML! 😍 @HansKersting @BachFrancis @AurelienLucchi #icml #MachineLearning

3

29

Antonio Orvieto

@orvieto_antonio

10 months

Still at #ICML2023 , our oral presentation is at 15:16 today. Don't miss it – I spent yesterday night working on the slides instead of going to the beach! Paper: #deeplearning #Hawaii @SamuelMLSmith @sohamde_ @_albertgu

1

2

30

Antonio Orvieto

@orvieto_antonio

6 months

Very last few days to apply for a PhD at @ELLISforEurope . This program is getting more and more awesome every year. If you'd like to work on unveiling some deep learning mysteries, pls apply!

ELLIS PhD Program: Call for applications 2023

The ELLIS mission is to create a diverse European network that promotes research excellence and advances breakthroughs in AI, as well as a pan-European PhD program to educate the next generation of...

ellis.eu

1

5

26

Antonio Orvieto

@orvieto_antonio

10 months

Today at 11am an amazing DeepMind team and I will present our poster on the LRU () at ICML in Honolulu 🏖️! If you are around, don't forget to stop by! Oral: this Thursday, 15:16 🏄 @sohamde_ @SamuelMLSmith @_albertgu #ICML2023 #Honolulu #DeepLearning

Resurrecting Recurrent Neural Networks for Long Sequences

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably...

arxiv.org

0

5

24

Antonio Orvieto

@orvieto_antonio

3 months

Studying the dynamics of minimax games is *super challenging* . Here we simplify everything, you are welcome 💥 PS: optimization is so beautiful

Enea Monzio Compagnoni

@EneaMC

3 months

In "SDEs for Minimax Optimization" we investigate the intriguing training dynamics of minimax games. It is a journey through a complex dance of optimizers, where continuous-time tools simplify the math and provide great insights. #AISTATS

0

2

18

0

1

24

Antonio Orvieto

@orvieto_antonio

1 month

he is the full thing!! :)

ELLIS Institute Tübingen

@ELLISInst_Tue

1 month

🎙 The first episode of the @Cyber_Valley Podcast with our Principal Investigators is now out! 🚀 @Orvieto_Antonio #AIPodcast #AIResearch #AI 🔗 Learn more:

1

8

18

1

3

23

Antonio Orvieto

@orvieto_antonio

7 months

I am so excited for this! Working with Francis is probably the best experience a PhD student can hope to have.

Francis Bach

@BachFrancis

7 months

@orvieto_antonio and I are planning to co-advise a student from this program. If you are interested in optimization for deep learning, please apply!

0

11

39

0

4

23

Antonio Orvieto

@orvieto_antonio

2 months

so nice to finally see what @sohamde_ @SamuelMLSmith @caglarml have been up to! this is INCREDIBLE, nicest possible read in my Cairo-Milan flight this morning :)

Sasha Rush (ICLR)

@srush_nlp

2 months

New Griffin paper is really interesting and contains a lot of implementation details . Implementation is in Pallas which is a Jax like frontend to Triton/TPU lowering. They show that Associative Scan is inherently worse than Linear Scan in this context.…

4

41

285

0

2

21

Antonio Orvieto

@orvieto_antonio

10 months

See you at the HiLD #ICML2023 workshop today! We have 3 posters: - On the Universality of Linear Recurrences Followed by Nonlinear Projections - A New Adaptive Method for Minimizing Non-negative Losses - On the Advantage of Lion Compared to signSGD with Momentum Pls stop by!

1

0

20

Antonio Orvieto

@orvieto_antonio

2 months

🚀 We show that dense generalizations of Mamba/Hawk/Griffin~(Linear CDEs) are able to approximate any nonlinear sequence to sequence map - no MLP/GLU layer is required. This is our main technical result.

2

1

15

Antonio Orvieto

@orvieto_antonio

7 months

Today it popped to mind that time @AnthropicAI showcased their product on our paper. Amazing!!

Resurrecting Recurrent Neural Networks for Long Sequences

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably...

arxiv.org

Anthropic

@AnthropicAI

1 year

Lastly, quickly consume and get up to speed on dense material like research papers.

1

13

222

0

15

Antonio Orvieto

@orvieto_antonio

3 years

Nesterov's acceleration mechanism is believed to be linked to the geometry of symplectic integration. I can name more than 10 papers about it. Our paper (accepted AISTATS 2021) shows this not the case: explicit Euler integration also leads to acceleration.

2

0

13

Antonio Orvieto

@orvieto_antonio

2 years

Wow, what a great first day at @DeepMind ! Also learned today: England ravioli are not good

0

13

Antonio Orvieto

@orvieto_antonio

3 months

this is so good!!

Sasha Rush (ICLR)

@srush_nlp

3 months

Mamba: The Hard Way (v2 ). Ton of feedback on v1, learned a lot. This version produces identical results to cuda, and should be faster and cleaner than v1. (I had to learn about butterfly register shuffles 🦋) Unfortunately still slower than Cuda. There…

4

79

583

0

1

12

Antonio Orvieto

@orvieto_antonio

9 months

Working with SDEs was so cool.

2

0

11

Antonio Orvieto

@orvieto_antonio

2 months

💥 We rigorously prove that Mamba collects input statistics more efficiently than S4. Chaining S6 recurrences with linear pointwise maps allows computation of higher-order global statistics. As such, Mamba and Hawk/Griffin put less compute burden on the MLP.

1

0

11

Antonio Orvieto

@orvieto_antonio

5 months

Last stop: @OxfordStats ! Thanks @yeewhye for the great discussions. Lots of RNNs again, but are those a good choice for minimizing perplexity?

1

0

11

Antonio Orvieto

@orvieto_antonio

2 months

🧑‍🔬 We establish a unified framework to analyze the power of all existing SSMs – using the mathematical formalism of Signatures.

1

10

Antonio Orvieto

@orvieto_antonio

8 months

Tuning hyperparameters is so 2012

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

8 months

Hot off the presses: ResNet hyperparameter transfer across depth and width! Tl;dr transfer for LR+schedules, momentum, L2 reg., etc. for wide ResNets and ViTs, with and without Batch/LayerNorm w/ @lorenzo_noci @mufan_li @BorisHanin @CPehlevan

2

26

138

1

10

Antonio Orvieto

@orvieto_antonio

6 months

Applying to this last year was one of the best decisions I ever made! The deadline for this year's call is approaching soon; hurry!

Bernhard Schölkopf

@bschoelkopf

6 months

Help us build the ELLIS Institute: the new call for Hector Endowed PI positions is at . The positions come with the possibility for co-appointment at Max Planck & Tübingen AI Center #ELLISforEurope #Tuebingen #AI @MPI_IS

5

41

109

0

10

Antonio Orvieto

@orvieto_antonio

1 month

omg!!

Aleksandar Botev

@botev_mg

1 month

Following our previous work, we are releasing RecurrentGemma - a fully open source 2B model based on our Griffin architecutre! Code + weights as everyone has wished for! Code on Github: Weights on Kaggle:

1

15

47

0

10

Antonio Orvieto

@orvieto_antonio

17 days

I learn something new from @jonasgeiping every day - this podcast is a bomb; stay tuned for the next episodes!!

ELLIS Institute Tübingen

@ELLISInst_Tue

17 days

🎙 The second episode of the @Cyber_Valley Podcast with our Principal Investigator @jonasgeiping is now available🚀Tune in to learn about Safety and Efficiency of AI. 👉 Check it out:

1

7

13

0

11

Antonio Orvieto

@orvieto_antonio

7 months

source:

1

0

9

Antonio Orvieto

@orvieto_antonio

1 year

Boosting generalization of your deep learning model with just 6 lines of code? "Explicit Regularization in Overparametrized Models via Noise Injection", just accepted at AISTATS2023 w/ @anantraj94 @HansKersting @BachFrancis

1

0

9

Antonio Orvieto

@orvieto_antonio

10 months

At HiLD #ICML2023 , Lin Xiao (Meta) and I will show an optimizer that is *provably better* than SGD – and works amazingly in Deep Learning and convex optimization. It's also as cheap as SGD, yet almost second order. Full paper coming soon! sample:

0

2

9

Antonio Orvieto

@orvieto_antonio

17 days

I am democratically not allowed to learn German

0

10

Antonio Orvieto

@orvieto_antonio

2 years

@giffmana @HansKersting @AurelienLucchi @BachFrancis hi! thanks for the comment. our objective was not to get state of the art, but to improve over vanilla algorithms – i.e. constant stepsize sgd and gd – using noise injection. with schedulers, additional effects might kick in: we want to just study the effects of noise here

1

0

8

Antonio Orvieto

@orvieto_antonio

7 months

Random screenshot from the Tensor Programs VI so exciting

0

7

Antonio Orvieto

@orvieto_antonio

1 year

Don't miss this! :)

Nicolas Loizou

@NicLoizou

1 year

Check out our #NeurIPS2022 paper: “Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution” Joint Work with @orvieto_antonio , @SimonLacosteJ Paper: Code:

1

3

20

0

7

Antonio Orvieto

@orvieto_antonio

1 year

@ykilcher yannic my italian heart says it does

2

0

7

Antonio Orvieto

@orvieto_antonio

2 months

So apparently, according to Gemini, the best way to learn about the paper DAGs with NO TEARS () is to listen to the song Whigfield - No Tears to Cry. Actually, the song is not too bad.

0

7

Antonio Orvieto

@orvieto_antonio

2 months

Soham is back from vacation with the most fantastic summary of Griffin!!

Soham De

@sohamde_

2 months

Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!

12

62

318

0

7

Antonio Orvieto

@orvieto_antonio

7 months

Here is a *fantastic* PhD program: the International Max Planck Research School for Intelligent Systems (IMPRS-IS). I am so lucky to be among faculty this year :) have room for one practical-minded PhD in opt for deep learning. Interested? Please apply on

0

6

Antonio Orvieto

@orvieto_antonio

5 months

Truly groundbreaking!

Albert Gu

@_albertgu

5 months

Quadratic attention has been indispensable for information-dense modalities such as language... until now. Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried. With @tri_dao 1/

53

421

2K

0

6

Antonio Orvieto

@orvieto_antonio

1 year

Lots of ideas flowing on my first internship day @MetaAI in Seattle 🐰🐰🐰 Now let's see if at Amazon Go they have good pizza #ArtificialIntelligence #pizza

1

0

6

Antonio Orvieto

@orvieto_antonio

6 months

Revisiting a problem I did not manage to solve in Spring 2020. It is still unsolved but has been open since 1964. Can you guess? ;)

0

5

Antonio Orvieto

@orvieto_antonio

5 months

!!!

Sebastian Raschka

@rasbt

6 months

"Simplifying Transformer Blocks" ranks easily among my favorite research papers that I've read this year. Here, the authors look into how the standard transformer block, essential to LLMs, can be simplified without compromising convergence properties and downstream task…

53

594

4K

0

3

Antonio Orvieto

@orvieto_antonio

7 months

Such an amazing overview! Nice illustrations too!

Adrian Valente

@lowrank_adrian

7 months

Blog post!! Rumors of the death of RNNs have been largely exaggerated... In this post I summarize why and how RNNs are making a comeback in ML, and what this means for theorists of neural comps. Many thanks to @NicolasZucchet for help and corrections!

3

61

213

0

4

Antonio Orvieto

@orvieto_antonio

2 years

Now, this is really really cool stuff. Heavy-ball is beautiful but very nasty to analyze mathematically compared to Nesterov's method. very nice work. #Optimization #MachineLearning

0

4

Antonio Orvieto

@orvieto_antonio

4 years

ehi! Yuwen, @AurelienLucchi , and I will be hosting in 20 minutes an ICML Zoom poster session for our paper on acceleration for stochastic derivative-free optimization (). Here is the link:

0

4

Antonio Orvieto

@orvieto_antonio

4 years

in like 2 hours! see you there

Aurelien Lucchi

@AurelienLucchi

4 years

Shadowing Properties of Optimization Algorithms, #neurips2019 , poster 217 on Thursday evening. We derive a theoretical argument to link ODEs and their corresponding algorithms in optimization.

1

8

19

0

4

Antonio Orvieto

@orvieto_antonio

3 months

@TobiasKatsch @Cosmos_Algomor @ykilcher @_akhaliq @HochreiterSepp @_albertgu thanks so much Tobias!!

0

3

Antonio Orvieto

@orvieto_antonio

26 days

Say you have 1 hour a day to dedicate to learning a new thing, would it be

Algebraic topology

81

German

45

1

5

Antonio Orvieto

@orvieto_antonio

2 years

@rasbt @HansKersting @AurelienLucchi @BachFrancis Uh yes!! thanks for spotting this :)

0

3

Antonio Orvieto

@orvieto_antonio

10 months

around Portland next week? come talk science with the amazing @giuliammaz and many others

1

3

Antonio Orvieto

@orvieto_antonio

2 years

@deepcohen @BachFrancis @HansKersting @AurelienLucchi great question. yes, you can but you then are not linearly adding Gaussian noise ;)

1

0

3

Antonio Orvieto

@orvieto_antonio

2 years

Uh, do not absolutely and for no reason at all miss our poster at #imcl2022 ! uh also, i should warn you that there is going to be nobody there ahaha

1

0

3

Antonio Orvieto

@orvieto_antonio

11 months

@NicolasZucchet @romeier1 @ssmonsays Wow, thank you so much for this!!!

0

3

Antonio Orvieto

@orvieto_antonio

2 months

Great people that made this happen: @CristopherSalvi and Benjamin Walker

0

3

Antonio Orvieto

@orvieto_antonio

7 months

@PierreMari0n It's a powerful decision that i fully respect. i was always curious though of this: is it really needed to take strict actions on flying when this is only ~3% (some say a bit more, some a bit less) of emissions? flying less is good, definitely helps, but is the stigma worth it?

1

0

3

Antonio Orvieto

@orvieto_antonio

2 years

literally the most funny thing i have seen in a while

Ethan Caballero is busy

@ethanCaballero

2 years

new fan-made NeurIPS 2022 trailer:

60

541

3K

0

3

Antonio Orvieto

@orvieto_antonio

2 years

@giffmana @HansKersting @AurelienLucchi @BachFrancis Maybe! there are many variations: what if one does momentum? what is you have an adaptive step? what about warmup? and if we clip gradients? there are many variations one could explore :) here we kept it simple.

0

2

Antonio Orvieto

@orvieto_antonio

7 months

@tetraduzione @ELLISforEurope @MPI_IS but how many do you know?

0

2

Antonio Orvieto

@orvieto_antonio

1 year

@DrewLinsley @arankomatsuzaki @_albertgu @tserre @lakshming92 @alekhka Need to include a discussion absolutely on those in the next updated version!! thanks!

0

2

Antonio Orvieto

@orvieto_antonio

3 years

Want to know more about the acceleration mechanism in convex optimization? Please visit our AISTATS poster in a few hours! "Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization"

0

2

Antonio Orvieto

@orvieto_antonio

4 years

finally science done right

0

2

Antonio Orvieto

@orvieto_antonio

2 years

@agrover112 @HansKersting @AurelienLucchi @BachFrancis @MetaAI Uh! Never got into that, thanks for the reference! :)

1

0

2

Antonio Orvieto

@orvieto_antonio

1 year

@plazined so cool

0

2

Antonio Orvieto

@orvieto_antonio

10 months

ok so since apparently everyone i know is in #ICML2023 , which movie reminds you best of Hawaii? @giuliammaz

Scooby-Doo

3

Jurassic Park

4

0

1

2

Antonio Orvieto

@orvieto_antonio

1 year

amazing

Chris Hladczuk

@chrishlad

1 year

Work-life balance in your 20s is an easy way to guarantee a mediocre career.

2K

20K

0

2

Antonio Orvieto

@orvieto_antonio

2 years

@roydanroy @HansKersting @AurelienLucchi @BachFrancis absolutely, yet we show that simply adding Gaussian noise does not help!! :) see the PGD results

0

2

Antonio Orvieto

@orvieto_antonio

11 months

@NicolasZucchet @romeier1 @ssmonsays If you experienced any issues or have any questions, please feel free to reach out!

0

2

Antonio Orvieto

@orvieto_antonio

10 months

🥰

Jürgen Schmidhuber

@SchmidhuberAI

10 months

Meta used my 1991 ideas to train LLaMA 2, but made it insinuate that I “have been involved in harmful activities” and have not made “positive contributions to society, such as pioneers in their field.” @Meta & LLaMA promoter @ylecun should correct this ASAP. See…

60

168

1K

0

2

Antonio Orvieto

@orvieto_antonio

3 months

@arimajee @MPI_IS @ELLISforEurope great question! 4-year bachelor would also be ok

0

2

Antonio Orvieto

@orvieto_antonio

9 months

@tetraduzione AHAHAH

0

1

Antonio Orvieto

@orvieto_antonio

7 months

@MassCaccia massimo! you magician

0

1

Antonio Orvieto

@orvieto_antonio

3 years

@tetraduzione dear Grande Antonio, i actually only read the solution to differential equations from the olive oil pattern in my fish soup. i can teach you any time

1

0

1

Antonio Orvieto

@orvieto_antonio

1 year

@OhadRubin @arankomatsuzaki also note: the credit for some parametrization ideas is to assign probably to the original S4 papers and closer variants.

0

1

Antonio Orvieto

@orvieto_antonio

3 years

@tetraduzione @UncertaintyInAI urca

0

1

Antonio Orvieto

@orvieto_antonio

2 years

@stevain @giuliammaz

1

0

1

Antonio Orvieto

@orvieto_antonio

7 months

@tetraduzione @ELLISforEurope @MPI_IS anto' ma che dici, only the street. where is the tower? can we go there and sing bella ciao insieme?

1

0

1

Antonio Orvieto

@orvieto_antonio

2 years

@SamuelAinsworth @deepcohen @BachFrancis @HansKersting @AurelienLucchi Nice discussion! Yes, everything is in expectation. But The analysis would be complex in the smoothing approach: grad(x+noise) = smoothed_grad + noise2, where noise2 crucially depends on the gradient scale. The noise will have a non-stationary distribution – harder to analyze.

1

0

1

Antonio Orvieto

@orvieto_antonio

1 year

@serrjoa I am very sure some Frankenstein architecture will be the way to go for the future

0

1

Antonio Orvieto

@orvieto_antonio

4 years

@ykilcher @MSFTResearch you are my hero

0

1

Antonio Orvieto

@orvieto_antonio

1 year

@CFGeek @SamuelMLSmith @_albertgu @caglarml @sohamde_ Thanks for the very nice summary!! :)

0

1

Antonio Orvieto

@orvieto_antonio

3 years

@tetraduzione miss you Grande Antonio

1

0

1

Antonio Orvieto

@orvieto_antonio

1 year

@KhanovMax I sort of agree, at inference time RNNs and all S4 variants can model very similar functions. Initialization and optimization are important tho! we try to explain the main issues and solutions here 🤠.

0

1

Antonio Orvieto

@orvieto_antonio

6 months

wow!! :)

Sanjeev Arora

@prfsanjeevarora

6 months

We're looking for postdoctoral fellows in AI! We offer: excellent cohort of young researchers, dedicated GPU cluster with 300H100s, $100K salary (+$10k research funds), stunning campus. 1 hour from NYC and Philly. Renewable, i.e., possible to stay multiple years. Join us!

6

86

303

0

1

Antonio Orvieto

@orvieto_antonio

2 years

@giuliammaz @stevain boomer

0

1

Antonio Orvieto