Mohammed AlQuraishi @MoAlQuraishi Twitter profile | Pikagi

Pikagi

Mohammed AlQuraishi

@MoAlQuraishi

11,022

Followers

359

Following

29

Media

1,711

Statuses

MLing biomolecules en route to structural systems biology. Asst Prof of Systems Biology and CS @Columbia . Prev. @Harvard SysBio; @Stanford Genetics, Stats.

New York, NY

https://t.co/RzUpX2Shjl

Joined November 2012

Don't wanna be here? Send us removal request.

Pinned Tweet

@MoAlQuraishi

Mohammed AlQuraishi

7 months

We’ve a new review on DL methods for protein-protein interactions, focused on discovering novel interactions, structurally characterizing known interactions, and designing binders. Work by @JuliaRuRogers and Gergő Nikolényi, who’ve done a great job distilling a huge field. More👇

@JuliaRuRogers

Julia Rogers, PhD

7 months

Can we characterize the full diversity of protein interactions that coordinate cell function? Deep learning is a promising way! @MoAlQuraishi , Gergő Nikolényi, and I review the ecosystem of DL models for protein interaction discovery, elucidation & design

Tweet media one

4

66

321

0

55

232

Last Seen Profiles

@OutLoudNews

@sketchbreak

@berrybl18834835

@Kevin_Moore6

@sms_974

@deteso_dete

@BikerPsychology

@FernShawArt

@Gratefvl_

@Praveeeen_

@Jay_Majumdar

@adinascozylife

@saadmdff

@UndadTV

@beatrizdendini

@icecoldsteele

@dizikirlisepeti

@thelegomommy

@nikita_builds

@MiCanal5

@melodaysong

@o_saedd_2

@KISAICHI_satoC

@2000red79

@KalmanYeger

@40YKL

@aimi57473

@Tinktweets45

@yakitor27663532

@maybeclueless

@KatieM2429

@hannyalune

@weweb_io

@smmkds

@HarifalWak99644

@envyonne

@MoAlQuraishi

Mohammed AlQuraishi

2 years

We have successfully trained OpenFold from scratch, our trainable PyTorch implementation of AlphaFold2. The new OpenFold (OF) (slightly) outperforms AlphaFold2 (AF2). I believe this is the first publicly available reproduction of AF2. We learned a lot. A🧵1/12

Tweet card media

OpenFold GitHub Repository

54

992

4K

@MoAlQuraishi

Mohammed AlQuraishi

3 years

CASP14 #s just came out and they’re astounding—DeepMind looks to have solved protein structure prediction. Median GDT_TS went from 68.5 (CASP13) to 92.4!!!! Cf. their 2nd best CASP13 struct scored 92.8 (out of 100). Median RMSD is 2.1Å. I think it's over

35

567

2K

@MoAlQuraishi

Mohammed AlQuraishi

3 years

An announcement I’ve been aching to make! After much sweat, we’ve built a trainable version of AlphaFold2, implemented in PyTorch, which we’re calling OpenFold. GitHub: Colab: Why a trainable version of AlphaFold2 you ask? ⬇️

26

483

2K

@MoAlQuraishi

Mohammed AlQuraishi

3 years

My thoughts on AlphaFold2:

Tweet card media

AlphaFold2 @ CASP14: “It feels like one’s child has left home.”

The past week was a momentous occasion for protein structure prediction, structural biology at large, and in due time, may prove to be so for the whole of life sciences. CASP14, the conference for …

moalquraishi.wordpress.com

26

240

1K

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Now that the #alphafold hype has completely died down (ha!), I've written a new blog post on the AF2 method paper: . This is a technical deep-dive into aspects of AF2 that I find most surprising/innovative and of relevance to broader biomolecular modeling.

Tweet card media

The AlphaFold2 Method Paper: A Fount of Good Ideas

Just over a week ago the long-awaited AlphaFold2 (AF2) method paper and associated code finally came out, putting to rest questions that I and many others raised about public disclosure of AF2. Alr…

moalquraishi.wordpress.com

10

199

865

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Some news! After many lovely years at @harvardmed I'm moving to @Columbia fall 2020 to start a new lab as an Assistant Professor in Systems Biology and the Program for Mathematical Genomics--and I'm recruiting students and postdocs! Email/DM me or see . 1/2

Tweet card media

AlQuraishi Lab at Columbia University

Machine learning biomolecules and their interactions en route to a bottom-up and emergent systems biology.

58

113

727

@MoAlQuraishi

Mohammed AlQuraishi

14 days

AlphaFold 3 is out! As expected expands coverage to small molecules and nucleic acids. And replaces the structure module with a diffusion-based one. Unfortunately no code or model weights--just a web server for a limited set of ligands:

Tweet card media

AlphaFold 3 predicts the structure and interactions of all of life’s molecules

Our new AI model AlphaFold 3 can predict the structure and interactions of all life’s molecules with unprecedented accuracy.

12

209

724

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Baker lab's effort at reproducing AlphaFold2 is out on bioRxiv. Pretty impressive performance gains (relative to original trRosetta), if not quite yet at AlphaFold2 level.

Tweet card media

Accurate prediction of protein structures and interactions using a 3-track network

DeepMind presented remarkably accurate protein structure predictions at the CASP14 conference. We explored network architectures incorporating related ideas and obtained the best performance with a...

www.biorxiv.org

4

163

650

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Building on last week’s announcement of OpenFold, an academic-industry consortium is being announced today within the non-profit @openmsf . The OpenFold Consortium will develop open source ML-based molecular modeling tools in a community-driven fashion. 1/3

Tweet card media

OpenFold Consortium

9

121

534

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Deep learning has obviously transformed protein structure prediction, but can it do the same for the rest of biology? In a perspective by @sorger_peter and myself out this week in @naturemethods , we begin to try to answer this question:

2

133

481

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Some thoughts on @DeepMindAI ’s entry in #CASP13 , after being there and chatting with the AlphaFold team and many others: . Warning: long, technical (in parts), and vicious! (but not toward DM :-) @LindorffLarsen @david_koes @TorstenSchwede @sokrypton

Tweet card media

AlphaFold @ CASP13: “What just happened?”

Update: An updated version of this blogpost was published as a (peer-reviewed) Letter to the Editor at Bioinformatics, sans the “sociology” commentary. I just came back from CASP13, the…

moalquraishi.wordpress.com

15

220

476

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Can we predict protein structure directly from sequence w/o MSAs and w/o intermediate steps like distograms? In a new preprint we show that we can, led by @mrprotein24 @NazimBouatta @SurgeBiswas along w/ @charochereau @geochurch & Peter Sorger: (1/4)

Tweet card media

Single-sequence protein structure prediction using language models from deep learning

AlphaFold2 and related systems use deep learning to predict protein structure from co-evolutionary relationships encoded in multiple sequence alignments (MSAs). Despite dramatic, recent increases in...

www.biorxiv.org

7

95

374

@MoAlQuraishi

Mohammed AlQuraishi

11 months

We built a new diffusion protein design model named Genie. We preprinted it a while ago (soon after RFDiffusion and Chroma preprints) but kept mum due to embargo. Final ICML version (major update) with code and paper here (1/7)

Tweet card media

Generating Novel, Designable, and Diverse Protein Structures by...

Proteins power a vast array of functional processes in living cells. The capability to create new proteins with designed structures and functions would thus enable the engineering of cellular...

4

95

343

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Nature's write-up:

Tweet card media

‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures

Nature - Google’s deep-learning program for determining the 3D shapes of proteins stands to transform biology, say scientists.

2

77

311

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Last week’s OmegaFold () and ESMFold () present contrasting takes on how to fuse language models (LMs) with structure prediction. A short 🧵1/9

1

42

285

@MoAlQuraishi

Mohammed AlQuraishi

7 months

Interesting status update from DeepMind on AlphaFold (just that, no model, paper, or code). All atom version in the works (similar to RFAA). Meaningful gains on small molecules but far from 'solved' (think AF1 vs AF2). Same w/nucleic acids and antibodies.

Tweet media one

@tfgg2

Tim Green

7 months

New! We’ve just put up a note evaluating the latest, in-development version of AlphaFold (“AlphaFold-latest”). This is a preview - development is still in progress - but performance across a wide range of tasks is striking. Highlights in the thread. 1/7

Tweet media one

9

270

843

3

62

285

@MoAlQuraishi

Mohammed AlQuraishi

1 month

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

Tweet card media

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary...

4

72

280

@MoAlQuraishi

Mohammed AlQuraishi

3 years

I have a new review out on machine learning in protein structure prediction in past 2 years (not focused on AlphaFold but obviously mentions it) part of a special issue on "Machine Learning in Chemical Biology" in COCHBI edited by @cwcoley and Xiao Wang.

3

59

247

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Here we go! I must say I'm impressed with how good DeepMind has been about putting everything out there for people to use.

@GoogleDeepMind

Google DeepMind

@GoogleDeepMind

3 years

Today with @emblebi , we're launching the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome, doubling humanity’s accumulated knowledge of high-accuracy human protein structures - for free: 1/

99

3K

7K

3

19

246

@MoAlQuraishi

Mohammed AlQuraishi

3 years

If you’re interested in machine learning and structural biology be sure to check out today’s NeurIPS workshop on the topic:

3

48

238

@MoAlQuraishi

Mohammed AlQuraishi

6 months

Even ~2 years after AlphaFold2's announcement this paper () remains my favorite in the post-AF2 realm. To be sure RFDiffusion is a strong contender and arguably has been more immediately useful, but I strongly believe this work will stand the test of time.

Tweet card media

State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

An algorithm that already predicts how proteins fold might also shed light on the physical principles that dictate this folding.

journals.aps.org

4

32

237

@MoAlQuraishi

Mohammed AlQuraishi

4 years

I’m late to my own party but excited to share our new work on predicting SLiM-mediated protein-protein interactions, out today in @naturemethods with Joe Cunningham, @GregKoytiger , and @sorger_peter ! A blogpost is forthcoming but for now a tweetstorm (1/8)

Tweet card media

Biophysical prediction of protein–peptide interactions and signaling networks using machine learning

Nature Methods - Protein–peptide interactions that underpin cell signaling are accurately predicted by wedding the strengths of machine learning with the interpretability of biophysical...

4

51

223

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Had the pleasure of reviewing this and will write more soon, but fantastic to finally see paper and code out!

@demishassabis

Demis Hassabis

3 years

Last year we presented #AlphaFold v2 which predicts 3D structures of proteins down to atomic accuracy. Today we’re proud to share the methods in @Nature w/open source code. Excited to see the research this enables. More very soon!

Tweet media one

89

2K

6K

4

22

220

@MoAlQuraishi

Mohammed AlQuraishi

3 years

DeepMind's blogpost:

Tweet card media

AlphaFold: a solution to a 50-year-old grand challenge in biology

Proteins are essential to life, supporting practically all its functions. They are large complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique...

deepmind.google

3

44

194

@MoAlQuraishi

Mohammed AlQuraishi

3 years

These are for single domains-not whole proteins-and there are a few poor predictions. So corner cases remain but core problem appears solved: 88% of predictions are <4Å, 76% <3Å, 46% <2Å. Unlike last time where there was some competition, this time AF2 was best for 88/97 targets.

6

19

171

@MoAlQuraishi

Mohammed AlQuraishi

6 years

Excited to release a preliminary version of ProteinNet, a data set for doing machine learning on protein structure. Aim is to lower the barrier to entry to protein folding, and spur more ML researchers to tackle the problem. More here: (1/3)

Tweet card media

GitHub - aqlaboratory/proteinnet: Standardized data set for machine learning of protein structure

Standardized data set for machine learning of protein structure - aqlaboratory/proteinnet

1

74

170

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Put up new preprint on arXiv () describing ProteinNet, a dataset for doing ML on protein sequence-structure relationships. ProteinNet is already on GitHub () and I hope the preprint sheds greater transparency on how it is constructed.

Tweet media one

2

68

159

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Haven't looked at in detail but appears very interesting. Claims that AF2 learns an energy function for proteins independent of MSAs, while MSAs are used primarily (and implicitly) by AF2 to solve the global search problem.

Tweet card media

State-of-the-Art Estimation of Protein Model Accuracy using AlphaFold

The problem of predicting a protein’s 3D structure from its primary amino acid sequence is a longstanding challenge in structural biology. Recently, approaches like AlphaFold have achieved remarkable...

www.biorxiv.org

2

27

160

@MoAlQuraishi

Mohammed AlQuraishi

5 years

The final version of my RGN paper is now online in @CellSystemsCP :

11

54

156

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Glad to see @DeepMindAI ’s AlphaFold paper finally out. I had the pleasure of being one of the reviewers and getting to write the accompanying @NatureNV article. The future of protein structure prediction is looking very bright!

@mvicaracal

Victoria Aranda

4 years

Accompanying N&V by @MoAlQuraishi “A watershed moment for protein structure prediction”

0

13

32

4

26

153

@MoAlQuraishi

Mohammed AlQuraishi

3 years

A protein language model for MSAs. Likely relevant for the 'trunk' part of the AlphaFold2 model. Basically just an axial transformer with tied row attention, but they see a rather dramatic jump in performance.

Tweet card media

MSA Transformer

Unsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins. Protein language models studied to date have been trained to perform...

www.biorxiv.org

0

32

143

@MoAlQuraishi

Mohammed AlQuraishi

2 years

This is great news! Our plans for OpenFold won’t change, as having a trainable platform is still incredibly valuable for modifying and building on AF2. The first step ofc is reproducing the AF2 weights independently which is what we’re currently working on.

@sokrypton

Sergey Ovchinnikov 🇺🇦

2 years

"The AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license" 🙂 (thanks to @BrianWeitzner for alerting me)

Tweet media one

3

103

373

5

24

140

@MoAlQuraishi

Mohammed AlQuraishi

3 years

New interesting work from DeepMind.

@ORonneberger

Olaf Ronneberger

3 years

Proteins are not static bricks! Feasibility study to infer a continuous distribution of all states using an end-to-end model from Cryo-EM images to atom coordinates: . @danrsm , @GarneloMarta , @MichaelZielins , @JonasAAdler , @arkitus , @CarlDoersch , @pushmeet

8

168

646

3

19

132

@MoAlQuraishi

Mohammed AlQuraishi

2 years

We have a faculty search this year (all ranks). If you're a computational biologist I strongly recommend you apply! Lots of fantastic people in dept () and at Columbia interested in ML and biomolecules ( @NShahLab @helloanum @HarmenBussemkr V. Cornish) 1/5

1

56

129

@MoAlQuraishi

Mohammed AlQuraishi

7 months

All-atom version of RoseTTAFold looks quite exciting!

Tweet card media

Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom

Although AlphaFold2 (AF2) and RoseTTAFold (RF) have transformed structural biology by enabling high-accuracy protein structure modeling, they are unable to model covalent modifications or interacti...

www.biorxiv.org

3

27

131

@MoAlQuraishi

Mohammed AlQuraishi

7 months

Looks very interesting: AlphaFold meets flow matching for generating protein ensembles.

1

29

123

@MoAlQuraishi

Mohammed AlQuraishi

3 years

As promised here is our high-level review of #AlphaFold and surrounding events, by @NazimBouatta , Peter Sorger, and myself. This is nearly all @NazimBouatta 's work, a string theorist turned protein theorist who has taken a far more expansive view of the field than I could have.

@ActaCrystD

Structural Biology

3 years

Nazim Bouatta et al.: Protein structure prediction by AlphaFold2: are attention and symmetries all you need? #AlphaFold2 #ProteinStructurePrediction #CASP14 ... #IUCr

0

24

84

1

15

119

@MoAlQuraishi

Mohammed AlQuraishi

2 years

The final version of our RGN2 manuscript for single-sequence structure prediction is out in @NatureBiotech ! Peer review dramatically improved this work, thanks to @fraser_lab , @thisismadani , and anonymous reviewers. For more on what’s new, see thread below by @NazimBouatta ⬇️

@NazimBouatta

Nazim Bouatta

2 years

Our new approach for predicting protein 3D structure using single sequence + protein language model, w/o MSAs, is out. We combine a protein language model (AminoBERT) with a structural module using a transfer matrix formalism. (1/5)

6

68

341

1

21

112

@MoAlQuraishi

Mohammed AlQuraishi

5 years

This looks interesting: an open-source implementation of the AlphaFold distance prediction NN. . I haven't had a chance to look in detail yet but there's an associated preprint:

Tweet card media

ProSPr: Democratized Implementation of Alphafold Protein Distance Prediction Network

Deep neural networks have recently enabled spectacular progress in predicting protein structures, as demonstrated by DeepMind’s winning entry with Alphafold at the latest Critical Assessment of...

www.biorxiv.org

2

36

112

@MoAlQuraishi

Mohammed AlQuraishi

2 years

First off: model weights, training code and colab notebook are here . We are also making available a training set of 400K unique MSAs & predicted structures for self-distillation. These lives in the Registry of Open Data on AWS 2/12

2

11

112

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Protein language models scaled up massively (training on ~5600 GPUs!) Unfortunately doesn't seem to have resulted in a meaningful performance improvement yet.

@biorxiv_bioinfo

bioRxiv Bioinfo

@biorxiv_bioinfo

4 years

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing #biorxiv_bioinfo

0

5

25

6

21

104

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Exciting to see the preprint on ESM2/ESMFold from @alexrives group out: . Far larger protein LMs than ESM-1b and applied to MSA-less protein structure prediction. Can't wait for code/models!

Tweet card media

Language models of protein sequences at the scale of evolution enable accurate structure prediction

Large language models have recently been shown to develop emergent capabilities with scale, going beyond simple pattern matching to perform higher level reasoning and generate lifelike images and...

www.biorxiv.org

2

30

102

@MoAlQuraishi

Mohammed AlQuraishi

1 year

OpenFold preprint is out! Much richer story than expctd 1) AF2 shockingly robust to data elision; train on 1k chains→get AF1 acc; train on helices or sheets→do ok on other 2) it learns 1D→2D→3D proteins. Tweetorial👇incl💯animation of low-D predictions

Tweet card media

OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for...

AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train...

www.biorxiv.org

@gahdritz

Gustaf Ahdritz

1 year

Our preprint on OpenFold, our trainable reproduction of AlphaFold2, is finally up ()! Since we open-sourced parameters in June, we've trained the model to high accuracy more than 50 times, on a variety of datasets. Here's what we learned (a lot) -> (1/19)

17

240

993

3

19

98

@MoAlQuraishi

Mohammed AlQuraishi

5 years

An updated version of my AlphaFold blogpost is now a Letter to the Editor in Bioinformatics: The science part was revised to reflect new information and reviewer feedback. The 'sociology' part was scrubbed to make it a dignified piece of writing :-D

Tweet card media

AlphaFold at CASP13

Abstract. Summary: Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the m

academic.oup.com

1

32

93

@MoAlQuraishi

Mohammed AlQuraishi

3 years

To be clear: we have NOT yet trained this new model from scratch but are doing so now and expect to release new model weights shortly. We have however confirmed that OF’s inference is identical to AF2’s by loading it with AF2's weights and predicting identical structures.

Tweet media one

1

12

92

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Great work using inter-residue orientations to exceed AlphaFold’s performance on protein structure prediction by Jianyi Yang, Ivan Anishchenko, and others from the Baker lab: . First heard about this at RosettaCon and I’m very glad to finally see it out!

Tweet card media

Improved protein structure prediction using predicted inter-residue orientations

The prediction of inter-residue contacts and distances from co-evolutionary data using deep learning has considerably advanced protein structure prediction. Here we build on these advances by...

www.biorxiv.org

0

24

91

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Finally, by “we”, I mean the inimitable OpenFold team, led by @gahdritz , @SachinKadyan99 , Will Gerecke, and Luna Xia. All were co-supervised by @NazimBouatta and myself (I mostly stayed out of the way to avoid slowing them down.) More very soon.

6

3

90

@MoAlQuraishi

Mohammed AlQuraishi

2 years

A key finding is that AF2/OF accuracy climbs very sharply then tapers off for a long and gradual increase. While total training time took ~100K A100 hours, 90% of final accuracy could be achieved in ~3K hours. This has important implications for training AF2/OF variants. 4/12

Tweet media one

5

11

88

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Coming back to my (new) office after four weeks of travel to find it freshly decorated by my lab. I get to work with the best people ❤️. (Also learned that the one letter abbreviation for ornithine is O)

Tweet media one

Tweet media two

1

1

88

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Lab picnic! Such a privilege to be working with these people every day. BY FAR the best part of the job.

Tweet media one

2

0

76

@MoAlQuraishi

Mohammed AlQuraishi

5 months

We have a new faculty position open in my department () with a strong focus on machine learning and quantitative biology, broadly defined. We value method development as much as hypothesis- and discovery-driven science. And we keep getting more GPUs!

Tweet card media

Faculty All Ranks (Assistant Professor, Associate Professor, Professor) - Tenure-track/Tenured -...

The Program for Mathematical Genomics in the Department of Systems Biology invites candidates for faculty positions

jobs.sciencecareers.org

0

28

73

@MoAlQuraishi

Mohammed AlQuraishi

5 years

From Google AI this time.

@biorxiv_bioinfo

bioRxiv Bioinfo

@biorxiv_bioinfo

5 years

Using Deep Learning to Annotate the Protein Universe #biorxiv_bioinfo

2

96

203

3

28

71

@MoAlQuraishi

Mohammed AlQuraishi

3 years

As we saw with the recent AlphaFold-Multimer, some applications can benefit from training new AF2 variants and possibly integrating AF2 within larger models. DeepMind’s JAX version, while excellent, is missing training code. PyTorch is also more widely used, hence OpenFold.

1

5

72

@MoAlQuraishi

Mohammed AlQuraishi

2 years

2nd is memory: we use less due to optimizations and custom CUDA kernels, enabling inference of much longer sequences. In general we get up to ~4,600 residues on a 40GB A100 and we believe we can optimize further. 7/12

1

3

69

@MoAlQuraishi

Mohammed AlQuraishi

2 years

This is far from the end of our OpenFold efforts; in fact it is only the beginning. Stay tuned for an exciting announcement soon! 12/12

3

2

67

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Differentiable protein learning paper is 15th most downloaded bioRxiv preprint of 2018! Woohoo! :-)

1

9

65

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Google's preprint on annotating the protein universe just got an update that includes clustered training/test splits, as well as new timing experiments. Looks like a major revision.

@MarkDePristo

Mark DePristo

5 years

Excited to see our updated and expanded manuscript "Using Deep Learning to Annotate the Protein Universe" now out on BioRxiv .

0

14

59

0

12

65

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Somehow this slipped my radar. Very cool looking work from the @DrorLab : Hierarchical, rotation-equivariant neural networks to predict the structure of protein complexes.

Tweet media one

0

11

63

@MoAlQuraishi

Mohammed AlQuraishi

3 years

We combine a new protein language model (AminoBERT) with an improved version of our end-to-end differentiable machinery (RGN2) to directly generate 3D coordinates. On orphan proteins, RGN2 outperforms all major methods, including #AlphaFold , RoseTTAFold, and trRosetta. (2/4)

Tweet media one

2

11

64

@MoAlQuraishi

Mohammed AlQuraishi

5 years

I'm not myself when I haven't programmed in a while. I notice this most acutely when I get an uninterrupted block of coding time after a months-long drought, and feel like I am made whole again. Is anyone else this way? Unfortunately the droughts are increasing in intensity.

4

2

63

@MoAlQuraishi

Mohammed AlQuraishi

3 years

I should note that another blog post has been written by @c_outeiral and it’s great and entirely complementary, so be sure to read his for another perspective:

2

3

58

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Preprint coming soon, with more details about what we learned during training and lots of ablation studies. 8/12

1

2

60

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Back to model: as this scatterplot shows (GDT_TS scores on CAMEO-based validation set) accuracy is very comparable to AF2 but slightly higher on average with OF, perhaps because of our slightly larger training set. 3/12

Tweet media one

2

3

58

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Our PyTorch implementation has some advantages over the publicly available JAX implementation from DeepMind, beyond the obvious one of being trainable. 5/12

1

3

56

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Been perusing the new CASP14 abstracts (): MSR & Baidu entered, and AlphaFold2 is using raw MSAs (cf. extracting pairwise values) and doing self-consistent predictions! RGNs are self-consistent too, but details likely very different. More on our entry soon.

2

7

53

@MoAlQuraishi

Mohammed AlQuraishi

2 years

This was a big effort within the lab and with many external collaborators. Internally credit goes to the OF team led by @gahdritz (w/ @SachinKadyan99 , Luna Xia, Will Gerecke) and co-advised by @NazimBouatta and me. 9/12

1

3

54

@MoAlQuraishi

Mohammed AlQuraishi

5 years

I had the pleasure of visiting @broadinstitute last week to give the MIA talk on differentiable protein structure learning. Video of the talk is up now:

Tweet card media

MIA: Mohammed AlQuraishi, End-to-end differentiable learning of...

March 6, 2019MIA MeetingMohammed AlQuraishiHMSEnd-to-end differentiable learning of protein structureAbstract: Predicting protein structure from sequence is ...

www.youtube.com

0

11

53

@MoAlQuraishi

Mohammed AlQuraishi

5 years

RGN latent space is the cover of this month's Cell Systems!

Tweet media one

3

4

53

@MoAlQuraishi

Mohammed AlQuraishi

3 years

My post is _not_ a high-level summary of how AF2 works. For that I suggest @c_outeiral 's blog post .

2

6

52

@MoAlQuraishi

Mohammed AlQuraishi

5 years

New blogpost up on protein representation learning: . I use our recent UniRep preprint () in collab with @EthanAlley @grigonomics @SurgeBiswas @geochurch as a springboard for reflecting on the future of the field.

Tweet media one

2

16

50

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Well deserved!

0

4

50

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Happy to have this paper finally out in print! I have a lot of hope for semi-supervised learning in protein biology.

@EthanAlley

Ethan C. Alley

5 years

My first paper is finally in (digital) print @naturemethods 🎉🦑! It's been a wild ride with @grigonomics and @SurgeBiswas . I'm immensely grateful to @geochurch and @MoAlQuraishi for taking a chance on a wacky idea and guiding us through the maze of academic publication. (1/4)

8

11

104

0

5

49

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Very cool looking stuff!

@sokrypton

Sergey Ovchinnikov 🇺🇦

3 years

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman A fun collaboration with Samantha Petti, Nicholas Bhattacharya, @proteinrosh , @JustasDauparas , @countablyfinite , @keitokiddo , @srush_nlp & @pkoo562 (1/8)

8

203

736

0

3

48

@MoAlQuraishi

Mohammed AlQuraishi

3 years

CASP's official press release:

2

3

45

@MoAlQuraishi

Mohammed AlQuraishi

2 years

1st is speed: OF inference is up to 2x faster on short proteins even when excluding JAX compilation. On longer proteins advantage lessens, until AF2 begins to OOM (see 2nd point). Inference speed is key when coupled with fast MSA schemes like MMseqs2 6/12

@thesteinegger

Martin Steinegger 🇺🇦

2 years

MSA generation is not slow. In ColabFold we generate MSAs in seconds using MMseqs2. This can be tweaked to run in < second using batch. Most of the time of AlphaFold/ColabFold is spent predicting the structure.

3

4

44

1

3

47

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Great review by @KevinKaichuang @ZvxyWu @kadinaj and @francesarnold on generative protein models.

@KevinKaichuang

Kevin K. Yang 楊凱筌

@KevinKaichuang

3 years

Final version of our review on deep generative models of protein sequence is out! Was a pleasure to work with editors @cwcoley and @xiaowan38018817 for the special edition on Machine Learning in Chemical Biology With @ZvxyWu @kadinaj @francesarnold

Tweet media one

2

55

294

1

10

47

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Well deserved!

@GoogleDeepMind

Google DeepMind

@GoogleDeepMind

2 years

Congratulations to @demishassabis and John Jumper who have won the 2023 Breakthrough Prize in Life Sciences for the development of #AlphaFold , our AI system that solved the 50-year-old challenge of protein structure prediction. 1/

18

140

709

0

4

47

@MoAlQuraishi

Mohammed AlQuraishi

7 months

Very excited about the launch of the CZI New York BioHub and what it means for the NYC ecosystem! Congratulations to @califano_lab for leading this effort!

@cziscience

CZI Science

7 months

We’re thrilled to share that we’re launching a new @CZBiohub in New York! Bringing together engineers + scientists at @Columbia , @RockefellerUniv and @Yale , #CZBiohubNY will engineer immune cells for earlier detection & treatment of disease

11

74

336

0

4

46

@MoAlQuraishi

Mohammed AlQuraishi

4 years

A piece of holiday-time reflection: one thing I’m grateful about in science is the existence of a real field-wide community, made more visible by Twitter. I suspect this is less true in other professions and is a genuinely positive feature.

0

3

45

@MoAlQuraishi

Mohammed AlQuraishi

5 years

This looks great. I think it's an idea that's been in the ether for some time but getting it to work is an altogether different matter. Will be interesting to see if it can be translated to animals, especially mammals.

@UWproteindesign

Institute for Protein Design

@UWproteindesign

5 years

Out today: Protein interaction networks revealed by proteome coevolution @sokrypton @sciencemagazine @UWBiochemistry

Tweet media one

0

19

51

1

8

43

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Final ProteinNet paper is now in @BMCBioinfo Also quick update: raw MSAs for PN12 are available upon request (4TB), PN13 is in progress, planning on prelim PN14 in time for CASP14, and should have co-evo inputs soon for <=PN12.

0

12

40

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Columbia is hiring! We have tenure-track/tenured positions at all ranks in the Program for Mathematical Genomics (Dept of Systems Biology). We have a special interest in method development but all areas of comp/sys bio are welcome. Come be my colleague!

1

19

39

@MoAlQuraishi

Mohammed AlQuraishi

2 years

Externally our collaborators at @nyuniversity ( @dabkiel1 ), @ArzedaCo (Andrew Ban), @cyrusbiotech ( @lucas_nivon ), @nvidiahealth ( @ItsRor , Abe Stern, Venkatesh Mysore, Marta Stepniewska-Dziubinska and Arkadiusz Nowaczynski), ... 10/12

1

3

39

@MoAlQuraishi

Mohammed AlQuraishi

6 years

Ref. implementation of RGNs is now available on GitHub (), along with 6 pre-trained models spanning CASP7 - 12. The code enables training quite a variety of RGN models, including ones I’ve never tried!

Tweet card media

GitHub - aqlaboratory/rgn: Recurrent Geometric Networks for end-to-end differentiable learning of...

Recurrent Geometric Networks for end-to-end differentiable learning of protein structure - aqlaboratory/rgn

1

10

38

@MoAlQuraishi

Mohammed AlQuraishi

2 years

… @OutpaceBio ( @BrianWeitzner ) and @PrescientDesign ( @amw_stanford , @RichBonneauNYU ) were pivotal in getting this off the ground and making it a reality. Thank you all! 11/12

1

4

37

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Should say that we will have in a couple of weeks a formal review paper out that is a high-level overview of AF2 and its implications.

1

1

35

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Hearing Gorman recite today reminded me of Teddy Roosevelt’s words that we are “a new nation, based on a mighty continent, of boundless possibilities.” Optimism may not be our birthright but it is our national character, and for the 1st time in at least 10 months, I’m feeling it.

1

2

35

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Looks interesting. Trains a language model (RoBERTa) on protein sequences then finetunes it for (binary) protein-protein interaction prediction.

@biorxiv_bioinfo

bioRxiv Bioinfo

@biorxiv_bioinfo

4 years

Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks #biorxiv_bioinfo

0

4

33

0

3

33

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Had an early look at this work and it’s really impressive stuff! Demonstrates the remarkable power of semi-supervised learning in very low N contexts.

@grigonomics

Grigory Khimulya

4 years

Excited to share our latest pre-print🎉 - a framework for low-N protein engineering with data-efficient deep learning! Had a blast working with brilliant @EthanAlley @SurgeBiswas @kesvelt and @geochurch . Thread (1/7)

3

39

140

0

5

33

@MoAlQuraishi

Mohammed AlQuraishi

3 years

@david_prihoda Yes.

0

1

33

@MoAlQuraishi

Mohammed AlQuraishi

6 years

Georgy Derevyanko and @g_lamoureux_ have just made public a very cool PyTorch library for differentiable protein primitives, with optimized CUDA kernels!

Tweet card media

GitHub - lupoglaz/TorchProteinLibrary: PyTorch library of layers acting on protein representations

PyTorch library of layers acting on protein representations - lupoglaz/TorchProteinLibrary

1

18

32

@MoAlQuraishi

Mohammed AlQuraishi

6 years

Been a while since I've blogged, but I figured yesterday's paper release deserved some background. In this post I write a little more about the conceptual ideas that led me to end-to-end differentiability for proteins.

Tweet card media

Protein Linguistics

For over a decade now I have been working, essentially off the grid, on protein folding. I started thinking about the problem during my undergraduate years and actively working on it from the very …

moalquraishi.wordpress.com

4

15

32

@MoAlQuraishi

Mohammed AlQuraishi

5 years

Been waiting for this to come out--really innovative work in geometric deep learning applied to protein-protein interactions and more.

@befcorreia

Bruno Correia

5 years

Unusual approach for our lab - fantastic work from @pgainza + @Freyer02952299 and fun collaboration with @mmbronstein on using learning techniques for Deciphering interaction fingerprints from protein molecular surfaces". Take a look.

2

16

78

1

5

31

@MoAlQuraishi

Mohammed AlQuraishi

4 years

Really cool work: incorporate a learnable ODE model of signal transduction within an ML framework to predict cell response to perturbations. I happen to be writing a review in which I speculate that this should be possible. Kudos to the @sandercbio team for actually doing it!

@sandercbio

Chris Sander aka cscbio

4 years

Aiming at more comprehensive computable perturbation/response models of cell biology. Preprint updated: Interpretable Machine Learning for Perturbation Biology

0

17

56

2

6

31

@MoAlQuraishi

Mohammed AlQuraishi

5 years

If you're interested in the latest on drug discovery + ML and QM, go follow @davidlmobley . He's done an amazing job live tweeting #OECUP2019 . I feel like I'm practically there!

0

3

31

@MoAlQuraishi

Mohammed AlQuraishi

6 years

Been working on this for quite a few years! Many thanks to @latentjasper , @champiDicty , and @karengigs for their feedback on early drafts.

@biorxiv_bioinfo

bioRxiv Bioinfo

@biorxiv_bioinfo

6 years

End-to-end differentiable learning of protein structure #biorxiv_bioinfo

0

8

11

2

14

30

@MoAlQuraishi

Mohammed AlQuraishi

3 years

Congratulations to @Liu_Changchang for passing her PhD defense with flying colors! Changchang is the first graduate student to be (co-)supervised by me (w/Peter Sorger), and I could not be more proud. Can't wait to see what you do next Changchang!

2

0

29

@MoAlQuraishi

Mohammed AlQuraishi

6 years

@atomadam2 @pollyp1 The lucky ones, yes, but not all (or even most I suspect.) Students in colleges or even universities without strong graduate programs can be quite isolated from academic norms.

1

0

29

@MoAlQuraishi

Mohammed AlQuraishi

5 years

This looks superb!

@soumithchintala

Soumith Chintala

@soumithchintala

5 years

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences (from FAIR) - unsupervised learning recovers representations that map to multiple levels of biological granularity

Tweet media one

3

74

244

3

5

30