Martin Steinegger 🇺🇦 @thesteinegger Twitter profile

Pinned Tweet

Martin Steinegger 🇺🇦

20 days

Foldseek-Multimer is a protein complex aligner that is up to 10,000x times faster than SOTA methods without sacrificing quality, enabling the comparison of billions of complex pairs per day. 1/5 📄 💾 🌐

8

152

472

Last Seen Profiles

@Kristin83470411

@StanleyOnelove3

@BorneOfElyon

@LinusEkenstam

@OzTheApe

@Wokingfc1887

@TheCremeShop

@LadyPhoenixBB

@tamcohen

@BustedStops

@Rubix9675343499

@DRock77777

@MachelleLa52861

@JimWelshMacro

@SolusFinance

@Daibhernandez

@_mari_shibata_

@AliBritt413936

@DaveTaylor1313

@AppelgrenO63132

@Wav3sBot

@VVDAmsterdam

@Trofi_Official

@uzi_zombie

@rsm3star

@NinjaYolo32

@Zenitsuretwitt3

@dontbeasussy

@Aliceshionotak1

@LeSlipFrancais

@Prefecture40

@BrianaC76224528

@YaboyJoeL__

@houseno7

@PortiaN19498

@55mommabear

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Protein structure prediction with #AlphaFold 2 in the browser using Google Colab. Just paste your protein in the input box and push "Run all". MSAs are generated by an MMseqs2 API call. Work by @sokrypton , @milot_mirdita . Try it out here:

AlphaFold2PredictStructure.ipynb

Colaboratory notebook

colab.research.google.com

40

541

2K

Martin Steinegger 🇺🇦

@thesteinegger

1 year

Foldseek, our fast structural aligner, is now published @NatureBiotech . It allows you to search through large structural databases like #AlphaFold or #ESMatlas in seconds. A long journey since '18! 1/6 📄 💾 🌐

17

379

1K

Martin Steinegger 🇺🇦

@thesteinegger

1 year

We clustered the #AlphaFold structure database with our novel Foldseek algorithm. We identified 2.27M clusters and analyzed them by function, annotation, domains and evolution. Amazing collaboration with @pedrobeltrao lab. 1/n 📄 💾

11

272

840

Martin Steinegger 🇺🇦

@thesteinegger

5 months

Learn how to predict mono-, multi-mers and conformations with our ColabFold/AlphaFold2 protocol. It covers workflows for beginners as well as advanced concepts. By @imGyuriKim @sewonlee_231 Eli Levy Karin @HKgenomics @Ag_smith @sokrypton @milot_mirdita 📄

3

216

713

Martin Steinegger 🇺🇦

@thesteinegger

2 months

I got tenure at Seoul National University! Starting off with as a first-gen with a lower secondary education (Hauptschule) in Germany makes this very meaningful to me. I am grateful for all that supported and believed in me. Excited for what lies ahead!

64

31

668

Martin Steinegger 🇺🇦

@thesteinegger

3 years

ColabFold updated: we speed up #AlphaFold2 's prediction to allow thousands of structures on a single GPU in a day. Add taxonomical aware paired MSA for complex predictions, a new metagenomic db and support to run it on your local machines. Updated 📄

3

162

633

Martin Steinegger 🇺🇦

@thesteinegger

3 years

ColabFold makes structure prediction and complex modeling of #AlphaFold 2 and #RoseTTAFold accessible through Google Colab. We show that MSAs produced by MMseqs2 match the accuracy of AF2 (HHblits/HMMer) while being faster. 📄 Code

6

194

621

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Search your protein structures against #AlphaFold DBs and #PDB in seconds using our Foldseek server. Just paste your PDB file and click search. We offer local (SW) and global (TMalign) structural alignments. Server was build by the amazing @milot_mirdita 🚀

10

174

569

Martin Steinegger 🇺🇦

@thesteinegger

2 years

ColabFold makes folding with #AlphaFold & #RoseTTAfold accessible to everyone. Our MSA server processed >1.6 million requests to date. We thank the community for all the help to improve Colabfold. Now published @naturemethods 📄

12

125

550

Martin Steinegger 🇺🇦

@thesteinegger

3 years

AlphaFold2 predicts protein structures at near crystal structure accuracy in less than <10 minutes (~300aa). The animation below shows the prediction of a viral RNA polymerase with >2k residues. I am grateful I could contribute to this huge milestone. 📄

12

151

544

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Igor Tolstoy brought to my attention that the #AlphaFold database contains predictions of nearly identical sequences with large pLDDT differences. For example, the two 99.6% similar sequences below have an avg. pLDDT of 97 and 33. We found over 1 Mio. of these cases in the AFDB.

8

127

528

Martin Steinegger 🇺🇦

@thesteinegger

8 months

Our work on clustering the 214M #AlphaFold protein structure was published in @Nature . We identified 2.3M clusters using our fast structure cluster algorithm and analysed its annotations, evolution and novel domains. 1/4 📄 🌐

12

155

484

Martin Steinegger 🇺🇦

@thesteinegger

3 months

It feels surreal to receive the Overton Prize from @ISCB ! This reflects the incredible support of my mentors ( @SoedingL , @StevenSalzberg1 ), collaborators, postdocs, students and friend @milot_mirdita . Excited to share this journey with you all at #ISMB2024 in Montreal.

63

40

458

Martin Steinegger 🇺🇦

@thesteinegger

10 months

MSA diversity is key for #AlphaFold2 's accuracy. Larger databases == better results. So, we generated MSAs from 22 peta-bytes of SRA data and show that ColabFold could have improved from rank 11 to 3 at CASP15. ⅕ 📄 💾

5

117

424

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Our Kraken protocol for microbiome analysis and pathogen detection is out. It also includes KrakenTools, useful helpers for Kraken. Work by @JenniferLu717 Rincon N @DerrickEWood @fbreitw Pockrandt C @BenLangmead @StevenSalzberg1 📄 💾

10

114

417

Martin Steinegger 🇺🇦

@thesteinegger

5 months

Maximum likelihood structural phylogeny beyond the twilight zone by combining Foldseek's 3Di alphabet with AA alignments. ML trees resolve the topology of distantly related proteins where traditional AA methods fall short. 📄 💾

5

106

381

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Foldseek, our local structural aligner, is four orders of magnitude faster than SOA structural aligners at similar sensitivity. Allowing to detect hits in the midnight zone confidently. Code: 🌐 📄

Foldseek: fast and accurate protein structure search

Highly accurate structure prediction methods are generating an avalanche of publicly available protein structures. Searching through these structures is becoming the main bottleneck in their analys...

www.biorxiv.org

11

116

345

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Predict protein structures in batches using the ColabFold "AlphaFold2_batch" notebook. It will predict all structures for a set of fasta files stored in a Google Drive folder. Try it out here: Thanks to @milot_mirdita @sokrypton

AlphaFold2_batch.ipynb

Run, share, and edit Python notebooks

colab.research.google.com

3

105

327

Martin Steinegger 🇺🇦

@thesteinegger

1 year

Explore our clustered #AlphaFold structural database with our new website by @milot_mirdita @clmgilchrist @jgyyy15 . With it you can find clusters, filter members by taxonomy, browse similar clusters and search with Foldseek. 🌐 📄

5

112

325

Martin Steinegger 🇺🇦

@thesteinegger

2 years

ColabFold now uses the AlphaFold-multimer models paired with MMseqs2 searches for prediction of protein complexes. Just provide chains separated by : and press "Run all" (provide the same sequence multiple times for homooligmers). Check it out at

9

85

326

Martin Steinegger 🇺🇦

@thesteinegger

7 months

Our Foldtree notebook allows you to compute and visualize trees from protein structures in the browser. Generate trees from either 1) a set of protein structures, 2) AlphaFoldDB identifiers or 3) an #AlphaFold cluster identifier . 🌐

Foldtree.ipynb

Run, share, and edit Python notebooks

colab.research.google.com

Dave Moi

@steg0s4urus

7 months

In a post #AlphaFold world, we can use protein structures in ways we never could before. Can we build phylogenies with them? Are they any good? Yes! Foldtree () surpasses traditional sequence-based methods, even for closely related proteins.👇

11

255

768

6

99

328

Martin Steinegger 🇺🇦

@thesteinegger

2 years

We predicted the structure of 140k protein isoforms from human using #AlphaFold /ColabFold. When comparing them to their canonical MANE partner we saw that structure predictions can improve genome annotation. Data is available at 📄

3

82

312

Martin Steinegger 🇺🇦

@thesteinegger

26 days

Foldseek got published in 2024 in Volume 42 of @NatureBiotech . Here is the timeline of FS releases: Source code: 2021/07 Webserver: 2022/01 Preprint: 2022/02 Journal: 2023/05 Print: 2024/02

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Hundreds of millions of protein structures will require new tools. Foldseek is a fast structural aligner that scales to billions of structures. Work by Stephanie Kim, Michel van Kempen and Johannes Söding and me. 📜 Poster at the @ISMB next week 💾

4

51

208

4

49

294

Martin Steinegger 🇺🇦

@thesteinegger

2 years

My postdoc Stephanie Kim presents Foldseek, our fast and accurate protein structure aligner, during today's poster session (P156-T) @ECCBinfo #ECCB2022 . Unfortunately, she has to leave earlier to reach her flight, so please make sure to not miss her.

4

41

281

Martin Steinegger 🇺🇦

@thesteinegger

6 months

Our Foldseek structural clustering of the #AlphaFold DB is now accessible through the AFDB website and API. It allows the fast discovery of similar structures for @uniprot proteins. It is a pleasure to work with the AlphaFold DB team! Foldseek cluster 📄

EMBL-EBI

@emblebi

6 months

The #AlphaFold Database has levelled up 🚀 🔍 Sequence-based search: Find protein structures in the database using BLAST 🤝With @thesteinegger team, we bring structure similarity clusters for seamless navigation A collaboration with @GoogleDeepMind

6

85

264

5

76

271

Martin Steinegger 🇺🇦

@thesteinegger

3 years

#AlphaFold 2 Colab has processed >10k queries. We now also search against BFD, Mgnify, SMAG( @tomodelmont ), MetaEuk in addition to UniRef. SMAG&MetaEuk have >20M eukaryotic environmental proteins that were not used in AF2 before. @sokrypton @milot_mirdita

7

69

268

Martin Steinegger 🇺🇦

@thesteinegger

2 years

ColabFold now uses a faster MMseqs2 backend server. We switched from BFD/Mgnify to ColabFoldDB, a larger metagenomic database, and reduced rate limits a lot, so batch AlphaFold2 runs should be faster. 💻

0

51

264

Martin Steinegger 🇺🇦

@thesteinegger

1 year

Our #AlphaFold cluster site has new features: ① Search for clusters using a protein structure via #Foldseek ② Filter candidate clusters ③ Explore the cluster using a pavian-style interactive Sankey taxonomy plot 🌐 👏 work by @milot_mirdita @jgyyy15

3

72

250

Martin Steinegger 🇺🇦

@thesteinegger

4 months

Exciting news. Colabfold is soon accessible through @galaxyproject . @milot_mirdita really made this possible.

Michael Schatz

@mike_schatz

4 months

That’s right! AlphaFold is coming to @galaxyproject ! Soon anyone anywhere will be able to fold and analyze the structure of nearly any protein completely for free! Many thanks to @thesteinegger et al for working with us to deploy their optimized ColabFold implementation.

6

56

224

3

50

234

Martin Steinegger 🇺🇦

@thesteinegger

10 months

At #ISMBECCB2023 , my talented students present their exciting work: 22 petabase search for structure prediction, structural clustering of AFDB, IDP multimer prediction, structural compression & metagenomic classification. Find us at poster B-036, B-038, B-039, B-040, B-114.

2

60

227

Martin Steinegger 🇺🇦

@thesteinegger

2 years

. @MetaAI released ESMfold and structure predictions for most metagenomic MGnify90 sequences. Thanks for early-access @TomSercu @alexrives to 36 mio structures clustered at 30% seq. id. Check them out on our Foldseek search server:

3

52

224

Martin Steinegger 🇺🇦

@thesteinegger

1 year

Foldseek's webserver now allows predicting structures thanks to the great ESMfold API. Just paste an amino acid sequence, click "PREDICT STRUCTURE" and search against structures from ESMatlas,AlphaFoldDB,PDB & more. Work by @milot_mirdita . Check it out at

4

62

216

Martin Steinegger 🇺🇦

@thesteinegger

3 months

Through homology search & pLMs, we identified an effective kynureninase that degrades a key immunosuppressor in cancer, reducing tumor weight in mice by 3.4x. 📄 Seek & rank your own protein based on only a handful of measures. ⛵

3

53

220

Martin Steinegger 🇺🇦

@thesteinegger

10 months

My group and I are excited to join this years #ISMBECCB2023 in Lyon. We are present 11 posters and 4 talks. We are preparing a set of updated stickers of our methods for the poster session. Sneak peek below!

12

18

213

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Our Foldcomp library compressed the #AlphaFold /Uniprot from 23TB to 950GB at an avg. loss of <0.5Å; decompresses ~200 structures per second per core, and has a python interface to download dbs, compress/decompress. Work by: @HKgenomics @milot_mirdita 1/4 💾

3

45

211

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Our Foldseek server now includes the #AlphaFold UniProt DB clustered to 52M structures at ~50% seq. id & 80% cov. The full Foldseek AlphaFoldDB, including Cα, can be downloaded through the Foldseek databases module (~700GB download, ~950GB extracted) 1/6 🌐

2

50

209

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Hundreds of millions of protein structures will require new tools. Foldseek is a fast structural aligner that scales to billions of structures. Work by Stephanie Kim, Michel van Kempen and Johannes Söding and me. 📜 Poster at the @ISMB next week 💾

Google DeepMind

@GoogleDeepMind

3 years

We’re also sharing the proteomes of 20 other biologically-significant organisms, totalling over 350k structures. Soon we plan to expand to over 100 million, covering almost every sequenced protein known to science & the @uniprot reference database. 2/

7

178

758

4

51

208

Martin Steinegger 🇺🇦

@thesteinegger

2 years

ColabFold supports to upload custom templates now. Here is an example of a GPCR (ACM2_HUMAN) modeled with an active and inactive template using no MSA information. The example was taken from @huhlim and @MeikelFeig 's preprint

4

44

208

Martin Steinegger 🇺🇦

@thesteinegger

2 years

OmegaFold is open source. Thank you so much for releasing it. It installs very easily. On Colab a protein with 583 res. ran out of memory (16GB GPU), 320 worked (it took 13min). 583 ran in ~23m on 24GB GPU. Complex prediction by glycine linker seem to work for a toy example.

Jian Peng

@peng_illinois

2 years

OmegaFold's code and model1 is released:

5

78

215

7

34

193

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Taxonomic assignment of contigs 2-18x time faster than state-of-the-art. At a glance: we assign each ORF a tax-label considering alignment uncertainty followed by a weighted majority prediction. 📝 Code:

5

45

183

Martin Steinegger 🇺🇦

@thesteinegger

1 year

New Foldcomp release! Our protein structure compression algorithm now supports multiple input/output file types as well as multiple chains/fragments. We updated the Python API & new DBs including #ESMatlas and #AlphFold cluster. Great work by @HKgenomics

4

48

183

Martin Steinegger 🇺🇦

@thesteinegger

4 years

Conterminator terminates contamination in genomes. @StevenSalzberg1 and me report over 114K/2M contaminations in RefSeq/GenBank and two unexpected ones in GRCH38 alt. scaffold and C. elegans ref. genome. Preprint: Code:

8

98

174

Martin Steinegger 🇺🇦

@thesteinegger

2 years

New MMseqs2 release 14-7e284: includes the features to run the ColabFold pipeline, position-specific gap costs/profile-profile Gotoh-Smith-Waterman, speed-ups and more. Thanks a lot to all contributors! 🐍 conda install -c conda-forge -c bioconda mmseqs2

2

38

173

Martin Steinegger 🇺🇦

@thesteinegger

9 months

ProstT5 is a protein LLM with structure-aware embeddings. It was trained on structures (Foldseek’s 3Di) and AA sequences. It translates AA to 3Di for sensitive foldseek search and designs proteins by converting 3Di to AA. 📄 Code:

GitHub - mheinzinger/ProstT5: Bilingual Language Model for Protein Sequence and Structure

Bilingual Language Model for Protein Sequence and Structure - mheinzinger/ProstT5

github.com

Michael Heinzinger

@HeinzingerM

9 months

Our new bilingual protein language model (pLM), ProstT5, translates between protein sequence and structure. Besides producing more structure-aware embeddings that are better at remote homology detection than sequence-pLMs, its translation capability enables inverse folding.

2

32

128

3

40

171

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Foldseek got a 3D structure visualization using NGL thanks to my postdoc @clmgilchrist and @milot_mirdita . We generate missing atoms using pulchra and superpose aligned sequences using TMalign in the browser using #WebAssembly 🌐 📄

3

35

172

Martin Steinegger 🇺🇦

@thesteinegger

2 years

ColabFold preprint update. Two Highlights: colabfold_batch executes MMseqs2+AlphaFold2 in batch and is nearly 100x faster using early-stopping at ≥85pLDDT compared to #AlphaFold 2. ColabFold+AlphaFold-multimer performs similar to AlphaFold-multimer. 📄

2

41

169

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Thank you so much for using ColabFold and the acknowledgment! These are crazy looking proteins. Its quite amazing that #AlphaFold2 gets them right.

Brett Collins

@brettcollins100

2 years

Enjoyed making a small contribution to this study by @AlbertPol10 and @LabParton . Quite amazed by how well AlphaFold2 predicts the unusual structures of the caveolin proteins.

1

10

47

3

19

169

Martin Steinegger 🇺🇦

@thesteinegger

1 year

ColabFold is number 2 on this list of most cited AI papers in 2022! Congratulations on everybody contributing @milot_mirdita @konstinx @Ag_smith @huhlim @sokrypton and the community.

Zeta Alpha

@ZetaVector

1 year

The 100 most cited AI papers for 2022. A detailed analysis of the most cited papers for the last three years allows good insights into the organisations and countries publishing the most impactful AI research right now. Read here: A thread 🧵

9

165

532

4

28

166

Martin Steinegger 🇺🇦

@thesteinegger

11 months

Metabuli 분리 improves metagenomic read classification through metamers, DNA-AA k-mers, to be sensitive and specific, recovering 99% and 98% of DNA or AA classifiers. Great work @JaebeomKim6 ! 💾 📄 🐍conda install -c bioconda metabuli

3

64

165

Martin Steinegger 🇺🇦

@thesteinegger

1 year

. @arian_jamasb integrated Foldcomp, our structure compression algorithm, into Graphein - a Geometric Deep Learning framework for protein structures. Now, you can train networks on a proteome scale in Colab! Great work. 🎉 View the notebook:

2

45

163

Martin Steinegger 🇺🇦

@thesteinegger

8 months

Foldseek Release 8: supporting searches against clustered databases (with prebuilt DBs for AFDB50 and PDB100) and protein-complexes. HTML output was improved by @clmgilchrist . In the webserver, you can download and (re)upload results. 💾 or bioconda 🐍

2

45

163

Martin Steinegger 🇺🇦

@thesteinegger

2 years

We changed the license of the ColabFoldDB and PDB70 from CC BY-NC to CC BY-4.0. Now, there shouldn’t be any further roadblocks for commercial use of AlphaFold2 or ColabFold.

4

28

162

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Bakers lab yeast all against all protein complex paper is online. They scanned 8.3 million PPIs using a smaller RoseTTAFold model and predicted the complex structure for high scoring PPIs using Alphafold2. Science review process is blazing fast (<3month)

Computed structures of core eukaryotic protein complexes

Proteome-wide coevolution and deep-learning methods identify and build accurate models of eukaryotic protein complexes.

www.science.org

5

48

162

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Our structural aligner Foldseek can now automatically download databases in a single command. We provide the PDB ( #PDB50 ) and Alphafold DB. You can download Foldseek here: foldseek databases PDB pdb tmp foldseek easy-search query.pdb pdb aln.m8 tmp

0

45

157

Martin Steinegger 🇺🇦

@thesteinegger

10 months

100B parameter protein language model trained on @uniprot and the #ColabFoldDB using 768 NVIDIA A100 GPUs for several months. The LM shows significant improvements in most prediction categories. Note: the model is not open-source; only the training data is currently available.

Diego del Alamo - ddelalamo.bsky.social

@DdelAlamo

10 months

“xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein” A borderline-SOTA antibody structure-prediction method is tucked away in the results section

4

33

146

1

27

155

Martin Steinegger 🇺🇦

@thesteinegger

1 month

Penguin is our new assembler that reconstructs manyfold more accurate strain-level viral genomes and 16S rRNAs from metagenomes through a novel greedy AA/DNA-hybrid bayesian overlap extension strategy. By @AnnikaJochheim et al. 📄 💾

5

49

154

Martin Steinegger 🇺🇦

@thesteinegger

10 months

. @Rosy_Zh presented a new method called Spacedust, which can cluster and detect reoccurring gene neighborhoods. It allows to find gene similarities using amino acids&structures. Check out her poster for more details and Marv stickers. Code: #ISMBECCB2023

1

35

155

Martin Steinegger 🇺🇦

@thesteinegger

5 months

Protein-Vec enhances protein function prediction by combining independently contrastively learned protein classifiers for EC, GO, PFAM, Gene3D, and TMscore (Aspect-vecs) into a merged embedding to boost prediction performance. 📄 💾

3

33

146

Martin Steinegger 🇺🇦

@thesteinegger

10 months

My student @HKgenomics talks about Foldcomp, a fast protein structure compression algorithm. Foldcomp compresses the AFDB down from 23TB to 1TB at the speed of gzip. #ISMBECCB2023 📄 Code:

1

21

148

Martin Steinegger 🇺🇦

@thesteinegger

3 years

We released the MMseqs2 ColabFold databases at: . Additionally to the BFD/Mgnify we also built a database containing additional metagenomic databases: MetaEuk, SMAG, TOPAZ, MGV, GPD and MetaClust2. Thanks @milot_mirdita for getting MMseqs2 ready.

0

50

146

Martin Steinegger 🇺🇦

@thesteinegger

4 years

Agnostos defines a framework to annotate genes beyond the twilight zone using clustering and remote homology detection. It organized over 415 million genes from 1,749 metagenomes. Maybe the dark matter is not so dark after all. Great work @ChiaraVanni5 ! 📄

2

68

139

Martin Steinegger 🇺🇦

@thesteinegger

8 months

"As part of our commitment to releasing our research breakthroughs safely and responsibly, we will not be sharing model weights, to prevent use in potentially unsafe applications." 😂

Eric Topol

@EricTopol

8 months

Just out @ScienceMagazine #AlphaMissense —building on #AlphaFold —based on unsupervised learning #AI , predicts impact of all 71 million human missense variants for disease-causing potential, across entire human proteome; open-source👍 @jun90cheng @Avsecz

7

141

453

5

26

140

Martin Steinegger 🇺🇦

@thesteinegger

4 years

Conterminator a method to terminate contamination in genome and protein databases is published @GenomeBiology . @StevenSalzberg1 and me found >114K/2M likely contaminations in RefSeq/GenBank. 📃 💾 🐍 conda install conterminator

3

70

139

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Here is a blog post about ColabFold by @labriataphd , which summarizes our efforts very well. Thank you so much for writing it. @labriataphd did you try to predict PRTEINSEQENCE?

The hype on AlphaFold keeps growing with this new preprint

Check out this new work democratizing access to the full power of AlphaFold2 by integrating it with a powerful protein sequence matcher…

towardsdatascience.com

1

35

139

Martin Steinegger 🇺🇦

@thesteinegger

5 years

Our protein level assembler “plass” paper is now published at @naturemethods . Plass recovers many fold more proteins from complex metagenomes compared to nucleotide assemblers. Paper: Code&Data: @milot_mirdita @virus_x_team

Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold

Nature Methods - The protein-level assembler can assemble protein catalogs from raw metagenomic sequencing data, enabling large-scale metagenomics studies.

www.nature.com

4

66

138

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Awesome. Alphafold2-multimer code and model are open source. We are working on pairing it up with MMseqs2 in ColabFold!

Google DeepMind

@GoogleDeepMind

3 years

The #AlphaFold source code has been updated and now accounts for multi-chain protein complexes - providing a significant improvement in accuracy for predicting protein interactions: Generate predictions from your browser via:

18

742

2K

1

28

137

Martin Steinegger 🇺🇦

@thesteinegger

1 year

For those interested in exploring the structural space using Foldseek, check out the tutorial video from @SBGrid , where I demonstrate the webserver and command line interface of foldseek. Thank you for hosting me. 🎥

Foldseek

Topic: Fast and accurate protein structure search with FoldseekPresenter: Prof. Martin Steinegger, Asst Professor of Bioinformatics, Seoul National Universit...

www.youtube.com

0

33

133

Martin Steinegger 🇺🇦

@thesteinegger

1 year

Annotating viral proteins in environmental samples is challenging. Using #ColabFold / #AlphaFold2 & #Foldseek can boost success rates from 10% to 50%. Excellent work by Henry Say @brjoris , Daniel Giguere & @gbgloor . 📄

3

31

133

Martin Steinegger 🇺🇦

@thesteinegger

4 years

MetaEuk predicts eukaryotic proteins from metagenomes. They extract millions of yet unknown proteins from marine metagenomes @TaraOcean_ . Preprint: Code: The proteins can be searched at

2

60

130

Martin Steinegger 🇺🇦

@thesteinegger

4 years

Yesterday was my last day at the lab of @StevenSalzberg1 . I feel so lucky that I was able to join such a fun and talented group. I'm looking forward to starting my own lab at Seoul National University @SNUnow . I am hiring! Please reach out via DM or email.

18

30

127

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Yesterday, we talked about ColabFold (AlphaFold2/RoseTTAFold in Google Colab) @ProteinBoston . Below are the slides. Including a comparison of the MMseqs2 vs @DeepMind 's jackhmmer version on CASP14-FM targets. We also covered some updates coming soon! Video will be posted soon too

Sergey Ovchinnikov 🇺🇦

@sokrypton

3 years

@ProteinBoston Video still being processed. But here are slides that we shared:

1

34

118

1

38

122

Martin Steinegger 🇺🇦

@thesteinegger

1 year

This paper covers the #ESMatlas , a huge metagenomic protein structure database and the lightning-fast #ESMfold structure predictor, which the authors provide as API, allowing for direct structure predictions. Kudos to @alexrives & the @MetaAI team for this exceptional work!

Science Magazine

@ScienceMagazine

1 year

In a Science study, @MetaAI researchers show the power of a large language model, #ESMFold , to enable protein structure prediction and analysis. Using ESMFold, they generated a database—the ESM Metagenomic Atlas—of over 600 million metagenomic proteins.

6

175

554

0

20

122

Martin Steinegger 🇺🇦

@thesteinegger

3 years

. @sokrypton talked about ColabFold at the @emblebi AlphaFold webinar. Below is a screenshot of its complex modelling possibilities. He also presented a memory resource friendly modeling approach for large complexes using trimming implemented in the AlphaFold2_advanced_beta colab

1

34

121

Martin Steinegger 🇺🇦

@thesteinegger

9 months

Today we present four posters #ISMBECCB2023 . A fast structural MSA algorithm (FoldMason), a NN to de-noise & select particles from Cryo-ET images, novel fungal core genes, and a benchmark of AA and structure measures for proteome comparison. Poster C-114, C-148, C-238, C-262

4

24

120

Martin Steinegger 🇺🇦

@thesteinegger

5 months

Curious to see how the new PDB identifier, with 5-characters instead of 4, will impact bioinformatics pipelines. This might be a "millennium bug" moment in structure bioinformatics.

rcsb pdb 💉🧬💻🔬💊🌱🧠🦠

@buildmodels

5 months

Today's the day: PDB no longer has 3-character chemical component IDs for incoming depositions. 1st structure with a 5-character CCD has been deposited. Details at wwPDB: PDB Entries w/Novel Ligands Now Distributed Only in PDBx/mmCIF & PDBML File Formats

4

50

119

4

19

116

Martin Steinegger 🇺🇦

@thesteinegger

3 years

The #CASP14 results are out and #AlphaFold2 won. It produces predictions with an margin of error close to crystal structures. Protein structure prediction might be solved. I am happy that I could contribute to this mile stone. See you at the conference.

Mohammed AlQuraishi

@MoAlQuraishi

3 years

CASP14 #s just came out and they’re astounding—DeepMind looks to have solved protein structure prediction. Median GDT_TS went from 68.5 (CASP13) to 92.4!!!! Cf. their 2nd best CASP13 struct scored 92.8 (out of 100). Median RMSD is 2.1Å. I think it's over

35

568

2K

3

34

117

Martin Steinegger 🇺🇦

@thesteinegger

1 year

Check out our new foldseek webserver!

Milot Mirdita

@milot_mirdita

1 year

. @clmgilchrist and I have refreshed the Foldseek webserver interface and made searches much quicker. We have also added the @CATHDatabase with the help of @nicolabordin . To explore the updated server, visit:

4

16

63

2

31

118

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Today @milot_mirdita is defending his PhD. I am so excited to hear his talk. It was such a pleasure to work with you. MMseqs2, ColabFold, and many more methods weren't possible without you. Good luck. :)

9

1

117

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Thank you @DeepMind for making the AlphaFold2's weights available for academic as well as commercial usage. Thus, making AF2 fully open to everybody (who gives proper attribution). We will reflect this change in the ColabFold usage texts soon.

Sergey Ovchinnikov 🇺🇦

@sokrypton

2 years

"The AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license" 🙂 (thanks to @BrianWeitzner for alerting me)

3

102

373

1

27

115

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Today @DeepMind released a colab for #AlphaFold 2 using HMMer for the homology search against a reduced version of Uniprot, BFD, and Mgnify. Thank you for linking our Colab. It’s great to have different favors available. DeepMind colab

AlphaFold.ipynb

Run, share, and edit Python notebooks

colab.research.google.com

1

16

115

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Prediction of Protein-Protein interaction by "just" adding a long linker in between the two sequences. This is pretty cool!

Yoshitaka Moriwaki

@Ag_smith

3 years

AlphaFold2 can also predict heterocomplexes. All you have to do is input the two sequences you want to predict and connect them with a long linker.

14

187

724

4

15

108

Martin Steinegger 🇺🇦

@thesteinegger

5 months

Enjoying my ColabFold Marv coffee latte at Okinawa. Thanks @SunjaeLee3 for inviting me to DTMBIO 2023.

3

108

Martin Steinegger 🇺🇦

@thesteinegger

18 days

I am incredibly proud of my two students @imGyuriKim and @JaebeomKim6 for receiving the prestigious Korean Presidential Scholarship '제1기 대학원 대통령과학장학금'. This is a very competitive price and it is exceptionally rare that two are awarded to the same lab. Congratulations

3

5

107

Martin Steinegger 🇺🇦

@thesteinegger

4 years

SkewIT (Skew Index Test) quantifies the bacterial GC Skew to detect mis-assembled genomes. It detected multiple mis-assemblies of complete RefSeq genomes. Great work @JenniferLu717 and @StevenSalzberg1 Preprint: Code: (not public)

1

46

102

Martin Steinegger 🇺🇦

@thesteinegger

3 years

End-to-end differentiable (vectorized) Smith-Waterman implemented in Jax. A new tool to optimize MSAs based on specific use cases like protein structure quality, phylogeny and many more. Great work by Petti et al. Code:

Sergey Ovchinnikov 🇺🇦

@sokrypton

3 years

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman A fun collaboration with Samantha Petti, Nicholas Bhattacharya, @proteinrosh , @JustasDauparas , @countablyfinite , @keitokiddo , @srush_nlp & @pkoo562 (1/8)

8

203

735

0

16

102

Martin Steinegger 🇺🇦

@thesteinegger

9 months

Adieu Lyon! It was an incredible #ISMBECCB2023 ! Immensely grateful for the warm welcome extended to my students - for many, it was their first international conference. Thanks to @BQPMalfoy and his baby for capturing the moment.

2

7

101

Martin Steinegger 🇺🇦

@thesteinegger

3 years

Behind the scenes of AlphaFold2's success at CASP14. The manuscript describes how difficult targets were processed in order to achieve the highest performance. One take away: search full length sequences instead of just a single domains. 📄

0

32

98

Martin Steinegger 🇺🇦

@thesteinegger

9 months

. @DrArunimaSingh has written a summary about Foldseek for @naturemethods . It's a great overview of the method and includes information about what we're working on. Arunima is also at the #ISMBECCB2023 right now, so don't miss your chance to talk to her. 📄

Speedier protein structure search

Nature Methods - Speedier protein structure search

www.nature.com

0

29

97

Martin Steinegger 🇺🇦

@thesteinegger

2 years

A bit late, but I just found this tweet interesting. The AlphaFold DB contains a weak prediction that can be predicted well by Deepmind's AF2 Colab. How is this possible?

Konstantin Korotkov 🇺🇦

@korotkov_lab

2 years

@sokrypton Here is an opposite example - Uniprot B2HHE4. #AlphaFold database model is low confidence whereas #OmegaFold models are reasonably good without MSA.

1

7

42

5

17

97

Martin Steinegger 🇺🇦

@thesteinegger

10 months

Our Marv stickers arrived just in time for #ISMBECCB2023 . Stickers are available at our posters. I am looking forward to reconnect with old friends, make new connections, and learn about the latest in bioinformatics. See you in Lyon.

6

7

97

Martin Steinegger 🇺🇦

@thesteinegger

4 years

spacegraphcats provides a tool to index and query metagenomic sequence diversity. Helps to recover missing content from genome bins and to quantify diversity. Published @GenomeBiology by @ctitusbrown et al. Great work! 📄 💻

1

47

97

Martin Steinegger 🇺🇦

@thesteinegger

2 years

. @Deepmind released the improved AlphaFold-multimer-v2 to reduce the clash problem. We integrated it in ColabFold. It’s still possible to use older complex methods using model_type. Thank you for open sourcing it and John, @tfgg2 and @richevans_dm for answering our questions.

Richard Evans

@richevans_dm

2 years

Happy to announce an update to the AlphaFold-Multimer paper and code! The new models reduce clashes and improved accuracy.

2

20

135

0

22

96

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Reciprocal best structure hit (RBSH) search with Foldseek detects more hits compared to sequence based methods. Great work by Vivian Angela Monzon,Typhaine Paysan-Lafosse, Valerie Wood and @Alexbateman1 📄 Code:

1

24

95

Martin Steinegger 🇺🇦

@thesteinegger

1 year

Foldcomp is a protein structures compression algorithm and indexing system. It improves compression by 3x over PIC at similar speed to Gzip and reconstructs at ~0.08Å Cα. AFDB/ESMatlas-HQ dbs for download. 🐍interface over pip. 💾 📄

Foldcomp: a library and format for compressing and indexing large protein structure sets

Highly accurate protein structure predictors have generated hundreds of millions of protein structures; these pose a challenge in terms of storage and processing. Here we present Foldcomp, a novel...

www.biorxiv.org

4

23

91

Martin Steinegger 🇺🇦

@thesteinegger

3 years

AlphaFold2 improves the protein structure model quality by recycling (default 3 times), meaning feeding the prediction x times through the model. @sokrypton figured that you can fold a de-novo designed protein from a single sequence by increasing recycles.

Sergey Ovchinnikov 🇺🇦

@sokrypton

3 years

Here is an example that took #alphafold ~12 recycles to fold! (denovo designed protein, single sequence input). Colored by predicted LDDT.

6

48

210

0

23

90

Martin Steinegger 🇺🇦

@thesteinegger

2 years

AF2-multimer models monomer complexes by concatenating MSAs. We observed that monomers are best modeled with unpaired ("stair-case") MSAs. In this example the unpaired MSA of ColabFold+AF2-multimer (soon public) picks up an intra-complex signal that AlphaFold-Colab misses.

Roland Dunbrack 🏳️‍🌈 @rolanddunbrack.bsky.social

@RolandDunbrack

2 years

Hmm, Alphafold-multimer went off the rails on this one. Homodimer of BRD2 bromodomains 1 and 2. Even the single chains are a mess with large overlaps and breaks in the chain.

9

8

77

2

22

91

Martin Steinegger 🇺🇦

@thesteinegger

2 years

Foldseek processed over 83 million AFDB structures. If nothing goes wrong we hopefully have a database by tomorrow. @milot_mirdita is on it.

Charles Bayly-Jones

@bj_charles

2 years

Roughly how long will it take for this to be available in #FoldSeek ?? @thesteinegger - I can't wait to dive in...

2

4

17

1

6

90

Martin Steinegger 🇺🇦

@thesteinegger

2 years

We have setup a new ColabFold MSA server provided by Korean Bioinformation Center. For the switch we will have a short downtime ~8pm KST/1pm CET/7am EST. We accelerated the MSA generation using multiple threads and updated Uniref30 to 2022_02 and PDB to March 2022.

1

15

91

Martin Steinegger 🇺🇦

@thesteinegger

2 years

. @daniel_c0deb0t 's block-aligner is a library to align protein/nucleotide sequences using adaptive banding blocks + SIMD. Its ~9 times faster than Farrar's striped SW, implemented in Rust and available here: 📄

1

23

91