Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ Profile Banner
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ Profile
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ

@thesteinegger

5,812
Followers
673
Following
231
Media
1,864
Statuses

Developing data intensive computational methods โ€ข PI @ Seoul National University ๐Ÿ‡ฐ๐Ÿ‡ท โ€ข #FirstGen โ€ข he/him โ€ข Hauptschรผler โ€ข @martinsteinegger @mstdn .science

South Korea
Joined October 2016
Don't wanna be here? Send us removal request.
Pinned Tweet
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
20 days
Foldseek-Multimer is a protein complex aligner that is up to 10,000x times faster than SOTA methods without sacrificing quality, enabling the comparison of billions of complex pairs per day. 1/5 ๐Ÿ“„ ๐Ÿ’พ ๐ŸŒ
8
152
472
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Protein structure prediction with #AlphaFold 2 in the browser using Google Colab. Just paste your protein in the input box and push "Run all". MSAs are generated by an MMseqs2 API call. Work by @sokrypton , @milot_mirdita . Try it out here:
40
541
2K
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
Foldseek, our fast structural aligner, is now published @NatureBiotech . It allows you to search through large structural databases like #AlphaFold or #ESMatlas in seconds. A long journey since '18! 1/6 ๐Ÿ“„ ๐Ÿ’พ ๐ŸŒ
Tweet media one
17
379
1K
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
We clustered the #AlphaFold structure database with our novel Foldseek algorithm. We identified 2.27M clusters and analyzed them by function, annotation, domains and evolution. Amazing collaboration with @pedrobeltrao lab. 1/n ๐Ÿ“„ ๐Ÿ’พ
Tweet media one
11
272
840
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
5 months
Learn how to predict mono-, multi-mers and conformations with our ColabFold/AlphaFold2 protocol. It covers workflows for beginners as well as advanced concepts. By @imGyuriKim @sewonlee_231 Eli Levy Karin @HKgenomics @Ag_smith @sokrypton @milot_mirdita ๐Ÿ“„
Tweet media one
3
216
713
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 months
I got tenure at Seoul National University! Starting off with as a first-gen with a lower secondary education (Hauptschule) in Germany makes this very meaningful to me. I am grateful for all that supported and believed in me. Excited for what lies ahead!
64
31
668
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
ColabFold updated: we speed up #AlphaFold2 's prediction to allow thousands of structures on a single GPU in a day. Add taxonomical aware paired MSA for complex predictions, a new metagenomic db and support to run it on your local machines. Updated ๐Ÿ“„
Tweet media one
Tweet media two
3
162
633
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
ColabFold makes structure prediction and complex modeling of #AlphaFold 2 and #RoseTTAFold accessible through Google Colab. We show that MSAs produced by MMseqs2 match the accuracy of AF2 (HHblits/HMMer) while being faster. ๐Ÿ“„ Code
Tweet media one
Tweet media two
Tweet media three
6
194
621
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Search your protein structures against #AlphaFold DBs and #PDB in seconds using our Foldseek server. Just paste your PDB file and click search. We offer local (SW) and global (TMalign) structural alignments. Server was build by the amazing @milot_mirdita ๐Ÿš€
10
174
569
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
ColabFold makes folding with #AlphaFold & #RoseTTAfold accessible to everyone. Our MSA server processed >1.6 million requests to date. We thank the community for all the help to improve Colabfold. Now published @naturemethods ๐Ÿ“„
Tweet media one
12
125
550
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
AlphaFold2 predicts protein structures at near crystal structure accuracy in less than <10 minutes (~300aa). The animation below shows the prediction of a viral RNA polymerase with >2k residues. I am grateful I could contribute to this huge milestone. ๐Ÿ“„
12
151
544
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Igor Tolstoy brought to my attention that the #AlphaFold database contains predictions of nearly identical sequences with large pLDDT differences. For example, the two 99.6% similar sequences below have an avg. pLDDT of 97 and 33. We found over 1 Mio. of these cases in the AFDB.
Tweet media one
8
127
528
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
8 months
Our work on clustering the 214M #AlphaFold protein structure was published in @Nature . We identified 2.3M clusters using our fast structure cluster algorithm and analysed its annotations, evolution and novel domains. 1/4 ๐Ÿ“„ ๐ŸŒ
12
155
484
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 months
It feels surreal to receive the Overton Prize from @ISCB ! This reflects the incredible support of my mentors ( @SoedingL , @StevenSalzberg1 ), collaborators, postdocs, students and friend @milot_mirdita . Excited to share this journey with you all at #ISMB2024 in Montreal.
Tweet media one
63
40
458
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
10 months
MSA diversity is key for #AlphaFold2 's accuracy. Larger databases == better results. So, we generated MSAs from 22 peta-bytes of SRA data and show that ColabFold could have improved from rank 11 to 3 at CASP15. โ…• ๐Ÿ“„ ๐Ÿ’พ
Tweet media one
5
117
424
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Our Kraken protocol for microbiome analysis and pathogen detection is out. It also includes KrakenTools, useful helpers for Kraken. Work by @JenniferLu717 Rincon N @DerrickEWood @fbreitw Pockrandt C @BenLangmead @StevenSalzberg1 ๐Ÿ“„ ๐Ÿ’พ
Tweet media one
Tweet media two
Tweet media three
10
114
417
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
5 months
Maximum likelihood structural phylogeny beyond the twilight zone by combining Foldseek's 3Di alphabet with AA alignments. ML trees resolve the topology of distantly related proteins where traditional AA methods fall short. ๐Ÿ“„ ๐Ÿ’พ
Tweet media one
5
106
381
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Foldseek, our local structural aligner, is four orders of magnitude faster than SOA structural aligners at similar sensitivity. Allowing to detect hits in the midnight zone confidently. Code: ๐ŸŒ ๐Ÿ“„
11
116
345
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Predict protein structures in batches using the ColabFold "AlphaFold2_batch" notebook. It will predict all structures for a set of fasta files stored in a Google Drive folder. Try it out here: Thanks to @milot_mirdita @sokrypton
3
105
327
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
Explore our clustered #AlphaFold structural database with our new website by @milot_mirdita @clmgilchrist @jgyyy15 . With it you can find clusters, filter members by taxonomy, browse similar clusters and search with Foldseek. ๐ŸŒ ๐Ÿ“„
5
112
325
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
ColabFold now uses the AlphaFold-multimer models paired with MMseqs2 searches for prediction of protein complexes. Just provide chains separated by : and press "Run all" (provide the same sequence multiple times for homooligmers). Check it out at
Tweet media one
9
85
326
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
7 months
Our Foldtree notebook allows you to compute and visualize trees from protein structures in the browser. Generate trees from either 1) a set of protein structures, 2) AlphaFoldโ€‹DB identifiers or 3) an #AlphaFold cluster identifier . ๐ŸŒ
@steg0s4urus
Dave Moi
7 months
In a post #AlphaFold world, we can use protein structures in ways we never could before. Can we build phylogenies with them? Are they any good? Yes! Foldtree () surpasses traditional sequence-based methods, even for closely related proteins.๐Ÿ‘‡
Tweet media one
11
255
768
6
99
328
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
We predicted the structure of 140k protein isoforms from human using #AlphaFold /ColabFold. When comparing them to their canonical MANE partner we saw that structure predictions can improve genome annotation. Data is available at ๐Ÿ“„
Tweet media one
Tweet media two
3
82
312
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
26 days
Foldseek got published in 2024 in Volume 42 of @NatureBiotech . Here is the timeline of FS releases: Source code: 2021/07 Webserver: 2022/01 Preprint: 2022/02 Journal: 2023/05 Print: 2024/02
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Hundreds of millions of protein structures will require new tools. Foldseek is a fast structural aligner that scales to billions of structures. Work by Stephanie Kim, Michel van Kempen and Johannes Sรถding and me. ๐Ÿ“œ Poster at the @ISMB next week ๐Ÿ’พ
Tweet media one
4
51
208
4
49
294
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
My postdoc Stephanie Kim presents Foldseek, our fast and accurate protein structure aligner, during today's poster session (P156-T) @ECCBinfo #ECCB2022 . Unfortunately, she has to leave earlier to reach her flight, so please make sure to not miss her.
Tweet media one
4
41
281
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
6 months
Our Foldseek structural clustering of the #AlphaFold DB is now accessible through the AFDB website and API. It allows the fast discovery of similar structures for @uniprot proteins. It is a pleasure to work with the AlphaFold DB team! Foldseek cluster ๐Ÿ“„
@emblebi
EMBL-EBI
6 months
The #AlphaFold Database has levelled up ๐Ÿš€ ๐Ÿ” Sequence-based search: Find protein structures in the database using BLAST ๐ŸคWith @thesteinegger team, we bring structure similarity clusters for seamless navigation A collaboration with @GoogleDeepMind
Tweet media one
6
85
264
5
76
271
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
#AlphaFold 2 Colab has processed >10k queries. We now also search against BFD, Mgnify, SMAG( @tomodelmont ), MetaEuk in addition to UniRef. SMAG&MetaEuk have >20M eukaryotic environmental proteins that were not used in AF2 before. @sokrypton @milot_mirdita
Tweet media one
7
69
268
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
ColabFold now uses a faster MMseqs2 backend server. We switched from BFD/Mgnify to ColabFoldDB, a larger metagenomic database, and reduced rate limits a lot, so batch AlphaFold2 runs should be faster. ๐Ÿ’ป
Tweet media one
0
51
264
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
Our #AlphaFold cluster site has new features: โ‘  Search for clusters using a protein structure via #Foldseek โ‘ก Filter candidate clusters โ‘ข Explore the cluster using a pavian-style interactive Sankey taxonomy plot ๐ŸŒ ๐Ÿ‘ work by @milot_mirdita @jgyyy15
3
72
250
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 months
Exciting news. Colabfold is soon accessible through @galaxyproject . @milot_mirdita really made this possible.
@mike_schatz
Michael Schatz
4 months
Thatโ€™s right! AlphaFold is coming to @galaxyproject ! Soon anyone anywhere will be able to fold and analyze the structure of nearly any protein completely for free! Many thanks to @thesteinegger et al for working with us to deploy their optimized ColabFold implementation.
Tweet media one
6
56
224
3
50
234
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
10 months
At #ISMBECCB2023 , my talented students present their exciting work: 22 petabase search for structure prediction, structural clustering of AFDB, IDP multimer prediction, structural compression & metagenomic classification. Find us at poster B-036, B-038, B-039, B-040, B-114.
Tweet media one
2
60
227
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
. @MetaAI released ESMfold and structure predictions for most metagenomic MGnify90 sequences. Thanks for early-access @TomSercu @alexrives to 36 mio structures clustered at 30% seq. id. Check them out on our Foldseek search server:
Tweet media one
3
52
224
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
Foldseek's webserver now allows predicting structures thanks to the great ESMfold API. Just paste an amino acid sequence, click "PREDICT STRUCTURE" and search against structures from ESMatlas,AlphaFoldDB,PDB & more. Work by @milot_mirdita . Check it out at
4
62
216
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 months
Through homology search & pLMs, we identified an effective kynureninase that degrades a key immunosuppressor in cancer, reducing tumor weight in mice by 3.4x. ๐Ÿ“„ Seek & rank your own protein based on only a handful of measures. โ›ต
Tweet media one
3
53
220
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
10 months
My group and I are excited to join this years #ISMBECCB2023 in Lyon. We are present 11 posters and 4 talks. We are preparing a set of updated stickers of our methods for the poster session. Sneak peek below!
Tweet media one
12
18
213
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Our Foldcomp library compressed the #AlphaFold /Uniprot from 23TB to 950GB at an avg. loss of <0.5ร…; decompresses ~200 structures per second per core, and has a python interface to download dbs, compress/decompress. Work by: @HKgenomics @milot_mirdita 1/4 ๐Ÿ’พ
3
45
211
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Our Foldseek server now includes the #AlphaFold UniProt DB clustered to 52M structures at ~50% seq. id & 80% cov. The full Foldseek AlphaFoldDB, including Cฮฑ, can be downloaded through the Foldseek databases module (~700GB download, ~950GB extracted) 1/6 ๐ŸŒ
2
50
209
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Hundreds of millions of protein structures will require new tools. Foldseek is a fast structural aligner that scales to billions of structures. Work by Stephanie Kim, Michel van Kempen and Johannes Sรถding and me. ๐Ÿ“œ Poster at the @ISMB next week ๐Ÿ’พ
Tweet media one
@GoogleDeepMind
Google DeepMind
3 years
Weโ€™re also sharing the proteomes of 20 other biologically-significant organisms, totalling over 350k structures. Soon we plan to expand to over 100 million, covering almost every sequenced protein known to science & the @uniprot reference database. 2/
Tweet media one
7
178
758
4
51
208
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
ColabFold supports to upload custom templates now. Here is an example of a GPCR (ACM2_HUMAN) modeled with an active and inactive template using no MSA information. The example was taken from @huhlim and @MeikelFeig 's preprint
Tweet media one
4
44
208
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
OmegaFold is open source. Thank you so much for releasing it. It installs very easily. On Colab a protein with 583 res. ran out of memory (16GB GPU), 320 worked (it took 13min). 583 ran in ~23m on 24GB GPU. Complex prediction by glycine linker seem to work for a toy example.
Tweet media one
@peng_illinois
Jian Peng
2 years
OmegaFold's code and model1 is released:
5
78
215
7
34
193
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Taxonomic assignment of contigs 2-18x time faster than state-of-the-art. At a glance: we assign each ORF a tax-label considering alignment uncertainty followed by a weighted majority prediction. ๐Ÿ“ Code:
Tweet media one
5
45
183
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
New Foldcomp release! Our protein structure compression algorithm now supports multiple input/output file types as well as multiple chains/fragments. We updated the Python API & new DBs including #ESMatlas and #AlphFold cluster. Great work by @HKgenomics
Tweet media one
4
48
183
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 years
Conterminator terminates contamination in genomes. @StevenSalzberg1 and me report over 114K/2M contaminations in RefSeq/GenBank and two unexpected ones in GRCH38 alt. scaffold and C. elegans ref. genome. Preprint: Code:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
8
98
174
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
New MMseqs2 release 14-7e284: includes the features to run the ColabFold pipeline, position-specific gap costs/profile-profile Gotoh-Smith-Waterman, speed-ups and more. Thanks a lot to all contributors! ๐Ÿ conda install -c conda-forge -c bioconda mmseqs2
Tweet media one
2
38
173
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
9 months
ProstT5 is a protein LLM with structure-aware embeddings. It was trained on structures (Foldseekโ€™s 3Di) and AA sequences. It translates AA to 3Di for sensitive foldseek search and designs proteins by converting 3Di to AA. ๐Ÿ“„ Code:
@HeinzingerM
Michael Heinzinger
9 months
Our new bilingual protein language model (pLM), ProstT5, translates between protein sequence and structure. Besides producing more structure-aware embeddings that are better at remote homology detection than sequence-pLMs, its translation capability enables inverse folding.
Tweet media one
2
32
128
3
40
171
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Foldseek got a 3D structure visualization using NGL thanks to my postdoc @clmgilchrist and @milot_mirdita . We generate missing atoms using pulchra and superpose aligned sequences using TMalign in the browser using #WebAssembly ๐ŸŒ ๐Ÿ“„
3
35
172
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
ColabFold preprint update. Two Highlights: colabfold_batch executes MMseqs2+AlphaFold2 in batch and is nearly 100x faster using early-stopping at โ‰ฅ85pLDDT compared to #AlphaFold 2. ColabFold+AlphaFold-multimer performs similar to AlphaFold-multimer. ๐Ÿ“„
Tweet media one
2
41
169
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Thank you so much for using ColabFold and the acknowledgment! These are crazy looking proteins. Its quite amazing that #AlphaFold2 gets them right.
Tweet media one
@brettcollins100
Brett Collins
2 years
Enjoyed making a small contribution to this study by @AlbertPol10 and @LabParton . Quite amazed by how well AlphaFold2 predicts the unusual structures of the caveolin proteins.
1
10
47
3
19
169
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
ColabFold is number 2 on this list of most cited AI papers in 2022! Congratulations on everybody contributing @milot_mirdita @konstinx @Ag_smith @huhlim @sokrypton and the community.
@ZetaVector
Zeta Alpha
1 year
The 100 most cited AI papers for 2022. A detailed analysis of the most cited papers for the last three years allows good insights into the organisations and countries publishing the most impactful AI research right now. Read here: A thread ๐Ÿงต
Tweet media one
9
165
532
4
28
166
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
11 months
Metabuli ๋ถ„๋ฆฌ improves metagenomic read classification through metamers, DNA-AA k-mers, to be sensitive and specific, recovering 99% and 98% of DNA or AA classifiers. Great work @JaebeomKim6 ! ๐Ÿ’พ ๐Ÿ“„ ๐Ÿconda install -c bioconda metabuli
Tweet media one
3
64
165
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
. @arian_jamasb integrated Foldcomp, our structure compression algorithm, into Graphein - a Geometric Deep Learning framework for protein structures. Now, you can train networks on a proteome scale in Colab! Great work. ๐ŸŽ‰ View the notebook:
2
45
163
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
8 months
Foldseek Release 8: supporting searches against clustered databases (with prebuilt DBs for AFDB50 and PDB100) and protein-complexes. HTML output was improved by @clmgilchrist . In the webserver, you can download and (re)upload results. ๐Ÿ’พ or bioconda ๐Ÿ
2
45
163
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
We changed the license of the ColabFoldDB and PDB70 from CC BY-NC to CC BY-4.0. Now, there shouldnโ€™t be any further roadblocks for commercial use of AlphaFold2 or ColabFold.
Tweet media one
4
28
162
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Bakers lab yeast all against all protein complex paper is online. They scanned 8.3 million PPIs using a smaller RoseTTAFold model and predicted the complex structure for high scoring PPIs using Alphafold2. Science review process is blazing fast (<3month)
5
48
162
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Our structural aligner Foldseek can now automatically download databases in a single command. We provide the PDB ( #PDB50 ) and Alphafold DB. You can download Foldseek here: foldseek databases PDB pdb tmp foldseek easy-search query.pdb pdb aln.m8 tmp
Tweet media one
0
45
157
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
10 months
100B parameter protein language model trained on @uniprot and the #ColabFoldDB using 768 NVIDIA A100 GPUs for several months. The LM shows significant improvements in most prediction categories. Note: the model is not open-source; only the training data is currently available.
Tweet media one
@DdelAlamo
Diego del Alamo - ddelalamo.bsky.social
10 months
โ€œxTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Proteinโ€ A borderline-SOTA antibody structure-prediction method is tucked away in the results section
Tweet media one
4
33
146
1
27
155
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 month
Penguin is our new assembler that reconstructs manyfold more accurate strain-level viral genomes and 16S rRNAs from metagenomes through a novel greedy AA/DNA-hybrid bayesian overlap extension strategy. By @AnnikaJochheim et al. ๐Ÿ“„ ๐Ÿ’พ
Tweet media one
5
49
154
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
10 months
. @Rosy_Zh presented a new method called Spacedust, which can cluster and detect reoccurring gene neighborhoods. It allows to find gene similarities using amino acids&structures. Check out her poster for more details and Marv stickers. Code: #ISMBECCB2023
Tweet media one
1
35
155
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
5 months
Protein-Vec enhances protein function prediction by combining independently contrastively learned protein classifiers for EC, GO, PFAM, Gene3D, and TMscore (Aspect-vecs) into a merged embedding to boost prediction performance. ๐Ÿ“„ ๐Ÿ’พ
Tweet media one
3
33
146
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
10 months
My student @HKgenomics talks about Foldcomp, a fast protein structure compression algorithm. Foldcomp compresses the AFDB down from 23TB to 1TB at the speed of gzip. #ISMBECCB2023 ๐Ÿ“„ Code:
Tweet media one
1
21
148
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
We released the MMseqs2 ColabFold databases at: . Additionally to the BFD/Mgnify we also built a database containing additional metagenomic databases: MetaEuk, SMAG, TOPAZ, MGV, GPD and MetaClust2. Thanks @milot_mirdita for getting MMseqs2 ready.
0
50
146
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 years
Agnostos defines a framework to annotate genes beyond the twilight zone using clustering and remote homology detection. It organized over 415 million genes from 1,749 metagenomes. Maybe the dark matter is not so dark after all. Great work @ChiaraVanni5 ! ๐Ÿ“„
Tweet media one
Tweet media two
Tweet media three
2
68
139
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
8 months
"As part of our commitment to releasing our research breakthroughs safely and responsibly, we will not be sharing model weights, to prevent use in potentially unsafe applications." ๐Ÿ˜‚
@EricTopol
Eric Topol
8 months
Just out @ScienceMagazine #AlphaMissense โ€”building on #AlphaFold โ€”based on unsupervised learning #AI , predicts impact of all 71 million human missense variants for disease-causing potential, across entire human proteome; open-source๐Ÿ‘ @jun90cheng @Avsecz
Tweet media one
Tweet media two
7
141
453
5
26
140
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 years
Conterminator a method to terminate contamination in genome and protein databases is published @GenomeBiology . @StevenSalzberg1 and me found >114K/2M likely contaminations in RefSeq/GenBank. ๐Ÿ“ƒ ๐Ÿ’พ ๐Ÿ conda install conterminator
Tweet media one
Tweet media two
Tweet media three
3
70
139
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Here is a blog post about ColabFold by @labriataphd , which summarizes our efforts very well. Thank you so much for writing it. @labriataphd did you try to predict PRTEINSEQENCE?
1
35
139
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
5 years
Our protein level assembler โ€œplassโ€ paper is now published at @naturemethods . Plass recovers many fold more proteins from complex metagenomes compared to nucleotide assemblers. Paper: Code&Data: @milot_mirdita @virus_x_team
4
66
138
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Awesome. Alphafold2-multimer code and model are open source. We are working on pairing it up with MMseqs2 in ColabFold!
@GoogleDeepMind
Google DeepMind
3 years
The #AlphaFold source code has been updated and now accounts for multi-chain protein complexes - providing a significant improvement in accuracy for predicting protein interactions: Generate predictions from your browser via:
Tweet media one
18
742
2K
1
28
137
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
For those interested in exploring the structural space using Foldseek, check out the tutorial video from @SBGrid , where I demonstrate the webserver and command line interface of foldseek. Thank you for hosting me. ๐ŸŽฅ
0
33
133
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
Annotating viral proteins in environmental samples is challenging. Using #ColabFold / #AlphaFold2 & #Foldseek can boost success rates from 10% to 50%. Excellent work by Henry Say @brjoris , Daniel Giguere & @gbgloor . ๐Ÿ“„
Tweet media one
3
31
133
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 years
MetaEuk predicts eukaryotic proteins from metagenomes. They extract millions of yet unknown proteins from marine metagenomes @TaraOcean_ . Preprint: Code: The proteins can be searched at
Tweet media one
Tweet media two
2
60
130
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 years
Yesterday was my last day at the lab of @StevenSalzberg1 . I feel so lucky that I was able to join such a fun and talented group. I'm looking forward to starting my own lab at Seoul National University @SNUnow . I am hiring! Please reach out via DM or email.
18
30
127
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Yesterday, we talked about ColabFold (AlphaFold2/RoseTTAFold in Google Colab) @ProteinBoston . Below are the slides. Including a comparison of the MMseqs2 vs @DeepMind 's jackhmmer version on CASP14-FM targets. We also covered some updates coming soon! Video will be posted soon too
@sokrypton
Sergey Ovchinnikov ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
@ProteinBoston Video still being processed. But here are slides that we shared:
1
34
118
1
38
122
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
This paper covers the #ESMatlas , a huge metagenomic protein structure database and the lightning-fast #ESMfold structure predictor, which the authors provide as API, allowing for direct structure predictions. Kudos to @alexrives & the @MetaAI team for this exceptional work!
@ScienceMagazine
Science Magazine
1 year
In a Science study, @MetaAI researchers show the power of a large language model, #ESMFold , to enable protein structure prediction and analysis. Using ESMFold, they generated a databaseโ€”the ESM Metagenomic Atlasโ€”of over 600 million metagenomic proteins.
Tweet media one
6
175
554
0
20
122
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
. @sokrypton talked about ColabFold at the @emblebi AlphaFold webinar. Below is a screenshot of its complex modelling possibilities. He also presented a memory resource friendly modeling approach for large complexes using trimming implemented in the AlphaFold2_advanced_beta colab
Tweet media one
1
34
121
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
9 months
Today we present four posters #ISMBECCB2023 . A fast structural MSA algorithm (FoldMason), a NN to de-noise & select particles from Cryo-ET images, novel fungal core genes, and a benchmark of AA and structure measures for proteome comparison. Poster C-114, C-148, C-238, C-262
Tweet media one
4
24
120
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
5 months
Curious to see how the new PDB identifier, with 5-characters instead of 4, will impact bioinformatics pipelines. This might be a "millennium bug" moment in structure bioinformatics.
@buildmodels
rcsb pdb ๐Ÿ’‰๐Ÿงฌ๐Ÿ’ป๐Ÿ”ฌ๐Ÿ’Š๐ŸŒฑ๐Ÿง ๐Ÿฆ 
5 months
Today's the day: PDB no longer has 3-character chemical component IDs for incoming depositions. 1st structure with a 5-character CCD has been deposited. Details at wwPDB: PDB Entries w/Novel Ligands Now Distributed Only in PDBx/mmCIF & PDBML File Formats
4
50
119
4
19
116
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
The #CASP14 results are out and #AlphaFold2 won. It produces predictions with an margin of error close to crystal structures. Protein structure prediction might be solved. I am happy that I could contribute to this mile stone. See you at the conference.
Tweet media one
@MoAlQuraishi
Mohammed AlQuraishi
3 years
CASP14 #s just came out and theyโ€™re astoundingโ€”DeepMind looks to have solved protein structure prediction. Median GDT_TS went from 68.5 (CASP13) to 92.4!!!! Cf. their 2nd best CASP13 struct scored 92.8 (out of 100). Median RMSD is 2.1ร…. I think it's over
35
568
2K
3
34
117
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
Check out our new foldseek webserver!
@milot_mirdita
Milot Mirdita
1 year
. @clmgilchrist and I have refreshed the Foldseek webserver interface and made searches much quicker. We have also added the @CATHDatabase with the help of @nicolabordin . To explore the updated server, visit:
4
16
63
2
31
118
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Today @milot_mirdita is defending his PhD. I am so excited to hear his talk. It was such a pleasure to work with you. MMseqs2, ColabFold, and many more methods weren't possible without you. Good luck. :)
Tweet media one
9
1
117
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Thank you @DeepMind for making the AlphaFold2's weights available for academic as well as commercial usage. Thus, making AF2 fully open to everybody (who gives proper attribution). We will reflect this change in the ColabFold usage texts soon.
@sokrypton
Sergey Ovchinnikov ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
"The AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license" ๐Ÿ™‚ (thanks to @BrianWeitzner for alerting me)
Tweet media one
3
102
373
1
27
115
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Today @DeepMind released a colab for #AlphaFold 2 using HMMer for the homology search against a reduced version of Uniprot, BFD, and Mgnify. Thank you for linking our Colab. Itโ€™s great to have different favors available. DeepMind colab
1
16
115
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Prediction of Protein-Protein interaction by "just" adding a long linker in between the two sequences. This is pretty cool!
@Ag_smith
Yoshitaka Moriwaki
3 years
AlphaFold2 can also predict heterocomplexes. All you have to do is input the two sequences you want to predict and connect them with a long linker.
Tweet media one
14
187
724
4
15
108
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
5 months
Enjoying my ColabFold Marv coffee latte at Okinawa. Thanks @SunjaeLee3 for inviting me to DTMBIO 2023.
Tweet media one
3
3
108
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
18 days
I am incredibly proud of my two students @imGyuriKim and @JaebeomKim6 for receiving the prestigious Korean Presidential Scholarship '์ œ1๊ธฐ ๋Œ€ํ•™์› ๋Œ€ํ†ต๋ น๊ณผํ•™์žฅํ•™๊ธˆ'. This is a very competitive price and it is exceptionally rare that two are awarded to the same lab. Congratulations
3
5
107
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 years
SkewIT (Skew Index Test) quantifies the bacterial GC Skew to detect mis-assembled genomes. It detected multiple mis-assemblies of complete RefSeq genomes. Great work @JenniferLu717 and @StevenSalzberg1 Preprint: Code: (not public)
Tweet media one
Tweet media two
1
46
102
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
End-to-end differentiable (vectorized) Smith-Waterman implemented in Jax. A new tool to optimize MSAs based on specific use cases like protein structure quality, phylogeny and many more. Great work by Petti et al. Code:
Tweet media one
Tweet media two
@sokrypton
Sergey Ovchinnikov ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman A fun collaboration with Samantha Petti, Nicholas Bhattacharya, @proteinrosh , @JustasDauparas , @countablyfinite , @keitokiddo , @srush_nlp & @pkoo562 (1/8)
8
203
735
0
16
102
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
9 months
Adieu Lyon! It was an incredible #ISMBECCB2023 ! Immensely grateful for the warm welcome extended to my students - for many, it was their first international conference. Thanks to @BQPMalfoy and his baby for capturing the moment.
Tweet media one
2
7
101
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Behind the scenes of AlphaFold2's success at CASP14. The manuscript describes how difficult targets were processed in order to achieve the highest performance. One take away: search full length sequences instead of just a single domains. ๐Ÿ“„
Tweet media one
Tweet media two
0
32
98
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
9 months
. @DrArunimaSingh has written a summary about Foldseek for @naturemethods . It's a great overview of the method and includes information about what we're working on. Arunima is also at the #ISMBECCB2023 right now, so don't miss your chance to talk to her. ๐Ÿ“„
0
29
97
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
A bit late, but I just found this tweet interesting. The AlphaFold DB contains a weak prediction that can be predicted well by Deepmind's AF2 Colab. How is this possible?
@korotkov_lab
Konstantin Korotkov ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
@sokrypton Here is an opposite example - Uniprot B2HHE4. #AlphaFold database model is low confidence whereas #OmegaFold models are reasonably good without MSA.
Tweet media one
1
7
42
5
17
97
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
10 months
Our Marv stickers arrived just in time for #ISMBECCB2023 . Stickers are available at our posters. I am looking forward to reconnect with old friends, make new connections, and learn about the latest in bioinformatics. See you in Lyon.
Tweet media one
6
7
97
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
4 years
spacegraphcats provides a tool to index and query metagenomic sequence diversity. Helps to recover missing content from genome bins and to quantify diversity. Published @GenomeBiology by @ctitusbrown et al. Great work! ๐Ÿ“„ ๐Ÿ’ป
Tweet media one
Tweet media two
1
47
97
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
. @Deepmind released the improved AlphaFold-multimer-v2 to reduce the clash problem. We integrated it in ColabFold. Itโ€™s still possible to use older complex methods using model_type. Thank you for open sourcing it and John, @tfgg2 and @richevans_dm for answering our questions.
@richevans_dm
Richard Evans
2 years
Happy to announce an update to the AlphaFold-Multimer paper and code! The new models reduce clashes and improved accuracy.
2
20
135
0
22
96
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Reciprocal best structure hit (RBSH) search with Foldseek detects more hits compared to sequence based methods. Great work by Vivian Angela Monzon,Typhaine Paysan-Lafosse, Valerie Wood and @Alexbateman1 ๐Ÿ“„ Code:
Tweet media one
Tweet media two
Tweet media three
1
24
95
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
1 year
Foldcomp is a protein structures compression algorithm and indexing system. It improves compression by 3x over PIC at similar speed to Gzip and reconstructs at ~0.08ร… Cฮฑ. AFDB/ESMatlas-HQ dbs for download. ๐Ÿinterface over pip. ๐Ÿ’พ ๐Ÿ“„
4
23
91
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
AlphaFold2 improves the protein structure model quality by recycling (default 3 times), meaning feeding the prediction x times through the model. @sokrypton figured that you can fold a de-novo designed protein from a single sequence by increasing recycles.
@sokrypton
Sergey Ovchinnikov ๐Ÿ‡บ๐Ÿ‡ฆ
3 years
Here is an example that took #alphafold ~12 recycles to fold! (denovo designed protein, single sequence input). Colored by predicted LDDT.
6
48
210
0
23
90
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
AF2-multimer models monomer complexes by concatenating MSAs. We observed that monomers are best modeled with unpaired ("stair-case") MSAs. In this example the unpaired MSA of ColabFold+AF2-multimer (soon public) picks up an intra-complex signal that AlphaFold-Colab misses.
Tweet media one
@RolandDunbrack
Roland Dunbrack ๐Ÿณ๏ธโ€๐ŸŒˆ @rolanddunbrack.bsky.social
2 years
Hmm, Alphafold-multimer went off the rails on this one. Homodimer of BRD2 bromodomains 1 and 2. Even the single chains are a mess with large overlaps and breaks in the chain.
Tweet media one
Tweet media two
9
8
77
2
22
91
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
Foldseek processed over 83 million AFDB structures. If nothing goes wrong we hopefully have a database by tomorrow. @milot_mirdita is on it.
@bj_charles
Charles Bayly-Jones
2 years
Roughly how long will it take for this to be available in #FoldSeek ?? @thesteinegger - I can't wait to dive in...
2
4
17
1
6
90
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
We have setup a new ColabFold MSA server provided by Korean Bioinformation Center. For the switch we will have a short downtime ~8pm KST/1pm CET/7am EST. We accelerated the MSA generation using multiple threads and updated Uniref30 to 2022_02 and PDB to March 2022.
1
15
91
@thesteinegger
Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ
2 years
. @daniel_c0deb0t 's block-aligner is a library to align protein/nucleotide sequences using adaptive banding blocks + SIMD. Its ~9 times faster than Farrar's striped SW, implemented in Rust and available here: ๐Ÿ“„
Tweet media one
1
23
91