amelie_schreiber @amelie_iska Twitter profile

Last Seen Profiles

@CoachLanceGary

@nocontextanhqv

@aiyauwu

@AnjaReschke1

@daisukevolley28

@LeoraLeone

@RDS_official

@CrystalFissure

@AtKmovie

@goto510sinpei

@hc_men

@romasaga_rs

@denryoku_kita

@penfelin

@Matiezequielcge

@bellieraa

@RealGraufuchs

@HollyareH

@junprints

@actuaInun

@BiraziR

@aimeemann

@UPGovt

@Blfire

@Alqadsiah

@wlegendw

@thesun_alone

@buacallate

@JDenaeghel

@CinnamonServal

@CrimsonMk3

@Dentoshi

@Out_Arabs

@shrimpjulia

@Angeles_NF

@DPodocasts

amelie_schreiber

@amelie_iska

2 months

Top 10 ❤️ tools rn, in no particular order: 1. ProteinDT 2. MoleculesSTM 3. RFDiffusion-AA 4. RosettaFold-AA 5. LigandMPNN 6. Distributional Graphormer (DiG) 7. DNA-Diffusion 8. OAReactDiff 9. RFDiffusion (original) 10. EvoDiff ❤️ Evo ❤️ Flow matching ❤️ Boltzmann generators

2

25

182

amelie_schreiber

@amelie_iska

2 months

Protein binding a small molecule designed with RFDiffusion-AA yesterday. I'm such a huge fangirl for these all-atom models. Baker Lab is awesome!

2

17

171

amelie_schreiber

@amelie_iska

2 months

These two together make a really good pair: From this you get conformational ensembles and binding affinity for protein-protein, protein-small molecule, and protein-nucleic acid affinities, reducing the need for expensive MD sims.

GitHub - bjing2016/alphaflow: AlphaFold Meets Flow Matching for Generating Protein Ensembles

AlphaFold Meets Flow Matching for Generating Protein Ensembles - bjing2016/alphaflow

github.com

0

24

139

amelie_schreiber

@amelie_iska

1 month

Found out yesterday some of my @huggingface blogs inspired some undergrads to start studying AI applied to proteins and someone applied to and received an internship based on their interest in replicating and extending some of them. 😎 Feeling very inspired and grateful now. ❤️

4

8

132

amelie_schreiber

@amelie_iska

27 days

In case it is helpful:

RFDiffusion Potentials

huggingface.co

1

13

94

amelie_schreiber

@amelie_iska

7 months

Just thought I would share this new Hugging Face community blog post I wrote as a follow up post to the ESMBind post. It explains how to build an ensemble of Low Rank Adaptations (LoRAs) after you have finetuned multiple ESMBind LoRA models:

ESMBind (ESMB) Ensemble Models

huggingface.co

1

13

56

amelie_schreiber

@amelie_iska

2 months

This just happened.

Atomically accurate de novo design of single-domain antibodies

Despite the central role that antibodies play in modern medicine, there is currently no way to rationally design novel antibodies to bind a specific epitope on a target. Instead, antibody discovery...

www.biorxiv.org

1

6

56

amelie_schreiber

@amelie_iska

1 month

An interesting and novel approach to applying transformers to graph structured data. This never got the attention it deserved and is likely an approach lost to time. It maybe “old” but it’s worth investigating further, especially for biochem/molecules:

GitHub - jw9730/tokengt: [NeurIPS'22] Tokenized Graph Transformer (TokenGT), in PyTorch

[NeurIPS'22] Tokenized Graph Transformer (TokenGT), in PyTorch - jw9730/tokengt

github.com

3

8

55

amelie_schreiber

@amelie_iska

1 month

RNA sequence design analogous to ProteinMPNN, but for RNA:

GitHub - chaitjo/geometric-rna-design: gRNAde: Geometric Deep Learning for RNA Design

gRNAde: Geometric Deep Learning for RNA Design. Contribute to chaitjo/geometric-rna-design development by creating an account on GitHub.

github.com

1

6

55

amelie_schreiber

@amelie_iska

1 month

Damn, another E(3)-equivariant model that should have been SE(3)-equivariant. Molecules have chirality! Still exciting that it works for small molecules AND proteins:

Equivariant Pretrained Transformer for Unified Geometric Learning...

Pretraining on a large number of unlabeled 3D molecules has showcased superiority in various scientific applications. However, prior efforts typically focus on pretraining models on a specific...

arxiv.org

0

7

55

amelie_schreiber

@amelie_iska

2 months

Has anyone else tried grafting two proteins together by first placing the proteins into AlphaFold-Multimer, then linking the proteins together with something like RFDiffusion motif scaffolding (treating the two proteins as though they are in the same chain)?

3

4

51

amelie_schreiber

@amelie_iska

1 month

Equivariant Spatio-Temporal Attentive Graph Networks to Simulate Physical Dynamics: A Replacement for MD? TBD. More comments to come. OpenReview: GitHub:

GitHub - ManlioWu/ESTAG: The source code of "Equivariant Spatio-Temporal Attentive Graph Networks...

The source code of "Equivariant Spatio-Temporal Attentive Graph Networks to Simulate Physical Dynamics" - ManlioWu/ESTAG

github.com

4

9

46

amelie_schreiber

@amelie_iska

5 months

ESM-2 for Generating and Optimizing Peptide Binders for Target Proteins

huggingface.co

1

12

42

amelie_schreiber

@amelie_iska

5 months

Working on a new method to cluster protein-protein complexes so I can finetune ESM-2 on them for predicting PPIs and for generating binders 😊. Also may try to finetune EvoDiff this way for generating binders. I ❤️ proteins so much.

2

1

41

amelie_schreiber

@amelie_iska

6 months

Recently wrote a new blog post on intrinsic dimension of protein language model embeddings and curriculum learning:

Estimating the Intrinsic Dimension of Protein Sequence Embeddings using ESM-2

huggingface.co

1

7

41

amelie_schreiber

@amelie_iska

1 month

Here’s a new method for sampling the equilibrium Boltzmann distribution for proteins using GFlowNets: If you aren’t familiar with GFlowNets, head over to @edwardjhu ’s twitter and watch his video. I’ll also post a link to a related lecture soon.

GitHub - GFNOrg/conf-gfn

Contribute to GFNOrg/conf-gfn development by creating an account on GitHub.

github.com

3

4

40

amelie_schreiber

@amelie_iska

6 months

Just cooked up a new tokenization method for protein language models and large language models. I can't wait to share :)

1

40

amelie_schreiber

@amelie_iska

20 days

Not specifically for proteins or other molecules, but this is a nice intro to flow matching. Thanks for the video @ykilcher any chance you’d ever do something on this applied to proteins?

Flow Matching for Generative Modeling (Paper Explained)

Flow matching is a more general method than diffusion and serves as the basis for models like Stable Diffusion 3.Paper: https://arxiv.org/abs/2210.02747Abstr...

www.youtube.com

0

8

38

amelie_schreiber

@amelie_iska

6 months

QLoRA for ESM-2 and Post Translational Modification Site Prediction

huggingface.co

0

10

35

amelie_schreiber

@amelie_iska

5 months

Clustering Protein Complexes using Persistent Homology and Finetuning ESM-2 for PPI Network...

huggingface.co

3

5

35

amelie_schreiber

@amelie_iska

6 months

In Silico Directed Evolution of Protein Sequences with ESM-2 and EvoProtGrad

huggingface.co

0

10

31

amelie_schreiber

@amelie_iska

2 months

This looks pretty cool! Also helpful for cutting down on expensive MD simulations. Can't believe I'm just now noticing this work.

Towards Predicting Equilibrium Distributions for Molecular Systems...

Valence Labs is a research engine within Recursion committed to advancing the frontier of AI in drug discovery. Learn more about our open roles: https://www....

www.youtube.com

1

6

29

amelie_schreiber

@amelie_iska

2 months

New updates, now we have BioT5+ I'm excited to try this model out.

GitHub - QizhiPei/BioT5: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowle...

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations (EMNLP 2023) - QizhiPei/BioT5

github.com

0

5

21

amelie_schreiber

@amelie_iska

4 months

Shouldn't we be able to do something similar to this with LoRA? LoRA and SVD are conceptually very similar. If so, that would likely explain the results in this paper where LoRA turns out to be better than full finetuning Thoughts?

Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning

Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then...

www.biorxiv.org

0

1

18

amelie_schreiber

@amelie_iska

2 months

Apparently you can in fact do flow matching on discrete data, for those interested in diffusion applied to discrete data like language and NLP, this is a good reference for how to do it with the more general flow matching models:

Jason Yim

@json_yim

2 months

Combining discrete and continuous data is an important capability for generative models. To address this for protein design, we introduce Multiflow, a generative model for structure and sequence generation. Preprint: Code: 1/8

2

92

443

0

1

17

amelie_schreiber

@amelie_iska

2 months

Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer

Accurately mapping protein-protein interactions (PPIs) is critical for elucidating cellular functions and has significant implications for health and disease. Conventional experimental approaches,...

www.biorxiv.org

0

1

14

amelie_schreiber

@amelie_iska

21 days

C_4 symmetric motif scaffolding with RFDiffusion.

0

14

amelie_schreiber

@amelie_iska

1 month

Another E(3)-equivariant model that should be SE(3)-equivariant. E(3) doesn’t preserve chirality of molecules. GitHub:

GitHub - mir-group/allegro: Allegro is an open-source code for building highly scalable and...

Allegro is an open-source code for building highly scalable and accurate equivariant deep learning interatomic potentials - mir-group/allegro

github.com

Machine Learning in Chemistry

@ML_Chem

1 month

Transferable Water Potentials Using Equivariant Neural Networks #machinelearning #compchem

0

3

35

0

1

14

amelie_schreiber

@amelie_iska

8 days

This looks pretty amazing:

0

3

13

amelie_schreiber

@amelie_iska

5 months

So nervous about this one.

0

2

13

amelie_schreiber

@amelie_iska

1 month

Interestingly, quantizing state space models like Mamba doesn't seem to work very well, whereas we are now in the era of 1-bit quantization for transformers ~without~ performance degradation; it also isn't clear if Mamba is as expressive as Transformers.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single...

arxiv.org

2

11

amelie_schreiber

@amelie_iska

6 months

If you have opportunities to work at the intersection of AI and proteins, DM me. I have ideas and I like implementing them :)

2

5

11

amelie_schreiber

@amelie_iska

5 months

Predicting the Effects of Mutations on Protein Function with ESM-2

huggingface.co

0

2

10

amelie_schreiber

@amelie_iska

2 months

Okay, serious question. If you can accomplish the same thing with more general proteins, why restrict yourself to antibodies? Also, what are some problems that really truly require antibodies specifically and that can’t be done with more general proteins?

5

0

10

amelie_schreiber

@amelie_iska

1 month

@TonyTheLion2500 I highly recommend this reference along with his “smooth manifolds” book: Introduction to Riemannian Manifolds (Graduate Texts in Mathematics)

Introduction to Riemannian Manifolds (Graduate Texts in Mathematics)

www.amazon.com

0

10

amelie_schreiber

@amelie_iska

1 month

Seems like an interesting method. I find it very interesting that it works better (SOTA?) if you give it conformational ensembles to work with. Could be very interesting to see how conformational sampling, Distributional Graphormer, or AlphaFlow might yield better results.

Javier Sánchez Utgés

@JavierUtges

1 month

Having a lot of fun visualising the ligand binding site predictions of #IFSitePred with #PyMol ! A new ligand binding site prediction method that uses #ESMIF1 learnt representations to predict where ligands bind! Check it out here: #Q96BI1

0

2

20

2

1

10

amelie_schreiber

@amelie_iska

1 month

(1/n) Even if Sora isn't currently capable of accurately generating simulations of small molecules or proteins, open sourcing it or giving select researcher access to it would allow us to add in equivariance or use components of it such as those that maintain temporal coherence.

4

0

10

amelie_schreiber

@amelie_iska

2 months

This looks pretty interesting! Text guided diffusion model for molecules.

GitHub - Deno-V/tgm-dlm: Code for AAAI24 paper Text-Guided Molecule Generation with Diffusion...

Code for AAAI24 paper Text-Guided Molecule Generation with Diffusion Language Model - GitHub - Deno-V/tgm-dlm: Code for AAAI24 paper Text-Guided Molecule Generation with Diffusion Language Model

github.com

0

9

amelie_schreiber

@amelie_iska

8 days

@olexandr Why not use PAE from RoseTTAFold All Atom to compute the LIS score similar to this:

GitHub - flyark/AFM-LIS: Local Interaction Score (LIS) Calculation from AlphaFold-Multimer (Enhan...

Local Interaction Score (LIS) Calculation from AlphaFold-Multimer (Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer) - flyark/AFM-LIS

github.com

0

1

9

amelie_schreiber

@amelie_iska

1 month

Having solid temporal coherence, or modifying the architecture to be SE(3)-equivariant would allow us to create better versions of things like this: and we might actually be able to replace MD with AI, speeding up drug discovery and solving major problems

Equivariant Spatio-Temporal Attentive Graph Networks to Simulate...

Learning to represent and simulate the dynamics of physical systems is a crucial yet challenging task. Existing equivariant Graph Neural Network (GNN) based methods have encapsulated the symmetry...

openreview.net

0

6

amelie_schreiber

@amelie_iska

1 month

To all those just getting into this stuff: You’re entering one of the most interesting and impactful areas at the most exciting time. Don’t give up, even when it feels impossible. Stay close to the open source biochem AI community. They’re a great crowd. Good luck and have fun!

1

0

7

amelie_schreiber

@amelie_iska

1 month

Selectively modulating PPI networks by designing high affinity and high specificity binders with RFDiffusion and checking that with AF-Multimer LIS score seems like low hanging fruit to me. What reasons might there be for this not being very actively & heavily worked on?

2

0

6

amelie_schreiber

@amelie_iska

1 month

Computational efficiency in equivariant models is often a concern. This model addresses that and creates fast SE(n)-equivariant models for tasks involving molecules:

Fast, Expressive SE$(n)$ Equivariant Networks through...

Based on the theory of homogeneous spaces we derive geometrically optimal edge attributes to be used within the flexible message-passing framework. We formalize the notion of weight sharing in...

arxiv.org

0

1

6

amelie_schreiber

@amelie_iska

5 months

Eeep! It's wooorkiiing! So excited! 😊 I'll write a hf blog post on it once it's all done.

0

1

6

amelie_schreiber

@amelie_iska

2 months

Crowdsourcing suggestion…if you could selectively disrupt or augment a pathway or PPI network, where would you start? Assume you can block any PPI, or augment the PPI network by designing proteins that create intermediary interactions (ex: proteins that bind/link two others)

3

0

5

amelie_schreiber

@amelie_iska

6 months

Persistent Homology Alignment (PHA): Replacing Multiple Sequence Alignments using ESM-2 and...

huggingface.co

0

2

6

amelie_schreiber

@amelie_iska

2 months

@alexrives I have a method for detecting AI generated proteins that I would like to open source at some point if people are interested. It seems to work on proteins generated by most models out right now, although there are a couple models it does not work for, hesitant to say which ones.

2

0

5

amelie_schreiber

@amelie_iska

2 months

@maurice_weiler @erikverlinde @wellingmax Could someone recommend a similar resource for other architectures like equivariant transformers or equivariance in geometric GNN models? Just curious what the go to resources are for people for other architectures.

2

0

4

amelie_schreiber

@amelie_iska

5 months

Now, using persistence landscapes we can cut down clustering time from a full day to less than 30 minutes for 1000 proteins!

Faster Persistent Homology Alignment and Protein Complex Clustering with ESM-2 and Persistence...

huggingface.co

0

2

5

amelie_schreiber

@amelie_iska

4 months

@pratyusha_PS This is awesome. When will the code be available? I would love to try this with a protein language model like ESM-2 and see if it improves performance.

2

0

5

amelie_schreiber

@amelie_iska

2 months

@samswoora You should also check out flow matching models. Flow matching generalizes diffusion (diffusion is a special case of flow matching). They're doing a lot with proteins and flow matching, but there's less buzz about it in vision and language domains.

2

0

5

amelie_schreiber

@amelie_iska

1 month

Attempting to raise my signal to noise ratio today by making some quality posts about AI and biochemistry. 😊

0

4

amelie_schreiber

@amelie_iska

8 days

@310ai__ It might also be good to look into computing the LIS score based on the PAE output of RoseTTAFold All Atom similar to what was done with AlphaFold-Multimeter here. This is a new approach for protein-small molecule complexes.

Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer

Accurately mapping protein-protein interactions (PPIs) is critical for elucidating cellular functions and has significant implications for health and disease. Conventional experimental approaches,...

www.biorxiv.org

1

4

amelie_schreiber

@amelie_iska

2 months

@biorxiv_bioinfo Cool idea, but how was the dataset split into train, test, and validation? Was sequence similarity/homology used to split the protein dataset? If not, this paper's results are unreliable. You have to split your data based on sequence similarity; 30% similarity is pretty standard

0

3

amelie_schreiber

@amelie_iska

5 months

Anyone have any idea why in silico directed evolution might increase perplexity and intrinsic dimension of a protein? Are more fit proteins generally more complicated?

3

0

4

amelie_schreiber

@amelie_iska

3 months

@GabGarrett CLIP but for proteins and small molecules...

1

0

4

amelie_schreiber

@amelie_iska

5 months

This would be cool for proteins I'd love to try and use this for designing protein-protein complexes in sequence space. Too bad the code isn't released.

Meet in the Middle: A New Pre-training Paradigm

Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the...

arxiv.org

2

0

4

amelie_schreiber

@amelie_iska

2 months

@HannesStaerk Still REALLY want to see this done with AlphaFold-Multimer. Maybe there’s a dynamic model of PAE and LIS that comes out of this that helps determine how strong or transient a PPI is.

Enhanced Protein-Protein Interaction Discovery via AlphaFold-Multimer

Accurately mapping protein-protein interactions (PPIs) is critical for elucidating cellular functions and has significant implications for health and disease. Conventional experimental approaches,...

www.biorxiv.org

1

2

4

amelie_schreiber

@amelie_iska

5 months

@andrewwhite01 You can also learn equivariance. I think equivariance is an overrated mathematical concept tbh. It's fancy and neat from a mathematical perspective, but otherwise I think you could have your network learn it and get just as far if not further.

0

4

amelie_schreiber

@amelie_iska

1 month

AlphaFlow-Multimer with the appropriate generalization of the LIS score would more or less solve PPI prediction. LIS alone already mostly solves it. Then the only bottleneck for giant detailed PPI networks is compute. This is a big deal. Explain to me why I might be wrong.

0

4

amelie_schreiber

@amelie_iska

1 month

Hot take for some, obvious to others: GPUs and LLM oriented ASICs along with AI operating systems will make CPUs mostly obsolete. Anyone out there capable of writing CUDA kernels who can explain why this might be an erroneous prediction?

2

0

4

amelie_schreiber

@amelie_iska

1 month

@HannesStaerk @chaitjo @SimMat20 @ADuvalinho Just found this and it seems to address some of the concerns over computational cost of equivariant architectures:

Fast, Expressive SE$(n)$ Equivariant Networks through...

Based on the theory of homogeneous spaces we derive geometrically optimal edge attributes to be used within the flexible message-passing framework. We formalize the notion of weight sharing in...

arxiv.org

0

3

amelie_schreiber

@amelie_iska

8 days

This is pretty cool 😎

Moderna partners with OpenAI to accelerate the development of...

Moderna deployed ChatGPT Enterprise to thousands of employees, seeing remarkable adoption across the company. They’ve created hundreds of custom versions of ...

www.youtube.com

0

1

3

amelie_schreiber

@amelie_iska

23 days

Really cool channel. Maybe we’ll get a video on SE(3)-equivariant neural networks one day🤞This would be great for folks trying to understand new SOTA models for proteins and small molecules. I would totally be down to collaborate @mathemaniacyt 🧬

Mathemaniac

@mathemaniacyt

24 days

Why do we require Jacobi identity to be satisfied for a Lie bracket? In the process, we also understand intuitively why tr(AB) = tr(BA) without matrix components. Watch now:

2

104

613

0

3

amelie_schreiber

@amelie_iska

5 months

@ylecun Also very skeptical of quantum and it's application to deep learning.

Is quantum advantage the right goal for quantum machine learning?

Machine learning is frequently listed among the most promising applications for quantum computing. This is in fact a curious choice: Today's machine learning algorithms are notoriously powerful in...

arxiv.org

0

2

amelie_schreiber

@amelie_iska

1 month

@MIT_CSAIL Using random train/test splits when the data should be split based on some similarity metric, especially for proteins/small molecules, to determine if the model generalizes well to unseen data. Also using E(3)-equivariance instead of SE(3) for small molecules/proteins.

0

3

amelie_schreiber

@amelie_iska

1 month

It would be very interesting and useful to see how this could be used in tandem with the following method for detecting binding sites of conformational ensembles of proteins using ESM-IF1:

GitHub - oxpig/binding-sites

Contribute to oxpig/binding-sites development by creating an account on GitHub.

github.com

0

3

amelie_schreiber

@amelie_iska

26 days

@nomad421 Is this at all related to evolutionary velocity of proteins as described in the paper for this model?

GitHub - brianhie/evolocity: Evolutionary velocity with protein language models

Evolutionary velocity with protein language models - brianhie/evolocity

github.com

0

3

amelie_schreiber

@amelie_iska

2 months

@lexfridman @sama Can something like Sora one day be used for molecular dynamics simulation, perhaps along with Gaussian splatting?

0

3

amelie_schreiber

@amelie_iska

6 months

@biorxivpreprint I'm so fascinated by how geometric compression, information theoretic compression, and LoRA or QLoRA all seem to be closely related. Should we be choosing our ranks based on perplexity or intrinsic dimension? Also, LoRA and QLoRA end up regularizing models! How neat!

0

2

amelie_schreiber

@amelie_iska

1 month

@naterbennett0 Will this be attempted with all atom models, or would that not make much difference? Also, what pain points are blocking progress to better performance? Architecture? Data? Is more physics needed? Something else? Maybe there’s some hairy math in the way I could grapple with?

0

3

amelie_schreiber

@amelie_iska

1 month

Toward AI-Driven Discovery of Electroceuticals - Dr. Michael Levin

Bioelectric networks as targets for regenerative medicine

www.youtube.com

0

3

amelie_schreiber

@amelie_iska

3 months

@KevinKaichuang @WChentong $\exists$ GitHub?

2

0

2

amelie_schreiber

@amelie_iska

2 months

@gallabytes @samswoora Try out these: Frank Noé's work is pretty cool in general. Let me know if you find others related to proteins, small molecules, DNA, or RNA.

GitHub - noegroup/bgflow: Boltzmann Generators and Normalizing Flows in PyTorch

Boltzmann Generators and Normalizing Flows in PyTorch - noegroup/bgflow

github.com

0

3

amelie_schreiber

@amelie_iska

1 month

@FrankNoeBerlin @CecClementi The turtles are adorable as are the others. Kinda makes me wanna play Subnautica 😆

0

2

amelie_schreiber

@amelie_iska

2 months

@soldni @RekaAILabs @YiTayML Apple’s new multimodal model MM1 uses JAX

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful...

arxiv.org

0

3

amelie_schreiber

@amelie_iska

1 month

Claims of superiority of the model don't appear until late in the paper and are completely absent from the abstract and the first part of the paper, which gives off confidence vibes. We're all tired of the SOTA claims appearing in every abstract these days.

0

1

amelie_schreiber

@amelie_iska

1 month

@mmbronstein @BlumLenore 😂nice…I’ve actually read a lot of this…can confirm it is a good read. I need to reread the sections on equivariant GNNs and attention. It’s been a while.

1

0

2

amelie_schreiber

@amelie_iska

6 months

@HWaymentSteele @Nature @ademi_moyin You may find this new work interesting! They do something similar without using MSA, making the method fast:

Evolutionary selection of proteins with two folds

Nature Communications - Most globular proteins are selected to fold into one unique structure. Schafer and Porter demonstrate that some proteins are selected to assume two stable folds; they...

www.nature.com

1

0

2

amelie_schreiber

@amelie_iska

1 month

Nm. LOL.

MambaByte: Token-free Selective State Space Model

Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this...

arxiv.org

0

2

amelie_schreiber

@amelie_iska

5 months

@KevinKaichuang Can you say more about the "better protein training" one?

0

2

amelie_schreiber

@amelie_iska

1 month

Mamba trained on zeros and ones without tokenization when?! Someone REALLY need to do this. Could be a game changer and the long context is perfect for such an experiment.

2

0

2

amelie_schreiber

@amelie_iska

2 months

@jchodera @HannesStaerk @GrantRotskoff @UWproteindesign @loukasa_tweet @KevinKaichuang @fra_grisoni @thesteinegger @marwinsegler @Blendenfleck @open_fold @nickpolizzi_ @inve_michele @AnimaAnandkumar @VStimper @HeinzingerM @simonduerr @__kraemer__ @igashov (not my work) for binding affinity. Maybe the two together would be a good combo.

GitHub - amelie-iska/u_GET: (Unofficial) GET for proteins

(Unofficial) GET for proteins. Contribute to amelie-iska/u_GET development by creating an account on GitHub.

github.com

0

2

amelie_schreiber

@amelie_iska

2 months

@kharlikesticker @xennygrimmato_ People always say the benchmark must be bogus once it is solved. People did the same thing with Hinton and his group when AlexNet did so well on image classification. "Oh, well the benchmark is clearly flawed then if a neural network solved it." In hindsight it looks too easy.

1

0

1

amelie_schreiber

@amelie_iska

6 months

@jinyuansun39143 You should include information about EvoDiff in its knowledge base.

1

0

2

amelie_schreiber

@amelie_iska

3 months

@BoWang87 For function prediction, this looks quite good: Similarly for small molecules: I'm waiting for the big splash this will inevitably make. Thoughts? The use a CLIP based approach and get SOTA (but actually).

0

2

amelie_schreiber

@amelie_iska

2 months

Could be extended to multiple protein fragments too I suppose. I'm curious to know how others are scaffolding various things.

0

2

amelie_schreiber

@amelie_iska

22 days

@Lauren_L_Porter @tiwarylab Have you looked into things like Distributional Graphormer, Timewarp, Boltzmann generators, GFlowNets, AlphaFlow, or other methods based on flow matching for sampling the Boltzmann distribution?

1

0

2

amelie_schreiber

@amelie_iska

1 month

@mmbronstein @BlumLenore This looks really interesting.

0

2

amelie_schreiber

@amelie_iska

28 days

@FrankNoeBerlin @adad8m Agreed. I think Quantum ML doesn’t really work. I think Scott Aaronson has a really sober perspective on this stuff. This is also very indicative of the state of Quantum ML:

Is quantum advantage the right goal for quantum machine learning?

Machine learning is frequently listed among the most promising applications for quantum computing. This is in fact a curious choice: Today's machine learning algorithms are notoriously powerful in...

arxiv.org

0

2

amelie_schreiber

@amelie_iska

4 months

@KevinKaichuang Depends on the class. Maybe this will help. I guess you'd need some kind of OCR conversion from handwritten to PDF though.

GitHub - facebookresearch/nougat: Implementation of Nougat Neural Optical Understanding for...

Implementation of Nougat Neural Optical Understanding for Academic Documents - facebookresearch/nougat

github.com

0

2

amelie_schreiber

@amelie_iska

23 days

@arankomatsuzaki This looks inspired by or related to this paper by @pratyusha_PS

The Truth is in There: Improving Reasoning in Language Models with...

Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. Correspondingly, significant resources are allocated towards research that aims to further advance...

arxiv.org

1

0

2

amelie_schreiber

@amelie_iska

1 month

I have come to realize every tool added to Copilot makes it better and more useful. This platform/product (not a model) is leading the way in natural language interfaces to advanced biochem AI and computational biochemistry.

1

2

amelie_schreiber

@amelie_iska

23 days

@Geronimo_AI @arankomatsuzaki Check out this paper and you’ll get an idea of why this might be better than full fine tuning

The Truth is in There: Improving Reasoning in Language Models with...

Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. Correspondingly, significant resources are allocated towards research that aims to further advance...

arxiv.org

0

2

amelie_schreiber

@amelie_iska

5 months

@EvaSmorodina How do I try it?

1

0

2

amelie_schreiber

@amelie_iska

1 month

Friendly reminder: You can wish those celebrating it a happy Easter AND support transgender people. Just sayin’. I did it, and I’m not even a Christian. Here’s to elevating the conversation, raising our signal to noise ratio, and being a little more chill and supportive.

0

2

amelie_schreiber

@amelie_iska

2 months

@amyxlu I want to see this applied to MD simulation so bad. Perhaps coupled with image-to-3D. I'm curious to know what approach people would take.

1

0

2

amelie_schreiber

@amelie_iska

5 months

@KevinKaichuang You could try ordering the data such that the intrinsic dimension of the embeddings associated to the data gradually increases. This might smooth things out some. With so few examples it may not help much, but it's worth a try.

1

0

2

amelie_schreiber

@amelie_iska

6 months

Is there noise added to the embeddings by QLoRA due to fitting everything into bins (quantization)? If so, I think there is a connection to NEFTune which might explain the improved performance of QLoRA over full finetuning. Thoughts?

1

0

2

amelie_schreiber

@amelie_iska

1 month

@mmbronstein @BlumLenore Would love an updated expanded section on equivariant GNN and transformers. Just putting that out there 😎

1

0

2