We wrapped up the first LLM hackathon for applications in materials and chemistry last week. The results to me were astounding. We are at the point now where some tasks that took years can now be completed in days. Here is a list of the fantastic submissions!
Did you know you can create a Google Scholar profile for your research group?
While GS is mostly used to track individual metrics, this process allows you to track and highlight team and project metrics.
Here's 5 steps to get to this ๐
#AcademicChatter
#AcademicTwitter
It's been a wild week already in
#ML
/
#AI
for science. Advancements in using diffusion models for protein folding, using learned potentials to discover new catalyst materials, a proposed battery data genome to speed energy storage material discovery, and so much more! ๐งต (1/8)
Spent 2 hours this morning with Claude 3, and it's the most intensely I've been shocked yet.
The Claude 3 Opus understanding of complex scientific topics is far ahead of GPT-4 on my self-made qualitative evals. I'd guess mid to advanced PhD level understanding of the topics
Had to recommend rejecting a paper today, which I hate doing. It was an ML application paper and after multiple rounds of asking, authors refused to share ~any~ data or code. ๐ฉHow can such be reasonably reviewed??
Looking for a way to easily access the highest quality ML-ready datasets in materials science and chemistry? Look no further!
๐ Foundry-ML ๐
Read the paper or visit the website below.
๐ฐ:
Web โจ:
A great partnership of
@hankgreen
A foul isnโt a third strike unless itโs a swinging fouk tip that goes into the catcherโs glove - but a bunt attempt on a second strike that goes foul is a strikeout. Also if a catcher drops the third strike the runner must be tagged out or thrown out at first base. ๐ตโ๐ซ
๐How can we use LLMs to accelerate scientific discovery? Let's find out! This year, hundreds of people from across the globe worked together in a hackathon to BUILD groundbreaking prototypes โ showing the path to breakthroughs in next generation batteries, sustainability,
Post a picture of the
@matplotlib
plot you are most proud of. Extra credit: link to the code to reproduce the plot any papers it may have appeared in! ๐
Defne Circi and Shruti Badhwar developed GraphInsight to automatically create materials knowledge graphs via entity and relationship extraction with GPT4 backed by
@neo4j
. Their prototype example showed development of a materials knowledge graph for polymer nanocomposites. ๐
I had an amazing time being part of the LLM Materials / Chemistry Hackathon! I learned so much, and it was a pleasure to work with
@yborg_2014
. Together, we worked on a tool that uses
#LLMs
and data visualization to create materials knowledge graphs. Check out our video!
โจTrillion Parameter Models in Scienceโจ
We present an initial vision for a shared ecosystem to take the next step in large language models for scientific research โ Trillion Parameter Models (TPMs).
#LLM
are becoming more powerful by the day. But, there is still work done to
@SamCox822
and Mark Caldras developed composable tools that combine structured queries to the Materials Project with LLM context creation, synthesis prediction, property prediction and more.
@LangChainAI
๐
Check out our submission for the LLM March Madness Hackathon! We created some tools for materials to use with LLMs, using LangChain and the Materials Project API. Check it out here!
@Kyam888
It's been quite sad to see so many remote science collaboration and learning opportunities removed this year. I am all for in person, but remote options dramatically increase accessibility and visibility.
For the past few years, I've been tracking the number of AI/ML-related publications in scientific domains. As expected, it's been a record year for AI/ML in the sciences. Here is my final official update covering 2021 publications.
#AI4Science
#MachineLearning
#OpenScience
๐กUnlocking New Frontiers: 2nd LLM Hackathon for Applications in Materials and Chemistry๐ก
Join us on May 8-9th for the 2nd Large Language Model Hackathon for Applications in Materials and Chemistry! This hybrid event is designed to connect brilliant minds to explore
@jakublala
and Sean Warren developed a conversational interface to 3dmol.js showing the power of incorporating language models on future interface design. ๐ฅณ
As a part of the
#llmhack
, me and Sean Warren developed MolVerse, a language interface to 3dmol.js: a web app for all chemists and biologists that want to visualise their (bio)molecules without needing to code. Here we load the structure of benzene:
We published a 780 GB dataset of quantum calculations this morning, and a 7.1 TB peptide assembly dataset this afternoon. Technical details on these soon. The Materials Data Facility offers researchers unrivaled capabilities to share materials data!
๐Excited to announce the speaker list for the LLM Hackathon for Applications in Materials and Chemistry.
This year, we will hear from experts from industry and academia including
๐ถElsa Olivetti (
@OlivettiGroup
,
@MIT
),
๐ทMarwin Segler (
@marwinsegler
,
@Microsoft
), ๐ถMichael
@dhuang26
@MaximZiatdinov
Right, but the thing that shocked me is that it was able to come up with the solution we found that took top tier chemists ~1 year to formulate through various in-lab failures. Claude did this in one shot - for 5 cents. So, it gets potentially much easier to find fruitful paths
The
#LLM
hack kicked off with 3๏ธโฃ inspiring speakers
๐ท
@andrewwhite01
discussed ways to find chem information in 2023, a paper-qa agent extension, direct synthesis to property prediction via LLMs, and the answer to all my questions - it's not 42.
#ML
#AI
I deeply appreciate everyone who is here and engaging. But, Iโm feeling quite heavy about the loss of a significant fraction of the Twitter science community and all that weโve already collectively missed out on and what we will miss out on. ๐
For drug discovery we need to understand molecular orientation in 3-dimensions. The GEOM dataset (Simon Axelrod and Rafa Gรณmez-Bombarelli) contains 37M conformations for >450k molecules - quickly becoming a benchmark in the field of molecular generative modeling.
@mit
@Harvard
Iโm thrilled to announce the
@NSF
Garden project๐ฑ. The AI4Science community has already seen tremendous successes. Yet, these advances are available to only a few specialists. We will make these advances ethically accessible to all researchers. ๐งต
We recently set up a Google Scholar profile for our research group! The idea is to allow the team - and others - to see one type of progress as a cohesive whole rather than only at the individual level.
@ianfoster
@chard_kyle
๐ขATTENTION! ๐จ Virtual hackathon alert! ๐ฅ
Excited by
@openai
's release of GPT-4?
๐กWe're bringing together the brightest minds to tackle problems via
#LLM
and create datasets in energy storage, materials/drug discovery and more.
#LLMHackathon
@andrewwhite01
@rachel_l_woods
โจ Yay, I can finally announce Globus Compute! โจ
๐ป Globus Compute enables cloud-managed fire and forget computation, running functions on computers ranging from laptops to cloud to supercomputers. This sounds abstract, but it's already enabling insane productivity boosts.๐
It was a very scary last month. A series of events and, in particular, people lined up perfectly to very likely save my life. More details on that sometime later, but the good news is that Iโm expected to make a complete and durable recovery with no long-term effects.
Itโs easy
Today I hit 1000 Twitter followers. โจI know this is a small amount but itโs been amazing getting to know you all on
#AcademicTwitter
. Thank you everyone! With this new network, some amazing things in
#OpenScience
have already happened hereโฆ
๐ข Attention students! ๐ข We're a team of scientists, researchers, and software engineers dedicated to creating software and services to accelerate scientific progress.
We build tools used by thousands of researchers to publish datasets and models, create state-of-the-art AI
Very intriguing work by Daniel Schwalbe-Koda et al. where they propose a new framework grounded in information theory that unifies key aspects of:
1. predictions of phase transformations,
2. kinetic events,
3. dataset optimality, and
4. model-free UQ from atomistic
@MaximZiatdinov
It still can't actually "do" anything, so you're safe for a while. But it did correctly guess the solution to what I thought was the hardest unpublished aspect of a very tricky materials/chem problem from my grad school days. It also had two other ideas, but the idea we
We are less than a week from the virtual hackathon for
#LLM
applications in materials science and chemistry. Here is some inspiration from other's applications to drive you!
๐๏ธ Register and start teaming today:
#hackathon
#openscience
#ML
#AI
I really dislike large kickoff meetings (>10) that spend half the time with individual intros.
Nobody is listening because they are figuring out how to describe themselves! Instead, set up a doc, spend 5 min having people write/read there. Also this is searchable in the future.
A team led by
@QaiAlex
fine tuned GPT-3 on examples from the open reaction database (ORD) and applied this to extract structured data from synthesis sections of papers. ๐งโ๐ฌ
Use LLM to parse free text synthesis recipes to structured data! With only 300 training pairs, the fine tuned model can already pick up chemical identities/amounts and generate valid JSON in ORD schema. Check out our video demo!
For the past few years, I've been tracking the number of AI/ML publications in scientific domains. With months of data yet for 2021, materials science, chemistry, and physics have surpassed their ML publications from 2020. Usually the numbers settle in by February...
#AI4Science
Pleased to announce the Materials Research Coordination Network (MaRCN)! ๐พMaRCN๐พ is an
@NSF
funded project focused on concepts in metadata standards, open science, shared benchmark problems, FAIR as applied to ML models and training protocols, and more.
@kevinmawright
@DataFacility
publishes and hosts multi TB datasets - I think our largest is 10 TB. We see a lot of reuse in materials science and chemistry adjacent applications at least!
For many datasets, the question of what to preserve long term is challenging though.
I'm writing a proposal on making
#machinelearning
in science more accessible. What are the biggest things that keep you from using ML in your research? Finding models, lack of expertise, adjusting for a new task, software and hardware incompatibilities, unknown model quality?
@DrAnneCarpenter
Maybe itโs worth considering just posting the preprint and authors copy on your website? Google scholar makes these links automatically show up in search. 40k is too much!
@_akhaliq
presents NeuralPLexer to predict protein-ligand structures. This will help understand the interaction between small molecules and proteins. Also with diffusion models โจ (3/8)
I absolutely love the fact that there are people in our group that know more than me about many technical topics. Itโs not a sign of weakness or insufficiency, itโs how you know youโre building a real team.
In academia this can be almost impossible to achieve, but I'm thankful every day that my parents live 10 min away and my kids are able to see them almost every day. Do I regret not applying to TT positions at random Unis? Nope. We're building something special in the lab and life
Last year, I had a conversation that changed my life.
It caused me to upend everything and move across the country.
The lesson from it may change yours:
Iโve been working late nights to help my kidโs new school understand and optimize their air quality. With just a few free and simple tweaks, measurements today showed that we reduced CO2 concentration 2.5x already. This will improve attendance, improve learning, and lessen the
Twitter is clearly tweaking the algorithm, and not in a good way. Yesterday, I was getting a lot of replays of tweets I'd already seen. Also, moticed all week that I'm not seeing content from people I respect (that are tweeting), and more from random "influencer" accounts. ๐
I have no idea how this works or how it could be healthy for a specific field. Counted roughly 170 authored publications in the 9.5 months of 2022. Anyone know the story?
Does it feel like you are seeing more impactful
#ML
and
#AI
for science publications? This probably explains it. ๐
๐ We've seen strong continued growth in
#AI
and
#ML
for science across a broad set of domains including materials science, chemistry, physics and more.
@WardLT2
and colleagues at
@argonne
,
@INL
, and more propose a Battery Data Genome to speed development of energy storage materials. Exploring in depth the needs for data, software, models, and community.
@ENERGY
@jam3es
#OpenScience
(7/8)
So excited our paper is published! It describes our on-going discussions started in Nov'19 on how to make battery data science possible and easier for a larger community of researchers
@Deepmind
@AxelrodSimon
show a learned force field performs equal or better to traditional DFT methods for finding interesting configurations at catalyst surfaces via
#OpenCatalyst
. ๐ (5/8)
New research shows that learned force fields are ready for use in catalyst discovery.
Here, โEasy Potentialsโ outperform classical quantum chemistry tools on the
#OpenCatalyst
challenge and find lower energy structures outside of the training set: 1/2
We are closing in on breakthroughs in fusion (maybe 15 years until deployment). Wind, solar, and energy storage costs are plummeting. AI breakthroughs every day. Technical problems can be solved, but how do we make similar progress on social issues?
Materials Genome Initiative (MGI) 2.0 is released! The MGI effort to speed the discovery and deployment of new materials enters its second decade boasting notable successes.
@NIST
@ENERGY
@NSF
@DARPA
@NASA
@DeptofDefense
PDF:
Web:
Ok here is one example set. Microencapsulation of adhesive materials (e.g., cyanoacrylate and epoxy curing agent). Starting with a general question of how to encapsulate cyanoacrylate, Claude first identifies 3 of the main encapsulation techniques interfacial, in situ,
Spent 2 hours this morning with Claude 3, and it's the most intensely I've been shocked yet.
The Claude 3 Opus understanding of complex scientific topics is far ahead of GPT-4 on my self-made qualitative evals. I'd guess mid to advanced PhD level understanding of the topics
I miss in-person conferences, but have to admit something. Iโve made more diverse contacts and started more collaborations in 3 months of
#AcademicTwitter
than Iโd expect from 2 yrs of a full slate of conferences. Online communication with the right audience is just so scalable.
Amazing to have
@argonne
called out specifically by
@ericschmidt
as one of the places advancing automated labs in his piece on how
#Ai
will transform science.
Our team will be submitting an expansive article Monday to
@digital_rsc
-
@A_Aspuru_Guzik
reporting our team's efforts
Taking notes: do not call
@arxiv
a cancer
I personally submit everything that I can to ArXiv because it means the important information gets to start percolating MONTHS earlier and perpetually for free to other researchers.
Thank you to all those who tirelessly run these
Kevin Jablonka described his latest work on LLMs applied to chemical discovery. He is building software to make using these models ridiculously simple.
@kmjablonka
#llmhack
@EPFL_en
Recording available later
๐ Exciting News for Open Source Enthusiasts in Science! We're launching an innovative online hub to bridge the gap between talented developers and cutting-edge open source science projects. ๐
Are you leading a project and eager to expand your team of contributors? Connect
Proteins are hot! A protein/structure generative model makes generation possible at previously inaccessible scales - from way back at the end of May. ๐ฅต (4/8)(1/8)
Super excited about all the possible applications for these new protein diffusion generative models from
@namrata_anand2
! The contrast between diffusion for 3D structures vs images is fascinating. Great to see this progress on the structural side.
It's amazing what a simple web interface widget can do to change user behavior. As such, Google Scholar has figured out that most researchers will do hours of work just to complete a progress bar...
#OpenScience
Elias Moubarak and team developed ClipDigest to summarize videos and fetch structured information from known databases (e.g., information on specific molecules mentioned) and speed learning. ๐
๐ซ That's it. Now you can track citations and impact for your research group or shared research projects! I hope this can help encourage focus and celebration of team metrics instead of just individual metrics.
โ Follow me for more tips and tricks like this
#MaterialsDataMonth
Looking to find materials informatics resources? ๐ Evgeny Blokhin and co. maintain an "Awesome List" of software, cloud services, dataset repositories, and standards. What are your favorite resources? Post them, and I will add them!
#MaterialsDataMonth
Zhi Hong et al have created ScholarBERT, the largest and most diverse scientific language model - trained on a 221B token scientific lit. dataset spanning disciplines. Interestingly, performance was similar among all BERT models across benchmark tasks.
My mother called me today super excited ~3 times~. She wanted to tell me that she found a website that can write letters, do some math, and answer almost any question. So yeah, ChatGPT must be getting pretty close to the top of the saturation curve. ๐
Today I learned that Olympic figure skating gold medalist Nathan Chen's sister is a founder of likely CRISPR unicorn bio startup
@mammothbiosci
. It would be so interesting to hear about families like this in detail.
We start off the month with a quick profile of the Center for Hierarchical Materials and Design (CHiMaD). CHiMaD is a
@NIST
-funded flagship Materials Genome Initiative (MGI) research center, strongly embracing MGI AI/ML/data approaches
#MaterialsDataMonth
Just scripted a task that would have taken me 3h manually or 2h to automate previously. Now, took 15 min of scripting by passing the API documentation to ChatGPT and asking it for the script + minor debug time. Balance of time for writing scripts has drastically changed. ๐ช
Meet BOLLaMA! If you have ever wondered how to optimize a chemical reaction quickly, Bayesian Optimization might be for you!
It's not very accessible tho: BOLLaMA tackles this by introducing an LLM-powered assistant, running
@6ojaHa
's BO in the backend!
#llmhack
Great interview with Rafa Gomez-Bombarelli (
@MIT
), one of my favorite collaborators, discussing how
#MachineLearning
is used for materials discovery. Some light moments discussing ML-driven zeolite discovery, peptide "invention", drug discovery, and more.
@andrewwhite01
and I have something cooking up for Tuesday next week. If you're interested in applications of large language models for materials science and chemistry you're going to love it. More details as soon as possible!
#LLM
@rachel_l_woods
๐ Updated AI/ML Publication Data for 2023! ๐
It feels like we are hearing about new advancements in
#ML
and
#AI
every day now. But, is that translating to publications in the sciences? ๐ก
2023 Numbers
๐ Materials Science soared with a 18.5% increase in publications over
@andrewwhite01
An under appreciated aspect of LLMs is that we may potentially solve new _classes_ of problems considered hopelessly complex by being able to connect the dots across millions of publications and other sources. The battle begins against aging, cancer, brain related illnesses...
Iโve been here on Twitter for 3 years now. Itโs been so rewarding getting to know many of you and learn about research I never would have found through other venues.
Thank you all!
๐๐๐ ๐ฉโ๐ฌ๐ฌ๐จโ๐ฌ ๐๐๐
As part of the Bayesian Optimization Hackathon today, I've started an "Awesome List" of relevant tutorials, software, tools, datasets, and more. If you have favorite resources, reply with them here, or issue a pull request to the repo
Repo:
I'm pumped for the BO hackathon tomorrow! Nearly 400 registrants and 36 confirmed projects and counting ๐ The hackathon website with all the necessary info is at . See you soon!
Batteries are a cornerstone of future clean energy and transportation. ๐ ๐ฌ๏ธ ๐ ๐
Today
@argonne
released electrochem data from >600 Li-ion pouch cells w/ varying chemistries. This is a treasure trove of data to speed
#ML
/
#AI
applications in energy storage.
@ENERGY
#OpenScience
February 2023
#ML
and
#AI
for Science thread includes:
๐ท Machine learned MOF potentials
๐ถ Open brain MRI dataset
๐ท ML for protein design (Hot topic๐ฅ)
๐ถ New open solar material database
๐ท ChemNLP and
#GPT3
modeling
๐ถ National AI Research Resource Report
+more
#OpenScience
I'm just getting started with
#AcademicTwitter
. If you are interested in topics of
#DataScience
,
#AI4Science
,
#AI
, and data infrastructure especially as applied to discovery in materials science, physics, chemistry, and more, give me a follow!
If you EVER see the line โData available upon reasonable requestโ in a manuscript you are working on, please send the lead researcher this way. There are petabytes of storage available for scientists, and your research is too valuable to remain hidden. We've got you.
#OpenScience
It is thus no surprise that
#ML4PS2022
@NeurIPSConf
is hot sauce. We can only expect these breakthroughs to accelerate in the coming months and years. Submission has passed here, but please consider participating! (8/8)
We got an astounding 253 submissions to our
#ML4PS2022
workshop
@NeurIPSConf
!
Dear reviewers, we are counting on you!
Don't forget the review deadline is October 14.
Thank you to the reviewers, organizers, and authors for contributing to this workshop & fostering the community!
Imagine a future where labs combine human ingenuity, AI, HPC, and robotics with reconfigurable modules to realize endless discovery. From education to materials, to biology, we've prototyped 5 use cases. Let's take a look at some concepts! ๐งต
@argonne
@argonne_lcf
@ENERGY
In 2023, we should incentivize and reward data products and software equivalently to journal pubs. As we approach an
#AI
-centric research enterprise, software, services,
#openscience
, and data products are key components that will automate, connect, and unify distributed effort.
I finally convinced Logan Ward (
@WardLT2
) to join Twitter! He does amazing work in applied
#machinelearning
for materials/chemistry, and development of
#openscience
software like Matminer, Colemena, DLHub, and many more. Give him a follow! โจ๐ค๐
#AcademicTwitter
@PeakSquirrel
@CT_Bergstrom
@stripe
@patrickc
It's hard to imagine a reasonable or fair process that would allow for shedding 50% of a company the size of Twitter in a week. It's intentionally antagonistic and capricious to show what is coming.