Matlantis - Million years of research acceleration with universal neural network potential-based SaaS

Kosuke Nakago, Rudy Coquet Preferred Networks, Inc/Preferred Computational Chemistry, Inc.
Matlantis - Million years of research acceleration with universal neural network potential-based SaaS

Topic of this talk 2 • Introduce paper “Towards Universal
Neural Network Potential for Material Discovery” • Impact made by cloud service Matlantis – Universal High-speed Atomistic Simulator https://www.nature.com/articles/s41467-022-30687-9 https://matlantis.com/

Table of Contents • NNP introduction • Creating “Universal” NNP
, PFP • Providing as SaaS Matlantis – Million years of research acceleration • Summary 3

NNP introduction 4

Neural Network Potential (NNP) E 𝑭! = − 𝜕E 𝜕𝒓!
O H H 𝒓! = (𝑥", 𝑦!, 𝑧!) 𝒓# = (𝑥#, 𝑦#, 𝑧#) 𝒓$ = (𝑥$, 𝑦$, 𝑧$) Neural Network Goal: Predict energy of given molecule with atomic coords by Neural Network → forces can be calculated from energy differentiation 5

Neural Network Potential (NNP) A. Normal supervised learning for MI:
predicts physical property directly → Need to train NN each time B. NNP learns internal calculation necessary for simulation → Can calculate various physical properties with single NNP! 6 O H H 𝒓! = (𝑥", 𝑦!, 𝑧!) 𝒓# = (𝑥#, 𝑦#, 𝑧#) 𝒓$ = (𝑥$, 𝑦$, 𝑧$) Schrodinger Eq. ・Energy ・Forces Physical Property ・Elastic consts ・Viscosity etc A B Simulation

Neural Network Potential (NNP) A. Normal supervised learning for MI:
predicts physical property directly → Need to train NN each time B. NNP learns internal calculation necessary for simulation → Can calculate various physical properties with single NNP! 7 O H H 𝒓! = (𝑥", 𝑦!, 𝑧!) 𝒓# = (𝑥#, 𝑦#, 𝑧#) 𝒓$ = (𝑥$, 𝑦$, 𝑧$) Schrodinger Eq. ・Energy ・Forces Physical Property ・Elastic consts ・Viscosity etc A B Simulation DFT etc. hours ~ months NNP seconds!

NNP can be used for various simulations 8 Reaction path
analysis (NEB) C-O dissociation on Co+V Catalyst Molecular Dynamics Thiol dynamics on Cu(111) Opt Fentanyl structure optimization Challenge: Create Universal NNP = Applicable to various systems

Creating “Universal” NNP, PFP 9

PFP: PreFerred Potential • Architecture • Dataset 10

PFP • PFP is GNN which updates scalar, vector and
tensor features internally – Formulation idea comes from the classical potential force field (EAM) • Satisfies physical requirements: Rotational, translational, permutation invariance. Infinitely differentiable etc. 11 https://arxiv.org/pdf/1912.01398.pdf

PFP architecture • Evaluation of PFP performance • Experiment results:
OC20 dataset – ※Not the rigorous comparison since data is not completely the same 12 https://arxiv.org/pdf/2106.14583.pdf

PFP Dataset • To achieve universality, dataset is collected with
various structures – Molecule – Bulk – Slab – Cluster – Adsorption (Slab+Molecule) – Disordered 13 https://arxiv.org/pdf/2106.14583.pdf

Disordered structure • Obtained by running MD in high temperature
• Force Field: Classical potential or training phase NNP can be used 14 https://arxiv.org/pdf/2106.14583.pdf Example structures taken in TeaNet paper: Train NNP Dataset collection MD on Trained NNP

PFP Dataset • Preferred Networks’ inhouse cluster is extensively utilized
15 Data collection with MN-Cluster & ABCI PFP v4.0.0 used 1650 GPU years computing resource

PFP Dataset • To achieve universality, dataset is collected with
various structures 16 https://arxiv.org/pdf/2106.14583.pdf

PFP Dataset • PFP v4.0 (released in 2023) is applicable
to 72 elements 17 v0.0 supported 45 elements v4.0 supports 72 elements

Applications 18

Applications 19 Visit https://matlantis.com/cases for detail!

Other Universal NNP researches: M3GNet • Famous & widely used
universal NNP • OSS publicly available at https://github.com/materialsvirtuallab/matgl • Applicable to 89 elements, trained on Materials Project dataset 20 “A universal graph deep learning interatomic potential for the periodic table” https://www.nature.com/articles/s43588-022-00349-3

Other Universal NNP researches: GNoME • Work from Deepmind •
Focusing on stable structure search • Discovered 380,000 new stable structures 21 https://deepmind.google/discover/blog/millions-of-new- materials-discovered-with-deep-learning/ “Scaling deep learning for materials discovery” https://www.nature.com/articles/s41586-023-06735-9

Matbench-discovery benchmark • ROC-AUC (Area Under Curve) for the stable
structure prediction • PFP shows good performance compared to existing studies 22 Better Worse Comparison of Universal NNP Performances

Computation Paradigm shift • Instead of consuming huge computational resources
in each simulation, we can benefit from pretrained foundation model (Universal NNP) 23 Need huge resources each simulation. Use foundation model with less resource Train foundation model using huge resources Universal NNP as a foundation model

Worth Million years of research done in 1 year Value
served in 2023 by Matlantis 24 Users Atoms Years 400+ 18.2 Trillion 1 Million Simulated in 2023 by users Uses our service in global Worth of simulations if executed in DFT * * Calculation based on the fact: single point calculation of 256 atoms of Pt took 2 hours to calculate with 36 cores CPU using Q.E. (ref)

Worth Million years of research done in 1 year Value
served in 2023 by Matlantis 25 Users Atoms Years 400+ 18.2 Trillion 1 Million Simulated in 2023 by users Uses our service in global Worth of simulations if executed in DFT * * Calculation based on the fact: single point calculation of 256 atoms of Pt took 2 hours to calculate with 36 cores CPU using Q.E. (ref) NNP Inference speed/cost becomes important!

MN-Core: AI Accelerator designed by PFN • Won No.1 of
Green 500 in 2021 • PFP inference on MN-Core in development 26 Pt 125 atoms Pt 1728 atoms MN-Core workload speedup (*) x 1.93 x 2.92 (*) Compared to NVIDIA GPU. Floating point format is different.

Future work • The physical property that NNP cannot handle
- Electric states – Predict Hamiltonian by NN? – Existing works: SchNorb, DeepH etc. • How to further scale up time & length?? Currently 10,000~ atoms & nano sec scale can be handled by NNP – More light weight potential needed. Tuned by each specific system. – Existing works: DeepMD etc. – We are developing LightPFP 27

Summary • NNP calculates energy & force very fast •
PFP is “universal” NNP which can handle various structures/applications • Applications – Energy, force calculation – Structure optimization – Reaction pathway analysis, activation energy – Molecular Dynamics – IR spectrum • Next computing paradigm? – Utilize foundation models (Universal NNP) even in materials science field 28 https://matlantis.com/product

Links • PFP related papers – “Towards universal neural network
potential for material discovery applicable to arbitrary combination of 45 elements” https://www.nature.com/articles/s41467-022-30687-9 – “Towards universal neural network interatomic potential” https://doi.org/10.1016/j.jmat.2022.12.007 29

Follow us 30 Twitter account https://twitter.com/matlantis_en GitHub https://github.com/matlantis-pfcc YouTube channel
https://www.youtube.com/c/Matlantis Slideshare account https://www.slideshare.net/matlantis Official website https://matlantis.com/

Appendix 31

NNP vs Quantum Chemistry Simulation Pros: • MUCH faster than
quantum chemistry simulation (ex. DFT) Cons: • Difficult to evaluate its accuracy • Data collection necessary from https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract 32

NNP Tutorial review: Neural Network intro 1 “Constructing high-dimensional neural
network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Linear transform à Nonlinear transform applied in each layer, to express various functions 𝑬 = 𝑓(𝑮!, 𝑮", 𝑮$) 33

network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 NN can learn more correct function form with increased data. When data is few, prediction value has variance and not trustful When data is enough, variance can be small 34

network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Careful evaluation is necessary to check if the NN only work well with training data Underfit：NN representation power is not enough, cannot express true target function Overfit：NN representation power is too strong, fit to training data but does not work well in other points 35

NNP Input - Descriptor Instead of raw coordinate value, we
input “Descriptor” to the Neural Network What kind of Descriptor can be made? Ex. The distance r between 2 atoms is translational / rotational invariant E O H H 𝒓! = (𝑥" , 𝑦! , 𝑧! ) 𝒓# = (𝑥# , 𝑦# , 𝑧# ) 𝒓$ = (𝑥$ , 𝑦$ , 𝑧$ ) Neural Network Multi Layer Perceptron (MLP) 𝑓(𝑮!, 𝑮", 𝑮$) 𝑮!, 𝑮", 𝑮$ Descriptor 36

O NNP data collection • The goal is to predict
energy for the molecules with various coordinates →Calculate energy by DFT with randomly placing atoms? → NG • In reality, molecule takes only low energy coordinates →We want to predict energy accurately which occurs in the real world. H H Low energy Likely to occur High energy (Almost) never occur O H H O H H O H H O H H O H H 37 exp(−𝐸/𝑘%𝑇) Boltzmann Distribution

ANI-1 Dataset creation “ANI-1, A data set of 20 million
calculated off-equilibrium conformations for organic molecules” https://www.nature.com/articles/sdata2017193 • GDB-11 database (Molecules which contains up to 11 C, N, O, F) subset is used – Limit to C, N, O – Max 8 Heavy Atom • Normal Mode Sampling (NMS): Various conformations generated from one molecule by vibration. rdkit MMFF94 Gaussian09 default method 38

ANI-1: Results “ANI-1: an extensible neural network potential with DFT
accuracy at force field computational cost” https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract • Energy prediction on various conformation – It predicts DFT results well compared to DFTB, PM9 (conventional method) • Bigger size than training data can be predicted one-dimensional potential surface scan 39

BPNN: Behler-Parrinello Symmetry function “Constructing high-dimensional neural network potentials: A
tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 AEV: Atomic Environment Vector describes information of specific atom’s surrounding env Rc: cutoff radius 1. radial symmetry functions represents 2-body term (distance) How many atoms exist in the radius Rc from the center atom i 40

BPNN: Behler-Parrinello Symmetry function “Constructing high-dimensional neural network potentials: A
tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 2. angular symmetry functions represents 3-body term (angle) In the radius Rc ball from center atom i, what kind of position relation (angle) do atoms j and k exist? 41 AEV: Atomic Environment Vector describes information of specific atom’s surrounding env Rc: cutoff radius

BPNN: Neural Network architecture Problems of normal MLP: ・Fixed number
of atoms ー 0 vector is necessary ー Cannot predict more atoms than training ・No ivariance for the atom order permutation “Constructing high-dimensional neural network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Proposed approach: ・Predict Atomic Energy for each atom separately, and summing up to obtain final energy Es ・Different NN is trained for each element (O, H) 42

Behler Parinello type: NNP Input - Descriptor Input atomic coordinates
? → NG! It does not satisfy basic physics law ・Translational invariance ・Rotational invariance ・Atom order permutation invariance E O H H 𝒓! = (𝑥" , 𝑦! , 𝑧! ) 𝒓# = (𝑥# , 𝑦# , 𝑧# ) 𝒓$ = (𝑥$ , 𝑦$ , 𝑧$ ) Neural Network 𝑓(𝑥!, 𝑦!, … , 𝑧$) 43

Graph Neural Network (GNN) • Neural network which accepts “graph”
input, it learns how the data is connected • Graph: Consists of Vertices v and Edge e – Social Network (SNS connection graph), Citation Network, Product Network – Protein-Protein Association Network – Organic molecules etc… 44 𝒗𝟎 𝒗𝟏 𝒗𝟐 𝒗𝟒 𝒗𝟑 𝑒&' 𝑒'( 𝑒() 𝑒*) 𝑒(* Various applications!

Graph Neural Network (GNN) • Image convolution à Graph convolution
• Also called Graph Convolution Network, Message Passing Neural Network 45 Image classification Cat, dog… Physical property Energy=1.2 eV … CNN: Image Convolution GNN: Graph Convolution

GNN architecture • Similar to CNN, Graph Convolution layer is
stacked to create Deep Neural Network 46 Graph Conv Graph Conv Graph Conv Graph Conv Sum Feature is updated in the graph format Output predicted value for each atom (e.g., energy) Input as “Graph” Output total molecule’s prediction (e.g., energy)

C N O 1.0 0.0 0.0 6.0 1.0 atom type
0.0 1.0 0.0 7.0 1.0 0.0 0.0 1.0 8.0 1.0 Atomic number chirality Feature is assigned for each node Molecular Graph Convolutions: Moving Beyond Fingerprints Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856 Feature for each node (atom)

GNN for molecules, crystals • Applicable to molecules →Various GNN
architecture proposed since late 2010s, big attention to Deep Learning research for molecules. – NFP, GGNN, MPNN, GWM etc… • Then, applied to positional data, crystal data (with periodic condition) – SchNet, CGCNN, MEGNet, Cormorant, DimeNet, PhysNet, EGNN, TeaNet etc… 48 NFP: “Convolutional Networks on Graph for Learning Molecular Fingerprints” https://arxiv.org/abs/1509.09292 GWM: “Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks in Molecular Graph Analysis” https://arxiv.org/pdf/1902.01020.pdf CGCNN: “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301

SchNet • Atom pair’s distance r, apply continuous filter convolution
(cfconv) It can deal with atom’s position r “SchNet: A continuous-filter convolutional neural network for modeling quantum interactions” https://arxiv.org/abs/1706.08566 RBF kernel 49

GNN application with periodic boundary condition (pbc) • CGCNN proposes
how to construct “graph” for the systems with pbc. • MEGNet reports applying both isolated system (molecule) and pbc (crystal) 50 CGCNN: “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301 MEGNet: “Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals” https://pubs.acs.org/doi/10.1021/acs.chemmater.9b01294

GNN approach: Summary With the Neural Network architecture improvement, we
can gain following advantages • Human-tuned descriptor is not necessary – It is automatically learned internally in GNN • Generalization to element species – Input dimension not increase even we add atomic species →It can avoid combinatorial explosion – Generalization to few data (or even unknown) element • Accuracy, Training efficiency – Increased network representation power, possibly high accuracy – Appropriate constraint (inductive bias) makes NN training easier 51

Deep learning ~ trending ~ • 2012, AlexNet won on
ILSVRC (Efficiently used GPU) • With the progress of GPU power, NN becomes deeper and bigger 52 GoogleNet “Going deeper with convolutions”: https://arxiv.org/pdf/1409.4842.pdf ResNet “Deep Residual Learning for Image Recognition”: https://arxiv.org/pdf/1512.03385.pdf Year CNN Depth # of Parameter 2012 AlexNet 8 layers 62.0M 2014 GoogleNet 22 layers 6.4M 2015 ResNet 110 layers (Max 1202!) 60.3M https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96

Deep learning ~ trending ~ • Dataset size in computer
vision area – Grows exponentially, 1 human cannot watch this amount in a life à Starts to learn collective intelligence… – “Pre-training à Fine tuning for specific task” workflow becomes the trend Dataset Data size # of class MNIST 60k 10 CIFAR-100 60k 100 ImageNet 1.3M 1,000 ImageNet-21k 14M 21,000 JFT-300M 300M (Google, not open) 18,000

“Universal” Neural Network Potential？ • This history of deep learning
technology leads the one challenging idea… NNP formulation Proof of conformation generalization ↓ ANI family researches Support various elements ↓ GNN node embedding Deal with crystal (with pbc) ↓ Graph construction for pbc system Big data training ↓ Success in CV/NLP field, DL trend →Universal NNP R&D started!! Goal: to support various elements, isolated/pbc system, various conformation. All use cases.

ANI-1 & ANI-1 Dataset: Summary “ANI-1: an extensible neural network
potential with DFT accuracy at force field computational cost” https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract • For small molecules which consist of H, C, N, O in various conformation, we can create NNP that can predict DFT energy well – Massive training data creation: 20 million datapoint Issues • Add another element (F, S etc) – Different NN necessary for each element – Input descriptor dimension increases in N^2 order • Necessary training data may scale with this order too 55

GNN architecture (general) • Similar to CNN, Graph Convolution layer
is stacked to create Deep Neural Network 56 Graph Conv Graph Conv Graph Conv Graph Conv Graph Readout Linear Linear Graph→vector Update vector Output prediction Input as “Graph” Feature is updated in the graph format

Collect calculated node features, obtain graph-wise feature Han Altae-Tran, Bharath
Ramsundar, Aneesh S. Pappu, & Vijay Pande (2017). Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci., 3 (4) Graph Readout: feature calculation for total graph (molecule)

PFP architecture • PFP performance evaluation on PFP benchmark dataset
– Confirmed TeaNet (PFP base model) achieves best performance 58 https://arxiv.org/pdf/2106.14583.pdf

PFP • “Universal” Neural Network Potential developed by Preferred Networks
and ENEOS • Stands for “PreFerred Potential” – SaaS product which packages PFP and various physical property calculation library – Sold by Preferred Computational Chemistry (PFCC) 59

PFP • Several improvements based on TeaNet, through more than
2 years research (Details in paper) • GNN edge cutoff is taken as 6A – 5 layers with different cutoff length [3, 3, 4, 6, 6] – → In total 22A range can be connected – GNN part can be calculated in O(N) • Energy surface is designed to be smooth (infinitely differentiable) 60

PFP Dataset • Calculation condition on MOLECULE, CRYSTAL Dataset •
PFP is jointly trained with 3 datasets below 61 Dataset name PFP MOLECULE PFP CRYSTAL, PFP CRYSTAL_U0 OC20 Software Gaussian VASP VASP xc/basis ωB97xd/6-31G(d) GGA-PBE GGA-RPBE Option Unrestricted DFT PAW pseudopotentials Cutoff energy 520 eV U parameter ON/OFF Spin polarization ON PAW pseudopotentials Cutoff energy 350 eV U parameter OFF Spin polarization OFF

TeaNet • Physical meaning of using “tensor” feature: Tensor is
related to classical force field called Tersoff potential 62 https://arxiv.org/pdf/1912.01398.pdf ・・・ Tersoff potential

Use Case 1: Renewable energy synthetic fuel catalyst • Search
for the effective FT catalyst that accelerates C-O dissociation • High throughput screening of promoters à Revealed doping V to Co accelerates the dissociation process 63 C-O dissociation on Co+V catalyst Reaction of fuel (C5+) from H2 ,CO Effect of promoters on activation energy Activation energies of methanation reactions of synthesis gas on Co(0001). Comparison of activation energy

Use Case 2: Grain boundary energy of elemental metals 64
Al Σ5 [100](0-21) 38 atoms H. Zheng et al., Acta Materialia,186, 40, (2020) https://materialsvirtuallab.org/2020/01/grain-boundary-database/

Use Case 3: Li-ion battery • Li diffusion activation energy
calculation on LiFeSO4 F, each a, b, c direction – Consists of various elements – Good agreement with DFT result 65 Diffusion path for [111], [101], [100] direction

Use Case 4: Metal-organic frameworks • Water molecule binding energy
on metal-organic framework MOF-74 – Metal element with organic molecule – Result matches with existing work with the Grimme’s D3 correction 66

Demonstration 67

Application: Nano Particle • “Calculations of Real-System Nanoparticles Using Universal
Neural Network Potential PFP” https://arxiv.org/abs/2107.00963 • PFP can even calculate high entropy alloys (HEA), which contains various metals • Difficult to calculate large size with DFT Difficult to support multiple elements with classical potential 68

OC20, OC22 introduction 69

Open Catalyst 2020 • Motivaion: New catalyst development for renewable
energy storage • Overview Paper: – Solar, wind power energy storge is crucial to overcome global warming – Why do hydroelectricity or battery no suffice? • Energy storage does not scale 70 https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2020 • Motivaion: New catalyst development for renewable
energy storage • Overview Paper: – Store solar energy, wind energy can be stored as a form of hydrogen or methane – Hydrogen, methane reaction process improvement is the key for renewable energy storage 71 https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2020 • Catalyst: A substance that promotes a
specific reaction. Itself does not change. • Dataset Paper: Technical details for dataset collection 72 Bottom pink atoms à Metal surface ＝ Catalyst Above molecule on top = Reactants https://arxiv.org/pdf/2010.09435.pdf https://opencatalystproject.org/

Open Catalyst 2020 • Combination of various molecules on various
metals • It covers main reactions related to renewable energy • Data size 130M ! 73 https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2022 • Subsequent work focuses on Oxygen Evolution
Reaction (OER) catalysts • 9.8M Dataset 74 https://arxiv.org/abs/2206.08917

Matlantis – Providing universal NNP on SaaS • Matlantis is
provided as a paid service to maximize the benefit of universal NNP worldwide – User support & success – Regular improvement / update – Library Maintenance – Feedback & Improve loop 75

Foundation Models, Generative AI, LLM 76 Foundation Model • Stable
diffusion, ChatGPT… • Foundation models used in various tasks • Model provider cannot extract all the potential of the foundation model – User explores & finds “new value” App 1 App 2 App 3 etc.

PFP as foundation models for atomistic simulations 77 • We
don’t know the full capability of the PFP, universal NNP – Various knowledge can be obtained by utilizing the model – We wish some people take Novel Prize for new materials discovery by utilizing PFP system PFP Structural Relaxation Reaction Analysis Molecular Dynamics etc.

Matlantis - Million years of research accelerat...

Matlantis - Million years of research acceleration with universal neural network potential-based SaaS

More Decks by Matlantis

Other Decks in Technology

Featured

Transcript