History towards Universal Neural Network Potential for Material Discovery

Kosuke Nakago, Taku Watanabe Preferred Computational Chemistry, Inc. History towards
Universal Neural Network Potential for Material Discovery

Motivation 2 • To accelerate materials discovery for a sustainable
future.

future.

future. https://pubs.rsc.org/en/content/ar ticlehtml/2019/ee/c8ee02495b

future. https://pubs.rsc.org/en/content/ar ticlehtml/2019/ee/c8ee02495b https://matlantis.com/calculation/li-diffusion- in-li10gep2s12-sulfide-solid-electrolyte

future. https://matlantis.com/calculation/silicon-tma-tel https://pubs.rsc.org/en/content/ar ticlehtml/2019/ee/c8ee02495b https://matlantis.com/calculation/li-diffusion- in-li10gep2s12-sulfide-solid-electrolyte

future. https://matlantis.com/calculation/silicon-tma-tel https://pubs.rsc.org/en/content/ar ticlehtml/2019/ee/c8ee02495b https://matlantis.com/calculation/li-diffusion- in-li10gep2s12-sulfide-solid-electrolyte Use Atomistic simulation for materials discovery

Today’s Topic 10 • “Towards Universal Neural Network Potential for
Material Discovery” • Providing SaaS: “Matlantis” – Universal High-speed Atomistic Simulator https://www.nature.com/articles/s41467-022-30687-9 https://matlantis.com/

Today’s Topic Universal Atomistic Simulator accelerates Material Discovery 11 Reaction
path analysis (NEB) C-O dissociation on Co+V Catalyst Molecular Dynamics Thiol dynamics on Cu(111) Opt Fentanyl structure optimization

Today’s Topic Understand “Towards Universal Neural Network Potential for Material
Discovery” 12

Today’s Topic 13 1st part introduces NNP research history Understand
“Towards Universal Neural Network Potential for Material Discovery”

Today’s Topic 14 2nd part explains how to create universal
NNP Understand “Towards Universal Neural Network Potential for Material Discovery”

Table of Contents • 1st part: NNP history – What’s
NNP – Behler Parinello type MLP – Graph Neural Network • 2nd part: How to create “Universal” NNP , PFP – PFP • PFP architecture • PFP data collection – PFP case study (in other slides) 15

1st part: NNP history 16

Neural Network Potential (NNP) E 𝑭𝑖 = − 𝜕E 𝜕𝒓𝑖
O H H 𝒓0 = (𝑥𝑜 , 𝑦0 , 𝑧0 ) 𝒓1 = (𝑥1 , 𝑦1 , 𝑧1 ) 𝒓2 = (𝑥2 , 𝑦2 , 𝑧2 ) Neural Network Goal: Predict energy of given molecule with atomic coords by Neural Network → NN is differentiable, forces can be calculated from energy differentiation 17

Neural Network Potential (NNP) A. Normal supervised learning: predicts physical
property directly B. NNP learns internal calculation necessary for simulation → After NNP is trained, it can be used to calculate various physical properties! Database for each physical property is unnecessary 18 O H H 𝒓0 = (𝑥𝑜 , 𝑦0 , 𝑧0 ) 𝒓1 = (𝑥1 , 𝑦1 , 𝑧1 ) 𝒓2 = (𝑥2 , 𝑦2 , 𝑧2 ) Schrodinger Eq. ・Energy ・Forces Physical Property ・Elastic consts ・Viscosity etc A B Simulation

NNP vs Quantum Chemistry Simulation Pros: Fast • MUCH faster
than quantum chemistry simulation (ex. DFT) Cons: • Difficult to evaluate its accuracy • Data collection necessary – Quantum chemistry simulation dataset is necessary for training NNP – Need accuracy evaluation when inference data and training data differs from https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract 19

Behler Parinello type: NNP Input - Descriptor Input atomic coordinates
? → NG! It does not satisfy basic physics law ・Translational invariance ・Rotational invariance ・Atom order permutation invariance E O H H 𝒓0 = (𝑥𝑜 , 𝑦0 , 𝑧0 ) 𝒓1 = (𝑥1 , 𝑦1 , 𝑧1 ) 𝒓2 = (𝑥2 , 𝑦2 , 𝑧2 ) Neural Network 𝑓(𝑥0 , 𝑦0 , … , 𝑧2 ) 20

NNP Input - Descriptor Instead of raw coordinate value, we
input “Descriptor” to the Neural Network What kind of Descriptor can be made? Ex. The distance r between 2 atoms is translational / rotational invariant E O H H 𝒓0 = (𝑥𝑜 , 𝑦0 , 𝑧0 ) 𝒓1 = (𝑥1 , 𝑦1 , 𝑧1 ) 𝒓2 = (𝑥2 , 𝑦2 , 𝑧2 ) Neural Network Multi Layer Perceptron (MLP) 𝑓(𝑮0 , 𝑮1, 𝑮2 ) 𝑮0 , 𝑮1, 𝑮2 Descriptor 21

O NNP data collection • The goal is to predict
energy for the molecules with various coordinates →Calculate energy by DFT with randomly placing atoms? → NG • In reality, molecule takes only low energy coordinates →We want to predict energy accurately which occurs in the real world. H H Low energy Likely to occur High energy (Almost) never occur O H H O H H O H H O H H O H H 22 exp(−𝐸/𝑘𝐵 𝑇) Boltzmann Distribution

ANI-1 Dataset creation “ANI-1, A data set of 20 million
calculated off-equilibrium conformations for organic molecules” https://www.nature.com/articles/sdata2017193 • GDB-11 database (Molecules which contains up to 11 C, N, O, F) subset is used – Limit to C, N, O – Max 8 Heavy Atom • Normal Mode Sampling (NMS): Various conformations generated from one molecule by vibration. rdkit MMFF94 Gaussian09 default method 23

ANI-1: Results “ANI-1: an extensible neural network potential with DFT
accuracy at force field computational cost” https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract • Energy prediction on various conformation – It predicts DFT results well compared to DFTB, PM9 (conventional method) • Bigger size than training data can be predicted one-dimensional potential surface scan 24

Graph Neural Network (GNN) • Neural network which accepts “graph”
input, it learns how the data is connected • Graph: Consists of Vertices v and Edge e – Social Network (SNS connection graph), Citation Network, Product Network – Protein-Protein Association Network – Organic molecules etc… 25 𝒗𝟎 𝒗𝟏 𝒗𝟐 𝒗𝟒 𝒗𝟑 𝑒01 𝑒12 𝑒24 𝑒34 𝑒23 Various applications!

Graph Neural Network (GNN) • Image convolution → Graph convolution
• Also called Graph Convolution Network, Message Passing Neural Network 26 Image classification Cat, dog… Physical property Energy=1.2 eV … CNN: Image Convolution GNN: Graph Convolution

GNN architecture • Similar to CNN, Graph Convolution layer is
stacked to create Deep Neural Network 27 Graph Conv Graph Conv Graph Conv Graph Conv Sum Feature is updated in the graph format Output predicted value for each atom (e.g., energy) Input as “Graph” Output total molecule’s prediction (e.g., energy)

C N O 1.0 0.0 0.0 6.0 1.0 atom type
0.0 1.0 0.0 7.0 1.0 0.0 0.0 1.0 8.0 1.0 Atomic number chirality Feature is assigned for each node Molecular Graph Convolutions: Moving Beyond Fingerprints Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856 Feature for each node (atom)

GNN for molecules, crystals • Applicable to molecules →Various GNN
architecture proposed since late 2010s, big attention to Deep Learning research for molecules. – NFP, GGNN, MPNN, GWM etc… • Then, applied to positional data, crystal data (with periodic condition) – SchNet, CGCNN, MEGNet, Cormorant, DimeNet, PhysNet, EGNN, TeaNet etc… 29 NFP: “Convolutional Networks on Graph for Learning Molecular Fingerprints” https://arxiv.org/abs/1509.09292 GWM: “Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks in Molecular Graph Analysis” https://arxiv.org/pdf/1902.01020.pdf CGCNN: “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301

SchNet • Atom pair’s distance r, apply continuous filter convolution
(cfconv) It can deal with atom’s position r “SchNet: A continuous-filter convolutional neural network for modeling quantum interactions” https://arxiv.org/abs/1706.08566 RBF kernel 30

GNN application with periodic boundary condition (pbc) • CGCNN proposes
how to construct “graph” for the systems with pbc. • MEGNet reports applying both isolated system (molecule) and pbc (crystal) 31 CGCNN: “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301 MEGNet: “Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals” https://pubs.acs.org/doi/10.1021/acs.chemmater.9b01294

GNN approach: Summary With the Neural Network architecture improvement, we
can gain following advantages • Human-tuned descriptor is not necessary – It is automatically learned internally in GNN • Generalization to element species – Input dimension not increase even we add atomic species →It can avoid combinatorial explosion – Generalization to few data (or even unknown) element • Accuracy, Training efficiency – Increased network representation power, possibly high accuracy – Appropriate constraint (inductive bias) makes NN training easier 32

Deep learning ~ trending ~ • 2012, AlexNet won on
ILSVRC (Efficiently used GPU) • With the progress of GPU power, NN becomes deeper and bigger 33 GoogleNet “Going deeper with convolutions”: https://arxiv.org/pdf/1409.4842.pdf ResNet “Deep Residual Learning for Image Recognition”: https://arxiv.org/pdf/1512.03385.pdf Year CNN Depth # of Parameter 2012 AlexNet 8 layers 62.0M 2014 GoogleNet 22 layers 6.4M 2015 ResNet 110 layers (Max 1202!) 60.3M https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96

Deep learning ~ trending ~ • Dataset size in computer
vision area – Grows exponentially, 1 human cannot watch this amount in a life → Starts to learn collective intelligence… – “Pre-training → Fine tuning for specific task” workflow becomes the trend Dataset Data size # of class MNIST 60k 10 CIFAR-100 60k 100 ImageNet 1.3M 1,000 ImageNet-21k 14M 21,000 JFT-300M 300M (Google, not open) 18,000

“Universal” Neural Network Potential？ • This history of deep learning
technology leads the one challenging idea… NNP formulation Proof of conformation generalization ↓ ANI family researches Support various elements ↓ GNN node embedding Deal with crystal (with pbc) ↓ Graph construction for pbc system Big data training ↓ Success in CV/NLP field, DL trend →Universal NNP R&D started!! Goal: to support various elements, isolated/pbc system, various conformation. All use cases.

2nd part: How to create “Universal” NNP, PFP 36

PFP • “Universal” Neural Network Potential developed by Preferred Networks
and ENEOS • Stands for “PreFerred Potential” – SaaS product which packages PFP and various physical property calculation library – Sold by Preferred Computational Chemistry (PFCC) 37

PFP • Architecture • Dataset 38

TeaNet • PFP is developed based on the TeaNet work
• TeaNet is GNN which updates scalar, vector and tensor features internally – Formulation idea comes from the classical potential force field (EAM) 39 https://arxiv.org/pdf/1912.01398.pdf

TeaNet • Physical meaning of using “tensor” feature: Tensor is
related to classical force field called Tersoff potential 40 https://arxiv.org/pdf/1912.01398.pdf ・・・ Tersoff potential

PFP • Several improvements based on TeaNet, through more than
2 years research (Details in paper) • GNN edge cutoff is taken as 6A – 5 layers with different cutoff length [3, 3, 4, 6, 6] – → In total 22A range can be connected – GNN part can be calculated in O(N) • Energy surface is designed to be smooth (infinitely differentiable) 41

PFP architecture • Evaluation of PFP performance • Experiment results:
OC20 dataset – ※Not the rigorous comparison since data is not completely the same 42 https://arxiv.org/pdf/2106.14583.pdf

PFP Dataset • To achieve universality, dataset is collected with
various structures – Molecule – Bulk – Slab – Cluster – Adsorption (Slab+Molecule) – Disordered 43 https://arxiv.org/pdf/2106.14583.pdf

TeaNet: Disordered structure • Dataset - Disordered structures under periodic
boundary condition • Generated using Classical MD or training phase NNP’s MD 44 https://arxiv.org/pdf/2106.14583.pdf Example structures taken in TeaNet paper: Train NNP Dataset collection MD on Trained NNP

PFP Dataset • PFN’s inhouse cluster is extensively utilized 45
Data collection with MN-Cluster & ABCI PFP v4.0.0 used 1650 GPU years computing resource

PFP Dataset • To achieve universality, dataset is collected with
various structures 46 https://arxiv.org/pdf/2106.14583.pdf

PFP Dataset • Latest PFP v4.0 (released in 2023) is
applicable to 72 elements 47 v0.0 supported 45 elements

Summary • NNP can be used to calculate energy much
faster than quantum calculation • Quality of data is important for good model – Data versatility – Quantum calculation quality/accuracy • PFP is “universal” NNP which can handle various structures/applications • Applications – Energy, force calculation – Structure optimization – Reaction pathway analysis, activation energy – Molecular Dynamics – IR spectrum 48 https://matlantis.com/product

Applications 49

Applications 50

Use Case 1: Renewable energy synthetic fuel catalyst • Search
for the effective FT catalyst that accelerates C-O dissociation • High throughput screening of promoters → Revealed doping V to Co accelerates the dissociation process 51 C-O dissociation on Co+V catalyst Reaction of fuel (C5+) from H2 ,CO Effect of promoters on activation energy Activation energies of methanation reactions of synthesis gas on Co(0001). Comparison of activation energy

Use Case 2: Grain boundary energy of elemental metals 52
Al Σ5 [100](0-21) 38 atoms H. Zheng et al., Acta Materialia,186, 40, (2020) https://materialsvirtuallab.org/2020/01/grain-boundary-database/

Use Case 3: Li-ion battery • Li diffusion activation energy
calculation on LiFeSO4 F, each a, b, c direction – Consists of various elements – Good agreement with DFT result 53 Diffusion path for [111], [101], [100] direction

Use Case 4: Metal-organic frameworks • Water molecule binding energy
on metal-organic framework MOF-74 – Metal element with organic molecule – Result matches with existing work with the Grimme’s D3 correction 54

Demonstration 55

Foundation Models, Generative AI, LLM 56 Foundation Model Application 1
Application 2 Application 3 ,,, and more!? • Many foundation models surprising the world: Stable diffusion, ChatGPT… • Model provider cannot extract all the potential of the foundation model – Want user to explore & find “new value”

Matlantis 57 • Model provider don’t know the full capability
of the PFP, universal NNP – Various knowledge can be obtained by utilizing the model – We wish some people take Novel Prize for new materials discovery by utilizing PFP system PFP Structural Relaxation Reaction Analysis Molecular Dynamics ,,, and more!!

MRS 2023 Fall Exhibition 58

MRS 2023 Fall Meeting • We are presenting 6 oral
talks & posters at MRS 2023 fall • Symposium: [DS06] Integrating Machine Learning with Simulations for Accelerated Materials Modeling 59 Date Title Presenter Presentation Nov 27 PM Applicability of Universal Neural Network Potential to Organic Polymer Materials Hiroki Iriguchi Poster Nov 28 AM Investigation of Phase Stability and Ionic Conductivity of Solid Electrolytes Li10MP2S12-xOx (M = Ge, Si, or Sn) with Universal Neural Network Potential Chikashi Shinagawa Oral Nov 28 PM Neural Network Potential for Arbitrary Combination of 72 Elements Trained Against Large Scale Dataset So Takamoto Oral Nov 28 PM Absorption and Dynamics of Gas Molecules in Metal-Organic Frameworks: Application of a Universal Neural Network Potential Taku Watanabe Oral Nov 29 AM Analysis of Monolayer to Bilayer Silicene Transformation in CaSi2Fx(x<1) using Universal Neural Network Potential Akihiro Nagoya Oral Nov 29 AM Efficient Crystal Structure Prediction using Universal Neural Network Potential and Genetic Algorithm Takuya Shibayama Oral

Links • PFP related papers – “Towards universal neural network
potential for material discovery applicable to arbitrary combination of 45 elements” https://www.nature.com/articles/s41467-022-30687-9 – “Towards universal neural network interatomic potential” https://doi.org/10.1016/j.jmat.2022.12.007 60

Follow us 61 Twitter account https://twitter.com/matlantis_en GitHub https://github.com/matlantis-pfcc YouTube channel
https://www.youtube.com/c/Matlantis Slideshare account https://www.slideshare.net/matlantis Official website https://matlantis.com/

Appendix 62

NNP Tutorial review: Neural Network intro 1 “Constructing high‐dimensional neural
network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Linear transform → Nonlinear transform applied in each layer, to express various functions 𝑬 = 𝑓(𝑮0 , 𝑮1, 𝑮2 ) 63

network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 NN can learn more correct function form with increased data. When data is few, prediction value has variance and not trustful When data is enough, variance can be small 64

network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Careful evaluation is necessary to check if the NN only work well with training data Underfit：NN representation power is not enough, cannot express true target function Overfit：NN representation power is too strong, fit to training data but does not work well in other points 65

BPNN: Behler-Parrinello Symmetry function “Constructing high‐dimensional neural network potentials: A
tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 AEV: Atomic Environment Vector describes information of specific atom’s surrounding env Rc: cutoff radius 1. radial symmetry functions represents 2-body term (distance) How many atoms exist in the radius Rc from the center atom i 66

BPNN: Behler-Parrinello Symmetry function “Constructing high‐dimensional neural network potentials: A
tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 2. angular symmetry functions represents 3-body term (angle) In the radius Rc ball from center atom i, what kind of position relation (angle) do atoms j and k exist? 67 AEV: Atomic Environment Vector describes information of specific atom’s surrounding env Rc: cutoff radius

BPNN: Neural Network architecture Problems of normal MLP: ・Fixed number
of atoms ー 0 vector is necessary ー Cannot predict more atoms than training ・No ivariance for the atom order permutation “Constructing high‐dimensional neural network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Proposed approach: ・Predict Atomic Energy for each atom separately, and summing up to obtain final energy Es ・Different NN is trained for each element (O, H) 68

ANI-1 & ANI-1 Dataset: Summary “ANI-1: an extensible neural network
potential with DFT accuracy at force field computational cost” https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract • For small molecules which consist of H, C, N, O in various conformation, we can create NNP that can predict DFT energy well – Massive training data creation: 20 million datapoint Issues • Add another element (F, S etc) – Different NN necessary for each element – Input descriptor dimension increases in N^2 order • Necessary training data may scale with this order too 69

GNN architecture (general) • Similar to CNN, Graph Convolution layer
is stacked to create Deep Neural Network 70 Graph Conv Graph Conv Graph Conv Graph Conv Graph Readout Linear Linear Graph→vector Update vector Output prediction Input as “Graph” Feature is updated in the graph format

Collect calculated node features, obtain graph-wise feature Han Altae-Tran, Bharath
Ramsundar, Aneesh S. Pappu, & Vijay Pande (2017). Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci., 3 (4) Graph Readout: feature calculation for total graph (molecule)

PFP architecture • PFP performance evaluation on PFP benchmark dataset
– Confirmed TeaNet (PFP base model) achieves best performance 72 https://arxiv.org/pdf/2106.14583.pdf

PFP Dataset • Calculation condition on MOLECULE, CRYSTAL Dataset •
PFP is jointly trained with 3 datasets below 73 Dataset name PFP MOLECULE PFP CRYSTAL, PFP CRYSTAL_U0 OC20 Software Gaussian VASP VASP xc/basis ωB97xd/6-31G(d) GGA-PBE GGA-RPBE Option Unrestricted DFT PAW pseudopotentials Cutoff energy 520 eV U parameter ON/OFF Spin polarization ON PAW pseudopotentials Cutoff energy 350 eV U parameter OFF Spin polarization OFF

Application: Nano Particle • “Calculations of Real-System Nanoparticles Using Universal
Neural Network Potential PFP” https://arxiv.org/abs/2107.00963 • PFP can even calculate high entropy alloys (HEA), which contains various metals • Difficult to calculate large size with DFT Difficult to support multiple elements with classical potential 74

OC20, OC22 introduction 75

Open Catalyst 2020 • Motivaion: New catalyst development for renewable
energy storage • Overview Paper: – Solar, wind power energy storge is crucial to overcome global warming – Why do hydroelectricity or battery no suffice? • Energy storage does not scale 76 https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2020 • Motivaion: New catalyst development for renewable
energy storage • Overview Paper: – Store solar energy, wind energy can be stored as a form of hydrogen or methane – Hydrogen, methane reaction process improvement is the key for renewable energy storage 77 https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2020 • Catalyst: A substance that promotes a
specific reaction. Itself does not change. • Dataset Paper: Technical details for dataset collection 78 Bottom pink atoms → Metal surface ＝ Catalyst Above molecule on top = Reactants https://arxiv.org/pdf/2010.09435.pdf https://opencatalystproject.org/

Open Catalyst 2020 • Combination of various molecules on various
metals • It covers main reactions related to renewable energy • Data size 130M ! 79 https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2022 • Subsequent work focuses on Oxygen Evolution
Reaction (OER) catalysts • 9.8M Dataset 80 https://arxiv.org/abs/2206.08917

History towards Universal Neural Network Potent...

History towards Universal Neural Network Potential for Material Discovery

More Decks by Matlantis

Other Decks in Science

Featured

Transcript