Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Matlantis - Million years of research acceleration with universal neural network potential-based SaaS

Matlantis
February 20, 2024

Matlantis - Million years of research acceleration with universal neural network potential-based SaaS

This is the presentation material for the lecture entitled “Matlantis - Million years of research acceleration with universal neural network potential-based SaaS”, presented by Kosuke Nakago and Rudy Coquet at the CECAM workshop “Perspectives and challenges of future HPC installations for atomistic and molecular simulations” on February 21, 2024.

https://www.cecam.org/workshop-details/perspectives-and-challenges-of-future-hpc-installations-for-atomistic-and-molecular-simulations-1227

The contents include the following:
- Introduce paper “Towards Universal Neural Network Potential for Material Discovery”
- Impact made by cloud service Matlantis: Universal High-speed Atomistic Simulator

Matlantis

February 20, 2024
Tweet

More Decks by Matlantis

Other Decks in Technology

Transcript

  1. Kosuke Nakago, Rudy Coquet Preferred Networks, Inc/Preferred Computational Chemistry, Inc.

    Matlantis - Million years of research acceleration with universal neural network potential-based SaaS
  2. Topic of this talk 2 • Introduce paper “Towards Universal

    Neural Network Potential for Material Discovery” • Impact made by cloud service Matlantis – Universal High-speed Atomistic Simulator https://www.nature.com/articles/s41467-022-30687-9 https://matlantis.com/
  3. Table of Contents • NNP introduction • Creating “Universal” NNP

    , PFP • Providing as SaaS Matlantis – Million years of research acceleration • Summary 3
  4. Neural Network Potential (NNP) E 𝑭! = − 𝜕E 𝜕𝒓!

    O H H 𝒓! = (𝑥", 𝑦!, 𝑧!) 𝒓# = (𝑥#, 𝑦#, 𝑧#) 𝒓$ = (𝑥$, 𝑦$, 𝑧$) Neural Network Goal: Predict energy of given molecule with atomic coords by Neural Network → forces can be calculated from energy differentiation 5
  5. Neural Network Potential (NNP) A. Normal supervised learning for MI:

    predicts physical property directly → Need to train NN each time B. NNP learns internal calculation necessary for simulation → Can calculate various physical properties with single NNP! 6 O H H 𝒓! = (𝑥", 𝑦!, 𝑧!) 𝒓# = (𝑥#, 𝑦#, 𝑧#) 𝒓$ = (𝑥$, 𝑦$, 𝑧$) Schrodinger Eq. ・Energy ・Forces Physical Property ・Elastic consts ・Viscosity etc A B Simulation
  6. Neural Network Potential (NNP) A. Normal supervised learning for MI:

    predicts physical property directly → Need to train NN each time B. NNP learns internal calculation necessary for simulation → Can calculate various physical properties with single NNP! 7 O H H 𝒓! = (𝑥", 𝑦!, 𝑧!) 𝒓# = (𝑥#, 𝑦#, 𝑧#) 𝒓$ = (𝑥$, 𝑦$, 𝑧$) Schrodinger Eq. ・Energy ・Forces Physical Property ・Elastic consts ・Viscosity etc A B Simulation DFT etc. hours ~ months NNP seconds!
  7. NNP can be used for various simulations 8 Reaction path

    analysis (NEB) C-O dissociation on Co+V Catalyst Molecular Dynamics Thiol dynamics on Cu(111) Opt Fentanyl structure optimization Challenge: Create Universal NNP = Applicable to various systems
  8. PFP • PFP is GNN which updates scalar, vector and

    tensor features internally – Formulation idea comes from the classical potential force field (EAM) • Satisfies physical requirements: Rotational, translational, permutation invariance. Infinitely differentiable etc. 11 https://arxiv.org/pdf/1912.01398.pdf
  9. PFP architecture • Evaluation of PFP performance • Experiment results:

    OC20 dataset – ※Not the rigorous comparison since data is not completely the same 12 https://arxiv.org/pdf/2106.14583.pdf
  10. PFP Dataset • To achieve universality, dataset is collected with

    various structures – Molecule – Bulk – Slab – Cluster – Adsorption (Slab+Molecule) – Disordered 13 https://arxiv.org/pdf/2106.14583.pdf
  11. Disordered structure • Obtained by running MD in high temperature

    • Force Field: Classical potential or training phase NNP can be used 14 https://arxiv.org/pdf/2106.14583.pdf Example structures taken in TeaNet paper: Train NNP Dataset collection MD on Trained NNP
  12. PFP Dataset • Preferred Networks’ inhouse cluster is extensively utilized

    15 Data collection with MN-Cluster & ABCI PFP v4.0.0 used 1650 GPU years computing resource
  13. PFP Dataset • To achieve universality, dataset is collected with

    various structures 16 https://arxiv.org/pdf/2106.14583.pdf
  14. PFP Dataset • PFP v4.0 (released in 2023) is applicable

    to 72 elements 17 v0.0 supported 45 elements v4.0 supports 72 elements
  15. Other Universal NNP researches: M3GNet • Famous & widely used

    universal NNP • OSS publicly available at https://github.com/materialsvirtuallab/matgl • Applicable to 89 elements, trained on Materials Project dataset 20 “A universal graph deep learning interatomic potential for the periodic table” https://www.nature.com/articles/s43588-022-00349-3
  16. Other Universal NNP researches: GNoME • Work from Deepmind •

    Focusing on stable structure search • Discovered 380,000 new stable structures 21 https://deepmind.google/discover/blog/millions-of-new- materials-discovered-with-deep-learning/ “Scaling deep learning for materials discovery” https://www.nature.com/articles/s41586-023-06735-9
  17. Matbench-discovery benchmark • ROC-AUC (Area Under Curve) for the stable

    structure prediction • PFP shows good performance compared to existing studies 22 Better Worse Comparison of Universal NNP Performances
  18. Computation Paradigm shift • Instead of consuming huge computational resources

    in each simulation, we can benefit from pretrained foundation model (Universal NNP) 23 Need huge resources each simulation. Use foundation model with less resource Train foundation model using huge resources Universal NNP as a foundation model
  19. Worth Million years of research done in 1 year Value

    served in 2023 by Matlantis 24 Users Atoms Years 400+ 18.2 Trillion 1 Million Simulated in 2023 by users Uses our service in global Worth of simulations if executed in DFT * * Calculation based on the fact: single point calculation of 256 atoms of Pt took 2 hours to calculate with 36 cores CPU using Q.E. (ref)
  20. Worth Million years of research done in 1 year Value

    served in 2023 by Matlantis 25 Users Atoms Years 400+ 18.2 Trillion 1 Million Simulated in 2023 by users Uses our service in global Worth of simulations if executed in DFT * * Calculation based on the fact: single point calculation of 256 atoms of Pt took 2 hours to calculate with 36 cores CPU using Q.E. (ref) NNP Inference speed/cost becomes important!
  21. MN-Core: AI Accelerator designed by PFN • Won No.1 of

    Green 500 in 2021 • PFP inference on MN-Core in development 26 Pt 125 atoms Pt 1728 atoms MN-Core workload speedup (*) x 1.93 x 2.92 (*) Compared to NVIDIA GPU. Floating point format is different.
  22. Future work • The physical property that NNP cannot handle

    - Electric states – Predict Hamiltonian by NN? – Existing works: SchNorb, DeepH etc. • How to further scale up time & length?? Currently 10,000~ atoms & nano sec scale can be handled by NNP – More light weight potential needed. Tuned by each specific system. – Existing works: DeepMD etc. – We are developing LightPFP 27
  23. Summary • NNP calculates energy & force very fast •

    PFP is “universal” NNP which can handle various structures/applications • Applications – Energy, force calculation – Structure optimization – Reaction pathway analysis, activation energy – Molecular Dynamics – IR spectrum • Next computing paradigm? – Utilize foundation models (Universal NNP) even in materials science field 28 https://matlantis.com/product
  24. Links • PFP related papers – “Towards universal neural network

    potential for material discovery applicable to arbitrary combination of 45 elements” https://www.nature.com/articles/s41467-022-30687-9 – “Towards universal neural network interatomic potential” https://doi.org/10.1016/j.jmat.2022.12.007 29
  25. Follow us 30 Twitter account https://twitter.com/matlantis_en GitHub https://github.com/matlantis-pfcc YouTube channel

    https://www.youtube.com/c/Matlantis Slideshare account https://www.slideshare.net/matlantis Official website https://matlantis.com/
  26. NNP vs Quantum Chemistry Simulation Pros: • MUCH faster than

    quantum chemistry simulation (ex. DFT) Cons: • Difficult to evaluate its accuracy • Data collection necessary from https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract 32
  27. NNP Tutorial review: Neural Network intro 1 “Constructing high-dimensional neural

    network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Linear transform à Nonlinear transform applied in each layer, to express various functions 𝑬 = 𝑓(𝑮!, 𝑮", 𝑮$) 33
  28. NNP Tutorial review: Neural Network intro 2 “Constructing high-dimensional neural

    network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 NN can learn more correct function form with increased data. When data is few, prediction value has variance and not trustful When data is enough, variance can be small 34
  29. NNP Tutorial review: Neural Network intro 3 “Constructing high-dimensional neural

    network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Careful evaluation is necessary to check if the NN only work well with training data Underfit:NN representation power is not enough, cannot express true target function Overfit:NN representation power is too strong, fit to training data but does not work well in other points 35
  30. NNP Input - Descriptor Instead of raw coordinate value, we

    input “Descriptor” to the Neural Network What kind of Descriptor can be made? Ex. The distance r between 2 atoms is translational / rotational invariant E O H H 𝒓! = (𝑥" , 𝑦! , 𝑧! ) 𝒓# = (𝑥# , 𝑦# , 𝑧# ) 𝒓$ = (𝑥$ , 𝑦$ , 𝑧$ ) Neural Network Multi Layer Perceptron (MLP) 𝑓(𝑮!, 𝑮", 𝑮$) 𝑮!, 𝑮", 𝑮$ Descriptor 36
  31. O NNP data collection • The goal is to predict

    energy for the molecules with various coordinates →Calculate energy by DFT with randomly placing atoms? → NG • In reality, molecule takes only low energy coordinates →We want to predict energy accurately which occurs in the real world. H H Low energy Likely to occur High energy (Almost) never occur O H H O H H O H H O H H O H H 37 exp(−𝐸/𝑘%𝑇) Boltzmann Distribution
  32. ANI-1 Dataset creation “ANI-1, A data set of 20 million

    calculated off-equilibrium conformations for organic molecules” https://www.nature.com/articles/sdata2017193 • GDB-11 database (Molecules which contains up to 11 C, N, O, F) subset is used – Limit to C, N, O – Max 8 Heavy Atom • Normal Mode Sampling (NMS): Various conformations generated from one molecule by vibration. rdkit MMFF94 Gaussian09 default method 38
  33. ANI-1: Results “ANI-1: an extensible neural network potential with DFT

    accuracy at force field computational cost” https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract • Energy prediction on various conformation – It predicts DFT results well compared to DFTB, PM9 (conventional method) • Bigger size than training data can be predicted one-dimensional potential surface scan 39
  34. BPNN: Behler-Parrinello Symmetry function “Constructing high-dimensional neural network potentials: A

    tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 AEV: Atomic Environment Vector describes information of specific atom’s surrounding env Rc: cutoff radius 1. radial symmetry functions represents 2-body term (distance) How many atoms exist in the radius Rc from the center atom i 40
  35. BPNN: Behler-Parrinello Symmetry function “Constructing high-dimensional neural network potentials: A

    tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 2. angular symmetry functions represents 3-body term (angle) In the radius Rc ball from center atom i, what kind of position relation (angle) do atoms j and k exist? 41 AEV: Atomic Environment Vector describes information of specific atom’s surrounding env Rc: cutoff radius
  36. BPNN: Neural Network architecture Problems of normal MLP: ・Fixed number

    of atoms ー 0 vector is necessary ー Cannot predict more atoms than training ・No ivariance for the atom order permutation “Constructing high-dimensional neural network potentials: A tutorial review” https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890 Proposed approach: ・Predict Atomic Energy for each atom separately, and summing up to obtain final energy Es ・Different NN is trained for each element (O, H) 42
  37. Behler Parinello type: NNP Input - Descriptor Input atomic coordinates

    ? → NG! It does not satisfy basic physics law ・Translational invariance ・Rotational invariance ・Atom order permutation invariance E O H H 𝒓! = (𝑥" , 𝑦! , 𝑧! ) 𝒓# = (𝑥# , 𝑦# , 𝑧# ) 𝒓$ = (𝑥$ , 𝑦$ , 𝑧$ ) Neural Network 𝑓(𝑥!, 𝑦!, … , 𝑧$) 43
  38. Graph Neural Network (GNN) • Neural network which accepts “graph”

    input, it learns how the data is connected • Graph: Consists of Vertices v and Edge e – Social Network (SNS connection graph), Citation Network, Product Network – Protein-Protein Association Network – Organic molecules etc… 44 𝒗𝟎 𝒗𝟏 𝒗𝟐 𝒗𝟒 𝒗𝟑 𝑒&' 𝑒'( 𝑒() 𝑒*) 𝑒(* Various applications!
  39. Graph Neural Network (GNN) • Image convolution à Graph convolution

    • Also called Graph Convolution Network, Message Passing Neural Network 45 Image classification Cat, dog… Physical property Energy=1.2 eV … CNN: Image Convolution GNN: Graph Convolution
  40. GNN architecture • Similar to CNN, Graph Convolution layer is

    stacked to create Deep Neural Network 46 Graph Conv Graph Conv Graph Conv Graph Conv Sum Feature is updated in the graph format Output predicted value for each atom (e.g., energy) Input as “Graph” Output total molecule’s prediction (e.g., energy)
  41. C N O 1.0 0.0 0.0 6.0 1.0 atom type

    0.0 1.0 0.0 7.0 1.0 0.0 0.0 1.0 8.0 1.0 Atomic number chirality Feature is assigned for each node Molecular Graph Convolutions: Moving Beyond Fingerprints Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856 Feature for each node (atom)
  42. GNN for molecules, crystals • Applicable to molecules →Various GNN

    architecture proposed since late 2010s, big attention to Deep Learning research for molecules. – NFP, GGNN, MPNN, GWM etc… • Then, applied to positional data, crystal data (with periodic condition) – SchNet, CGCNN, MEGNet, Cormorant, DimeNet, PhysNet, EGNN, TeaNet etc… 48 NFP: “Convolutional Networks on Graph for Learning Molecular Fingerprints” https://arxiv.org/abs/1509.09292 GWM: “Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks in Molecular Graph Analysis” https://arxiv.org/pdf/1902.01020.pdf CGCNN: “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301
  43. SchNet • Atom pair’s distance r, apply continuous filter convolution

    (cfconv) It can deal with atom’s position r “SchNet: A continuous-filter convolutional neural network for modeling quantum interactions” https://arxiv.org/abs/1706.08566 RBF kernel 49
  44. GNN application with periodic boundary condition (pbc) • CGCNN proposes

    how to construct “graph” for the systems with pbc. • MEGNet reports applying both isolated system (molecule) and pbc (crystal) 50 CGCNN: “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties” https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301 MEGNet: “Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals” https://pubs.acs.org/doi/10.1021/acs.chemmater.9b01294
  45. GNN approach: Summary With the Neural Network architecture improvement, we

    can gain following advantages • Human-tuned descriptor is not necessary – It is automatically learned internally in GNN • Generalization to element species – Input dimension not increase even we add atomic species →It can avoid combinatorial explosion – Generalization to few data (or even unknown) element • Accuracy, Training efficiency – Increased network representation power, possibly high accuracy – Appropriate constraint (inductive bias) makes NN training easier 51
  46. Deep learning ~ trending ~ • 2012, AlexNet won on

    ILSVRC (Efficiently used GPU) • With the progress of GPU power, NN becomes deeper and bigger 52 GoogleNet “Going deeper with convolutions”: https://arxiv.org/pdf/1409.4842.pdf ResNet “Deep Residual Learning for Image Recognition”: https://arxiv.org/pdf/1512.03385.pdf Year CNN Depth # of Parameter 2012 AlexNet 8 layers 62.0M 2014 GoogleNet 22 layers 6.4M 2015 ResNet 110 layers (Max 1202!) 60.3M https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96
  47. Deep learning ~ trending ~ • Dataset size in computer

    vision area – Grows exponentially, 1 human cannot watch this amount in a life à Starts to learn collective intelligence… – “Pre-training à Fine tuning for specific task” workflow becomes the trend Dataset Data size # of class MNIST 60k 10 CIFAR-100 60k 100 ImageNet 1.3M 1,000 ImageNet-21k 14M 21,000 JFT-300M 300M (Google, not open) 18,000
  48. “Universal” Neural Network Potential? • This history of deep learning

    technology leads the one challenging idea… NNP formulation Proof of conformation generalization ↓ ANI family researches Support various elements ↓ GNN node embedding Deal with crystal (with pbc) ↓ Graph construction for pbc system Big data training ↓ Success in CV/NLP field, DL trend →Universal NNP R&D started!! Goal: to support various elements, isolated/pbc system, various conformation. All use cases.
  49. ANI-1 & ANI-1 Dataset: Summary “ANI-1: an extensible neural network

    potential with DFT accuracy at force field computational cost” https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract • For small molecules which consist of H, C, N, O in various conformation, we can create NNP that can predict DFT energy well – Massive training data creation: 20 million datapoint Issues • Add another element (F, S etc) – Different NN necessary for each element – Input descriptor dimension increases in N^2 order • Necessary training data may scale with this order too 55
  50. GNN architecture (general) • Similar to CNN, Graph Convolution layer

    is stacked to create Deep Neural Network 56 Graph Conv Graph Conv Graph Conv Graph Conv Graph Readout Linear Linear Graph→vector Update vector Output prediction Input as “Graph” Feature is updated in the graph format
  51. Collect calculated node features, obtain graph-wise feature Han Altae-Tran, Bharath

    Ramsundar, Aneesh S. Pappu, & Vijay Pande (2017). Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci., 3 (4) Graph Readout: feature calculation for total graph (molecule)
  52. PFP architecture • PFP performance evaluation on PFP benchmark dataset

    – Confirmed TeaNet (PFP base model) achieves best performance 58 https://arxiv.org/pdf/2106.14583.pdf
  53. PFP • “Universal” Neural Network Potential developed by Preferred Networks

    and ENEOS • Stands for “PreFerred Potential” – SaaS product which packages PFP and various physical property calculation library – Sold by Preferred Computational Chemistry (PFCC) 59
  54. PFP • Several improvements based on TeaNet, through more than

    2 years research (Details in paper) • GNN edge cutoff is taken as 6A – 5 layers with different cutoff length [3, 3, 4, 6, 6] – → In total 22A range can be connected – GNN part can be calculated in O(N) • Energy surface is designed to be smooth (infinitely differentiable) 60
  55. PFP Dataset • Calculation condition on MOLECULE, CRYSTAL Dataset •

    PFP is jointly trained with 3 datasets below 61 Dataset name PFP MOLECULE PFP CRYSTAL, PFP CRYSTAL_U0 OC20 Software Gaussian VASP VASP xc/basis ωB97xd/6-31G(d) GGA-PBE GGA-RPBE Option Unrestricted DFT PAW pseudopotentials Cutoff energy 520 eV U parameter ON/OFF Spin polarization ON PAW pseudopotentials Cutoff energy 350 eV U parameter OFF Spin polarization OFF
  56. TeaNet • Physical meaning of using “tensor” feature: Tensor is

    related to classical force field called Tersoff potential 62 https://arxiv.org/pdf/1912.01398.pdf ・・・ Tersoff potential
  57. Use Case 1: Renewable energy synthetic fuel catalyst • Search

    for the effective FT catalyst that accelerates C-O dissociation • High throughput screening of promoters à Revealed doping V to Co accelerates the dissociation process 63 C-O dissociation on Co+V catalyst Reaction of fuel (C5+) from H2 ,CO Effect of promoters on activation energy Activation energies of methanation reactions of synthesis gas on Co(0001). Comparison of activation energy
  58. Use Case 2: Grain boundary energy of elemental metals 64

    Al Σ5 [100](0-21) 38 atoms H. Zheng et al., Acta Materialia,186, 40, (2020) https://materialsvirtuallab.org/2020/01/grain-boundary-database/
  59. Use Case 3: Li-ion battery • Li diffusion activation energy

    calculation on LiFeSO4 F, each a, b, c direction – Consists of various elements – Good agreement with DFT result 65 Diffusion path for [111], [101], [100] direction
  60. Use Case 4: Metal-organic frameworks • Water molecule binding energy

    on metal-organic framework MOF-74 – Metal element with organic molecule – Result matches with existing work with the Grimme’s D3 correction 66
  61. Application: Nano Particle • “Calculations of Real-System Nanoparticles Using Universal

    Neural Network Potential PFP” https://arxiv.org/abs/2107.00963 • PFP can even calculate high entropy alloys (HEA), which contains various metals • Difficult to calculate large size with DFT Difficult to support multiple elements with classical potential 68
  62. Open Catalyst 2020 • Motivaion: New catalyst development for renewable

    energy storage • Overview Paper: – Solar, wind power energy storge is crucial to overcome global warming – Why do hydroelectricity or battery no suffice? • Energy storage does not scale 70 https://arxiv.org/pdf/2010.09435.pdf
  63. Open Catalyst 2020 • Motivaion: New catalyst development for renewable

    energy storage • Overview Paper: – Store solar energy, wind energy can be stored as a form of hydrogen or methane – Hydrogen, methane reaction process improvement is the key for renewable energy storage 71 https://arxiv.org/pdf/2010.09435.pdf
  64. Open Catalyst 2020 • Catalyst: A substance that promotes a

    specific reaction. Itself does not change. • Dataset Paper: Technical details for dataset collection 72 Bottom pink atoms à Metal surface = Catalyst Above molecule on top = Reactants https://arxiv.org/pdf/2010.09435.pdf https://opencatalystproject.org/
  65. Open Catalyst 2020 • Combination of various molecules on various

    metals • It covers main reactions related to renewable energy • Data size 130M ! 73 https://arxiv.org/pdf/2010.09435.pdf
  66. Open Catalyst 2022 • Subsequent work focuses on Oxygen Evolution

    Reaction (OER) catalysts • 9.8M Dataset 74 https://arxiv.org/abs/2206.08917
  67. Matlantis – Providing universal NNP on SaaS • Matlantis is

    provided as a paid service to maximize the benefit of universal NNP worldwide – User support & success – Regular improvement / update – Library Maintenance – Feedback & Improve loop 75
  68. Foundation Models, Generative AI, LLM 76 Foundation Model • Stable

    diffusion, ChatGPT… • Foundation models used in various tasks • Model provider cannot extract all the potential of the foundation model – User explores & finds “new value” App 1 App 2 App 3 etc.
  69. PFP as foundation models for atomistic simulations 77 • We

    don’t know the full capability of the PFP, universal NNP – Various knowledge can be obtained by utilizing the model – We wish some people take Novel Prize for new materials discovery by utilizing PFP system PFP Structural Relaxation Reaction Analysis Molecular Dynamics etc.
  70. 78