by Matlantis

Kosuke Nakago, Rudy Coquet
Preferred Networks, Inc/Preferred Computational Chemistry, Inc.
Matlantis - Million years of research acceleration with
universal neural network potential-based SaaS

Topic of this talk
2
• Introduce paper “Towards Universal Neural Network Potential for Material Discovery”
• Impact made by cloud service Matlantis
– Universal High-speed Atomistic Simulator
https://www.nature.com/articles/s41467-022-30687-9 https://matlantis.com/

Table of Contents
• NNP introduction
• Creating “Universal” NNP , PFP
• Providing as SaaS Matlantis
– Million years of research acceleration
• Summary
3

NNP introduction
4

Neural Network Potential (NNP)
E
𝑭! = −
𝜕E
𝜕𝒓!
O
H
H
𝒓! = (𝑥", 𝑦!, 𝑧!)
𝒓# = (𝑥#, 𝑦#, 𝑧#)
𝒓$ = (𝑥$, 𝑦$, 𝑧$)
Neural Network
Goal: Predict energy of given molecule with atomic coords by Neural Network
→ forces can be calculated from energy differentiation
5

Neural Network Potential (NNP)
A. Normal supervised learning for MI: predicts physical property directly
→ Need to train NN each time
B. NNP learns internal calculation necessary for simulation
→ Can calculate various physical properties with single NNP!
6
O
H
H
𝒓! = (𝑥", 𝑦!, 𝑧!)
𝒓# = (𝑥#, 𝑦#, 𝑧#)
𝒓$ = (𝑥$, 𝑦$, 𝑧$)
Schrodinger
Eq.
・Energy
・Forces
Physical Property
・Elastic consts
・Viscosity etc
A
B
Simulation

Neural Network Potential (NNP)
A. Normal supervised learning for MI: predicts physical property directly
→ Need to train NN each time
B. NNP learns internal calculation necessary for simulation
→ Can calculate various physical properties with single NNP!
7
O
H
H
𝒓! = (𝑥", 𝑦!, 𝑧!)
𝒓# = (𝑥#, 𝑦#, 𝑧#)
𝒓$ = (𝑥$, 𝑦$, 𝑧$)
Schrodinger
Eq.
・Energy
・Forces
Physical Property
・Elastic consts
・Viscosity etc
A
B
Simulation
DFT etc.
hours ~ months
NNP
seconds!

NNP can be used for various simulations
8
Reaction path analysis (NEB)
C-O dissociation on Co+V Catalyst
Molecular Dynamics
Thiol dynamics on Cu(111)
Opt
Fentanyl structure optimization
Challenge: Create Universal NNP = Applicable to various systems

Creating “Universal” NNP, PFP
9

PFP: PreFerred Potential
• Architecture
• Dataset
10

PFP
• PFP is GNN which updates scalar, vector and tensor features internally
– Formulation idea comes from the classical potential force field (EAM)
• Satisfies physical requirements:
Rotational, translational, permutation invariance. Infinitely differentiable etc.
11
https://arxiv.org/pdf/1912.01398.pdf

PFP architecture
• Evaluation of PFP performance
• Experiment results: OC20 dataset
– ※Not the rigorous comparison
since data is not completely the same
12
https://arxiv.org/pdf/2106.14583.pdf

PFP Dataset
• To achieve universality, dataset is collected with various structures
– Molecule
– Bulk
– Slab
– Cluster
– Adsorption (Slab+Molecule)
– Disordered
13
https://arxiv.org/pdf/2106.14583.pdf

Disordered structure
• Obtained by running MD in high temperature
• Force Field: Classical potential or training phase NNP can be used
14
https://arxiv.org/pdf/2106.14583.pdf
Example structures taken in TeaNet paper:
Train NNP
Dataset collection
MD on Trained NNP

PFP Dataset
• Preferred Networks’ inhouse cluster is extensively utilized
15
Data collection with MN-Cluster & ABCI
PFP v4.0.0 used 1650 GPU years computing resource

PFP Dataset
• To achieve universality, dataset is collected with various structures
16
https://arxiv.org/pdf/2106.14583.pdf

PFP Dataset
• PFP v4.0 (released in 2023) is applicable to 72 elements
17
v0.0 supported 45 elements
v4.0 supports 72 elements

Applications
18

Applications
19
Visit https://matlantis.com/cases for detail!

Other Universal NNP researches: M3GNet
• Famous & widely used universal NNP
• OSS publicly available at https://github.com/materialsvirtuallab/matgl
• Applicable to 89 elements, trained on Materials Project dataset
20
“A universal graph deep learning interatomic potential for the periodic table”
https://www.nature.com/articles/s43588-022-00349-3

Other Universal NNP researches: GNoME
• Work from Deepmind
• Focusing on stable structure search
• Discovered 380,000 new stable structures
21
https://deepmind.google/discover/blog/millions-of-new-
materials-discovered-with-deep-learning/
“Scaling deep learning for materials discovery”
https://www.nature.com/articles/s41586-023-06735-9

Matbench-discovery benchmark
• ROC-AUC (Area Under Curve) for the stable structure prediction
• PFP shows good performance compared to existing studies
22
Better
Worse
Comparison of Universal NNP Performances

Computation Paradigm shift
• Instead of consuming huge computational resources in each simulation,
we can benefit from pretrained foundation model (Universal NNP)
23
Need huge resources each simulation.
Use foundation model
with less resource
Train foundation model
using huge resources
Universal NNP as
a foundation model

Worth Million years of research done in 1 year
Value served in 2023 by Matlantis
24
Users Atoms
Years
400+ 18.2 Trillion
1 Million
Simulated in 2023 by users
Uses our service in global
Worth of simulations if executed in DFT *
* Calculation based on the fact: single point calculation of 256 atoms of Pt took 2 hours to calculate with 36 cores CPU using Q.E. (ref)

Worth Million years of research done in 1 year
Value served in 2023 by Matlantis
25
Users Atoms
Years
400+ 18.2 Trillion
1 Million
Simulated in 2023 by users
Uses our service in global
Worth of simulations if executed in DFT *
* Calculation based on the fact: single point calculation of 256 atoms of Pt took 2 hours to calculate with 36 cores CPU using Q.E. (ref)
NNP Inference speed/cost
becomes important!

MN-Core: AI Accelerator designed by PFN
• Won No.1 of Green 500 in 2021
• PFP inference on MN-Core in development
26
Pt 125 atoms Pt 1728 atoms
MN-Core
workload speedup (*)
x 1.93 x 2.92
(*) Compared to NVIDIA GPU. Floating point format is different.

Future work
• The physical property that NNP cannot handle - Electric states
– Predict Hamiltonian by NN?
– Existing works: SchNorb, DeepH etc.
• How to further scale up time & length??
Currently 10,000~ atoms & nano sec scale can be handled by NNP
– More light weight potential needed.
Tuned by each specific system.
– Existing works: DeepMD etc.
– We are developing LightPFP
27

Summary
• NNP calculates energy & force very fast
• PFP is “universal” NNP which can handle
various structures/applications
• Applications
– Energy, force calculation
– Structure optimization
– Reaction pathway analysis, activation energy
– Molecular Dynamics
– IR spectrum
• Next computing paradigm?
– Utilize foundation models
(Universal NNP) even in
materials science field
28
https://matlantis.com/product

Links
• PFP related papers
– “Towards universal neural network potential for material discovery applicable to arbitrary
combination of 45 elements”
https://www.nature.com/articles/s41467-022-30687-9
– “Towards universal neural network interatomic potential”
https://doi.org/10.1016/j.jmat.2022.12.007
29

Follow us
30
Twitter account
https://twitter.com/matlantis_en
GitHub
https://github.com/matlantis-pfcc
YouTube channel
https://www.youtube.com/c/Matlantis
Slideshare account
https://www.slideshare.net/matlantis
Official website
https://matlantis.com/

Appendix
31

NNP vs Quantum Chemistry Simulation
Pros:
• MUCH faster than quantum
chemistry simulation (ex. DFT)
Cons:
• Difficult to evaluate its accuracy
• Data collection necessary from https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract
32

NNP Tutorial review: Neural Network intro 1
“Constructing high-dimensional neural network potentials: A tutorial review”
https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890
Linear transform à Nonlinear transform applied in each layer,
to express various functions
𝑬 = 𝑓(𝑮!, 𝑮", 𝑮$)
33

NNP Tutorial review: Neural Network intro 2
“Constructing high-dimensional neural network potentials: A tutorial review”
https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890
NN can learn more correct function form with increased data.
When data is few, prediction value
has variance and not trustful
When data is enough,
variance can be small
34

NNP Tutorial review: Neural Network intro 3
“Constructing high-dimensional neural network potentials: A tutorial review”
https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890
Careful evaluation is necessary to check if the NN only work well with training data
Underfit：NN representation power is not
enough, cannot express true target function
Overfit：NN representation power is too strong,
fit to training data but does not work well in other
points
35

NNP Input - Descriptor
Instead of raw coordinate value, we input “Descriptor” to the
Neural Network
What kind of Descriptor can be made?
Ex. The distance r between 2 atoms is translational / rotational invariant
E
O
H
H
𝒓!
= (𝑥"
, 𝑦!
, 𝑧!
)
𝒓#
= (𝑥#
, 𝑦#
, 𝑧#
)
𝒓$
= (𝑥$
, 𝑦$
, 𝑧$
) Neural Network
Multi Layer Perceptron (MLP)
𝑓(𝑮!, 𝑮", 𝑮$)
𝑮!, 𝑮", 𝑮$
Descriptor
36

O
NNP data collection
• The goal is to predict energy for the molecules with various coordinates
→Calculate energy by DFT with randomly placing atoms? → NG
• In reality, molecule takes only low energy coordinates
→We want to predict energy accurately which occurs in the real world.
H H
Low energy
Likely to occur
High energy
(Almost) never occur
O
H H
O
H H
O
H
H
O
H
H
O
H
H
37
exp(−𝐸/𝑘%𝑇)
Boltzmann Distribution

ANI-1 Dataset creation
“ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules”
https://www.nature.com/articles/sdata2017193
• GDB-11 database (Molecules which contains up to 11 C, N, O, F)
subset is used
– Limit to C, N, O
– Max 8 Heavy Atom
• Normal Mode Sampling (NMS):
Various conformations generated
from one molecule by vibration.
rdkit
MMFF94
Gaussian09
default method
38

ANI-1: Results
“ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost”
https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract
• Energy prediction on
various conformation
– It predicts DFT results well
compared to DFTB, PM9
(conventional method)
• Bigger size than training data
can be predicted
one-dimensional potential surface scan
39

BPNN: Behler-Parrinello Symmetry function
“Constructing high-dimensional neural network potentials: A tutorial review”
https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890
AEV: Atomic Environment Vector
describes information of
specific atom’s surrounding env
Rc: cutoff radius
1. radial symmetry functions
represents 2-body term (distance)
How many atoms exist in the
radius Rc from the center atom i
40

BPNN: Behler-Parrinello Symmetry function
“Constructing high-dimensional neural network potentials: A tutorial review”
https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890
2. angular symmetry functions
represents 3-body term (angle)
In the radius Rc ball from center atom i, what kind
of position relation (angle) do atoms j and k exist?
41
AEV: Atomic Environment Vector
describes information of
specific atom’s surrounding env
Rc: cutoff radius

BPNN: Neural Network architecture
Problems of normal MLP:
・Fixed number of atoms
ー 0 vector is necessary
ー Cannot predict more atoms than training
・No ivariance for the atom order permutation
“Constructing high-dimensional neural network potentials: A tutorial review”
https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24890
Proposed approach:
・Predict Atomic Energy for each atom separately,
and summing up to obtain final energy Es
・Different NN is trained for each element (O, H)
42

Behler Parinello type: NNP Input - Descriptor
Input atomic coordinates ? → NG!
It does not satisfy basic physics law
・Translational invariance
・Rotational invariance
・Atom order permutation invariance
E
O
H
H
𝒓!
= (𝑥"
, 𝑦!
, 𝑧!
)
𝒓#
= (𝑥#
, 𝑦#
, 𝑧#
)
𝒓$
= (𝑥$
, 𝑦$
, 𝑧$
) Neural Network
𝑓(𝑥!, 𝑦!, … , 𝑧$)
43

Graph Neural Network (GNN)
• Neural network which accepts “graph” input,
it learns how the data is connected
• Graph: Consists of Vertices v and Edge e
– Social Network (SNS connection graph), Citation Network, Product Network
– Protein-Protein Association Network
– Organic molecules etc…
44
𝒗𝟎
𝒗𝟏
𝒗𝟐
𝒗𝟒
𝒗𝟑
𝑒&'
𝑒'(
𝑒()
𝑒*)
𝑒(*
Various applications!

Graph Neural Network (GNN)
• Image convolution à Graph convolution
• Also called Graph Convolution Network, Message Passing Neural Network
45
Image classification
Cat, dog…
Physical property
Energy=1.2 eV …
CNN: Image Convolution
GNN: Graph Convolution

GNN architecture
• Similar to CNN, Graph Convolution layer is stacked to create Deep Neural Network
46
Graph
Conv
Graph
Conv
Graph
Conv
Graph
Conv
Sum
Feature is updated in
the graph format
Output predicted value
for each atom (e.g., energy)
Input as “Graph”
Output total molecule’s
prediction (e.g., energy)

C
N
O
1.0 0.0 0.0 6.0 1.0
atom type
0.0 1.0 0.0 7.0 1.0
0.0 0.0 1.0 8.0 1.0
Atomic
number
chirality
Feature is assigned for each node
Molecular Graph Convolutions: Moving Beyond Fingerprints
Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856
Feature for each node (atom)

GNN for molecules, crystals
• Applicable to molecules
→Various GNN architecture proposed since late 2010s,
big attention to Deep Learning research for molecules.
– NFP, GGNN, MPNN, GWM etc…
• Then, applied to positional data, crystal data (with periodic condition)
– SchNet, CGCNN, MEGNet, Cormorant, DimeNet, PhysNet, EGNN, TeaNet etc…
48
NFP: “Convolutional Networks on Graph for
Learning Molecular Fingerprints”
https://arxiv.org/abs/1509.09292
GWM: “Graph Warp Module: an Auxiliary Module for
Boosting the Power of Graph Neural Networks in Molecular Graph Analysis”
https://arxiv.org/pdf/1902.01020.pdf
CGCNN: “Crystal Graph Convolutional Neural Networks for an
Accurate and Interpretable Prediction of Material Properties”
https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301

SchNet
• Atom pair’s distance r, apply continuous filter convolution (cfconv)
It can deal with atom’s position r
“SchNet: A continuous-filter convolutional neural network for modeling quantum interactions”
https://arxiv.org/abs/1706.08566
RBF kernel
49

GNN application with periodic boundary condition (pbc)
• CGCNN proposes how to construct “graph” for the systems with pbc.
• MEGNet reports applying both isolated system (molecule) and pbc (crystal)
50
CGCNN: “Crystal Graph Convolutional Neural Networks for an Accurate
and Interpretable Prediction of Material Properties”
https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301
MEGNet: “Graph Networks as a Universal Machine Learning Framework
for Molecules and Crystals”
https://pubs.acs.org/doi/10.1021/acs.chemmater.9b01294

GNN approach: Summary
With the Neural Network architecture improvement, we can gain following advantages
• Human-tuned descriptor is not necessary
– It is automatically learned internally in GNN
• Generalization to element species
– Input dimension not increase even we add atomic species
→It can avoid combinatorial explosion
– Generalization to few data (or even unknown) element
• Accuracy, Training efficiency
– Increased network representation power, possibly high accuracy
– Appropriate constraint (inductive bias) makes NN training easier
51

Deep learning ~ trending ~
• 2012, AlexNet won on ILSVRC (Efficiently used GPU)
• With the progress of GPU power, NN becomes deeper and bigger
52
GoogleNet “Going deeper with convolutions”:
https://arxiv.org/pdf/1409.4842.pdf
ResNet “Deep Residual Learning for Image
Recognition”: https://arxiv.org/pdf/1512.03385.pdf
Year CNN Depth # of Parameter
2012 AlexNet 8 layers 62.0M
2014 GoogleNet 22 layers 6.4M
2015 ResNet 110 layers (Max 1202!) 60.3M
https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96

Deep learning ~ trending ~
• Dataset size in computer vision area
– Grows exponentially,
1 human cannot watch this amount in a life à Starts to learn collective intelligence…
– “Pre-training à Fine tuning for specific task” workflow becomes the trend
Dataset Data size # of class
MNIST 60k 10
CIFAR-100 60k 100
ImageNet 1.3M 1,000
ImageNet-21k 14M 21,000
JFT-300M 300M (Google, not open) 18,000

“Universal” Neural Network Potential？
• This history of deep learning technology leads the one challenging idea…
NNP formulation
Proof of conformation generalization
↓
ANI family researches
Support various elements
↓
GNN node embedding
Deal with crystal (with pbc)
↓
Graph construction for pbc system
Big data training
↓
Success in CV/NLP field, DL trend
→Universal NNP R&D started!!
Goal: to support various elements, isolated/pbc system, various conformation. All use cases.

ANI-1 & ANI-1 Dataset: Summary
“ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost”
https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a#!divAbstract
• For small molecules which consist of H, C, N, O in various conformation,
we can create NNP that can predict DFT energy well
– Massive training data creation: 20 million datapoint
Issues
• Add another element (F, S etc)
– Different NN necessary for each element
– Input descriptor dimension increases in N^2 order
• Necessary training data may scale with this order too
55

GNN architecture (general)
• Similar to CNN, Graph Convolution layer is stacked to create Deep Neural Network
56
Graph
Conv
Graph
Conv
Graph
Conv
Graph
Conv
Graph
Readout
Linear Linear
Graph→vector Update vector
Output
prediction
Input as “Graph”
Feature is updated in
the graph format

Collect calculated node features, obtain graph-wise feature
Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, & Vijay Pande (2017). Low Data Drug
Discovery with One-Shot Learning. ACS Cent. Sci., 3 (4)
Graph Readout: feature calculation for total graph (molecule)

PFP architecture
• PFP performance evaluation on PFP benchmark dataset
– Confirmed TeaNet (PFP base model) achieves best performance
58
https://arxiv.org/pdf/2106.14583.pdf

PFP
• “Universal” Neural Network Potential developed by
Preferred Networks and ENEOS
• Stands for “PreFerred Potential”
– SaaS product which packages
PFP and various physical property calculation library
– Sold by Preferred Computational Chemistry (PFCC)
59

PFP
• Several improvements based on TeaNet,
through more than 2 years research (Details in paper)
• GNN edge cutoff is taken as 6A
– 5 layers with different cutoff length [3, 3, 4, 6, 6]
– → In total 22A range can be connected
– GNN part can be calculated in O(N)
• Energy surface is designed to be smooth (infinitely differentiable)
60

PFP Dataset
• Calculation condition on MOLECULE, CRYSTAL Dataset
• PFP is jointly trained with 3 datasets below
61
Dataset name PFP MOLECULE PFP CRYSTAL,
PFP CRYSTAL_U0
OC20
Software Gaussian VASP VASP
xc/basis ωB97xd/6-31G(d) GGA-PBE GGA-RPBE
Option Unrestricted DFT PAW pseudopotentials
Cutoff energy 520 eV
U parameter ON/OFF
Spin polarization ON
PAW pseudopotentials
Cutoff energy 350 eV
U parameter OFF
Spin polarization OFF

TeaNet
• Physical meaning of using “tensor” feature:
Tensor is related to classical force field called Tersoff potential
62
https://arxiv.org/pdf/1912.01398.pdf
・・・
Tersoff potential

Use Case 1: Renewable energy synthetic fuel catalyst
• Search for the effective FT catalyst that accelerates C-O dissociation
• High throughput screening of promoters
à Revealed doping V to Co accelerates the dissociation process
63
C-O dissociation on Co+V catalyst
Reaction of fuel (C5+) from H2
,CO Effect of promoters on activation energy
Activation energies of methanation reactions of
synthesis gas on Co(0001).
Comparison of activation energy

Use Case 2: Grain boundary energy of elemental metals
64
Al Σ5 [100](0-21)
38 atoms
H. Zheng et al., Acta Materialia,186, 40, (2020)
https://materialsvirtuallab.org/2020/01/grain-boundary-database/

Use Case 3: Li-ion battery
• Li diffusion activation energy calculation on LiFeSO4
F, each a, b, c direction
– Consists of various elements
– Good agreement with DFT result
65
Diffusion path for [111], [101], [100] direction

Use Case 4: Metal-organic frameworks
• Water molecule binding energy on metal-organic framework MOF-74
– Metal element with organic molecule
– Result matches with existing work with the Grimme’s D3 correction
66

Demonstration
67

Application: Nano Particle
• “Calculations of Real-System Nanoparticles Using Universal Neural Network Potential PFP”
https://arxiv.org/abs/2107.00963
• PFP can even calculate high entropy alloys (HEA), which contains various metals
• Difficult to calculate large size with DFT
Difficult to support multiple elements with classical potential
68

OC20, OC22 introduction
69

Open Catalyst 2020
• Motivaion: New catalyst development for renewable energy storage
• Overview Paper:
– Solar, wind power energy storge is crucial to overcome global warming
– Why do hydroelectricity or battery no suffice?
• Energy storage does not scale
70
https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2020
• Motivaion: New catalyst development for renewable energy storage
• Overview Paper:
– Store solar energy, wind energy can be stored as a form of hydrogen or methane
– Hydrogen, methane reaction process improvement is the key for renewable energy storage
71
https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2020
• Catalyst: A substance that promotes a specific reaction. Itself does not change.
• Dataset Paper: Technical details for dataset collection
72
Bottom pink atoms à Metal surface ＝ Catalyst
Above molecule on top = Reactants
https://arxiv.org/pdf/2010.09435.pdf https://opencatalystproject.org/

Open Catalyst 2020
• Combination of various molecules on various metals
• It covers main reactions related to renewable energy
• Data size 130M !
73
https://arxiv.org/pdf/2010.09435.pdf

Open Catalyst 2022
• Subsequent work focuses on Oxygen Evolution Reaction (OER) catalysts
• 9.8M Dataset
74
https://arxiv.org/abs/2206.08917

Matlantis – Providing universal NNP on SaaS
• Matlantis is provided as a paid service to
maximize the benefit of universal NNP worldwide
– User support & success
– Regular improvement / update
– Library Maintenance
– Feedback & Improve loop
75

Foundation Models, Generative AI, LLM
76
Foundation Model
• Stable diffusion, ChatGPT…
• Foundation models used in various tasks
• Model provider cannot extract
all the potential of the foundation model
– User explores & finds “new value”
App 1 App 2 App 3 etc.

PFP as foundation models for atomistic simulations
77
• We don’t know the full capability
of the PFP, universal NNP
– Various knowledge can be
obtained by utilizing the model
– We wish some people take
Novel Prize for new materials
discovery by utilizing PFP system
PFP
Structural
Relaxation
Reaction
Analysis
Molecular
Dynamics
etc.

78