Slide 1

Slide 1 text

Aron Walsh Department of Materials Centre for Processable Electronics Machine Learning for Materials 4. Crystal Representations Module MATE70026

Slide 2

Slide 2 text

Module Contents 1. Introduction 2. Machine Learning Basics 3. Materials Data 4. Crystal Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Accelerated Discovery 9. Generative Artificial Intelligence 10. Recent Advances

Slide 3

Slide 3 text

Class Outline Crystal Representations A. Compositional B. Structural C. Graphs

Slide 4

Slide 4 text

Representation of Materials Model performance depends on the choice of compositional and structural features Minimal representation Input: Atomic number, Z Coordinates, R Output: Properties Ab initio quantum mechanics (QM) ΰ΅Ώ ΰ·‘ 𝐇|Ξ¨ = Ϋ§ E|Ξ¨ electronic wavefunction Effective representation Input: Feature vector, 𝐗 Output: Properties Supervised machine learning (ML) 𝑦 = 𝑓 𝐗, 𝚯 learned weights K. T. Butler et al, Nature 559, 547 (2018)

Slide 5

Slide 5 text

How to Best Represent a Molecule? Networks of atoms (nodes) connected by bonds (edges) J. J. Sylvester, Am. J. Math. 1, 64 (1878)

Slide 6

Slide 6 text

How to Best Represent a Material? Many possible materials features from atomistic to macroscopic length scales Wavefunctions or electron density (β„«) Electronic Local atomic connectivity (nm) Atomic scale Grain size and orientation (Β΅m) Microstructure Shape (cm) Macroscale Image after Taylor Sparks (University of Utah)

Slide 7

Slide 7 text

Hot Encoding We can use an n-dimensional vector to categorise the atomic number of the elements in a compound [100000000...] H He Li Be B C N O F…. [000001010...] H He Li Be B C N O F…. Element (One-hot) Compound (Multi-hot) '1' indicates the presence of that specific element and '0' for others

Slide 8

Slide 8 text

Hand-Built (Local) Representations We can define elemental feature vectors based on standard properties of the elements 22 dimensional Magpie representation from L. Ward et al, npj Comp. Mater. 2, 16028 (2016) https://github.com/WMD-group/ElementEmbeddings

Slide 9

Slide 9 text

Hand-Built (Local) Representations We can also define compound feature vectors based on standard properties of the elements X(Fe2 O3 ) = [2X(Fe) + 3X(O)]/5 https://github.com/WMD-group/ElementEmbeddings X1 X2 X3 … Xn Fe 0.52 0.11 0.01 0.80 O 0.32 0.23 0.14 0.64 Fe2 O3 0.40 0.18 0.09 0.70 Different types of pooling is possible (e.g. max, min, mean)

Slide 10

Slide 10 text

Learned (Distributed) Representations SkipSpecies 200 D Structure graph pooling Mat2Vec 200 D Literature word embedding https://github.com/WMD-group/ElementEmbeddings We can learn continuous feature vectors with elemental information as part of model training

Slide 11

Slide 11 text

Element Embeddings Toolkit to access and modify elemental and compositional representations for machine learning https://github.com/WMD-group/ElementEmbeddings Latest embeddings CrystaLLM SkipSpecies CGNF Dr Anthony Onwuli

Slide 12

Slide 12 text

Learned Chemical Similarity Quantify with distance (e.g. Chebyshev), similarity (e.g. Cosine), or correlation (e.g. Pearson) metrics cos πœƒ = 𝑨 βˆ™ 𝑩 𝑨 𝑩 Cosine similarity B A Anthony Onwuli et al, Digital Discovery 2, 1558 (2023) Name Dimension Type Magpie 22 Element properties Mat2Vec 200 Chemical abstracts Skipatom 200 Crystal structure graphs MegNet 16 Graph neural network CrystaLLM 512 Crystal structure text Bi H

Slide 13

Slide 13 text

Learned Chemical Similarity Dimensionality reduction confirms a natural clustering of elements into β€œgroups” Principal Component Analysis (PCA) Anthony Onwuli et al, Digital Discovery 2, 1558 (2023)

Slide 14

Slide 14 text

Class Outline Crystal Representations A. Compositional B. Structural C. Graphs

Slide 15

Slide 15 text

Many Possible Materials Features Implemented in https://github.com/hackingmaterials/matminer

Slide 16

Slide 16 text

Learn from Crystallography High symmetry crystal: MgO Cubic 8 atom unit cell a = b = c Low symmetry crystal: BiVO4 Monoclinic 24 atom unit cell a β‰  b β‰  c

Slide 17

Slide 17 text

Learn from Crystallography 7 crystal systems, 14 Bravais lattices, 230 space groups, 103 prototype structures Conventional description Unit cell (β„’) a, b, c, ⍺, Ξ², Ι£ Fractional coordinates (𝒳) (x1 , y1 , z1 )… Atom types (π’œ) Sn, Ti, O… Problem for ML: conventional description lacks invariance* *with respect to atomic permutation, unit cell rotations, and translations

Slide 18

Slide 18 text

Unit Cell Transformations The same structure is described in each case 4 5 6 0 0 0 0.5 0.5 0.5 π‘Ž 𝑏 𝑐 π‘₯1 𝑦1 𝑧1 π‘₯2 𝑦2 𝑧2 Two-atom orthorhombic unit cell Atomic permutation 4 5 6 0.5 0.5 0.5 0 0 0 Crystal rotation Unit cell translation 4 5 6 0.0 0.5 0.5 0.5 0 0 5 4 6 0.5 0.5 0.5 0 0 0 ML models based on variant representations struggle to generalise

Slide 19

Slide 19 text

Structural Representations Many structural descriptors have been developed Several are implemented in https://singroup.github.io/dscribe β€’ Atom-Centered Symmetry Functions (Behler, 2011) - site expansion of radial and angular terms β€’ Coulomb Matrix (Rupp et al, 2012) - mimics electrostatic interactions (qi qj /rij ) β€’ Many Body Tensor Representation (Huo et al, 2017) - distribution of local structural motifs β€’ Atomic Cluster Expansion (Drautz, 2019) - high body-order expansion of atomic environments

Slide 20

Slide 20 text

Real Space Grid Voxels (three-dimensional pixels) used in computer graphics can describe a unit cell Image courtesy of Taylor Sparks (University of Utah) Used in early materials ML, but not recommended for structure

Slide 21

Slide 21 text

Pairwise Interatomic Distances Coulomb matrix is a global descriptor that mimics the electrostatic interaction between nuclei Implemented in https://singroup.github.io/dscribe Sine matrix is a modification that accounts for periodicity

Slide 22

Slide 22 text

Invariant Structural Representations Comprehensive review: F. Musil et al, Chem. Rev. 121, 9759 (2021)

Slide 23

Slide 23 text

Invariant Structural Representations Atomic Cluster Expansion (ACE) provides a systematic representation of atomic environments through radial (R) and angular (Y) terms πœ™ π‘Ÿ = 𝑅𝑙 π‘Œπ‘™ π‘š Site basis function π‘¨π’Š = ෍ π‘›π‘’π‘–π‘”β„Žπ‘π‘œπ‘’π‘Ÿπ‘  πœ™ π‘Ÿ Permutation invariance π‘©π’Š = ΰΆ± π‘¨π’Š 𝑑𝑄 Rotation (Q) invariance R. Drautz, Phys. Rev. B. 99, 014104 (2019); arXiv:2311.16326 (2023) Product basis B forms a body-order expansion Property = 𝑓(𝑩, 𝚯) ACE is used in linear and deep learning models for materials weights

Slide 24

Slide 24 text

ML Powered Molecular Dynamics J. D. Morrow, J. L. A. Gardner and V. Deringer, J. Chem. Phys. 158, 121501 (2023) Octahedral tilt correlation Classical models are being complemented by machine learning force fields (MLFF) Three start-of-the-art implementations based on equivariant neural network regression are MACE, Allegro, and SevenNet

Slide 25

Slide 25 text

ML Powered Molecular Dynamics Xia Liang et al, J. Phys. Chem. C 127, 12941 (2023) Octahedral tilt correlation Enable large-scale simulations of complex materials such as organic-inorganic solids 69,120 atom simulation of CsPbI3 perovskite based on the atomic cluster expansion (ACE) Animation by Will Baldwin (Small 20, 2303565, 2024)

Slide 26

Slide 26 text

Class Outline Crystal Representations A. Compositional B. Structural C. Graphs

Slide 27

Slide 27 text

Graphs Graphs are a representation common to many domains and problems Image courtesy of Michael Bronstein (University of Oxford)

Slide 28

Slide 28 text

Graphs P. W. Battaglia et al, arXiv:1806.01261 (2018)

Slide 29

Slide 29 text

Graph Components Nodes (Vertices), Edges, Global Attributes Crystal systems N – atoms E – bonds G – unit cell or materials properties N Edge Edge Edge Global N N Vectors can be associated with each component to encode & exchange information

Slide 30

Slide 30 text

Graph Components Nodes (Vertices), Edges, Global Attributes Image from https://distill.pub/2021/gnn-intro Graphs can be fully connected (every node connected to every other node), but sparse connections are often used

Slide 31

Slide 31 text

Graph Components Nodes (or Vertices), Edges, Global Attributes Image from https://distill.pub/2021/gnn-intro For chemical problems, nearest-neighbour connectivity is common, as used in β€œball and stick” representations Three edges Graph (excluding H nodes) Molecule (including H)

Slide 32

Slide 32 text

Standard crystallographic representation of materials Fractional positions xyz of atoms within a unit cell formed of lattice vectors abc Effective for humans Crystal graph representation Nodes (atoms) connected by edges (bonds). Multiple edges can describe periodicity Effective for ML models Crystal Graphs T. Xie and J. C. Grossman, Phys. Rev. Lett. 120, 145301 (2018)

Slide 33

Slide 33 text

Materials Graphs Nodes can be used to represent larger structural units of a crystal or even entire grains M. Dai et al, npj Comp. Mater. 7, 103 (2021)

Slide 34

Slide 34 text

Multi-Scale Representations Ongoing efforts to combine features that bridge from the micro to macroscale; from atoms to devices S. B. Torrisi et al, APL Machine Learning 1, 020901 (2023)

Slide 35

Slide 35 text

Class Outcomes 1. Describe the ways that chemical composition can expanded into vectors 2. Explain how the structure of a material can be represented for machine learning 3. Consider the limitations of a graph-based description of a three-dimensional structure Activity: Navigating crystal space