Slide 1

Slide 1 text

Aron Walsh Department of Materials Centre for Processable Electronics Machine Learning for Materials 6. Artificial Neural Networks Module MATE70026

Slide 2

Slide 2 text

Module Contents 1. Introduction 2. Machine Learning Basics 3. Materials Data 4. Crystal Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Accelerated Discovery 9. Generative Artificial Intelligence 10. Recent Advances

Slide 3

Slide 3 text

Class Outline Artificial Neural Networks A. From neuron to perceptron B. Network architecture and training C. Convolutional neural networks

Slide 4

Slide 4 text

Artificial Neuron Neurons transmit chemical and electrical signals in the brain. Artificial neurons mimic this behaviour using mathematical functions Image: BioMed Research International Biological neuron Artificial neuron Cell nucleus Node Dendrites Input Synapse Weights (Interconnects) Axon Output The human brain has ~ 1011 neurons and 1015 synapses (~1015 FLOPS)

Slide 5

Slide 5 text

x1 x2 x3 x4 x5 w1 w2 w3 w4 w5 Artificial Neuron The perceptron is a binary neural network classifier: weighted inputs produce an output of 0 or 1 F. Rosenblatt, Cornell Aeronautical Laboratory, Report 85-460-1 (1957) y = f(w·x+b) Output Activation function Weighted input Bias (constant) if ∑xi wi + b > threshold: output = 1 else output = 0 Weights are adjusted to minimise the model error

Slide 6

Slide 6 text

The function “zip” pairs the elements of x and w together in tuples

Slide 7

Slide 7 text

Class Outline Artificial Neural Networks A. From neuron to perceptron B. Network architecture and training C. Convolutional neural networks

Slide 8

Slide 8 text

Neural Network Architecture Image generator: https://alexlenail.me/NN-SVG Basic neural network: One or two layers Deep neural network: Three or more layers Three layer model (input layer is excluded in counting)

Slide 9

Slide 9 text

Neural Network Architecture Image generator: https://alexlenail.me/NN-SVG Basic neural network: One or two layers Deep neural network: Three or more layers Five layer model Note the layer 2 bottleneck Why? Compressed representation

Slide 10

Slide 10 text

Activation Function w·x+b is simply a linear combination. Activation function f(w·x+b) introduces non-linearity Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models

Slide 11

Slide 11 text

Activation Function Corresponding weights and thresholds are learned (fit) during model training Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models

Slide 12

Slide 12 text

Universal Function Approximators Multilayer neural networks can approximate any continuous function to any desired accuracy K. Hurt, M. Stinchcombe and H. White, Neural Networks 2, 359 (1989) Practical performance will depend on the number of hidden layers, choice of activation function, and training data

Slide 13

Slide 13 text

Universal Function Approximators Multilayer neural networks can approximate any continuous function to any desired accuracy S. J. D. Prince “Understanding Deep Learning” Approximation of a 1D function (dashed line) by a piecewise linear model

Slide 14

Slide 14 text

Universal Function Approximators S. J. D. Prince “Understanding Deep Learning” The combination of two single-layer networks with three hidden units each

Slide 15

Slide 15 text

Universal Function Approximators Extrapolation outside training region is not guaranteed (no fixed functional form) Four models with the same performance (in grey region) Be cautious with out of distribution (OOD) application

Slide 16

Slide 16 text

Types of Layer in Deep Learning Layers are combined to learn representations and capture data patterns effectively • Dense (fully connected): neurons connected to every other neuron • Convolutional: filter applied to grid-like input, extracting features • Pooling: reduce spatial dimensions, retaining key information • Recurrent: incorporate feedback loops for sequential data flow • Dropout: randomly zero out inputs to mitigate overfitting in training • Embedding: map categorical variables into continuous vectors • Upscaling: increase spatial resolution of feature maps Self-study is needed if you want to delve deeper into these

Slide 17

Slide 17 text

Backpropogation (Backprop) An algorithm used to adjust the weights of a neural network using gradient descent (from the output layer) I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Application of the chain rule Output layer Training iteration

Slide 18

Slide 18 text

Backpropogation (Backprop) Backprop efficiently computes gradients, enabling networks to learn parameters by error minimisation I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Limitations • Slow training • Failure to converge • Local minima Improvements • Stochastic gradient descent (use random subset of data) • Batch normalisation (avoid vanishing gradients) • Adaptive learning rates (more robust convergence)

Slide 19

Slide 19 text

Nobel Prize in Physics 2024 https://www.nobelprize.org/prizes/physics “Foundational discoveries and inventions that enable machine learning with artificial neural networks” Deterministic recurrent network of binary nodes (for local minima) Stochastic recurrent network of visible/hidden nodes (for global minima)

Slide 20

Slide 20 text

Class Outline Artificial Neural Networks A. From neuron to perceptron B. Network architecture and training C. Convolutional neural networks

Slide 21

Slide 21 text

Quiz What features could you use to separately cluster dogs and cats? Image from https://www.halifaxhumanesociety.org

Slide 22

Slide 22 text

Feature Identification Significant progress has been made for image processing in the field of computer vision Algorithmic edge detection (e.g. intensity gradients) J. Canny, IEEE Trans. Pat. Anal. Mach. Intell. 6, 679 (1986) Popular packages for classical filters include imagej and scikit-image

Slide 23

Slide 23 text

Feature Identification Significant progress has been made for image processing in the field of computer vision Deep learning image and language model (e.g. DenseCap) J. Johnson, A. Karpathy, L. Fei-Fei, arXiv:1511.07571 (2015)

Slide 24

Slide 24 text

Images as Arrays Pixels are a convenient representation, but inefficient, e.g. 1 MP image = 1,000,000 pixels Image generated by DALL-3 text-to-image model Image Feature vector Array of pixels Decision boundaries are difficult to define, e.g. to distinguish between animals based on pixels alone

Slide 25

Slide 25 text

Images as Arrays Each layer in a deep neural network can improve the representation of the preceding layer Image generated by DALL-3 text-to-image model Image The initial sparse input is densified as it passes through each layer of the network 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 {1,0} {“dog”, “cat”} Classification model

Slide 26

Slide 26 text

Convolutional Filters Small matrices (kernels) that extract features from data by performing localised operations 2D input data Kernel (filter) Output 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 learned weights * = Kernel passes over the input data, capturing patterns at different locations, enabling the network to learn and detect specific features Filters are translation equivariant and can be tailored for rotational symmetry

Slide 27

Slide 27 text

Convolutional Filters Small matrices (kernels) that extract features from data by performing localised operations 2D input data Kernel (filter) Output 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 learned weights 31 * = Sum of element-wise products: 1*1+0*2+1*3+0*4+1*5+1*6+1*7+0*8+1*9 = 31 Filters are translation equivariant and can be tailored for rotational symmetry

Slide 28

Slide 28 text

Convolutional Filters Image: I. Goodfellow, Y Bengio, A. Courville, “Deep Learning”

Slide 29

Slide 29 text

Quiz What would these kernels do to an image? Kernel A 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Kernel B -1 -1 -1 2 2 2 -1 -1 -1 Kernel C -1 -1 -1 -1 8 -1 -1 -1 -1 An image of the proposed room-temperature superconductor LK-99

Slide 30

Slide 30 text

Quiz What would these kernels do to an image? Kernel A 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Kernel B -1 -1 -1 2 2 2 -1 -1 -1 Kernel C -1 -1 -1 -1 8 -1 -1 -1 -1 Blur Horizontal lines Edge detection An image of the proposed room-temperature superconductor LK-99

Slide 31

Slide 31 text

Convolutional Neural Networks (CNN) A type of neural network used for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) US Postal Service Challenge Computer recognition of handwritten zip codes

Slide 32

Slide 32 text

Convolutional Neural Networks (CNN) A type of neural network used for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Input Direct images rather than feature vectors Output {0,1,2,3,4,5,6,7,8,9} 16x16 pixels Model 1000 neurons 9760 parameters

Slide 33

Slide 33 text

Convolutional Neural Networks (CNN) A type of neural network used for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Training 7291 examples Testing 2007 examples (included hand-chosen ambiguous or unclassifiable samples)

Slide 34

Slide 34 text

Convolutional Neural Networks (CNN) LeNet-5 was the fifth evolution of this network and became a standard in the field Y. LeCun et al, Proc. IEEE 1 (1998) Higher resolution input (MNIST dataset) Cn = convolutional layer n (extract features) Sn = sub-sampling layer n (reduce spatial dimensions)

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Towards State of the Art (SOTA) Example from https://towardsdatascience.com VGG16 Computer Vision Model Softmax is an activation function common in the output layer of a neural network for classification tasks Modern deep learning models combine many layer types with 103-1012 parameters

Slide 37

Slide 37 text

Towards State of the Art (SOTA) Modern deep learning models combine many layer types with 103-1012 parameters 𝜎(𝑧𝑖 ) = 𝑒𝑧𝑖/𝑇 σ 𝑗=1 𝑛 𝑒𝑧𝑗/𝑇 Partition function Softmax 3 6 7 11 4 Input vector 0.00 0.03 0.04 0.92 0.01 Class probability Appearance of the Boltzmann distribution (deep learning models often borrow from statistical mechanics)

Slide 38

Slide 38 text

CNN for Ferroelectrics Convolutional network to map between local domain structures and switching behaviour S. V. Kalinin et al, ACS Appl. Mater. Inter. 13. 1693 (2021) Encoder Decoder 2D convolutions 1D convolutions

Slide 39

Slide 39 text

Application to Materials Images from: https://distill.pub/2021/understanding-gnns Pixel-based convolutions Graph-based convolutions Information is stored on each piece of the graph, i.e. vectors associated with the nodes, edges and global attributes Information exchange Graph Convolutional Neural Networks (GCNN) are designed for graph-structured data

Slide 40

Slide 40 text

Application to Materials Input graphs can be transformed and tailored for regression or classification tasks (V = vertex; E = edge, U = global attribute) Images from: https://distill.pub/2021/understanding-gnns End-to-end prediction Graph update

Slide 41

Slide 41 text

Application to Materials Crystal Graph Convolutional Neural Networks (CGCNN) are being used for materials modelling C. Chen et al, Chem. Mater. 31, 3564 (2019) MEGNet (two-body connections); https://github.com/materialsvirtuallab/matgl

Slide 42

Slide 42 text

Application to Materials A new generation of universal force fields that can predict energies and forces for any material C. Chen and S. P. Ong, Nature Computational Science 2, 718 (2022) Input Graph Output Energy Force Stress M3GNet; https://github.com/materialsvirtuallab/matgl Beyond pairwise interactions

Slide 43

Slide 43 text

Application to Materials A new generation of universal force fields that can predict energies and forces for any material I. Batatia et al, arXiv:2401.00096 (2023); pip install mace-torch

Slide 44

Slide 44 text

Class Outcomes 1. Describe how a perceptron works 2. Distinguish between different types of deep learning architectures 3. Specify how convolutional neural networks work and can be applied to materials problems Activity: Learning microstructure