Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Materials (Lecture 6)

Aron Walsh
February 06, 2024

Machine Learning for Materials (Lecture 6)

Aron Walsh

February 06, 2024
Tweet

More Decks by Aron Walsh

Other Decks in Science

Transcript

  1. Aron Walsh Department of Materials Centre for Processable Electronics Machine

    Learning for Materials 6. Artificial Neural Networks
  2. Course Contents 1. Course Introduction 2. Materials Modelling 3. Machine

    Learning Basics 4. Materials Data and Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Recent Advances in AI 9. and 10. Research Challenge
  3. Class Outline Artificial Neural Networks A. From neuron to perceptron

    B. Network architecture and training C. Convolutional neural networks
  4. Artificial Neuron Neurons transmit chemical and electrical signals in the

    brain. Artificial neurons mimics this behaviour using mathematical functions Image: BioMed Research International Biological neuron Artificial neuron Cell nucleus Node Dendrites Input Synapse Weights (Interconnects) Axon Output The human brain has ~ 1011 neurons and 1015 synapses (~1015 FLOPS)
  5. x1 x2 x3 x4 x5 w1 w2 w3 w4 w5

    Artificial Neuron The perceptron is a binary neural network classifier: weighted inputs produce an output of 0 or 1 F. Rosenblatt, Cornell Aeronautical Laboratory, Report 85-460-1 (1957) y = f(w·x+b) Output Activation function Weighted input Bias (constant) if ∑xi wi + b > threshold: output = 1 else output = 0 Weights are adjusted to minimise the model error
  6. Class Outline Artificial Neural Networks A. From neuron to perceptron

    B. Network architecture and training C. Convolutional neural networks
  7. Neural Network Architecture Image generator: https://alexlenail.me/NN-SVG Basic neural network: One

    or two layers Deep neural network: Three or more layers Three layer model (input layer is excluded in counting)
  8. Neural Network Architecture Image generator: https://alexlenail.me/NN-SVG Basic neural network: One

    or two layers Deep neural network: Three or more layers Five layer model Note the layer 2 bottleneck Why? Compressed representation
  9. Activation Function w·x+b is simply a linear combination. Activation function

    f(w·x+b) introduces non-linearity Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models
  10. Activation Function Corresponding weights and thresholds are learned (fit) during

    model training Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models
  11. Universal Function Approximators Multilayer neural networks can approximate any continuous

    function to any desired accuracy K. Hurt, M. Stinchcombe and H. White, Neural Networks 2, 359 (1989) Practical performance will depend on the number of hidden layers, choice of activation function, and training data available
  12. Universal Function Approximators Multilayer neural networks can approximate any continuous

    function to any desired accuracy S. J. D. Prince “Understanding Deep Learning” Approximation of a 1D function (dashed line) by a piecewise linear model
  13. Universal Function Approximators S. J. D. Prince “Understanding Deep Learning”

    The combination of two single-layer networks with three hidden units each
  14. Universal Function Approximators Performance depends on model architecture Example from

    https://writings.stephenwolfram.com Optimal trained model for each network Target function
  15. Universal Function Approximators Extrapolation outside training region is not guaranteed

    (no fixed functional form) Four models with similar training accuracy (in the white region) Example from https://writings.stephenwolfram.com Target function
  16. Types of Layer in Deep Learning Layers are combined to

    learn representations and capture data patterns effectively • Dense (fully connected): neurons connected to every other neuron • Convolutional: filter applied to grid-like input, extracting features • Pooling: reduce spatial dimensions, retaining key information • Recurrent: incorporate feedback loops for sequential data flow • Dropout: randomly zero out inputs to mitigate overfitting in training • Embedding: map categorical variables into continuous vectors • Upscaling: increase spatial resolution of feature maps Self-study is needed if you want to get deeper into these (beyond our scope)
  17. Towards State of the Art (SOTA) Example from https://towardsdatascience.com VGG16

    Computer Vision Model Softmax is an activation function common in the output layer of a neural network for classification tasks Modern deep learning models combine many layer types with 103-1012 parameters
  18. Towards State of the Art (SOTA) Modern deep learning models

    combine many layer types with 103-1012 parameters Example from https://towardsdatascience.com Appearance of the Boltzmann distribution (recall your thermodynamics lectures) 𝜎(𝑧! ) = 𝑒"! ∑ #$% & 𝑒"" Partition function Softmax 3 6 7 11 4 Input vector 0.00 0.03 0.04 0.92 0.01 Class probability
  19. Backpropogation (Backprop) An algorithm used to adjust the weights of

    a neural network using gradient descent (from the output layer) I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Application of the chain rule Output layer Training iteration
  20. Backpropogation (Backprop) Backprop efficiently computes gradients, enabling networks to learn

    parameters by error minimisation I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Limitations • Slow training • Failure to converge • Local minima Improvements • Stochastic gradient descent (use random subset of data) • Batch normalisation (avoid vanishing gradients) • Adaptive learning rates (more robust convergence)
  21. Class Outline Artificial Neural Networks A. From neuron to perceptron

    B. Network architecture and training C. Convolutional neural networks
  22. Quiz What features could you use to separately cluster dogs

    and cats? Image from https://www.halifaxhumanesociety.org
  23. Images as Arrays Pixels are a convenient representation, but inefficient,

    e.g. 1 MP image = 1,000,000 pixels Image: Eric Eaton and Dinesh Jayaraman (Penn Engineering) Image Array of pixels Feature vector Decision boundaries are difficult to define, e.g. to distinguish between animals based on pixels alone
  24. Feature Identification Significant progress has been made for image processing

    in the field of computer vision Algorithmic edge detection (e.g. intensity gradients) J. Canny, IEEE Trans. Pat. Anal. Mach. Intell. 6, 679 (1986) Popular packages for classical filters include imagej and scikit-image
  25. Feature Identification Significant progress has been made in the field

    of computer vision Deep learning image and language model (e.g. DenseCap) J. Johnson, A. Karpathy, L. Fei-Fei, arXiv:1511.07571 (2015)
  26. Feature Identification Each layer in a deep neural network can

    improve the representation of the preceding layer Image Classification model Image: Eric Eaton and Dinesh Jayaraman (Penn Engineering) {1,0} {“dog”, “cat”} 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0
  27. Convolutional Filters Small matrices (kernels) to extract features from data

    by performing localised operations 2D input data Kernel (filter) Output 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 learned weights * = Kernel passes over the input data, capturing patterns at different locations, enabling the network to learn and detect specific features Filters can be made translation and rotational invariant (equivariant deep learning)
  28. Convolutional Filters Small matrices (kernels) to extract features from data

    by performing localised operations 2D input data Kernel (filter) Output 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 learned weights 31 * = Sum of element-wise products: 1*1+0*2+1*3+0*4+1*5+1*6+1*7+0*8+1*9 = 31 Filters can be made translation and rotational invariant (equivariant deep learning)
  29. Quiz What would these kernels do to an image? Kernel

    A 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Kernel B -1 -1 -1 2 2 2 -1 -1 -1 Kernel C -1 -1 -1 -1 8 -1 -1 -1 -1 An image of the proposed room-temperature superconductor LK-99
  30. Quiz What would these kernels do to an image? Kernel

    A 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Kernel B -1 -1 -1 2 2 2 -1 -1 -1 Kernel C -1 -1 -1 -1 8 -1 -1 -1 -1 Blur Horizontal lines Edge detection An image of the proposed room-temperature superconductor LK-99
  31. Convolutional Neural Networks (CNN) A type of neural network used

    for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) US Postal Service Challenge Computer recognition of handwritten zip codes
  32. Convolutional Neural Networks (CNN) A type of neural network used

    for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Input Direct images rather than feature vectors Output {0,1,2,3,4,5,6,7,8,9} 16x16 pixels Model 1000 neurons 9760 parameters
  33. Convolutional Neural Networks (CNN) A type of neural network used

    for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Training 7291 examples Testing 2007 examples (included hand-chosen ambiguous or unclassifiable samples)
  34. Convolutional Neural Networks (CNN) LeNet-5 was the fifth evolution of

    this network and became a standard in the field Y. LeCun et al, Proc. IEEE 1 (1998) Higher resolution input (MNIST dataset) Cn = convolutional layer n (extract features) Sn = sub-sampling layer n (reduce spatial dimensions)
  35. Advanced Architecture: ResNet ResNet (Residual Network) is a deep neural

    network with “skip” connections for effective training Images from https://www.baeldung.com/cs/residual-networks Residual Block skip connection for x Approach avoids vanishing gradient problem for training of deep networks
  36. Advanced Architecture: ResNet ResNet (Residual Network) is a deep neural

    network with “skip” connections for effective training Images from https://www.baeldung.com/cs/residual-networks Access higher performance deep models
  37. CNN for Ferroelectrics Convolutional network to map between local domain

    structures and switching behaviour S. V. Kalinin et al, ACS Appl. Mater. Inter. 13. 1693 (2021) Encoder Decoder 2D convolutions 1D convolutions
  38. Application to Materials Images from: https://distill.pub/2021/understanding-gnns Pixel-based convolutions Graph-based convolutions

    Information is stored on each piece of the graph, i.e. vectors associated with the nodes, edges and global attributes Information exchange Graph Convolutional Neural Networks (GCNN) are designed for graph-structured data
  39. Application to Materials Input graphs can be transformed and tailored

    for regression or classification tasks (V = vertex; E = edge, U = global attribute) Images from: https://distill.pub/2021/understanding-gnns End-to-end prediction Graph update
  40. Application to Materials Crystal Graph Convolutional Neural Networks (CGCNN) are

    being used for materials modelling C. Chen et al, Chem. Mater. 31, 3564 (2019) MEGNet (two-body connections); https://github.com/materialsvirtuallab/matgl
  41. Application to Materials A new generation of universal force fields

    that can predict energies and forces for any material C. Chen and S. P. Ong, Nature Computational Science 2, 718 (2022) Input Graph Output Energy Force Stress M3GNet; https://github.com/materialsvirtuallab/matgl Beyond pairwise interactions
  42. Application to Materials A new generation of universal force fields

    that can predict energies and forces for any material I. Batatia et al, arXiv:2401.00096 (2023); pip install mace-torch
  43. Class Outcomes 1. Describe how a perceptron works 2. Distinguish

    between different types of deep learning architectures 3. Specify how convolutional neural networks work and can be applied to materials problems Activity: Learning microstructure