Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Materials (Lecture 6)

Aron Walsh
February 06, 2024

Machine Learning for Materials (Lecture 6)

Slides linked to https://github.com/aronwalsh/MLforMaterials. Updated for 2025.

Aron Walsh

February 06, 2024
Tweet

More Decks by Aron Walsh

Other Decks in Science

Transcript

  1. Aron Walsh Department of Materials Centre for Processable Electronics Machine

    Learning for Materials 6. Artificial Neural Networks Module MATE70026
  2. Module Contents 1. Introduction 2. Machine Learning Basics 3. Materials

    Data 4. Crystal Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Accelerated Discovery 9. Generative Artificial Intelligence 10. Recent Advances
  3. Class Outline Artificial Neural Networks A. From neuron to perceptron

    B. Network architecture and training C. Convolutional neural networks
  4. Artificial Neuron Neurons transmit chemical and electrical signals in the

    brain. Artificial neurons mimic this behaviour using mathematical functions Image: BioMed Research International Biological neuron Artificial neuron Cell nucleus Node Dendrites Input Synapse Weights (Interconnects) Axon Output The human brain has ~ 1011 neurons and 1015 synapses (~1015 FLOPS)
  5. x1 x2 x3 x4 x5 w1 w2 w3 w4 w5

    Artificial Neuron The perceptron is a binary neural network classifier: weighted inputs produce an output of 0 or 1 F. Rosenblatt, Cornell Aeronautical Laboratory, Report 85-460-1 (1957) y = f(w·x+b) Output Activation function Weighted input Bias (constant) if ∑xi wi + b > threshold: output = 1 else output = 0 Weights are adjusted to minimise the model error
  6. Class Outline Artificial Neural Networks A. From neuron to perceptron

    B. Network architecture and training C. Convolutional neural networks
  7. Neural Network Architecture Image generator: https://alexlenail.me/NN-SVG Basic neural network: One

    or two layers Deep neural network: Three or more layers Three layer model (input layer is excluded in counting)
  8. Neural Network Architecture Image generator: https://alexlenail.me/NN-SVG Basic neural network: One

    or two layers Deep neural network: Three or more layers Five layer model Note the layer 2 bottleneck Why? Compressed representation
  9. Activation Function w·x+b is simply a linear combination. Activation function

    f(w·x+b) introduces non-linearity Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models
  10. Activation Function Corresponding weights and thresholds are learned (fit) during

    model training Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models
  11. Universal Function Approximators Multilayer neural networks can approximate any continuous

    function to any desired accuracy K. Hurt, M. Stinchcombe and H. White, Neural Networks 2, 359 (1989) Practical performance will depend on the number of hidden layers, choice of activation function, and training data
  12. Universal Function Approximators Multilayer neural networks can approximate any continuous

    function to any desired accuracy S. J. D. Prince “Understanding Deep Learning” Approximation of a 1D function (dashed line) by a piecewise linear model
  13. Universal Function Approximators S. J. D. Prince “Understanding Deep Learning”

    The combination of two single-layer networks with three hidden units each
  14. Universal Function Approximators Extrapolation outside training region is not guaranteed

    (no fixed functional form) Four models with the same performance (in grey region) Be cautious with out of distribution (OOD) application
  15. Types of Layer in Deep Learning Layers are combined to

    learn representations and capture data patterns effectively • Dense (fully connected): neurons connected to every other neuron • Convolutional: filter applied to grid-like input, extracting features • Pooling: reduce spatial dimensions, retaining key information • Recurrent: incorporate feedback loops for sequential data flow • Dropout: randomly zero out inputs to mitigate overfitting in training • Embedding: map categorical variables into continuous vectors • Upscaling: increase spatial resolution of feature maps Self-study is needed if you want to delve deeper into these
  16. Backpropogation (Backprop) An algorithm used to adjust the weights of

    a neural network using gradient descent (from the output layer) I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Application of the chain rule Output layer Training iteration
  17. Backpropogation (Backprop) Backprop efficiently computes gradients, enabling networks to learn

    parameters by error minimisation I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Limitations • Slow training • Failure to converge • Local minima Improvements • Stochastic gradient descent (use random subset of data) • Batch normalisation (avoid vanishing gradients) • Adaptive learning rates (more robust convergence)
  18. Nobel Prize in Physics 2024 https://www.nobelprize.org/prizes/physics “Foundational discoveries and inventions

    that enable machine learning with artificial neural networks” Deterministic recurrent network of binary nodes (for local minima) Stochastic recurrent network of visible/hidden nodes (for global minima)
  19. Class Outline Artificial Neural Networks A. From neuron to perceptron

    B. Network architecture and training C. Convolutional neural networks
  20. Quiz What features could you use to separately cluster dogs

    and cats? Image from https://www.halifaxhumanesociety.org
  21. Feature Identification Significant progress has been made for image processing

    in the field of computer vision Algorithmic edge detection (e.g. intensity gradients) J. Canny, IEEE Trans. Pat. Anal. Mach. Intell. 6, 679 (1986) Popular packages for classical filters include imagej and scikit-image
  22. Feature Identification Significant progress has been made for image processing

    in the field of computer vision Deep learning image and language model (e.g. DenseCap) J. Johnson, A. Karpathy, L. Fei-Fei, arXiv:1511.07571 (2015)
  23. Images as Arrays Pixels are a convenient representation, but inefficient,

    e.g. 1 MP image = 1,000,000 pixels Image generated by DALL-3 text-to-image model Image Feature vector Array of pixels Decision boundaries are difficult to define, e.g. to distinguish between animals based on pixels alone
  24. Images as Arrays Each layer in a deep neural network

    can improve the representation of the preceding layer Image generated by DALL-3 text-to-image model Image The initial sparse input is densified as it passes through each layer of the network 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 {1,0} {“dog”, “cat”} Classification model
  25. Convolutional Filters Small matrices (kernels) that extract features from data

    by performing localised operations 2D input data Kernel (filter) Output 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 learned weights * = Kernel passes over the input data, capturing patterns at different locations, enabling the network to learn and detect specific features Filters are translation equivariant and can be tailored for rotational symmetry
  26. Convolutional Filters Small matrices (kernels) that extract features from data

    by performing localised operations 2D input data Kernel (filter) Output 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 learned weights 31 * = Sum of element-wise products: 1*1+0*2+1*3+0*4+1*5+1*6+1*7+0*8+1*9 = 31 Filters are translation equivariant and can be tailored for rotational symmetry
  27. Quiz What would these kernels do to an image? Kernel

    A 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Kernel B -1 -1 -1 2 2 2 -1 -1 -1 Kernel C -1 -1 -1 -1 8 -1 -1 -1 -1 An image of the proposed room-temperature superconductor LK-99
  28. Quiz What would these kernels do to an image? Kernel

    A 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Kernel B -1 -1 -1 2 2 2 -1 -1 -1 Kernel C -1 -1 -1 -1 8 -1 -1 -1 -1 Blur Horizontal lines Edge detection An image of the proposed room-temperature superconductor LK-99
  29. Convolutional Neural Networks (CNN) A type of neural network used

    for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) US Postal Service Challenge Computer recognition of handwritten zip codes
  30. Convolutional Neural Networks (CNN) A type of neural network used

    for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Input Direct images rather than feature vectors Output {0,1,2,3,4,5,6,7,8,9} 16x16 pixels Model 1000 neurons 9760 parameters
  31. Convolutional Neural Networks (CNN) A type of neural network used

    for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Training 7291 examples Testing 2007 examples (included hand-chosen ambiguous or unclassifiable samples)
  32. Convolutional Neural Networks (CNN) LeNet-5 was the fifth evolution of

    this network and became a standard in the field Y. LeCun et al, Proc. IEEE 1 (1998) Higher resolution input (MNIST dataset) Cn = convolutional layer n (extract features) Sn = sub-sampling layer n (reduce spatial dimensions)
  33. Towards State of the Art (SOTA) Example from https://towardsdatascience.com VGG16

    Computer Vision Model Softmax is an activation function common in the output layer of a neural network for classification tasks Modern deep learning models combine many layer types with 103-1012 parameters
  34. Towards State of the Art (SOTA) Modern deep learning models

    combine many layer types with 103-1012 parameters 𝜎(𝑧𝑖 ) = 𝑒𝑧𝑖/𝑇 σ 𝑗=1 𝑛 𝑒𝑧𝑗/𝑇 Partition function Softmax 3 6 7 11 4 Input vector 0.00 0.03 0.04 0.92 0.01 Class probability Appearance of the Boltzmann distribution (deep learning models often borrow from statistical mechanics)
  35. CNN for Ferroelectrics Convolutional network to map between local domain

    structures and switching behaviour S. V. Kalinin et al, ACS Appl. Mater. Inter. 13. 1693 (2021) Encoder Decoder 2D convolutions 1D convolutions
  36. Application to Materials Images from: https://distill.pub/2021/understanding-gnns Pixel-based convolutions Graph-based convolutions

    Information is stored on each piece of the graph, i.e. vectors associated with the nodes, edges and global attributes Information exchange Graph Convolutional Neural Networks (GCNN) are designed for graph-structured data
  37. Application to Materials Input graphs can be transformed and tailored

    for regression or classification tasks (V = vertex; E = edge, U = global attribute) Images from: https://distill.pub/2021/understanding-gnns End-to-end prediction Graph update
  38. Application to Materials Crystal Graph Convolutional Neural Networks (CGCNN) are

    being used for materials modelling C. Chen et al, Chem. Mater. 31, 3564 (2019) MEGNet (two-body connections); https://github.com/materialsvirtuallab/matgl
  39. Application to Materials A new generation of universal force fields

    that can predict energies and forces for any material C. Chen and S. P. Ong, Nature Computational Science 2, 718 (2022) Input Graph Output Energy Force Stress M3GNet; https://github.com/materialsvirtuallab/matgl Beyond pairwise interactions
  40. Application to Materials A new generation of universal force fields

    that can predict energies and forces for any material I. Batatia et al, arXiv:2401.00096 (2023); pip install mace-torch
  41. Class Outcomes 1. Describe how a perceptron works 2. Distinguish

    between different types of deep learning architectures 3. Specify how convolutional neural networks work and can be applied to materials problems Activity: Learning microstructure