Slide 1

Slide 1 text

Aron Walsh Department of Materials Centre for Processable Electronics Machine Learning for Materials 2. Machine Learning Basics Module MATE70026

Slide 2

Slide 2 text

Module Contents 1. Introduction 2. Machine Learning Basics 3. Materials Data 4. Crystal Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Accelerated Discovery 9. Generative Artificial Intelligence 10. Recent Advances

Slide 3

Slide 3 text

Artificial Intelligence Computational techniques that mimic human intelligence ARTIFICIAL INTELLIGENCE (AI) (entire knowledge field) MACHINE LEARNING (ML) (data-driven statistical models) Supervised Unsupervised Reinforcement DEEP LEARNING (multi-layered neural networks) Nobel Prizes in Chemistry and Physics (2024)

Slide 4

Slide 4 text

Focus on Machine Learning (ML) https://vas3k.com/blog/machine_learning Statistical techniques that improve with experience

Slide 5

Slide 5 text

Focus on Machine Learning (ML) https://vas3k.com/blog/machine_learning

Slide 6

Slide 6 text

Quiz What type of learning is this? Input Output {Cubic, Tetragonal, Orthorhombic, Hexagonal, Trigonal, Monoclinic, Triclinic} ML Model Crystal structure Crystal system Black box of trained parameters “Black box” is a common criticism of ML – with the right tools you can open it up!

Slide 7

Slide 7 text

Class Outline Materials Learning Basics A. Terminology B. Evaluation metrics C. Learning by example

Slide 8

Slide 8 text

Function Approximation y = f(x) Output Input Property Features 1. Composition 2. Structure Linear Regression y = β 0 + β 1 x1 + β 2 x2 Property Composition Structure Learned weights Constant

Slide 9

Slide 9 text

Function Approximation Generalised Linear Models y = β 0 + f1 (x1 ) + f2 (x2 ) Property Composition Structure Constant y = β 0 + β 1 x1 + β 2 x2 + β 3 x1 x2 Non-Linear Interactions Property Composition Structure Constant Coupling

Slide 10

Slide 10 text

Function Approximation You should recognise the underlying function your undergraduate classes

Slide 11

Slide 11 text

Function Approximation My reference function to generate data for model training and testing

Slide 12

Slide 12 text

Function Approximation Underfitting Overfitting Fitting Default parameters with the scikit-learn Python package

Slide 13

Slide 13 text

Function Approximation Standard expansions work in low dimensions (D). Real problems face the “curse of dimensionality” An exponential increase in the data requirements needed to cover the parameter space effectively, O(eD) M. M. Bronstein et al, arXiv:2104.13478 (2021)

Slide 14

Slide 14 text

Three Components of ML Models 1. Representation Type of data and model architecture A Few Useful Things to Know about Machine Learning, P. Domingos 2. Evaluation Objective (or scoring) function to distinguish good from bad models 3. Optimisation Update of model parameters to improve performance

Slide 15

Slide 15 text

ML Vocabulary • Classification model – input a tensor of feature values and output a single discrete value (the class) • Regression model – input a tensor of feature values and output a continuous (predicted) value • Feature – an input variable • Labelled example – a feature with its corresponding label (the “answer” or “result”) • Ground truth – reliable reference value(s) • Hyperparameter – model variables that can be tuned to optimise performance, e.g. learning rate See https://developers.google.com/machine-learning/glossary

Slide 16

Slide 16 text

ML Vocabulary • Bias – systematic error in the average prediction • Variance – variability around the average prediction Low variance High variance Low bias High bias underfitting overfitting Predicted values (purple circles) compared to the ground truth (red centre)

Slide 17

Slide 17 text

ML Vocabulary • Underfitting – model too simple to describe patterns • Overfitting – model too complex and fits noise Image from https://github.com/jermwatt/machine_learning_refined Regression Classification

Slide 18

Slide 18 text

ML Vocabulary Image from https://github.com/jermwatt/machine_learning_refined Dataset split High training error & high bias High validation error & high variance • Underfitting – model too simple to describe patterns • Overfitting – model too complex and fits noise

Slide 19

Slide 19 text

Model assessment Typical Supervised ML Workflow Initial dataset x, y Data cleaning and feature engineering The exact workflow depends on the type of problem and available data Model training and validation Final model xnew ypredict Test (20%) xtest , ytest Train (80%) xtrain , ytrain Human time intensive Computer time intensive Production

Slide 20

Slide 20 text

Class Outline Materials Learning Basics A. Terminology B. Evaluation metrics C. Learning by example

Slide 21

Slide 21 text

Model Assessment Consider a linear model with optimal weights w 𝒚𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = model(𝒙, 𝒘) = 𝑤0 + 𝑤1 𝑥 i = 1 𝑛 σ𝑖=1 𝑛 (𝑦i − model 𝑥𝑖 , 𝒘 )2 = 1 𝑛 σ𝑖=1 𝑛 (𝑒i )2 Mean squared error (MSE) Squaring the error ensures non-negativity and penalises larger deviations

Slide 22

Slide 22 text

• Residual – a measure of prediction error • MAE – Mean Absolute Error = • RMSE – Root Mean Square Error = • Standard Deviation – a measure of the amount of dispersion in a set of values. Small = close to the mean. Expressed in the same units as the data, e.g. lattice parameters a = 4 Å, 5 Å, 6 Å mean = (4+5+6)/3 = 5 Å deviation = -1, 0, 1; deviation squared = 1, 0, 1 sample variance σ2 = (1+0+1)/2 = 1 standard deviation σ = 1 Å Model Assessment σ𝑖=1 𝑛 𝑒𝑖 𝑛 σ 𝑖=1 𝑛 (𝑒𝑖 )2 𝑛 𝑒𝑖 = 𝑦𝑖 − 𝑦 𝑖 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑

Slide 23

Slide 23 text

Model Training → Minimising Error Model weights w are adjusted until a cost function (e.g. RMSE) is minimised Gradient descent is a popular choice: 𝑤𝑖 → 𝑤𝑖 - α 𝑑 Error 𝑑𝑤𝑖 Learning rate Warning: local optimisation algorithms often miss global minima Parameter set w Error Step 0 3 1 2

Slide 24

Slide 24 text

Model Training → Minimising Error Model weights w are adjusted until a cost function (e.g. RMSE) is minimised Gradient descent is a popular choice: 𝑤𝑖 → 𝑤𝑖 - α 𝑑 Error 𝑑𝑤𝑖 Learning rate Animation from https://github.com/jermwatt/machine_learning_refined w RMSE

Slide 25

Slide 25 text

Model Training → Minimising Error Optimisation algorithms have their own parameters, e.g. step size and no. of iterations Animation from https://github.com/jermwatt/machine_learning_refined w Step number RMSE Learning rate (step size)

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Correlation Coefficient (r) Describes the strength of the relationship between two variables (e.g. “ground truth” vs predicted values) r ∊ [-1,1] Positive: variables change in the same direction Zero: no relationship between the variables Negative: variables change in opposite directions *Outlined by Auguste Bravais (1844); https://bit.ly/3Kv75GJ Reminder: correlation does not imply causation Pearson correlation* 𝑟𝑥𝑦 = σ𝑖=1 𝑛 (𝑥𝑖 − ҧ 𝑥)(𝑦𝑖 − ത 𝑦) σ 𝑖=1 𝑛 (𝑥𝑖 − ҧ 𝑥)2 σ 𝑖=1 𝑛 (𝑦𝑖 − ҧ 𝑥)2

Slide 28

Slide 28 text

Coefficient of Determination (r2) Measure of the goodness of fit for a model. Describes how well that known data is approximated r2 ∊ [0,1] Zero: baseline model with no variability that predicts 0.5: 50% of the variability in y is accounted for One: model matches observed values of y exactly S. Wright “Correlation and Causation”, J. Agri. Res. 20, 557 (1921) Note: a unitless metric. Alternative definitions are sometimes used 𝑟2 = 1 − 𝑆𝑆𝑟𝑒𝑠 𝑆𝑆𝑡𝑜𝑡 ഥ 𝒚 𝑟2 = 1 − σ𝑖=1 𝑛 𝑦𝑖−𝑦 𝑖 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 2 σ 𝑖=1 𝑛 (𝑦𝑖− ത 𝑦)2 𝑟2 = 1 − σ𝑖=1 𝑛 𝑒𝑖 2 σ 𝑖=1 𝑛 (𝑦𝑖− ത 𝑦)2 Three equivalent definitions

Slide 29

Slide 29 text

Correlation, Causation… F. Messereli, New England Journal of Medicine 367, 1562 (2012) Chocolate Nobel prizes Correlation Confounder Causation Causation Causal Inference

Slide 30

Slide 30 text

Classification Metrics Confusion (or error) matrix provides a summary of classification model performance K. Pearson “Mathematical Contributions to the Theory of Evolution” (1904) True positive (𝐓𝐏) False negative (𝐅𝐍) False positive (𝐅𝐏) True negative (𝐓𝐍) Actual class Predicted class + + - - 70 0 0 30 Perfect model to classify metals and insulators (N = 100) 66 4 8 22 My best model Accuracy = Correct/Total (70+30)/100 = 100 % (66+22)/100 = 88 %

Slide 31

Slide 31 text

Classification Metrics Confusion (or error) matrix provides a summary of classification model performance K. Pearson “Mathematical Contributions to the Theory of Evolution” (1904) True positive (𝐓𝐏) False negative (𝐅𝐍) False positive (𝐅𝐏) True negative (𝐓𝐍) Actual class Predicted class + + - - 70 0 0 30 Perfect model to classify metals and insulators (N = 100) 66 4 8 22 My best model Sensitivity = TP/(TP+FN) 70/(70+0) = 100 % 66/(66+4) = 94 %

Slide 32

Slide 32 text

Quiz Fill in “?” for this confusion matrix Metal Insulator Predicted class Actual class Insulator Metal Insulator 10 2 Metal ? 4 There are 20 data points in total

Slide 33

Slide 33 text

Class Outline Materials Learning Basics A. Terminology B. Evaluation metrics C. Learning by example

Slide 34

Slide 34 text

Supervised Regression Model that maps an input to an output based on example input-output pairs (labelled data) Regression predicts a continuous value, e.g. to extract a reaction rate 𝒚 = 𝑓(𝐱) + ε Target variable Learned function Error We’ll start to cover the model (function) details in Lecture 5

Slide 35

Slide 35 text

Regression Example Predict the dielectric constant of a crystal K. Morita et al, J. Chem. Phys. 153, 024503 (2020) Support vector regression (r2=0.92) Note: outliers are often interesting cases (poor or exceptional data)

Slide 36

Slide 36 text

Regression Example Predict the dielectric constant of a crystal K. Morita et al, J. Chem. Phys. 153, 024503 (2020); https://github.com/slundberg/shap SHAP (SHapley Additive exPlanations) analysis is a method for interpreting ML models Relative importance of input features in making predictions A positive SHAP indicates a feature contributes to an increase in the prediction

Slide 37

Slide 37 text

Regression Example Predict the dielectric constant of a crystal K. Morita et al, J. Chem. Phys. 153, 024503 (2020); https://github.com/slundberg/shap CdCN2 Transparent breakdown of a predicted value Note: a physical connection is not implied (only a correlation)

Slide 38

Slide 38 text

Supervised Classification Model that maps an input to an output based on example input-output pairs (labelled data) Classification predicts a category, e.g. decision trees for reaction outcomes 𝒚 = 𝑓(𝐱) Class label Classifier Assignment can be absolute or probabilistic (e.g. 90% apple, 10% pear)

Slide 39

Slide 39 text

Classification Example G. H. Gu et al, npj Computational Materials 8, 71 (2022) Predict if a material will be stable or unstable Crystal likeness (CL) Probabilistic score for the class label

Slide 40

Slide 40 text

Classification Example G. H. Gu et al, npj Computational Materials 8, 71 (2022) ABX3 perovskite crystals Radius ratio rules Neural network model Improved selectivity for promising compositions Likelihood of formation

Slide 41

Slide 41 text

Unsupervised Learning Model that can identify trends or correlations within a dataset (unlabeled data) Clustering groups data by similarity, e.g. high-throughput crystallography 𝐱 → 𝑓(𝐱) Input data Transformation to new representation

Slide 42

Slide 42 text

Unsupervised Example Map materials space according to their features Hyunsoo Park et al, Faraday Discussions 256, 601 (2025) Dimensionality reduction techniques: PCA: Principal component analysis t-SNE: t-distributed stochastic neighbour embedding UMAP: Uniform manifold approximation and projection for dimension reduction

Slide 43

Slide 43 text

Reinforcement Learning Model that performs a series of actions by trial and error to achieve an objective Maximise reward, e.g. reaction conditions to optimise yield Agent ⇌ Environment Automated experiments Actions Samples, conditions, etc.

Slide 44

Slide 44 text

Reinforcement Example Achieve the highest score or outcome (e.g. AlphaGo) Image from https://towardsdatascience.com/how-to-teach-an-ai-to-play-games Untrained model Trained model Q-learning model

Slide 45

Slide 45 text

Reinforcement Example This familiar equation is a softmax (Boltzmann) policy Hyunsoo Park et al, Digital Discovery 3, 728 (2024) Optimisation of metal-organic frameworks

Slide 46

Slide 46 text

Class Outcomes 1. Define machine learning 2. Describe the three components of machine learning with examples 3. Explain the statistical metrics used to assess model performance Activity: Crystal hardness