Aron Walsh
January 29, 2024
910

# Machine Learning for Materials (Lecture 3)

January 29, 2024

## Transcript

1. ### Aron Walsh Department of Materials Centre for Processable Electronics Machine

Learning for Materials 3. Machine Learning Basics
2. ### Course Contents 1. Course Introduction 2. Materials Modelling 3. Machine

Learning Basics 4. Materials Data and Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Recent Advances in AI 9. and 10. Research Challenge
3. ### Field of Artificial Intelligence (AI) https://vas3k.com/blog/machine_learning Computational techniques that mimic

human intelligence (knowledge field)
4. ### Focus on Machine Learning (ML) https://vas3k.com/blog/machine_learning Statistical techniques that improve

with experience (input variables)

7. ### Quiz What type of learning is this? Input Output {Cubic,

Tetragonal, Orthorhombic, Hexagonal, Trigonal, Monoclinic, Triclinic} ML Model
8. ### Class Outline Materials Learning Basics A. Terminology B. Evaluation metrics

C. Learning by example
9. ### Function Approximation y = f(x) Output Input Property Features 1.

Composition 2. Structure Linear Regression y = β0 + β1 x1 + β2 x2 Property Composition Structure Learned weights Constant
10. ### Function Approximation Generalised Linear Models y = β0 + f1

(x1 ) + f2 (x2 ) Property Composition Structure Constant y = β0 + β1 x1 + β2 x2 + β3 x1 x2 Non-Linear Interactions Property Composition Structure Constant Coupling
11. ### Function Approximation Standard expansions work in low dimensions (D). Many

real problems face the “curse of dimensionality” An exponential increase in the data requirements needed to cover the parameter space effectively, O(eD) M. M. Bronstein et al, arXiv:2104.13478 (2021)
12. ### Three Components of ML Models 1. Representation Type of data

and model architecture A Few Useful Things to Know about Machine Learning, P. Domingos 2. Evaluation Objective (or scoring) function to distinguish good from bad models 3. Optimisation Update of model parameters to improve performance
13. ### ML Vocabulary • Classification model – input a tensor of

feature values and output a single discrete value (the class) • Regression model – input a tensor of feature values and output a continuous (predicted) value • Feature – an input variable • Labelled example – a feature with its corresponding label (the “answer” or “result”) • Ground truth – reliable reference value(s) • Hyperparameter – model variables that can be tuned to optimise performance, e.g. learning rate See https://developers.google.com/machine-learning/glossary
14. ### ML Vocabulary • Bias – systematic error in the average

prediction • Variance – variability around the average prediction Low variance High variance Low bias High bias underfitting overfitting Distribution of predicted values (purple circles) compared to the ground truth
15. ### ML Vocabulary • Underfitting – model too simple to describe

patterns • Overfitting – model too complex and fits noise Image from https://github.com/jermwatt/machine_learning_refined Regression Classification
16. ### ML Vocabulary Image from https://github.com/jermwatt/machine_learning_refined Dataset split High training error

& high bias High validation error & high variance • Underfitting – model too simple to describe patterns • Overfitting – model too complex and fits noise
17. ### Model assessment Typical Supervised ML Workflow Initial dataset x, y

Data cleaning and feature engineering The exact workflow depends on the type of problem and available data Model training and validation Final model xnew ypredict Test (20%) xtest , ytest Train (80%) xtrain , ytrain Human time intensive Computer time intensive Production
18. ### Class Outline Materials Learning Basics A. Terminology B. Evaluation metrics

C. Learning by example
19. ### Model Quality F. Messereli, New England Journal of Medicine 367,

1562 (2012) Chocolate Nobel prizes Correlation Confounder Causation Causation Causal Inference
20. ### Model Evaluation Consider a linear model with optimal weights w

𝒚𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = model(𝒙, 𝒘) = 𝑤0 + 𝑤1 𝑥i = ! " ∑!"# \$ (𝑦i − model 𝑥𝑖 , 𝒘 )% = ! " ∑!"# \$ (𝑒i )% Mean squared error (MSE)
21. ### • Residual – a measure of prediction error • MAE

– Mean Absolute Error = • RMSE – Root Mean Square Error = • Standard Deviation – a measure of the amount of dispersion in a set of values. Small = close to the mean. Expressed in the same units as the data, e.g. lattice parameters a = 4 Å, 5 Å, 6 Å mean = (4+5+6)/3 = 5 Å deviation = -1, 0, 1; deviation squared = 1, 0, 1 sample variance σ2 = (1+0+1)/2 = 1 standard deviation σ = 1 Å Model Evaluation ∑!"# \$ 𝑒! 𝑛 ∑ !"# \$ (𝑒! )% 𝑛 𝑒! = 𝑦! − 𝑦! &'()!*+()
22. ### Model Training → Minimising Error Model weights w are adjusted

until a cost function (e.g. RMSE) is minimised Gradient descent is a popular choice: 𝑤! → 𝑤! - α " #\$\$%\$ "&! Learning rate Warning: local optimisation algorithms often miss global minima Parameter set w Error Step 0 3 1 2
23. ### Model Training → Minimising Error Model weights w are adjusted

until a cost function (e.g. RMSE) is minimised Gradient descent is a popular choice: 𝑤! → 𝑤! - α " '(()( "&! Learning rate Animation from https://github.com/jermwatt/machine_learning_refined w RMSE
24. ### Model Training → Minimising Error Optimisation algorithms have their own

parameters, e.g. step size and no. of iterations Animation from https://github.com/jermwatt/machine_learning_refined w Step number RMSE Learning rate (step size)
25. ### Correlation Coefficient (r) Describes the strength & direction of the

relationship between two variables r ∊ [-1,1] Positive: variables change in the same direction Zero: no relationship between the variables Negative: variables change in opposite directions *Outlined by Auguste Bravais (1844); https://bit.ly/3Kv75GJ Reminder: correlation does not imply causation Pearson correlation* 𝑟!" = ∑#\$% & (𝑥# − ̅ 𝑥)(𝑦# − * 𝑦) ∑#\$% & (𝑥# − ̅ 𝑥)' ∑#\$% & (𝑦# − ̅ 𝑥)'
26. ### Coefficient of Determination (r2) A measure of the goodness of

fit for a model. Describes how well known data is approximated r2 ∊ [0,1] Zero: baseline model with no variability that predicts 0.5: 50% of the variability in y is accounted for One: model matches observed values of y exactly S. Wright “Correlation and Causation”, J. Agri. Res. 20, 557 (1921) Note: a unitless metric. Alternative definitions are sometimes used 𝑟% = 1 − &&,-. &&/0/ % 𝒚 𝑟% = 1 − ∑123 4 (1)( 1 5,-617/-6 ! ∑ 123 4 ((1) + ()! 𝑟% = 1 − ∑123 4 -1 ! ∑123 4 ((1) + ()! Three equivalent definitions
27. ### Classification Metrics Confusion (or error) matrix provides a summary of

classification model performance K. Pearson “Mathematical Contributions to the Theory of Evolution” (1904) True positive (𝐓𝐏) False negative (𝐅𝐍) False positive (𝐅𝐏) True negative (𝐓𝐍) Actual class Predicted class + + - - 70 0 0 30 Perfect model to classify metals and insulators (N = 100) 66 4 8 22 My best model Accuracy = Correct/Total (70+30)/100 = 100 % (66+22)/100 = 88 %
28. ### Classification Metrics Confusion (or error) matrix provides a summary of

classification model performance K. Pearson “Mathematical Contributions to the Theory of Evolution” (1904) True positive (𝐓𝐏) False negative (𝐅𝐍) False positive (𝐅𝐏) True negative (𝐓𝐍) Actual class Predicted class + + - - 70 0 0 30 Perfect model to classify metals and insulators (N = 100) 66 4 8 22 My best model Sensitivity = TP/(TP+FN) 70/(70+0) = 100 % 66/(66+4) = 94 %
29. ### Quiz Fill in “?” for this confusion matrix Metal Insulator

Predicted class Actual class Insulator Metal Insulator 10 2 Metal ? 4
30. ### Class Outline Materials Learning Basics A. Terminology B. Evaluation metrics

C. Learning by example
31. ### Supervised Regression Model that maps an input to an output

based on example input-output pairs (labelled data) Regression predicts a continuous value, e.g. to extract a reaction rate 𝒚 = 𝑓(𝐱) + ε Target variable Learned function Error We’ll start to cover the model (function) details in Lecture 5
32. ### Regression Example Predict the dielectric constant of a crystal K.

Morita et al, J. Chem. Phys. 153, 024503 (2020) Support vector regression (r2=0.92) Note: outliers are often interesting cases (poor or exceptional data)
33. ### Regression Example Predict the dielectric constant of a crystal K.

Morita et al, J. Chem. Phys. 153, 024503 (2020); https://github.com/slundberg/shap SHAP (SHapley Additive exPlanations) analysis is a method for interpreting ML models Relative importance of input features in making predictions A positive SHAP indicates a feature contributes to an increase in the prediction
34. ### Regression Example Predict the dielectric constant of a crystal K.

Morita et al, J. Chem. Phys. 153, 024503 (2020); https://github.com/slundberg/shap CdCN2 Transparent breakdown of a predicted value Note: a physical connection is not implied (correlation, not causation)
35. ### Supervised Classification Model that maps an input to an output

based on example input-output pairs (labelled data) Classification predicts a category, e.g. decision trees for reaction outcomes 𝒚 = 𝑓(𝐱) Class label Classifier Assignment can be absolute or probabilistic (e.g. 90% apple, 10% pear)
36. ### Classification Example G. H. Gu et al, npj Computational Materials

8, 71 (2022) Predict if a material will be stable or unstable Crystal likeness (CL) Probabilistic score for the class label
37. ### Classification Example G. H. Gu et al, npj Computational Materials

8, 71 (2022) ABX3 perovskite crystals Textbook materials chemistry Neural network model Improved selectivity for promising compositions Likelihood of formation
38. ### Unsupervised Learning Model that can identify trends or correlations within

a dataset (unlabeled data) Clustering groups data by similarity, e.g. high-throughput crystallography 𝐱 → 𝑓(𝐱) Input data Transformation to new representation
39. ### Unsupervised Example Map the space of materials according to their

structural characteristics Y. Suzuki et al, Mach. Learn.: Sci. Tech. 3, 045034 (2022) t-SNE: “t-distributed stochastic neighbour embedding”
40. ### Unsupervised Example Map model errors across molecular space Shomik Verma

et al, J. Chem. Phys. 156, 134116 (2022) UMAP: “Uniform manifold approximation and projection for dimension reduction” Plots contain 3.5 million molecular datapoints from the PubChemQC molecular database
41. ### Reinforcement Learning Model that performs a series of actions by

trial and error to achieve an objective Maximise reward, e.g. reaction conditions to optimise yield Agent ⇌ Environment Automated experiments Actions Samples, conditions, etc.
42. ### Reinforcement Example Achieve the highest score or outcome (e.g. AlphaGo)

Image from https://towardsdatascience.com/how-to-teach-an-ai-to-play-games Untrained model Trained model Q-learning model
43. ### Reinforcement Example Design molecules with specific properties M. Popova et

al, Science Advances 4, eaap7885 (2018) Reward increase for drug design
44. ### Class Outcomes 1. Define machine learning 2. Describe the three

components of machine learning with examples 3. Explain the statistical metrics used to assess model performance Activity: Crystal hardness