Slide 1

Slide 1 text

Aron Walsh Department of Materials Centre for Processable Electronics Machine Learning for Materials

Slide 2

Slide 2 text

y = f(x) Learn f(x) from data; trust what generalises Key Concept #1 Machine learning is function approximation Output Input features Model

Slide 3

Slide 3 text

New Era of Materials Research A. Agrawal and A. Choudhary, APL Materials 4, 053208 (2016) The research toolkit for materials science now includes data-driven statistical models

Slide 4

Slide 4 text

Computer Revolution Keith Butler (now: STFC/SciML) Analytical Engine Automated calculations Charles Babbage (1837) “The science of operations has its own truth and value” Ada Lovelace (1840) Multiple two 20 digit numbers in ~3 minutes

Slide 5

Slide 5 text

Computer Revolution “System on a chip” microprocessor from https://www.apple.com

Slide 6

Slide 6 text

Computer Revolution “System on a chip” microprocessor from https://www.apple.com

Slide 7

Slide 7 text

Exascale Supercomputing Exascale computing refers to 1018 floating point operations per second; https://top500.org

Slide 8

Slide 8 text

Powerful Statistical Techniques Chris Hendon (now: University of Oregon) Keith Butler (now: STFC/SciML) Using GPT-5 via https://github.com/hwchase17/langchain Answers provided included transition metal oxides (V2 O5 ), Chevrel phases (Mo6 S8 ), Prussian blues (Fe4 [Fe(CN)6 ]3 )

Slide 9

Slide 9 text

Efficient Research Workflows J. P. Correa-Baena et al., Joule 2, 1410 (2018) Integration of computational techniques to accelerate discovery & development cycles

Slide 10

Slide 10 text

Module Contents 1. Introduction 2. Machine Learning Basics 3. Materials Data 4. Crystal Representations 5. Classical Learning 6. Deep Learning 7. Building a Model from Scratch 8. Accelerated Discovery 9. Generative Artificial Intelligence 10. Future Directions Dense module with time to self-study to explore concepts further

Slide 11

Slide 11 text

Class Outline Introduction A. Overview B. Expectations C. Assessments

Slide 12

Slide 12 text

What is Machine Learning (ML)? Statistical algorithms that learn from training data and build a model to make predictions Data types Materials features can be binary (e.g. stability), categorical (e.g. symmetry), integer (e.g. stoichiometry), continuous (e.g. rate) Learning types Unsupervised (identify patterns), supervised (use patterns), reinforcement (maximise reward)

Slide 13

Slide 13 text

What is Machine Learning (ML)? Statistical algorithms that identify and use patterns in multi-dimensional datasets Image from “How Machines Learn” by Helen Edwards

Slide 14

Slide 14 text

What is Machine Learning (ML)? Statistical algorithms that identify and use patterns in multi-dimensional datasets Images from https://vas3k.com/blog/machine_learning Predict a category, e.g. decision trees to predict reaction outcome Predict a value, e.g. regression to extract a reaction rate Group by similarity, e.g. high-throughput crystallography Maximise reward, e.g. reaction conditions to optimise yield

Slide 15

Slide 15 text

What is Machine Learning (ML)? Statistical algorithms that operate on multi-dimensional arrays of numerical data Image from http://karlstratos.com; note the physical definitions are more nuanced 7 8 3 1 7 2 3 4 8 6 7 8 9 [1 7] ⋯ [6 4] ⋮ ⋱ ⋮ [5 6] ⋯ [2 8] 𝑥 𝒙 𝒊 𝒙 𝒊𝒋 𝒙 𝒊𝒋𝒌

Slide 16

Slide 16 text

What is Machine Learning (ML)? Statistical algorithms that operate on multi-dimensional arrays of numerical data Image from “How Machines Learn” by Helen Edwards 𝑦 1 𝑦 2 𝑦 3 𝑥 11 𝑥 12 𝑥 13 𝑥 14 𝑥 15 𝑥 21 𝑥 22 𝑥 23 𝑥 24 𝑥 25 𝑥 31 𝑥 32 𝑥 33 𝑥 34 𝑥 35 𝑔1 𝑔2 𝑔3 𝑔4 𝑔5 = 3 1 matrix 3 5 matrix 5 1 matrix

Slide 17

Slide 17 text

ML ~ Function Approximation Image from https://github.com/jermwatt/machine_learning_refined Model selection, training, and testing tunes a “complexity dial” for your problem of interest Linear model Highly non-linear model Underfit regime Overfit regime

Slide 18

Slide 18 text

Image from https://vas3k.com/blog/machine_learning ML Model Map

Slide 19

Slide 19 text

A. L. Samuel, IBM Journal, 211 (1959) Brief History of ML Term coined by Arthur Samuel in 1959 “It is now possible to devise learning schemes which will greatly outperform an average person and that such learning schemes may eventually be economically feasible”

Slide 20

Slide 20 text

W. S. McCulloch and W. Pitts, Bull. Math. Biophys. 5, 115 (1943) Brief History of ML An artificial neuron had been proposed in 1943 “Every net, if furnished with a tape, scanners connected to afferents to perform the necessary motor-operations, can compute only such numbers as can a Turing machine”

Slide 21

Slide 21 text

A. M. Turing, Mind 236, 433 (1950) Brief History of ML In 1950, Alan Turing proposed a “Learning Machine” that could become intelligent “I PROPOSE to consider the question, Can machines think?”

Slide 22

Slide 22 text

ML in Materials R&D Growing field combining traditional industry, large technology companies, and start-ups

Slide 23

Slide 23 text

Source Material for Module ML content available from many sources, including blogs, research papers, repositories, and textbooks These slides are a skeleton, fleshed out with lectures, activities, and reading General Specialist

Slide 24

Slide 24 text

Class Outline Introduction A. Overview B. Expectations C. Assessments

Slide 25

Slide 25 text

Active Participation Your engagement is essential. This is a dense course with new concepts, Python coding, and self-study • Attend all lectures to hear the core content • Attend all practical sessions for hands-on coding • Attempt to solve problems yourself and ask course assistants if you need help

Slide 26

Slide 26 text

Creative Solutions There is great flexibility in programming with no unique solution for any given problem You may be interested in speed or clarity, but ultimately want a working code • Check package manuals, e.g. https://matplotlib.org & https://scikit-learn.org • Search https://stackexchange.com & https://github.com for ideas

Slide 27

Slide 27 text

Creative Solutions Many AI assistants for coding exist such as Github Copilot, GPT, Gemini • Most helpful when you know the basics first • Assistants can give poor suggestions with buggy code based on out-of-date libraries/functions • Not a substitute for hands-on coding experience and knowledge of materials

Slide 28

Slide 28 text

Irea Mosquera-Lois Mathilde Franckel Fintan Hardy Masahiro Negeishi Pan D. Amy Liu 2026 Module Assistants Senior GTA

Slide 29

Slide 29 text

Class Outline Introduction A. Overview B. Expectations C. Assessments

Slide 30

Slide 30 text

Module Assessment Aim for working knowledge of ML with practical sessions and coursework Computer labs (8 ⨉ 2%) Notebook submitted on Blackboard (Due by the end of each session – 15:45) Research assignment (84%) Assignment to complete (details after Lecture 9) Registration of absence or mitigation goes via the student office

Slide 31

Slide 31 text

Introductory Quiz http://menti.com Open on your phone, tablet or laptop