Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Materials (Lecture 8)

Aron Walsh
February 12, 2024

Machine Learning for Materials (Lecture 8)

Aron Walsh

February 12, 2024
Tweet

More Decks by Aron Walsh

Other Decks in Science

Transcript

  1. Course Contents 1. Course Introduction 2. Materials Modelling 3. Machine

    Learning Basics 4. Materials Data and Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Recent Advances in AI 9. and 10. Research Challenge
  2. “A problem in artificial intelligence is one which is so

    complex that it cannot be solved using any normal algorithm” Hugh M. Cartwright, Applications of AI in Chemistry (1993)
  3. Natural Language Processing (NLP) Branch of AI that focuses on

    the interaction between computers and human language Easy Hard Spell checking Text classification Information extraction Question answering Conversational agent
  4. Natural Language Processing (NLP) Branch of AI that focuses on

    the interaction between computers and human language Image from https://github.com/practical-nlp
  5. Natural Language Processing (NLP) Many statements are ambiguous and require

    context to be understood Let’s eat grandma? Essen wir Oma? 我们吃奶奶的饭? おばあちゃんを食べようか? Mangeons grand-mère? 할머니랑 같이 먹어요? Does the ambiguity of the English phrase translate? (image: DALL-E 3 model)
  6. What Happens Inside a Chatbot? From text prompt to text

    response via a large language model (LLM) “Write a Limerick about Imperial College London” Prompt “In Imperial College, bright minds convene, Where knowledge and innovation gleam. From White City to South Ken, A hub of brilliance, amen, Where dreams are born and discoveries esteem!” Response 1. LLM interprets the user prompt (encoding: words to vector) 2. LLM generates a response (decoding: vector to words) LLM Using GPT-3 via https://github.com/hwchase17/langchain
  7. Language Models Predictive text Using GPT3 via https://github.com/hwchase17/langchain I love

    materials because of they their shape are like Top words ranked by probability “Temperature” of the text choices Sampling the distribution of probabilities (“creativity”) I love materials because they ignite a symphony of vibrant colors, tantalizing textures, and wondrous possibilities that dance in the realms of imagination, transcending boundaries and embracing the sheer beauty of creation itself. I love materials because they are essential. strong essential beautiful
  8. Language Models Large refers to the size and capacity of

    the model. It must sample a literary combinatorial explosion 104 common words in English 108 two-word combinations 1012 three-word combinations 1016 four-word combinations Language must be represented numerically for machine learning models Token: discrete scalar representation of word (or subword) Embedding: continuous vector representation of tokens
  9. Text to Tokens Example: “ZnO is a wide bandgap semiconductor”

    GPT-3: https://platform.openai.com/tokenizer [57, 77, 46, 318, 257, 3094, 4097, 43554, 39290, 40990] Token-IDs The model looks up 768 dimensional embedding vectors from the (contextual) embedding matrix
  10. Large Language Models Image from https://towardsdatascience.com Deep learning models trained

    to generate text e.g. BERT (370M, 2018), GPT-3 (175B, 2020) Recent models include: Llama2 (Meta, 2023) Bard (Google, 2023) GPT-4 (OpenAI, 2023) PanGu-Σ (Huawei, 2023)
  11. Large Language Models T. N. Brown et al, arXiv:2005.14165 (2020)

    GPT = “Generative Pre-trained Transformer” Generate new content Trained on a large dataset Deep learning architecture User Prompt Encode to a vector Transformer layers analyse relationship between vector components; generate transformed vector Decode to words Response Key components of a transformer layer Self-attention heads: smart focus on different parts of input Feed-forward neural network: capture non-linear relationships
  12. Large Language Models B. Geshkovski et al, arXiv:2312.10794 (2023); Image:

    https://pub.aimind.so Ongoing analysis into transformer architectures, e.g. “the structure of these interacting particle systems allows one to draw concrete connections to established topics in mathematics, including nonlinear transport equations”
  13. Large Language Models T. N. Brown et al, arXiv:2005.14165 (2020)

    Essential ingredients of GPT Diverse data Deep learning model Validation on tasks
  14. Large Language Models What are the potential drawbacks and limitations

    of LLMs such as GPT? • Training data, e.g. not up to date, strong bias • Context tracking, e.g. limited short-term memory • Hallucination, e.g. generate false information • Ownership, e.g. fair use of training data • Ethics, e.g. appear human generated
  15. LLMs for Materials Many possibilities, e.g. read a textbook and

    ask technical questions about the content “The Future of Chemistry is Language” A. D. White, Nat. Rev. Chem. 7, 457 (2023)
  16. LLMs for Materials Language models tailored to be fact-based with

    clear context. Applied to one of my review papers https://github.com/whitead/paper-qa
  17. LLMs for Materials L. M. Antunes et al, arXiv 2307.04340

    (2023); https://crystallm.com CrystaLLM: learn to write valid crystallographic information files (cifs) and generate new structures
  18. LLMs for Materials CrystaLLM: learn to write valid crystallographic information

    files (cifs) and generate new structures Training set 2.2 million cifs Validation set 35,000 cifs Test set 10,000 cifs Tokenisation: space group symbols, element symbols, numeric digits. 768 million training tokens for a deep learning model with 25 million parameters L. M. Antunes et al, arXiv 2307.04340 (2023); https://crystallm.com
  19. LLMs for Materials Integrate a large language model into scientific

    research workflows Daniil A. Boiko et al, Nature 624, 570 (2023)
  20. Accelerate Scientific Discovery Research can be broken down into a

    set of core tasks that can each benefit from acceleration H. S. Stein and J. M. Gregoire, Chem. Sci. 10, 9640 (2019) Traditional research workflow
  21. Potential for speedup Accelerate Scientific Discovery H. S. Stein and

    J. M. Gregoire, Chem. Sci. 10, 9640 (2019) Research can be broken down into a set of core tasks that can each benefit from acceleration
  22. Accelerate Scientific Discovery Workflow classification of published studies H. S.

    Stein and J. M. Gregoire, Chem. Sci. 10, 9640 (2019)
  23. Automation and Robotics Execution of physical tasks to achieve a

    target using autonomous or collaborative robots Industrial revolutions from https://transportgeography.org
  24. Automation and Robotics Robots can be tailored for a wide

    range of materials synthesis and characterisation tasks B. P. MacLeod et al, Science Advances 6, eaaz8867 (2020)
  25. Automation and Robotics Self-driving labs (SDL) are now operating N.

    J. Szymanski et al, Nature 624, 86 (2023) A-Lab
  26. Automation and Robotics Robots can be equipped with sensors and

    artificial intelligence to interact with their environment S. Eppel et al, ACS Central Science 6, 1743 (2020) Adapting computer vision models for laboratory settings GT = ground truth Pred = predicted
  27. Automation and Robotics Robots can be equipped with sensors and

    artificial intelligence to interact with their environment https://www.youtube.com/watch?v=K7I2QJcIyBQ
  28. Automation and Robotics Automation platforms designed to deliver complex research

    workflows (fixed platform or mobile) Catalysis workflow from https://www.chemspeed.com Digifab is a dedicated institute within ICL https://www.imperial.ac.uk/digital-molecular- design-and-fabrication/ Usually a mix of proprietary code, with GUI and Python API for user control
  29. Optimisation Algorithms to efficiently achieve a desired research objective. Considerations:

    Objective function (O): Materials properties or device performance criteria, e.g. battery lifetime Parameter selection: Variables that can be controlled, e.g. temperature, pressure, composition Data acquisition: How the data is collected, e.g. instruments, measurements, automation
  30. Optimisation Algorithms Local optimisation – find the best solution in

    a limited region of the parameter space (x) Gradient based: iterate in the direction of the steepest gradient (dO/dx), e.g. gradient descent Hessian based: use information from the second derivatives (d2O/dx2), e.g. quasi-Newton O x x1 xn Local minimum The same concepts were discussed for ML model training
  31. Optimisation Algorithms Global optimisation – find the best solution from

    across the entire parameter space Numerical: iterative techniques to explore parameter space, e.g. downhill simplex, simulated annealing Probabilistic: incorporate probability distributions, e.g. Markov chain Monte Carlo, Bayesian optimisation O x The same concepts were discussed for ML model training Global minimum xn x1
  32. Bayesian Optimisation (BO) BO can use prior (measured or simulated)

    data to decide which experiment to perform next Probabilistic (Surrogate) Model Approximation of the true objective function O(x) ~ f(x), e.g. Gaussian process, GP(x,x') Acquisition Function Selection of the next sample point, e.g. Upper confidence bound, UCB(x') = μ(x') + κσ(x’) known J. Močkus, Optimisation Techniques 1, 400 (1974) new mean prediction exploration term (parameters to sample)
  33. Bayesian Optimisation (BO) Y. Wu, A. Walsh, A. M. Ganose,

    ChemRxiv (2023) BO can use prior (measured or simulated) data to decide which experiment to perform next
  34. Bayesian Optimisation (BO) Application to maximise electrical conductivity of a

    composite (P3HT-CNT) thin-film D. Bash et al, Adv. Funct. Mater. 31, 2102606 (2021)
  35. Bayesian Optimisation (BO) D. Bash et al, Adv. Funct. Mater.

    31, 2102606 (2021) Application to maximise electrical conductivity of a composite (P3HT-CNT) thin-film
  36. Active Learning (AL) BO: find inputs that maximise the objective

    function AL: find inputs that enhance model performance Epistemic uncertainty* Posterior samples Target unknown regions with the largest uncertainty Gaussian process: f(x) ~ GP(μ(x), k(x,x’)) mean function Gaussian kernel function * Reducible uncertainty associated with lack of information
  37. Integrated Research Workflows Feedback loop between optimisation model and automated

    experiments NIMS-OS: R. Tamura, K. Tsuda, S. Matsuda, arXiv:2304.13927 (2023)
  38. Integrated Research Workflows Feedback loop between optimisation model and automated

    experiments NIMS-OS: R. Tamura, K. Tsuda, S. Matsuda, arXiv:2304.13927 (2023)
  39. Obstacles to Closed Loop Discovery • Materials complexity (complex structures,

    compositions, processing sensitivity) • Data quality and reliability (errors and inconsistencies that waste resources) • Cost of automation (major investment required in infrastructure and training) • Adaptability (systems and workflows may be difficult to reconfigure for new problems)
  40. Class Outcomes 1. Explain the foundations of large language models

    2. Assess the impact of AI on materials research and discovery 3. Discuss potential biases and ethical considerations for these applications Activity: Closed-loop optimisation