Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Materials (Lecture 8)

Aron Walsh
February 12, 2024

Machine Learning for Materials (Lecture 8)

Aron Walsh

February 12, 2024
Tweet

More Decks by Aron Walsh

Other Decks in Science

Transcript

  1. Aron Walsh
    Department of Materials
    Centre for Processable Electronics
    Machine Learning for Materials
    8. Recent Advances in AI

    View full-size slide

  2. Course Contents
    1. Course Introduction
    2. Materials Modelling
    3. Machine Learning Basics
    4. Materials Data and Representations
    5. Classical Learning
    6. Artificial Neural Networks
    7. Building a Model from Scratch
    8. Recent Advances in AI
    9. and 10. Research Challenge

    View full-size slide

  3. “A problem in artificial
    intelligence is one which is so
    complex that it cannot be
    solved using any normal
    algorithm”
    Hugh M. Cartwright, Applications of AI in Chemistry (1993)

    View full-size slide

  4. Class Outline
    Recent Advances in AI
    A. Large Language Models
    B. Closed-Loop Materials Discovery

    View full-size slide

  5. Natural Language Processing (NLP)
    Branch of AI that focuses on the interaction
    between computers and human language
    Easy
    Hard
    Spell checking
    Text classification
    Information extraction
    Question answering
    Conversational agent

    View full-size slide

  6. Natural Language Processing (NLP)
    Branch of AI that focuses on the interaction
    between computers and human language
    Image from https://github.com/practical-nlp

    View full-size slide

  7. Natural Language Processing (NLP)
    Four major building blocks of language
    Image from https://github.com/practical-nlp

    View full-size slide

  8. Natural Language Processing (NLP)
    Many statements are ambiguous and
    require context to be understood
    Let’s eat grandma?
    Essen wir Oma?
    我们吃奶奶的饭?
    おばあちゃんを食べようか?
    Mangeons grand-mère?
    할머니랑 같이 먹어요?
    Does the ambiguity of the English phrase translate? (image: DALL-E 3 model)

    View full-size slide

  9. What Happens Inside a Chatbot?
    From text prompt to text response via
    a large language model (LLM)
    “Write a
    Limerick about
    Imperial
    College
    London”
    Prompt
    “In Imperial College, bright minds convene,
    Where knowledge and innovation gleam.
    From White City to South Ken,
    A hub of brilliance, amen,
    Where dreams are born and discoveries esteem!”
    Response
    1. LLM interprets the user prompt (encoding: words to vector)
    2. LLM generates a response (decoding: vector to words)
    LLM
    Using GPT-3 via https://github.com/hwchase17/langchain

    View full-size slide

  10. Language Models
    Predictive text
    Using GPT3 via https://github.com/hwchase17/langchain
    I love materials because
    of
    they
    their
    shape
    are
    like
    Top words
    ranked by
    probability
    “Temperature” of the text choices
    Sampling the
    distribution
    of probabilities
    (“creativity”)
    I love materials because they ignite a symphony
    of vibrant colors, tantalizing textures, and
    wondrous possibilities that dance in the realms
    of imagination, transcending boundaries and
    embracing the sheer beauty of creation itself.
    I love materials because they are essential.
    strong
    essential
    beautiful

    View full-size slide

  11. Language Models
    Large refers to the size and capacity of the model.
    It must sample a literary combinatorial explosion
    104 common words in English
    108 two-word combinations
    1012 three-word combinations
    1016 four-word combinations
    Language must be represented numerically
    for machine learning models
    Token: discrete scalar representation of word (or subword)
    Embedding: continuous vector representation of tokens

    View full-size slide

  12. Text to Tokens
    Example: “ZnO is a wide bandgap semiconductor”
    GPT-3: https://platform.openai.com/tokenizer
    [57, 77, 46, 318, 257, 3094,
    4097, 43554, 39290, 40990]
    Token-IDs
    The model looks up 768 dimensional embedding vectors
    from the (contextual) embedding matrix

    View full-size slide

  13. Large Language Models
    Image from https://towardsdatascience.com
    Deep learning models trained to generate text
    e.g. BERT (370M, 2018), GPT-3 (175B, 2020)
    Recent models
    include:
    Llama2
    (Meta, 2023)
    Bard
    (Google, 2023)
    GPT-4
    (OpenAI, 2023)
    PanGu-Σ
    (Huawei, 2023)

    View full-size slide

  14. Large Language Models
    T. N. Brown et al, arXiv:2005.14165 (2020)
    GPT = “Generative Pre-trained Transformer”
    Generate
    new content
    Trained on a
    large dataset
    Deep learning
    architecture
    User
    Prompt
    Encode to a
    vector
    Transformer layers
    analyse relationship between
    vector components; generate
    transformed vector
    Decode to
    words
    Response
    Key components of a transformer layer
    Self-attention heads: smart focus on different parts of input
    Feed-forward neural network: capture non-linear relationships

    View full-size slide

  15. Large Language Models
    B. Geshkovski et al, arXiv:2312.10794 (2023); Image: https://pub.aimind.so
    Ongoing analysis into transformer architectures, e.g.
    “the structure of these interacting particle systems allows
    one to draw concrete connections to established topics in
    mathematics, including nonlinear transport equations”

    View full-size slide

  16. Large Language Models
    T. N. Brown et al, arXiv:2005.14165 (2020)
    Essential ingredients of GPT
    Diverse
    data
    Deep
    learning
    model
    Validation
    on tasks

    View full-size slide

  17. Large Language Models
    What are the potential drawbacks and
    limitations of LLMs such as GPT?
    • Training data, e.g. not up to date, strong bias
    • Context tracking, e.g. limited short-term memory
    • Hallucination, e.g. generate false information
    • Ownership, e.g. fair use of training data
    • Ethics, e.g. appear human generated

    View full-size slide

  18. LLMs for Materials
    Many possibilities, e.g. read a textbook and ask
    technical questions about the content
    “The Future of Chemistry is Language” A. D. White, Nat. Rev. Chem. 7, 457 (2023)

    View full-size slide

  19. LLMs for Materials
    Language models tailored to be fact-based with
    clear context. Applied to one of my review papers
    https://github.com/whitead/paper-qa

    View full-size slide

  20. LLMs for Materials
    L. M. Antunes et al, arXiv 2307.04340 (2023); https://crystallm.com
    CrystaLLM: learn to write valid crystallographic
    information files (cifs) and generate new structures

    View full-size slide

  21. LLMs for Materials
    CrystaLLM: learn to write valid crystallographic
    information files (cifs) and generate new structures
    Training set 2.2 million cifs
    Validation set 35,000 cifs
    Test set 10,000 cifs
    Tokenisation: space group symbols, element
    symbols, numeric digits. 768 million training
    tokens for a deep learning model with
    25 million parameters
    L. M. Antunes et al, arXiv 2307.04340 (2023); https://crystallm.com

    View full-size slide

  22. LLMs for Materials
    Integrate a large language model into
    scientific research workflows
    Daniil A. Boiko et al, Nature 624, 570 (2023)

    View full-size slide

  23. Class Outline
    Recent Advances in AI
    A. Large Language Models
    B. Closed-Loop Materials Discovery

    View full-size slide

  24. Accelerate Scientific Discovery
    Research can be broken down into a set of
    core tasks that can each benefit from acceleration
    H. S. Stein and J. M. Gregoire, Chem. Sci. 10, 9640 (2019)
    Traditional
    research
    workflow

    View full-size slide

  25. Potential
    for
    speedup
    Accelerate Scientific Discovery
    H. S. Stein and J. M. Gregoire, Chem. Sci. 10, 9640 (2019)
    Research can be broken down into a set of
    core tasks that can each benefit from acceleration

    View full-size slide

  26. Accelerate Scientific Discovery
    Workflow classification of published studies
    H. S. Stein and J. M. Gregoire, Chem. Sci. 10, 9640 (2019)

    View full-size slide

  27. Automation and Robotics
    Execution of physical tasks to achieve a target
    using autonomous or collaborative robots
    Industrial revolutions from https://transportgeography.org

    View full-size slide

  28. Automation and Robotics
    Robots can be tailored for a wide range of
    materials synthesis and characterisation tasks
    B. P. MacLeod et al, Science Advances 6, eaaz8867 (2020)

    View full-size slide

  29. Automation and Robotics
    Self-driving labs (SDL) are now operating
    N. J. Szymanski et al, Nature 624, 86 (2023)
    A-Lab

    View full-size slide

  30. Automation and Robotics
    Robots can be equipped with sensors and artificial
    intelligence to interact with their environment
    S. Eppel et al, ACS Central Science 6, 1743 (2020)
    Adapting computer
    vision models for
    laboratory settings
    GT = ground truth
    Pred = predicted

    View full-size slide

  31. Automation and Robotics
    Robots can be equipped with sensors and artificial
    intelligence to interact with their environment
    https://www.youtube.com/watch?v=K7I2QJcIyBQ

    View full-size slide

  32. Automation and Robotics
    Automation platforms designed to deliver complex
    research workflows (fixed platform or mobile)
    Catalysis workflow from https://www.chemspeed.com
    Digifab is a dedicated institute within ICL
    https://www.imperial.ac.uk/digital-molecular-
    design-and-fabrication/
    Usually a mix of proprietary code,
    with GUI and Python API for user control

    View full-size slide

  33. Optimisation
    Algorithms to efficiently achieve a desired
    research objective. Considerations:
    Objective function (O): Materials properties
    or device performance criteria, e.g. battery lifetime
    Parameter selection: Variables that can be
    controlled, e.g. temperature, pressure, composition
    Data acquisition: How the data is collected,
    e.g. instruments, measurements, automation

    View full-size slide

  34. Optimisation Algorithms
    Local optimisation – find the best solution in
    a limited region of the parameter space (x)
    Gradient based: iterate in the direction of the steepest
    gradient (dO/dx), e.g. gradient descent
    Hessian based: use information from the second
    derivatives (d2O/dx2), e.g. quasi-Newton
    O
    x
    x1
    xn
    Local minimum
    The same concepts were discussed for ML model training

    View full-size slide

  35. Optimisation Algorithms
    Global optimisation – find the best solution
    from across the entire parameter space
    Numerical: iterative techniques to explore parameter
    space, e.g. downhill simplex, simulated annealing
    Probabilistic: incorporate probability distributions,
    e.g. Markov chain Monte Carlo, Bayesian optimisation
    O
    x
    The same concepts were discussed for ML model training
    Global minimum
    xn
    x1

    View full-size slide

  36. Bayesian Optimisation (BO)
    BO can use prior (measured or simulated) data to
    decide which experiment to perform next
    Probabilistic (Surrogate) Model
    Approximation of the true objective function
    O(x) ~ f(x), e.g. Gaussian process, GP(x,x')
    Acquisition Function
    Selection of the next sample point, e.g.
    Upper confidence bound, UCB(x') = μ(x') + κσ(x’)
    known
    J. Močkus, Optimisation Techniques 1, 400 (1974)
    new
    mean
    prediction
    exploration
    term
    (parameters to sample)

    View full-size slide

  37. Bayesian Optimisation (BO)
    Y. Wu, A. Walsh, A. M. Ganose, ChemRxiv (2023)
    BO can use prior (measured or simulated) data to
    decide which experiment to perform next

    View full-size slide

  38. Bayesian Optimisation (BO)
    Application to maximise electrical conductivity
    of a composite (P3HT-CNT) thin-film
    D. Bash et al, Adv. Funct. Mater. 31, 2102606 (2021)

    View full-size slide

  39. Bayesian Optimisation (BO)
    D. Bash et al, Adv. Funct. Mater. 31, 2102606 (2021)
    Application to maximise electrical conductivity
    of a composite (P3HT-CNT) thin-film

    View full-size slide

  40. Active Learning (AL)
    BO: find inputs that maximise the objective function
    AL: find inputs that enhance model performance
    Epistemic
    uncertainty*
    Posterior
    samples
    Target unknown regions with the largest uncertainty
    Gaussian process: f(x) ~ GP(μ(x), k(x,x’))
    mean
    function
    Gaussian kernel
    function
    * Reducible uncertainty associated with lack of information

    View full-size slide

  41. Integrated Research Workflows
    Feedback loop between optimisation model
    and automated experiments
    NIMS-OS: R. Tamura, K. Tsuda, S. Matsuda, arXiv:2304.13927 (2023)

    View full-size slide

  42. Integrated Research Workflows
    Feedback loop between optimisation model
    and automated experiments
    NIMS-OS: R. Tamura, K. Tsuda, S. Matsuda, arXiv:2304.13927 (2023)

    View full-size slide

  43. Obstacles to Closed Loop Discovery
    • Materials complexity (complex structures,
    compositions, processing sensitivity)
    • Data quality and reliability (errors and
    inconsistencies that waste resources)
    • Cost of automation (major investment
    required in infrastructure and training)
    • Adaptability (systems and workflows may
    be difficult to reconfigure for new problems)

    View full-size slide

  44. Class Outcomes
    1. Explain the foundations of
    large language models
    2. Assess the impact of AI on materials
    research and discovery
    3. Discuss potential biases and ethical
    considerations for these applications
    Activity:
    Closed-loop optimisation

    View full-size slide