Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative Models Molecular Design, Elix, CBI 2020

October 28, 2020

Generative Models Molecular Design, Elix, CBI 2020


October 28, 2020

More Decks by Elix

Other Decks in Technology


  1. 2 Introduction (1) Discovering new molecules are fundamentally difficult, but

    computers are good at exploring repetitive tasks. polymers drugs Exploring the chemical landscape is tedious and meticulous. It is prohibitively expensive and not all discovered molecules can be synthesized.
  2. 3 Introduction (2) Enter Machine Learning Machine Learning methods are

    good at local optimization Generating Realistic Faces with StyleGAN Balancing & Predicting combustion reactions
  3. 4 Problem Definition Problem Statement De novo molecule-design is presented

    with a large search space. At such a large space prohibits exhaustive searching and navigation in de novo design process relies on the principle of local optimization. Initial Challenges • How to assemble the candidate compounds? • How to evaluate their potential quality • How to sample the search space effectively? Solution: Deep Generative Models
  4. 5 Generative Models for De Novo Molecule-Design (1) Generative Models

    What are they? The latent space of interest Gómez-Bombarelli et al. (2017) DECODER ENCODER The goal is optimize the latent search space to for generating new molecules Decoder-Encoder Architecture
  5. 6 Generative Models for De Novo Molecule-Design (2) Representation Learning

    for Molecule-Design Everything is a representation 2D Molecule Structure CC(=O)Nc1ccc(O)cc1 SMILES Encoding (Fingerprint/One-Hot-Encoding) ENCODER DECODER Automatic Differentiation (AutoGrad)
  6. 7 Different Encoder-Decoder Generative Models Graph Based Encoder-Decoder Graph Structure

    Adjacency Matrix Node Features Autoencoder Graph structures are more faithful to the representation of a 2D molecule
  7. 9 Generative Models for De Novo Molecule-Design (3) Client Requirement(s)

    Can we find new molecules with specific properties? Sure, we can constrain our generative models to preserve/constrain properties while searching for novel molecules Gómez-Bombarelli et al. (2017) Inclusion of constraints to restrict search space
  8. 10 Research Challenges • How can we build generative models

    that scales? ◦ Computationally expensive ◦ Limited to small molecules Gao et al. (2020) • How can we build generative models that are synthesizable? • Current models are limited to one-step generation (black-box) • How can we ensure validity and stability? ◦ Current models sometimes do not consider real-world constraints (i.e. price, yield etc.) ◦ Bias and variance problem
  9. 11 Current State of Generative Models (1) Schwalbe and Gómez-Bombarelli

    (2019) • Current state of work applies unsupervised machine learning techniques ◦ Can we explore other means of shrinking the chemical space? (semi-supervised learning, active learning, meta learning etc.) ◦ Can we explore ways to interpret generative models? (normalizing flows, disentangled VAE, etc.) • Current state of work are limited to simple representations (i.e. SMILES and Graphs). ◦ Can we explore more complicated dynamics? (i.e. positional information)
  10. 12 Current State of Generative Models (2) Bradshaw et al.

    (2020) • Simulating real-world experience of drug design is not straightforward ◦ Can we close the gap between virtual and real-world experiment? ◦ The feedback loop is not ideal • Theoretical understanding between the interpolation of latent space ◦ How complex should the latent space be?
  11. 13 Conclusion • At ELIX, we explore various ways to

    improve generative models depending on our Client’s needs. • We also conduct independent research to improve our in-house generative model’s platform. • Generative models for molecular design is an active area of research. ◦ Many fundamental problems are unexplored and requires further analysis and understanding • Most methods employed are data-driven, unsupervised methods. Occasionally, prior knowledge are encoded into the model. ◦ How much of prior information are necessary? (i.e., can we avoid templates altogether?)
  12. 14 References [1] Schwalbe-Koda, Daniel, and Rafael Gómez-Bombarelli. “Generative Models

    for Automatic Chemical Design.” ArXiv:1907.01632 [Physics, Stat], July 2, 2019. http://arxiv.org/abs/1907.01632. [2] Gao, Wenhao, and Connor W. Coley. “The Synthesizability of Molecules Proposed by Generative Models.” Journal of Chemical Information and Modeling, April 6, 2020. https://doi.org/10.1021/acs.jcim.0c00174. [3] Bradshaw, John, Brooks Paige, Matt J Kusner, Marwin Segler, and José Miguel Hernández-Lobato. “A Model to Search for Synthesizable Molecules.” In Advances in Neural Information Processing Systems 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, 7937–7949. Curran Associates, Inc., 2019. [4] Hamilton, William L., Rex Ying, and Jure Leskovec. “Inductive Representation Learning on Large Graphs.” In Advances in Neural Information Processing Systems, no. 1024–1034 (2017). https://arxiv.org/abs/1706.02216.