Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chemical Space Exploration

Jan Jensen
March 06, 2021

Chemical Space Exploration

Talk at U. New Brunswick 2021.03.05

Jan Jensen

March 06, 2021
Tweet

More Decks by Jan Jensen

Other Decks in Science

Transcript

  1. Chemical Space Exploration Jan H. Jensen University of Copenhagen The

    game of Go has 10170 possible positions, yet computers can now beat grandmasters. Can we use similar approaches for chemistry? Chemical Space 1060 possible molecules (1023 stars in the observable universe) 108 molecules made so far Almost all of chemical space is unexplored but how do we search such a large space? U. New Brunswick 2021.03.05
  2. The Fundamental Challenge 1060 106 100 1 AI? Recurrent NNs

    autocomplete for molecules Autoencoders molecules as vectors Genetic Algorithms Evolving new molecules
  3. “to be or not to be that is the question”

    27 characters and 39 positions 2739 = 6.7 x 1055 possible sentences Yet a genetic algorithm can consistently find the correct sentence by considering only 50,000 sentences How? A Simple Example from Shakespeare 1055 104 1 DOI: 10.7717/peerj-pchem.11
  4. to be or not to be that is the question

    ll hczcoanysflshfkeoomatsinswqm ld jpzn pssogzosqrnapy ywuwqakdvrs snibjoqmziwx ll hczcoanysflshf + keoomatsinswqm ld jpzn pssogzosqrnapy yw + uwqakdvrs snibjoqmziwx pssogzosqrnapy ywkeoomatsinswqm ld jpzn pssogzosqrnapy ywketomatsinswqm ld jpzn score = 1 score = 1 score = 2 Genetic Algorithm score = 3 Generate 100 random sequences Score sequences Pick pair of sequences based on score Mate/crossover Mutate Score Mate Mutate
  5. 1-(26/27)39 or 77% of the 6.7 x 1055 possible sequences

    have at least one character placed correctly 77% of sequences have score ≥ 1 Maria H. Rasmussen
  6. There are so many paths to the target that it

    is easy to find one by chance and follow it to the target 77% of sequences have score ≥ 1 Maria H. Rasmussen
  7. Is it possible to find one specific molecule among 1060?

    Rediscovery Score = Tanimoto Similarity OH H2 N OH Tanimoto = 0.33
  8. Is it possible to find one specific molecule among 1060?

    Rediscovery Score = Tanimoto Similarity OH H2 N OH Tanimoto = 3 in common 9 total
  9. O HN O S O O OH Can we find

    Troglitazone? (55 unique fragments)
  10. CC1=C(O)C(C)=C2CCC(C)(COC3=CC=C(CC4SC(=O)NC4=O)C=C3)OC2=C1C So what’s the problem? String can easily be matched

    with GA, but … Scoring requires sequence to correspond to real molecule Most matings/mutations fail, i.e. many fewer paths *Starting population Tanimoto score between 0.23 - 0.32 Only one fragment not represented Rediscovery using SMILES fails, despite a lot of help* O HN O S O O OH CC(C + OC = CC(COC Emilie Henault
  11. Success Using Graph-Based Methods Molecules are more like crossword puzzles

    crossover Chem. Sci. 2019, J. Chem. Inf. Comput. Sci. 2004, JACS 2013 github.com/jensengroup/GB-GA Emilie Henault
  12. O HN O S O O OH O S O

    N N F F F NH2 O S O N S N N Some molecules are harder to find Missing fragments
  13. Finding Chromophores using Genetic Algorithms (molecules absorbing at 300-500 nm

    are removed from starting population) (Computed using xTB-STDA//MMFF, population = 20) score = λ-score + f -score Emilie Henault
  14. Docking using Genetic Algorithms More molecules with low (good) scores

    compared to HTVS Score modified to ensure synthetic accessibility (Minimizing Glide htvs_ds score) (Population = 400, 50 generations, 20 GA searches) Casper Steinmann (Aalborg U) Target GA ZINC Score/SA < -9.0 < -10.0 % SA < -9.0 < -10.0 % SA β2 AR 164 10 76% 86 1 84% DDR1 378 38 88% 199 8 82% DOI: 10.26434/chemrxiv.13525589.v2 SA: synthetic accessibility by
  15. postera.ai/manifold Our top hit for β2 AR can be made

    in one step (according to Manifold)
  16. Is it possible to find 1 specific molecule among 1060?

    Yes, if the property of interest is cumulative function of structure most building blocks can be identified beforehand Because there are any many paths to the molecule Most properties of interest have many solutions, each with many paths
  17. Current Directions 1060 10x 100 1 1. How small can

    we make x? Smaller x = better scoring function & more applications e.g. catalysts 2. Combine GA and machine learning 3. Experimental validation
  18. Chemical Space Exploration Delivering the first “AlphaGo moment” in chemistry

    https://www.huffpost.com/entry/move-37-or-how-ai-can-change-the-world_b_58399703e4b0a79f7433b675 In Game Two of AlphaGo versus Lee Sedol in March 2016, the machine made a move no human would ever think of doing. “Move 37” was unimaginable in the more than three thousand year history of the game. By taking position on the “fifth line” AlphaGo pushed the boundaries of human intuition.