Save 37% off PRO during our Black Friday Sale! »

Chemical Space Exploration

1b56a2e51fc81e3e92bdc3c412441af8?s=47 Jan Jensen
March 06, 2021

Chemical Space Exploration

Talk at U. New Brunswick 2021.03.05

1b56a2e51fc81e3e92bdc3c412441af8?s=128

Jan Jensen

March 06, 2021
Tweet

Transcript

  1. Chemical Space Exploration Jan H. Jensen University of Copenhagen The

    game of Go has 10170 possible positions, yet computers can now beat grandmasters. Can we use similar approaches for chemistry? Chemical Space 1060 possible molecules (1023 stars in the observable universe) 108 molecules made so far Almost all of chemical space is unexplored but how do we search such a large space? U. New Brunswick 2021.03.05
  2. GAN 2014, VAE 2013, RL 2013 VAE applied to molecules

    Oct. 2016
  3. The Fundamental Challenge 1060 106 100 1 AI? Recurrent NNs

    autocomplete for molecules Autoencoders molecules as vectors Genetic Algorithms Evolving new molecules
  4. “to be or not to be that is the question”

    27 characters and 39 positions 2739 = 6.7 x 1055 possible sentences Yet a genetic algorithm can consistently find the correct sentence by considering only 50,000 sentences How? A Simple Example from Shakespeare 1055 104 1 DOI: 10.7717/peerj-pchem.11
  5. to be or not to be that is the question

    ll hczcoanysflshfkeoomatsinswqm ld jpzn pssogzosqrnapy ywuwqakdvrs snibjoqmziwx ll hczcoanysflshf + keoomatsinswqm ld jpzn pssogzosqrnapy yw + uwqakdvrs snibjoqmziwx pssogzosqrnapy ywkeoomatsinswqm ld jpzn pssogzosqrnapy ywketomatsinswqm ld jpzn score = 1 score = 1 score = 2 Genetic Algorithm score = 3 Generate 100 random sequences Score sequences Pick pair of sequences based on score Mate/crossover Mutate Score Mate Mutate
  6. 1-(26/27)39 or 77% of the 6.7 x 1055 possible sequences

    have at least one character placed correctly 77% of sequences have score ≥ 1 Maria H. Rasmussen
  7. There are so many paths to the target that it

    is easy to find one by chance and follow it to the target 77% of sequences have score ≥ 1 Maria H. Rasmussen
  8. Need Additive and Semi-Continuous Scores

  9. Is it possible to find one specific molecule among 1060?

    Rediscovery Score = Tanimoto Similarity OH H2 N OH Tanimoto = 0.33
  10. Is it possible to find one specific molecule among 1060?

    Rediscovery Score = Tanimoto Similarity OH H2 N OH Tanimoto = 3 in common 9 total
  11. O HN O S O O OH Can we find

    Troglitazone? (55 unique fragments)
  12. CC1=C(O)C(C)=C2CCC(C)(COC3=CC=C(CC4SC(=O)NC4=O)C=C3)OC2=C1C So what’s the problem? String can easily be matched

    with GA, but … Scoring requires sequence to correspond to real molecule Most matings/mutations fail, i.e. many fewer paths *Starting population Tanimoto score between 0.23 - 0.32 Only one fragment not represented Rediscovery using SMILES fails, despite a lot of help* O HN O S O O OH CC(C + OC = CC(COC Emilie Henault
  13. Success Using Graph-Based Methods Molecules are more like crossword puzzles

    crossover Chem. Sci. 2019, J. Chem. Inf. Comput. Sci. 2004, JACS 2013 github.com/jensengroup/GB-GA Emilie Henault
  14. O HN O S O O OH O S O

    N N F F F NH2 O S O N S N N Some molecules are harder to find Missing fragments
  15. Finding Chromophores using Genetic Algorithms (molecules absorbing at 300-500 nm

    are removed from starting population) (Computed using xTB-STDA//MMFF, population = 20) score = λ-score + f -score Emilie Henault
  16. Finding Chromophores using Genetic Algorithms These molecules absorb strongly round

    400 nm Emilie Henault
  17. Docking using Genetic Algorithms More molecules with low (good) scores

    compared to HTVS Score modified to ensure synthetic accessibility (Minimizing Glide htvs_ds score) (Population = 400, 50 generations, 20 GA searches) Casper Steinmann (Aalborg U) Target GA ZINC Score/SA < -9.0 < -10.0 % SA < -9.0 < -10.0 % SA β2 AR 164 10 76% 86 1 84% DDR1 378 38 88% 199 8 82% DOI: 10.26434/chemrxiv.13525589.v2 SA: synthetic accessibility by
  18. Molecular Diversity very different molecules, same target (β2 AR) Docking

    score (Tanimoto score) [SA score]
  19. postera.ai/manifold Our top hit for β2 AR can be made

    in one step (according to Manifold)
  20. Is it possible to find 1 specific molecule among 1060?

    Yes, if the property of interest is cumulative function of structure most building blocks can be identified beforehand Because there are any many paths to the molecule Most properties of interest have many solutions, each with many paths
  21. Current Directions 1060 10x 100 1 1. How small can

    we make x? Smaller x = better scoring function & more applications e.g. catalysts 2. Combine GA and machine learning 3. Experimental validation
  22. Chemical Space Exploration Delivering the first “AlphaGo moment” in chemistry

    https://www.huffpost.com/entry/move-37-or-how-ai-can-change-the-world_b_58399703e4b0a79f7433b675 In Game Two of AlphaGo versus Lee Sedol in March 2016, the machine made a move no human would ever think of doing. “Move 37” was unimaginable in the more than three thousand year history of the game. By taking position on the “fifth line” AlphaGo pushed the boundaries of human intuition.