Slide 1

Slide 1 text

Synthon-GA Searching make-on-demand libraries with genetic algorithms Jan H. Jensen Department of Chemistry, University of Copenhagen @janhjensen 1 Casper Steinmann (Aalborg University)

Slide 2

Slide 2 text

Genetic Algorithms for Molecules Mating Chem. Sci. 2019 github.com/jensengroup/GB-GA Kill Unfit Molecules Molecule Unfit Molecule New Molecule Mate & Mutate Survivors

Slide 3

Slide 3 text

Fitness = docking scores (Minimizing Glide htvs_ds score) (Population = 400, 50 generations, 20 GA searches) Target GA ZINC Score/SA < -9.0 < -10.0 % SA < -9.0 < -10.0 % SA β2 AR 164 10 76% 86 1 84% DDR1 378 38 88% 199 8 82% DOI: 10.7717/peerj-pchem.18 SA: synthetic accessibility by

Slide 4

Slide 4 text

Organisers will purchase and test $10K worth of molecules from Enamine

Slide 5

Slide 5 text

How to use GA to search Enamine REAL Space? Synthons and Combination rules from Synt-On (formely SynthI) + Random choice Mutation Random choice Crossover

Slide 6

Slide 6 text

REAL Space is only a small fraction of possible genes “The REAL Space comprises 21 billion make-on-demand molecules and is currently the largest offer of commercially available compounds. The REAL compounds in the Space are assembled via more than 170 well-validated parallel synthesis protocols applied to over 112 000 qualified reagents and building blocks.” 129K reagents => 91K and 41K 1-synthons and 2-synthons + 24 possible reactions = 28 trillion genes 2-synthon 1-synthons

Slide 7

Slide 7 text

Workflow: Minimizing Glide XP score Population = 400, 100 generations 20 GA searches (8 million docking calculations) Random genes Synthon-GA Similarity Search Final populations Redock

Slide 8

Slide 8 text

GA-1 Molecules are too big

Slide 9

Slide 9 text

GA-2 MW < 350, logP < 3.5 Final pop 8000 → 90 unique molecules

Slide 10

Slide 10 text

GA-2 Are these 90 molecules available from Enamine? (No) If not what are the closest analogs and what are their docking score? GA-2: 90 mols Postera similarity search (1B?) (API, instantaenous) FTrees similarity search (~20 B) (command line, 4 min/mol) SmallWorld similarity search (2.5 B) (instantaneous, web GUI) 8,856 mols with sim > 0.2 90,000 mols (1000 per mol) 27,640 mols with sim > 0.2

Slide 11

Slide 11 text

SmallWorld server

Slide 12

Slide 12 text

Postera FTrees SmallWorld Random Example

Slide 13

Slide 13 text

Best Case Postera FTrees SmallWorld

Slide 14

Slide 14 text

GA-2 Redocking Are these 90 molecules available from Enamine? (No) If not what are the closest analogs and what are their docking score? GA-2: 90 mols Postera FTrees SmallWorld 8,856 mols with sim > 0.2 90,000 mols (1000 per mol) 27,640 mols Docking Score ≤ -8.5 MW ≤ 400 Nrot ≤ 7 No PAINS 17 mols 222 mols 65 mols 150 mols 304 mols

Slide 15

Slide 15 text

GA-2 Redocking Top Scores SmallWorld Score = -10.9 Score = -10.9 Score = -10.3 Score = -11.5 FTrees

Slide 16

Slide 16 text

GA-2 Redocking Top Scores SmallWorld Score = -10.9 Score = -10.9 Score = -10.3 Score = -11.5 FTrees Protonation state? Chirality? FTrees does not report on chirality Synt-On removes chirality Enamine sell some cmpds as racemates and some as pure

Slide 17

Slide 17 text

GA-2 Redocking Top Scores All possible chiralities plus reasonable protonation states 720 chiral/ prot mols 304 mols 150 short list Carteblanche The CACHE organisers suggested using Carteblance (CB) to check which chiral isomers are purchasable. CB can't find all the molecules that FTrees found For the ones CB finds we pick the enantiomer with the best score and update the catalog ID. For the rest we use the catalogue ID that FTree provided with unknown chirality (51) New 150 short list 149 short list (submitted) Check for duplicates Total price < $10K 77 short list (submitted)

Slide 18

Slide 18 text

Adjusted Workflow: Random genes Synthon-GA Similarity Search Final populations Redock Chiral synthons Remove duplicates from population Estimate protonation state Ftrees: What chiral isomers are in library?

Slide 19

Slide 19 text

Summary Use regular GA instead? Can identify molecules in with good docking score The molecules found by Synthon-GA are generally not similar to those available in library While the synthons are known the combination rules are proprietary Combination rules could probably be “reverse engineered” with ML Experimental verification is underway Regular GA penalises chiral molecules Are Synthon GA molecules easier to synthesize? Synthon-GA is not on GitHub yet. Contact me for access