Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hit to Lead Discovery of Benzylpiperidine Acety...

October 26, 2022

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study, Elix, CBI 2022


October 26, 2022

More Decks by Elix

Other Decks in Research


  1. Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative

    Models: a Retrospective Case Study Nazim Medzhidov, Ph.D & Joshua Owoyemi, Ph.D Elix, Inc. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan | October 26th, 2022
  2. 2 Background • Challenges associated with the traditional drug discovery

    process have facilitated the application of machine learning approaches in this domain. • Generative AI approaches for molecular design are actively investigated. • Evaluating generative models in silico is challenging, confirmation requires experimental validation (expensive) • Majority evaluate model performance based on optimizing computable properties (logP, QED, SA score, etc.) • How to select generated candidates efficiently? Objectives ★ Design a scenario and a pipeline to evaluate generative models in silico: ◦ Hit-to-lead campaign ◦ Novel chemotype discovery ★ Test our Elix Discovery™ Platform ★ Candidate prioritization pipeline
  3. 3 Study Workflow AI Model Development Dataset Preparation Post-processing &

    Prioritization Result Analysis • Pre-training dataset: ◦ ChEMBL ◦ AChE inhibitors removed ◦ Target chemotype scaffold removed • Training Set: ◦ AChE inhibitors from ChEMBL • Elix Discovery™ Platform • Elix Predict: ◦ AChE inhibitory activity prediction model ◦ Blood Brain Barrier (BBB) Permeability prediction model • Elix Create: ◦ SmilesFormer Generative Model ◦ 10 sampling runs • 30K molecules generated in each of 10 sampling runs • Post-processing: ◦ Phys-Chem Filters (RO5) ◦ MCF filters ◦ Novelty ◦ BBB Permeability • Prioritization: ◦ QED score ◦ Predicted activity ◦ Binding affinity (docking) • Quality assessment: ◦ Target scaffold discovery ◦ Documented potent compound discovery • Short list of best 200 molecules from each run • Final short-list of 20 most frequently selected best compounds
  4. 5 Datasets: Acetylcholinesterase inhibitors ChEMBL (~2.2M) Training dataset (1076): A

    + B + C AChE Inhibitors with IC50 values (4,238) AChE inhibitors before 1992 (120) More recent molecules with same chemotypes present in A (847) A Pre-training dataset (~2.2M): Physostigmine Tacrine Rivastigmine B C Hit & hit expansion compounds (109) D Molecules containing piperazine, piperidine or indan (357) (Hidden) Established chemotypes before 1992 AChE inhibitors removed (15.5K mols) 48 mols with the scaffold removed B 1992 First appearance of donepezil chemotype in ChEMBL database A B C D (Hit compound) (Target chemotype) A
  5. 8 Elix Discovery™ Platform Generative Model • SmilesFormer ◦ Pre-trained

    on ChEMBL dataset without AChE inhibitors and target scaffold ◦ Trained on: datasets A + B + C (1076 samples) • Multiobjective Optimization Problem: • SA score • QED score • Favorable physical-chemical properties • Novelty (distance from the training set) • Activity Predictive Models • AChE inhibitory activity prediction model: ◦ GCN ◦ Trained on: datasets A + B + C (1076 samples) • Blood Brain Barrier (BBB) Permeability prediction model: ◦ GCN ◦ Trained on an in-house dataset (9059 samples)
  6. 10 Generation strategy and post-processing pipeline 30K mols / run

    Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Random Sampling One seed Group seed No seed 1 2 3 Filtering 5 6 ~4000 mols / run • RO5 • MCF • Novelty • BBB Permeability Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Prioritizing 5 6 • QED • Predicted activity • Binding affinity (docking) 200 mols / run Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Aggregation 20 most frequently selected candidates • Recommendation score: ◦ Consistency of selection ◦ Min = 1, Max = 10
  7. 12 Discovering reported potent scaffold and molecules Reported scaffold discovery

    success (number of runs) Reported potent compound discovery success (number of runs) No Seed 0 / 10 0 / 10 One Seed 4 / 10 5 / 10 Group Seed 9 / 10 9 / 10 D Molecules from hidden dataset D (target chemotype) containing represented substructures Reported potent scaffold A B Molecules rediscovered with One Seed setting Molecules rediscovered with Group Seed setting IC50 = 81 nM Rank = 8 IC50 = 6.7 nM Rank = 31 IC50 = 58 nM Rank = 107 IC50 = 94 nM Rank = 166 IC50 = 81 nM Rank = 56 IC50 = 30 nM Rank = 71 IC50 = 6.7 nM Rank = 124 IC50 = 94 nM Rank = 393 Random Sampling
  8. 13 Final 20 Candidates by Recommendation Score No seed One

    seed Legend: top 1% recommendation score (max = 10)
  9. 14 Conclusion • Designed a retrospective case study of novel

    chemotype discovery for generative models (quality assessment) • Tested our Elix Discovery™ platform in a hit-to-lead discovery campaign • Given an early hit compound, optimized the scaffold to a more complex diverse scaffolds including a reported potent indanone-piperidine scaffold • Multiple sampling runs and recommendations score analysis helped to focus on consistently top ranked candidates • Among the prioritized candidates, reported indanone-piperidine containing potent molecules were discovered • These molecules were included in the top 1% of the generated molecules • Final 20 top ranked candidates included at least one known potent AChE inhibitor • Potential presence of yet unknown potent compounds among final recommendations