Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study, Elix, CBI 2022

Elix
October 26, 2022

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study, Elix, CBI 2022

Elix

October 26, 2022
Tweet

More Decks by Elix

Other Decks in Research

Transcript

  1. Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative

    Models: a Retrospective Case Study Nazim Medzhidov, Ph.D & Joshua Owoyemi, Ph.D Elix, Inc. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan | October 26th, 2022
  2. 2 Background • Challenges associated with the traditional drug discovery

    process have facilitated the application of machine learning approaches in this domain. • Generative AI approaches for molecular design are actively investigated. • Evaluating generative models in silico is challenging, confirmation requires experimental validation (expensive) • Majority evaluate model performance based on optimizing computable properties (logP, QED, SA score, etc.) • How to select generated candidates efficiently? Objectives ★ Design a scenario and a pipeline to evaluate generative models in silico: ◦ Hit-to-lead campaign ◦ Novel chemotype discovery ★ Test our Elix Discovery™ Platform ★ Candidate prioritization pipeline
  3. 3 Study Workflow AI Model Development Dataset Preparation Post-processing &

    Prioritization Result Analysis • Pre-training dataset: ◦ ChEMBL ◦ AChE inhibitors removed ◦ Target chemotype scaffold removed • Training Set: ◦ AChE inhibitors from ChEMBL • Elix Discovery™ Platform • Elix Predict: ◦ AChE inhibitory activity prediction model ◦ Blood Brain Barrier (BBB) Permeability prediction model • Elix Create: ◦ SmilesFormer Generative Model ◦ 10 sampling runs • 30K molecules generated in each of 10 sampling runs • Post-processing: ◦ Phys-Chem Filters (RO5) ◦ MCF filters ◦ Novelty ◦ BBB Permeability • Prioritization: ◦ QED score ◦ Predicted activity ◦ Binding affinity (docking) • Quality assessment: ◦ Target scaffold discovery ◦ Documented potent compound discovery • Short list of best 200 molecules from each run • Final short-list of 20 most frequently selected best compounds
  4. 5 Datasets: Acetylcholinesterase inhibitors ChEMBL (~2.2M) Training dataset (1076): A

    + B + C AChE Inhibitors with IC50 values (4,238) AChE inhibitors before 1992 (120) More recent molecules with same chemotypes present in A (847) A Pre-training dataset (~2.2M): Physostigmine Tacrine Rivastigmine B C Hit & hit expansion compounds (109) D Molecules containing piperazine, piperidine or indan (357) (Hidden) Established chemotypes before 1992 AChE inhibitors removed (15.5K mols) 48 mols with the scaffold removed B 1992 First appearance of donepezil chemotype in ChEMBL database A B C D (Hit compound) (Target chemotype) A
  5. 8 Elix Discovery™ Platform Generative Model • SmilesFormer ◦ Pre-trained

    on ChEMBL dataset without AChE inhibitors and target scaffold ◦ Trained on: datasets A + B + C (1076 samples) • Multiobjective Optimization Problem: • SA score • QED score • Favorable physical-chemical properties • Novelty (distance from the training set) • Activity Predictive Models • AChE inhibitory activity prediction model: ◦ GCN ◦ Trained on: datasets A + B + C (1076 samples) • Blood Brain Barrier (BBB) Permeability prediction model: ◦ GCN ◦ Trained on an in-house dataset (9059 samples)
  6. 10 Generation strategy and post-processing pipeline 30K mols / run

    Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Random Sampling One seed Group seed No seed 1 2 3 Filtering 5 6 ~4000 mols / run • RO5 • MCF • Novelty • BBB Permeability Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Prioritizing 5 6 • QED • Predicted activity • Binding affinity (docking) 200 mols / run Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Aggregation 20 most frequently selected candidates • Recommendation score: ◦ Consistency of selection ◦ Min = 1, Max = 10
  7. 12 Discovering reported potent scaffold and molecules Reported scaffold discovery

    success (number of runs) Reported potent compound discovery success (number of runs) No Seed 0 / 10 0 / 10 One Seed 4 / 10 5 / 10 Group Seed 9 / 10 9 / 10 D Molecules from hidden dataset D (target chemotype) containing represented substructures Reported potent scaffold A B Molecules rediscovered with One Seed setting Molecules rediscovered with Group Seed setting IC50 = 81 nM Rank = 8 IC50 = 6.7 nM Rank = 31 IC50 = 58 nM Rank = 107 IC50 = 94 nM Rank = 166 IC50 = 81 nM Rank = 56 IC50 = 30 nM Rank = 71 IC50 = 6.7 nM Rank = 124 IC50 = 94 nM Rank = 393 Random Sampling
  8. 13 Final 20 Candidates by Recommendation Score No seed One

    seed Legend: top 1% recommendation score (max = 10)
  9. 14 Conclusion • Designed a retrospective case study of novel

    chemotype discovery for generative models (quality assessment) • Tested our Elix Discovery™ platform in a hit-to-lead discovery campaign • Given an early hit compound, optimized the scaffold to a more complex diverse scaffolds including a reported potent indanone-piperidine scaffold • Multiple sampling runs and recommendations score analysis helped to focus on consistently top ranked candidates • Among the prioritized candidates, reported indanone-piperidine containing potent molecules were discovered • These molecules were included in the top 1% of the generated molecules • Final 20 top ranked candidates included at least one known potent AChE inhibitor • Potential presence of yet unknown potent compounds among final recommendations