Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study, Elix, CBI 2022

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative
Models: a Retrospective Case Study Nazim Medzhidov, Ph.D & Joshua Owoyemi, Ph.D Elix, Inc. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan | October 26th, 2022

2 Background • Challenges associated with the traditional drug discovery
process have facilitated the application of machine learning approaches in this domain. • Generative AI approaches for molecular design are actively investigated. • Evaluating generative models in silico is challenging, conﬁrmation requires experimental validation (expensive) • Majority evaluate model performance based on optimizing computable properties (logP, QED, SA score, etc.) • How to select generated candidates eﬃciently? Objectives ★ Design a scenario and a pipeline to evaluate generative models in silico: ◦ Hit-to-lead campaign ◦ Novel chemotype discovery ★ Test our Elix Discovery™ Platform ★ Candidate prioritization pipeline

3 Study Workﬂow AI Model Development Dataset Preparation Post-processing &
Prioritization Result Analysis • Pre-training dataset: ◦ ChEMBL ◦ AChE inhibitors removed ◦ Target chemotype scaffold removed • Training Set: ◦ AChE inhibitors from ChEMBL • Elix Discovery™ Platform • Elix Predict: ◦ AChE inhibitory activity prediction model ◦ Blood Brain Barrier (BBB) Permeability prediction model • Elix Create: ◦ SmilesFormer Generative Model ◦ 10 sampling runs • 30K molecules generated in each of 10 sampling runs • Post-processing: ◦ Phys-Chem Filters (RO5) ◦ MCF filters ◦ Novelty ◦ BBB Permeability • Prioritization: ◦ QED score ◦ Predicted activity ◦ Binding affinity (docking) • Quality assessment: ◦ Target scaffold discovery ◦ Documented potent compound discovery • Short list of best 200 molecules from each run • Final short-list of 20 most frequently selected best compounds

4 Dataset Preparation

5 Datasets: Acetylcholinesterase inhibitors ChEMBL (~2.2M) Training dataset (1076): A
+ B + C AChE Inhibitors with IC50 values (4,238) AChE inhibitors before 1992 (120) More recent molecules with same chemotypes present in A (847) A Pre-training dataset (~2.2M): Physostigmine Tacrine Rivastigmine B C Hit & hit expansion compounds (109) D Molecules containing piperazine, piperidine or indan (357) (Hidden) Established chemotypes before 1992 AChE inhibitors removed (15.5K mols) 48 mols with the scaffold removed B 1992 First appearance of donepezil chemotype in ChEMBL database A B C D (Hit compound) (Target chemotype) A

6 Datasets: Chemical Space Visualization Physostigmine Tacrine Rivastigmine Hit Compound
Target Chemotype

AI Model Development 7

8 Elix Discovery™ Platform Generative Model • SmilesFormer ◦ Pre-trained
on ChEMBL dataset without AChE inhibitors and target scaffold ◦ Trained on: datasets A + B + C (1076 samples) • Multiobjective Optimization Problem: • SA score • QED score • Favorable physical-chemical properties • Novelty (distance from the training set) • Activity Predictive Models • AChE inhibitory activity prediction model: ◦ GCN ◦ Trained on: datasets A + B + C (1076 samples) • Blood Brain Barrier (BBB) Permeability prediction model: ◦ GCN ◦ Trained on an in-house dataset (9059 samples)

Molecule Generation, Post Processing & Prioritization 9

10 Generation strategy and post-processing pipeline 30K mols / run
Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Random Sampling One seed Group seed No seed 1 2 3 Filtering 5 6 ~4000 mols / run • RO5 • MCF • Novelty • BBB Permeability Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Prioritizing 5 6 • QED • Predicted activity • Binding affinity (docking) 200 mols / run Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Aggregation 20 most frequently selected candidates • Recommendation score: ◦ Consistency of selection ◦ Min = 1, Max = 10

Results 11

12 Discovering reported potent scaffold and molecules Reported scaffold discovery
success (number of runs) Reported potent compound discovery success (number of runs) No Seed 0 / 10 0 / 10 One Seed 4 / 10 5 / 10 Group Seed 9 / 10 9 / 10 D Molecules from hidden dataset D (target chemotype) containing represented substructures Reported potent scaffold A B Molecules rediscovered with One Seed setting Molecules rediscovered with Group Seed setting IC50 = 81 nM Rank = 8 IC50 = 6.7 nM Rank = 31 IC50 = 58 nM Rank = 107 IC50 = 94 nM Rank = 166 IC50 = 81 nM Rank = 56 IC50 = 30 nM Rank = 71 IC50 = 6.7 nM Rank = 124 IC50 = 94 nM Rank = 393 Random Sampling

13 Final 20 Candidates by Recommendation Score No seed One
seed Legend: top 1% recommendation score (max = 10)

14 Conclusion • Designed a retrospective case study of novel
chemotype discovery for generative models (quality assessment) • Tested our Elix Discovery™ platform in a hit-to-lead discovery campaign • Given an early hit compound, optimized the scaffold to a more complex diverse scaffolds including a reported potent indanone-piperidine scaffold • Multiple sampling runs and recommendations score analysis helped to focus on consistently top ranked candidates • Among the prioritized candidates, reported indanone-piperidine containing potent molecules were discovered • These molecules were included in the top 1% of the generated molecules • Final 20 top ranked candidates included at least one known potent AChE inhibitor • Potential presence of yet unknown potent compounds among ﬁnal recommendations

www.elix-inc.com

APPENDIX 16

17 Final 20 candidates: One seed vs Group Seed Group
seed One seed

18 Chemical Space Visualization Physostigmine Tacrine Rivastigmine Hit Compound Target
Chemotype

Hit to Lead Discovery of Benzylpiperidine Acety...

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study, Elix, CBI 2022

Elix

More Decks by Elix

Other Decks in Research

Featured

Transcript