Slide 1

Slide 1 text

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study Nazim Medzhidov, Ph.D & Joshua Owoyemi, Ph.D Elix, Inc. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan | October 26th, 2022

Slide 2

Slide 2 text

2 Background ● Challenges associated with the traditional drug discovery process have facilitated the application of machine learning approaches in this domain. ● Generative AI approaches for molecular design are actively investigated. ● Evaluating generative models in silico is challenging, confirmation requires experimental validation (expensive) ● Majority evaluate model performance based on optimizing computable properties (logP, QED, SA score, etc.) ● How to select generated candidates efficiently? Objectives ★ Design a scenario and a pipeline to evaluate generative models in silico: ○ Hit-to-lead campaign ○ Novel chemotype discovery ★ Test our Elix Discovery™ Platform ★ Candidate prioritization pipeline

Slide 3

Slide 3 text

3 Study Workflow AI Model Development Dataset Preparation Post-processing & Prioritization Result Analysis ● Pre-training dataset: ○ ChEMBL ○ AChE inhibitors removed ○ Target chemotype scaffold removed ● Training Set: ○ AChE inhibitors from ChEMBL ● Elix Discovery™ Platform ● Elix Predict: ○ AChE inhibitory activity prediction model ○ Blood Brain Barrier (BBB) Permeability prediction model ● Elix Create: ○ SmilesFormer Generative Model ○ 10 sampling runs ● 30K molecules generated in each of 10 sampling runs ● Post-processing: ○ Phys-Chem Filters (RO5) ○ MCF filters ○ Novelty ○ BBB Permeability ● Prioritization: ○ QED score ○ Predicted activity ○ Binding affinity (docking) ● Quality assessment: ○ Target scaffold discovery ○ Documented potent compound discovery ● Short list of best 200 molecules from each run ● Final short-list of 20 most frequently selected best compounds

Slide 4

Slide 4 text

4 Dataset Preparation

Slide 5

Slide 5 text

5 Datasets: Acetylcholinesterase inhibitors ChEMBL (~2.2M) Training dataset (1076): A + B + C AChE Inhibitors with IC50 values (4,238) AChE inhibitors before 1992 (120) More recent molecules with same chemotypes present in A (847) A Pre-training dataset (~2.2M): Physostigmine Tacrine Rivastigmine B C Hit & hit expansion compounds (109) D Molecules containing piperazine, piperidine or indan (357) (Hidden) Established chemotypes before 1992 AChE inhibitors removed (15.5K mols) 48 mols with the scaffold removed B 1992 First appearance of donepezil chemotype in ChEMBL database A B C D (Hit compound) (Target chemotype) A

Slide 6

Slide 6 text

6 Datasets: Chemical Space Visualization Physostigmine Tacrine Rivastigmine Hit Compound Target Chemotype

Slide 7

Slide 7 text

AI Model Development 7

Slide 8

Slide 8 text

8 Elix Discovery™ Platform Generative Model ● SmilesFormer ○ Pre-trained on ChEMBL dataset without AChE inhibitors and target scaffold ○ Trained on: datasets A + B + C (1076 samples) ● Multiobjective Optimization Problem: ● SA score ● QED score ● Favorable physical-chemical properties ● Novelty (distance from the training set) ● Activity Predictive Models ● AChE inhibitory activity prediction model: ○ GCN ○ Trained on: datasets A + B + C (1076 samples) ● Blood Brain Barrier (BBB) Permeability prediction model: ○ GCN ○ Trained on an in-house dataset (9059 samples)

Slide 9

Slide 9 text

Molecule Generation, Post Processing & Prioritization 9

Slide 10

Slide 10 text

10 Generation strategy and post-processing pipeline 30K mols / run Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Random Sampling One seed Group seed No seed 1 2 3 Filtering 5 6 ~4000 mols / run ● RO5 ● MCF ● Novelty ● BBB Permeability Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Prioritizing 5 6 ● QED ● Predicted activity ● Binding affinity (docking) 200 mols / run Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Aggregation 20 most frequently selected candidates ● Recommendation score: ○ Consistency of selection ○ Min = 1, Max = 10

Slide 11

Slide 11 text

Results 11

Slide 12

Slide 12 text

12 Discovering reported potent scaffold and molecules Reported scaffold discovery success (number of runs) Reported potent compound discovery success (number of runs) No Seed 0 / 10 0 / 10 One Seed 4 / 10 5 / 10 Group Seed 9 / 10 9 / 10 D Molecules from hidden dataset D (target chemotype) containing represented substructures Reported potent scaffold A B Molecules rediscovered with One Seed setting Molecules rediscovered with Group Seed setting IC50 = 81 nM Rank = 8 IC50 = 6.7 nM Rank = 31 IC50 = 58 nM Rank = 107 IC50 = 94 nM Rank = 166 IC50 = 81 nM Rank = 56 IC50 = 30 nM Rank = 71 IC50 = 6.7 nM Rank = 124 IC50 = 94 nM Rank = 393 Random Sampling

Slide 13

Slide 13 text

13 Final 20 Candidates by Recommendation Score No seed One seed Legend: top 1% recommendation score (max = 10)

Slide 14

Slide 14 text

14 Conclusion ● Designed a retrospective case study of novel chemotype discovery for generative models (quality assessment) ● Tested our Elix Discovery™ platform in a hit-to-lead discovery campaign ● Given an early hit compound, optimized the scaffold to a more complex diverse scaffolds including a reported potent indanone-piperidine scaffold ● Multiple sampling runs and recommendations score analysis helped to focus on consistently top ranked candidates ● Among the prioritized candidates, reported indanone-piperidine containing potent molecules were discovered ● These molecules were included in the top 1% of the generated molecules ● Final 20 top ranked candidates included at least one known potent AChE inhibitor ● Potential presence of yet unknown potent compounds among final recommendations

Slide 15

Slide 15 text

www.elix-inc.com

Slide 16

Slide 16 text

APPENDIX 16

Slide 17

Slide 17 text

17 Final 20 candidates: One seed vs Group Seed Group seed One seed

Slide 18

Slide 18 text

18 Chemical Space Visualization Physostigmine Tacrine Rivastigmine Hit Compound Target Chemotype