Slide 1

Slide 1 text

1 Elix DiscoveryTM: a rediscovery case study of Donepezil

Slide 2

Slide 2 text

2 Goals ● Provide a use case of Elix DiscoveryTM Platform ● Focus on the generative models ● Focus on novel scaffolds: ○ Rediscover a scaffold of a known drug, but novel in terms of data used.

Slide 3

Slide 3 text

3 Target Molecule Selection Criteria Problem design: ● Identify a known drug (“target molecule”) with detailed description of the discovery process ● Collect publicly available data on the protein target of the target molecule ● Filter out the training set to exclude molecules similar to the target molecule ● Train predictive and generative models ● Observe if scaffold rediscovery is successful Target Molecule Selection Criteria: ● NOT a kinase inhibitor ● Well-documented drug design process ● Diverse dataset (not focused on single moiety derivative compounds)

Slide 4

Slide 4 text

4 Study Workflow Dataset Filtering ● Exclusion of donepezil scaffold containing compounds from pre-training set ● Exclusion of donepezil scaffold and relevant molecules from training set Dataset Curation ● Pre-training set: ○ ChEMBL data ○ Objective: Learn SMILES vocabulary ● Training-set: ○ AChE inhibitors from ChEMBL database Predictive Model Training ● Single model for activity prediction in generation step ● 10 model ensemble for activity prediction in post-processing ● BBB Permeability model ensemble of 5 models: ○ Trained on a curated dataset of 9059 samples Generative model Training ● In house developed SmilesFormer model ● Pre-trained on the cleaned ChEMBL data ● Fine-tuned on the cleaned activity data Generation and data analysis ● Generate 30K molecules/run ● Phys-chem filters ● MCF filters ● Novelty filters ● BBB permeability prediction confidence filter ● Activity prediction confidence filter ● Scaffold grouping and rankings

Slide 5

Slide 5 text

5 Donepezil (Aricept) ● Used for Alzheimer’s disease treatment ● Centrally acting reversible acetylcholinesterase (AChE) inhibitor Physostigmine Galantamine Tacrine Donepezil Rivastigmine Compound 8 (Backbone) Donepezil Compound 1 (Seed) N-Benzylpiperazine 1-indanone N-Benzylpiperadine

Slide 6

Slide 6 text

6 Training Set Filtering Filtered Pattern Filtered Substructures Group 1 Group 2 AND ✅ ✅ ✅ ✅

Slide 7

Slide 7 text

7 Training set distribution Tanimoto Similarity to Donepezil pIC50 n : 3950

Slide 8

Slide 8 text

8 Training set: Most abundant scaffolds 3572 807 787 615 524 120 120 120 117 112 265 220 207 156 150 Number of molecules containing the structure (Legend) Extract unique scaffolds from training set Search training set for substructure matches to each scaffold

Slide 9

Slide 9 text

9 Training set: 10 most similar molecules to donepezil Legend: Tanimoto similarity to Donepezil 0.443 0.443 0.441 0.435 0.433 0.432 0.431 0.429 0.425 0.424

Slide 10

Slide 10 text

10 Generation Procedure Multiobjective Optimization Problem: ● SA score ● QED score ● Favorable physical-chemical properties ● Novelty (distance from the training set) ● Activity Generative Score: ● Average of the normalized single scores (SA, QED, phys-chem, novelty, activity) was computed for each generated molecule ● Molecules with the highest “generative score” were prioritized during generation process ● Up to 30K molecules with highest scores were generated in each sampling run ● 10 sampling runs were performed in total

Slide 11

Slide 11 text

11 Post-Processing Analysis Summary 1 2 3 4 5 6 30K molecules each Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Top 20 scaffolds each 20 most frequent scaffolds 7 Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Combined scaffolds from all runs 6 7 7 Filtering Steps

Slide 12

Slide 12 text

12 Post-Processing Analysis 1) Phys-Chem & MCFs ● Lipinski’s RO5 ● Allowed common atoms ● Ring size (up to 8) ● Medicinal chemistry filters (189 filters) 2) Novelty ● Avoid building upon known scaffolds (tacrine and physostigmine). ● Remove molecules with exact scaffold match to the training set ● Remove molecules with > 0.5 tanimoto similarity score 3) BBB Permeability ● Choose molecules based on BBB permeability prediction probability threshold ● Value used: ○ 0.99 4) Activity Prediction Confidence ● Choose top n percent of the molecules based on pIC50 prediction confidence ● Values used: ○ 50% ○ 40% ○ 30% ○ 20%

Slide 13

Slide 13 text

13 Grouping and Ranking Analysis 5) Scaffold Grouping & Ranking ● Group molecules sharing the same scaffold ● Rank scaffolds by a “desirability score”: ○ (QED + pIC50)/2 6) Combine Multiple Runs ● Combine top 20 scaffolds from each of 10 sampling runs 7) Most consistent suggestions ● Rank final list by number of occurrences

Slide 14

Slide 14 text

14 Results

Slide 15

Slide 15 text

15 Top 50% by activity prediction confidence Legend: Frequency of generation among 10 runs

Slide 16

Slide 16 text

16 Top 40% by activity prediction confidence Legend: Frequency of generation among 10 runs

Slide 17

Slide 17 text

17 Top 30% by activity prediction confidence Legend: Frequency of generation among 10 runs

Slide 18

Slide 18 text

18 Top 20% by activity prediction confidence Legend: Frequency of generation among 10 runs

Slide 19

Slide 19 text

19 Generated Results with Donepezil Scaffold Donepezil scaffold Generated molecule Donepezil Compound 14 from the original Donepezil paper[1] [1] Sugimoto H. et al. Jpn. J. Pharmacol. 89, 7 – 20 (2002)

Slide 20

Slide 20 text

20 Summary & Discussion [1] ● Elix DiscoveryTM Platform was used to discover novel scaffolds (distant from the training set) ● During 10 runs ~30K molecules were generated in each run ● Molecules in each run were filtered to a short list of 20 scaffolds. ● Donepezil scaffold consistently ranked amongst the top 20 scaffolds ● Donepezil scaffold was represented by a molecule originally described as one of the intermediary molecules (Compound 14) that led to the donepezil discovery[1] [1] Sugimoto H. et al. Jpn. J. Pharmacol. 89, 7 – 20 (2002)

Slide 21

Slide 21 text

21 Summary & Discussion [2] ● Observations: ○ Diversity in scaffolds: many scaffolds were represented by very few molecules. ○ Generated molecules were mostly predicted to be BBB permeable, without explicit optimization for this parameter. ○ Activity prediction models struggled when predicting on a chemical space too distant from the training set ○ Filtering by the prediction confidence helped to focus on molecules with more confidence in predicted IC50 values.

Slide 22

Slide 22 text

株式会社Elix http://ja.elix-inc.com/ 2