An Elix Discovery™ Case Study: Rediscovering Donepezil with an In-house Generative Model

1 Elix DiscoveryTM: a rediscovery case study of Donepezil

2 Goals • Provide a use case of Elix DiscoveryTM
Platform • Focus on the generative models • Focus on novel scaffolds: ◦ Rediscover a scaffold of a known drug, but novel in terms of data used.

3 Target Molecule Selection Criteria Problem design: • Identify a
known drug (“target molecule”) with detailed description of the discovery process • Collect publicly available data on the protein target of the target molecule • Filter out the training set to exclude molecules similar to the target molecule • Train predictive and generative models • Observe if scaffold rediscovery is successful Target Molecule Selection Criteria: • NOT a kinase inhibitor • Well-documented drug design process • Diverse dataset (not focused on single moiety derivative compounds)

4 Study Workﬂow Dataset Filtering • Exclusion of donepezil scaffold
containing compounds from pre-training set • Exclusion of donepezil scaffold and relevant molecules from training set Dataset Curation • Pre-training set: ◦ ChEMBL data ◦ Objective: Learn SMILES vocabulary • Training-set: ◦ AChE inhibitors from ChEMBL database Predictive Model Training • Single model for activity prediction in generation step • 10 model ensemble for activity prediction in post-processing • BBB Permeability model ensemble of 5 models: ◦ Trained on a curated dataset of 9059 samples Generative model Training • In house developed SmilesFormer model • Pre-trained on the cleaned ChEMBL data • Fine-tuned on the cleaned activity data Generation and data analysis • Generate 30K molecules/run • Phys-chem filters • MCF filters • Novelty filters • BBB permeability prediction confidence filter • Activity prediction confidence filter • Scaffold grouping and rankings

5 Donepezil (Aricept) • Used for Alzheimer’s disease treatment •
Centrally acting reversible acetylcholinesterase (AChE) inhibitor Physostigmine Galantamine Tacrine Donepezil Rivastigmine Compound 8 (Backbone) Donepezil Compound 1 (Seed) N-Benzylpiperazine 1-indanone N-Benzylpiperadine

6 Training Set Filtering Filtered Pattern Filtered Substructures Group 1
Group 2 AND ✅ ✅ ✅ ✅

7 Training set distribution Tanimoto Similarity to Donepezil pIC50 n
: 3950

8 Training set: Most abundant scaffolds 3572 807 787 615
524 120 120 120 117 112 265 220 207 156 150 Number of molecules containing the structure (Legend) Extract unique scaffolds from training set Search training set for substructure matches to each scaffold

9 Training set: 10 most similar molecules to donepezil Legend:
Tanimoto similarity to Donepezil 0.443 0.443 0.441 0.435 0.433 0.432 0.431 0.429 0.425 0.424

10 Generation Procedure Multiobjective Optimization Problem: • SA score •
QED score • Favorable physical-chemical properties • Novelty (distance from the training set) • Activity Generative Score: • Average of the normalized single scores (SA, QED, phys-chem, novelty, activity) was computed for each generated molecule • Molecules with the highest “generative score” were prioritized during generation process • Up to 30K molecules with highest scores were generated in each sampling run • 10 sampling runs were performed in total

11 Post-Processing Analysis Summary 1 2 3 4 5 6
30K molecules each Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Top 20 scaffolds each 20 most frequent scaffolds 7 Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Combined scaffolds from all runs 6 7 7 Filtering Steps

12 Post-Processing Analysis 1) Phys-Chem & MCFs • Lipinski’s RO5
• Allowed common atoms • Ring size (up to 8) • Medicinal chemistry filters (189 filters) 2) Novelty • Avoid building upon known scaffolds (tacrine and physostigmine). • Remove molecules with exact scaffold match to the training set • Remove molecules with > 0.5 tanimoto similarity score 3) BBB Permeability • Choose molecules based on BBB permeability prediction probability threshold • Value used: ◦ 0.99 4) Activity Prediction Confidence • Choose top n percent of the molecules based on pIC50 prediction confidence • Values used: ◦ 50% ◦ 40% ◦ 30% ◦ 20%

13 Grouping and Ranking Analysis 5) Scaffold Grouping & Ranking
• Group molecules sharing the same scaffold • Rank scaffolds by a “desirability score”: ◦ (QED + pIC50)/2 6) Combine Multiple Runs • Combine top 20 scaffolds from each of 10 sampling runs 7) Most consistent suggestions • Rank ﬁnal list by number of occurrences

14 Results

15 Top 50% by activity prediction conﬁdence Legend: Frequency of
generation among 10 runs

19 Generated Results with Donepezil Scaffold Donepezil scaffold Generated molecule
Donepezil Compound 14 from the original Donepezil paper[1] [1] Sugimoto H. et al. Jpn. J. Pharmacol. 89, 7 – 20 (2002)

20 Summary & Discussion [1] • Elix DiscoveryTM Platform was
used to discover novel scaffolds (distant from the training set) • During 10 runs ~30K molecules were generated in each run • Molecules in each run were ﬁltered to a short list of 20 scaffolds. • Donepezil scaffold consistently ranked amongst the top 20 scaffolds • Donepezil scaffold was represented by a molecule originally described as one of the intermediary molecules (Compound 14) that led to the donepezil discovery[1] [1] Sugimoto H. et al. Jpn. J. Pharmacol. 89, 7 – 20 (2002)

21 Summary & Discussion [2] • Observations: ◦ Diversity in
scaffolds: many scaffolds were represented by very few molecules. ◦ Generated molecules were mostly predicted to be BBB permeable, without explicit optimization for this parameter. ◦ Activity prediction models struggled when predicting on a chemical space too distant from the training set ◦ Filtering by the prediction conﬁdence helped to focus on molecules with more conﬁdence in predicted IC50 values.

株式会社Elix http://ja.elix-inc.com/ 2

An Elix Discovery™ Case Study: Rediscovering Do...

An Elix Discovery™ Case Study: Rediscovering Donepezil with an In-house Generative Model

Elix

More Decks by Elix

Other Decks in Technology

Featured

Transcript

1 Elix DiscoveryTM: a rediscovery case study of Donepezil

2 Goals • Provide a use case of Elix DiscoveryTM

3 Target Molecule Selection Criteria Problem design: • Identify a

4 Study Workﬂow Dataset Filtering • Exclusion of donepezil scaffold

5 Donepezil (Aricept) • Used for Alzheimer’s disease treatment •

6 Training Set Filtering Filtered Pattern Filtered Substructures Group 1

7 Training set distribution Tanimoto Similarity to Donepezil pIC50 n

8 Training set: Most abundant scaffolds 3572 807 787 615

9 Training set: 10 most similar molecules to donepezil Legend:

10 Generation Procedure Multiobjective Optimization Problem: • SA score •

11 Post-Processing Analysis Summary 1 2 3 4 5 6

12 Post-Processing Analysis 1) Phys-Chem & MCFs • Lipinski’s RO5

13 Grouping and Ranking Analysis 5) Scaffold Grouping & Ranking

14 Results

15 Top 50% by activity prediction conﬁdence Legend: Frequency of

16 Top 40% by activity prediction conﬁdence Legend: Frequency of

17 Top 30% by activity prediction conﬁdence Legend: Frequency of

18 Top 20% by activity prediction conﬁdence Legend: Frequency of

19 Generated Results with Donepezil Scaffold Donepezil scaffold Generated molecule

20 Summary & Discussion [1] • Elix DiscoveryTM Platform was

21 Summary & Discussion [2] • Observations: ◦ Diversity in

株式会社Elix http://ja.elix-inc.com/ 2