Models: a Retrospective Case Study Nazim Medzhidov, Ph.D & Joshua Owoyemi, Ph.D Elix, Inc. Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan | October 26th, 2022
process have facilitated the application of machine learning approaches in this domain. • Generative AI approaches for molecular design are actively investigated. • Evaluating generative models in silico is challenging, confirmation requires experimental validation (expensive) • Majority evaluate model performance based on optimizing computable properties (logP, QED, SA score, etc.) • How to select generated candidates efficiently? Objectives ★ Design a scenario and a pipeline to evaluate generative models in silico: ◦ Hit-to-lead campaign ◦ Novel chemotype discovery ★ Test our Elix Discovery™ Platform ★ Candidate prioritization pipeline
+ B + C AChE Inhibitors with IC50 values (4,238) AChE inhibitors before 1992 (120) More recent molecules with same chemotypes present in A (847) A Pre-training dataset (~2.2M): Physostigmine Tacrine Rivastigmine B C Hit & hit expansion compounds (109) D Molecules containing piperazine, piperidine or indan (357) (Hidden) Established chemotypes before 1992 AChE inhibitors removed (15.5K mols) 48 mols with the scaffold removed B 1992 First appearance of donepezil chemotype in ChEMBL database A B C D (Hit compound) (Target chemotype) A
on ChEMBL dataset without AChE inhibitors and target scaffold ◦ Trained on: datasets A + B + C (1076 samples) • Multiobjective Optimization Problem: • SA score • QED score • Favorable physical-chemical properties • Novelty (distance from the training set) • Activity Predictive Models • AChE inhibitory activity prediction model: ◦ GCN ◦ Trained on: datasets A + B + C (1076 samples) • Blood Brain Barrier (BBB) Permeability prediction model: ◦ GCN ◦ Trained on an in-house dataset (9059 samples)
Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Random Sampling One seed Group seed No seed 1 2 3 Filtering 5 6 ~4000 mols / run • RO5 • MCF • Novelty • BBB Permeability Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Prioritizing 5 6 • QED • Predicted activity • Binding affinity (docking) 200 mols / run Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Aggregation 20 most frequently selected candidates • Recommendation score: ◦ Consistency of selection ◦ Min = 1, Max = 10
chemotype discovery for generative models (quality assessment) • Tested our Elix Discovery™ platform in a hit-to-lead discovery campaign • Given an early hit compound, optimized the scaffold to a more complex diverse scaffolds including a reported potent indanone-piperidine scaffold • Multiple sampling runs and recommendations score analysis helped to focus on consistently top ranked candidates • Among the prioritized candidates, reported indanone-piperidine containing potent molecules were discovered • These molecules were included in the top 1% of the generated molecules • Final 20 top ranked candidates included at least one known potent AChE inhibitor • Potential presence of yet unknown potent compounds among final recommendations