Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study, Elix, CBI 2022

Elix
October 27, 2022

Hit to Lead Discovery of Benzylpiperidine Acetylcholinesterase Inhibitors Using Generative Models: a Retrospective Case Study, Elix, CBI 2022

Elix

October 27, 2022
Tweet

More Decks by Elix

Other Decks in Research

Transcript

  1. Hit to Lead Discovery of Benzylpiperidine
    Acetylcholinesterase Inhibitors Using Generative
    Models: a Retrospective Case Study
    Nazim Medzhidov, Ph.D & Joshua Owoyemi, Ph.D
    Elix, Inc.
    Chem-Bio Informatics Society (CBI) Annual Meeting 2022, Tokyo, Japan | October 26th, 2022

    View Slide

  2. 2
    Background
    ● Challenges associated with the traditional drug discovery process have facilitated the application of machine
    learning approaches in this domain.
    ● Generative AI approaches for molecular design are actively investigated.
    ● Evaluating generative models in silico is challenging, confirmation requires experimental validation (expensive)
    ● Majority evaluate model performance based on optimizing computable properties (logP, QED, SA score, etc.)
    ● How to select generated candidates efficiently?
    Objectives
    ★ Design a scenario and a
    pipeline to evaluate generative
    models in silico:
    ○ Hit-to-lead campaign
    ○ Novel chemotype discovery
    ★ Test our Elix Discovery™
    Platform
    ★ Candidate prioritization
    pipeline

    View Slide

  3. 3
    Study Workflow
    AI Model
    Development
    Dataset Preparation
    Post-processing &
    Prioritization
    Result Analysis
    ● Pre-training dataset:
    ○ ChEMBL
    ○ AChE inhibitors
    removed
    ○ Target chemotype
    scaffold removed
    ● Training Set:
    ○ AChE inhibitors from
    ChEMBL
    ● Elix Discovery™ Platform
    ● Elix Predict:
    ○ AChE inhibitory activity
    prediction model
    ○ Blood Brain Barrier (BBB)
    Permeability prediction
    model
    ● Elix Create:
    ○ SmilesFormer Generative
    Model
    ○ 10 sampling runs
    ● 30K molecules generated in
    each of 10 sampling runs
    ● Post-processing:
    ○ Phys-Chem Filters (RO5)
    ○ MCF filters
    ○ Novelty
    ○ BBB Permeability
    ● Prioritization:
    ○ QED score
    ○ Predicted activity
    ○ Binding affinity (docking)
    ● Quality assessment:
    ○ Target scaffold discovery
    ○ Documented potent compound discovery
    ● Short list of best 200 molecules from each
    run
    ● Final short-list of 20 most frequently
    selected best compounds

    View Slide

  4. 4
    Dataset Preparation

    View Slide

  5. 5
    Datasets: Acetylcholinesterase inhibitors
    ChEMBL (~2.2M)
    Training dataset (1076): A + B + C
    AChE Inhibitors with IC50 values (4,238)
    AChE inhibitors
    before 1992 (120)
    More recent molecules with same
    chemotypes present in A (847)
    A
    Pre-training dataset (~2.2M):
    Physostigmine
    Tacrine Rivastigmine
    B
    C
    Hit & hit expansion
    compounds (109)
    D
    Molecules containing piperazine,
    piperidine or indan (357) (Hidden)
    Established chemotypes before 1992
    AChE inhibitors
    removed (15.5K mols)
    48 mols with the
    scaffold removed
    B
    1992
    First appearance of
    donepezil chemotype
    in ChEMBL database
    A B
    C D
    (Hit compound) (Target chemotype)
    A

    View Slide

  6. 6
    Datasets: Chemical Space Visualization
    Physostigmine
    Tacrine
    Rivastigmine
    Hit Compound
    Target
    Chemotype

    View Slide

  7. AI Model Development
    7

    View Slide

  8. 8
    Elix Discovery™ Platform
    Generative Model
    ● SmilesFormer
    ○ Pre-trained on ChEMBL dataset without AChE
    inhibitors and target scaffold
    ○ Trained on: datasets A + B + C (1076 samples)
    ● Multiobjective Optimization Problem:
    ● SA score
    ● QED score
    ● Favorable physical-chemical properties
    ● Novelty (distance from the training set)
    ● Activity
    Predictive Models
    ● AChE inhibitory activity prediction model:
    ○ GCN
    ○ Trained on: datasets A + B + C (1076 samples)
    ● Blood Brain Barrier (BBB) Permeability prediction model:
    ○ GCN
    ○ Trained on an in-house dataset (9059 samples)

    View Slide

  9. Molecule Generation,
    Post Processing &
    Prioritization
    9

    View Slide

  10. 10
    Generation strategy and post-processing pipeline
    30K mols /
    run
    Run 1
    Run 2
    Run 5
    Run 3
    Run 4
    Run 6
    Run 7
    Run 10
    Run 8
    Run 9
    Random Sampling
    One seed
    Group seed
    No seed
    1
    2
    3
    Filtering
    5
    6
    ~4000 mols /
    run
    ● RO5
    ● MCF
    ● Novelty
    ● BBB
    Permeability
    Run 1
    Run 2
    Run 5
    Run 3
    Run 4
    Run 6
    Run 7
    Run 10
    Run 8
    Run 9
    Prioritizing
    5
    6
    ● QED
    ● Predicted
    activity
    ● Binding
    affinity
    (docking)
    200 mols /
    run
    Run 1
    Run 2
    Run 5
    Run 3
    Run 4
    Run 6
    Run 7
    Run 10
    Run 8
    Run 9
    Aggregation
    20 most
    frequently
    selected
    candidates
    ● Recommendation
    score:
    ○ Consistency of
    selection
    ○ Min = 1, Max = 10

    View Slide

  11. Results
    11

    View Slide

  12. 12
    Discovering reported potent scaffold and molecules
    Reported scaffold
    discovery success
    (number of runs)
    Reported potent compound
    discovery success
    (number of runs)
    No Seed 0 / 10 0 / 10
    One Seed 4 / 10 5 / 10
    Group Seed 9 / 10 9 / 10
    D
    Molecules from hidden dataset D (target
    chemotype) containing represented substructures
    Reported potent scaffold
    A
    B
    Molecules rediscovered with One Seed setting Molecules rediscovered with Group Seed setting
    IC50 = 81 nM
    Rank = 8
    IC50 = 6.7 nM
    Rank = 31
    IC50 = 58 nM
    Rank = 107
    IC50 = 94 nM
    Rank = 166
    IC50 = 81 nM
    Rank = 56
    IC50 = 30 nM
    Rank = 71
    IC50 = 6.7 nM
    Rank = 124
    IC50 = 94 nM
    Rank = 393
    Random Sampling

    View Slide

  13. 13
    Final 20 Candidates by Recommendation Score
    No seed One seed
    Legend: top 1% recommendation score (max = 10)

    View Slide

  14. 14
    Conclusion
    ● Designed a retrospective case study of novel chemotype discovery for generative models (quality assessment)
    ● Tested our Elix Discovery™ platform in a hit-to-lead discovery campaign
    ● Given an early hit compound, optimized the scaffold to a more complex diverse scaffolds including a reported
    potent indanone-piperidine scaffold
    ● Multiple sampling runs and recommendations score analysis helped to focus on consistently top ranked
    candidates
    ● Among the prioritized candidates, reported indanone-piperidine containing potent molecules were discovered
    ● These molecules were included in the top 1% of the generated molecules
    ● Final 20 top ranked candidates included at least one known potent AChE inhibitor
    ● Potential presence of yet unknown potent compounds among final recommendations

    View Slide

  15. www.elix-inc.com

    View Slide

  16. APPENDIX
    16

    View Slide

  17. 17
    Final 20 candidates: One seed vs Group Seed
    Group seed
    One seed

    View Slide

  18. 18
    Chemical Space Visualization
    Physostigmine
    Tacrine
    Rivastigmine
    Hit Compound
    Target
    Chemotype

    View Slide