Target Discovery in the age of Big Data and AI, Elix, CBI 2021

Target Discovery in the age of Big Data and AI
Nazim Medzhidov, PhD Research Engineer Elix Inc. 2021/10/26

2 Overview • Background • Problem components • Data sources
• Approaches • Challenges and opportunities • Concluding remarks

3 Background • Selection and prioritization of novel drug targets
in drug discovery process is a central problem • Choice of the right target can signiﬁcantly impact the probability of advancing the drug to clinical development and clinical trial success • Multi-parameter optimization problem: ◦ Target role in the disease progression ◦ Novelty ◦ Target Druggability ◦ Safety ◦ ….. ◦ …..

Grand challenge(s) 4 • Out of ~20,000 expressed proteins in
human genome (~1.5%) about 2,000-3,000 (10-15%) are considered disease-related • Less than 700 of these proteins have been effectively targeted • Majority of the disease-related protein targets remain undruggable, lacking distinct binding motifs/pockets for small molecules to bind • Focusing on protein targets only? How to identify novel targets? Can AI be used to facilitate this process?

5 Problem Components Required Components Methodology Data Data Structure? Which
Representation? Architecture Optimization

6 Data A large number of publicly available resources: •
UniProt • Open Targets • Therapeutic Target Database (TTD) • The Drug Gene Interaction database (DGIdb) • Target Central Resource Database (TCRD) • Large-scale human genomics, proteomics, and metabolomics data sets

7 Data (2) • Combining all this information in a
single place for further analysis or prioritization of a list of targets can become a daunting task. • Each data source specializes in different areas such as protein expression, disease association or pharmacology • Navigating through a myriad of cross-references in order to paint an accurate portrait of a potential target is necessary and challenging for researchers How do we structure these massive data?

8 Data (3) • Biological Networks: ◦ Protein-Protein interaction networks
◦ Gene interaction networks ◦ Metabolic networks • These type of biological network data can be represented as graphs that capture interactions between nodes for subsequent analysis using computational approaches • ML techniques can be helpful in processing these data and in identifying features related to the successful target selection

9 Deep Learning • One of the strengths of deep
learning is its ability to detect complex patterns in the data making it well suited for application in bioinformatics, where the data represent complex, interdependent relationships between biological entities and processes, which are often intrinsically noisy and occurring at multiple scales • DL methods have been extended to graph-structured data, making it a promising technology to tackle these biological network analysis problems

10 Potential steps • Predicting unknown function of a node
(gene/protein/metabolite) in the network • Link prediction: identiﬁcation of interactions between nodes • Combining predictions on separate tasks

11 Concluding Remarks • Novel target identiﬁcation remains a challenging,
yet central problem in drug discovery research • Involvement of a myriad of parameters contributes to the problem complexity • Need of robust ways to process large biological network data and suggest novel target candidates • Complex biological network data can be structured as a graph for subsequent analyses using DL • Community effort has mostly been focused on drug-target interaction prediction problem with only a few studies focused on novel target identiﬁcation

12 Q&A

株式会社Elix http://ja.elix-inc.com/ 13

Target Discovery in the age of Big Data and AI,...

Target Discovery in the age of Big Data and AI, Elix, CBI 2021

Elix

More Decks by Elix

Other Decks in Technology

Featured

Transcript

Target Discovery in the age of Big Data and AI

2 Overview • Background • Problem components • Data sources

3 Background • Selection and prioritization of novel drug targets

Grand challenge(s) 4 • Out of ~20,000 expressed proteins in

5 Problem Components Required Components Methodology Data Data Structure? Which

6 Data A large number of publicly available resources: •

7 Data (2) • Combining all this information in a

8 Data (3) • Biological Networks: ◦ Protein-Protein interaction networks

9 Deep Learning • One of the strengths of deep

10 Potential steps • Predicting unknown function of a node

11 Concluding Remarks • Novel target identiﬁcation remains a challenging,

12 Q&A

株式会社Elix http://ja.elix-inc.com/ 13