Slide 1

Slide 1 text

Improving Molecular Property Prediction using Self-supervised Learning 27th October 2021 Laurent Dillard

Slide 2

Slide 2 text

2 Table of Contents • Motivation • Self-supervised learning • Methods • Results • Conclusion • Appendix • References

Slide 3

Slide 3 text

3 • Data scarcity is a common bane of deep learning applications and drug discovery is no exception to it. Motivation

Slide 4

Slide 4 text

4 • Data scarcity is a common bane of deep learning applications and drug discovery is no exception to it. • Deep Learning models typically rely on large amounts of annotated data to be trained which is especially challenging in the context of drug discovery. Motivation

Slide 5

Slide 5 text

5 • Data scarcity is a common bane of deep learning applications and drug discovery is no exception to it. • Deep Learning models typically rely on large amounts of annotated data to be trained which is especially challenging in the context of drug discovery. How can we improve model’s generalization and accuracy given a limited amount of annotated data? Motivation

Slide 6

Slide 6 text

6 Annotated data is hard to come by but… Motivation

Slide 7

Slide 7 text

7 Annotated data is hard to come by but… Large databases1,2,3 of chemical structures are growing at unprecedented rates. Motivation

Slide 8

Slide 8 text

8 Annotated data is hard to come by but… Large databases1,2,3 of chemical structures are growing at unprecedented rates. These databases can be leveraged using unsupervised learning techniques. Motivation

Slide 9

Slide 9 text

9 Self-supervised learning What is self-supervised learning? • Self-supervised learning is an unsupervised learning technique.

Slide 10

Slide 10 text

10 Self-supervised learning What is self-supervised learning? • Self-supervised learning is an unsupervised learning technique. • It does not require annotations (labels).

Slide 11

Slide 11 text

11 Self-supervised learning What is self-supervised learning? • Self-supervised learning is an unsupervised learning technique. • It does not require annotations (labels). • It is aimed at leveraging large amounts of unlabeled data.

Slide 12

Slide 12 text

12 Self-supervised learning What is self-supervised learning? • Self-supervised learning is an unsupervised learning technique. • It does not require annotations (labels). • It is aimed at leveraging large amounts of unlabeled data. The goal is to learn useful feature representations that can then be transferred to other downstream tasks.

Slide 13

Slide 13 text

13 Self-supervised learning Unlabeled Dataset Model

Slide 14

Slide 14 text

14 Self-supervised learning Unlabeled Dataset Model Self-supervised pretext task Self-supervision Example: Train model to predict logP value.

Slide 15

Slide 15 text

15 Self-supervised learning Unlabeled Dataset Model Self-supervised pretext task Labeled Dataset Model Supervised downstream task Transfer learning After being trained with self-supervision, the model can then be fine tuned on downstream tasks using transfer learning. Self-supervision Example: Train model to predict logP value.

Slide 16

Slide 16 text

16 Self-supervised learning The assumption of self-supervised learning is that it is possible to find pretext tasks that allow the model to learn useful, general feature representations that can then be used on a variety of different downstream tasks. Once the model is trained on the pretext tasks, it can be fine tuned on downstream tasks of interest. This transfer learning step should be beneficial especially when the downstream tasks only contain a low number of samples.

Slide 17

Slide 17 text

17 Methods Our work focused on Graph Neural Networks (GNNs) architectures, which have proved to be very efficient for molecular applications. We considered 3 different types of GNNs: GCN4, GIN5 and DMPNN6.

Slide 18

Slide 18 text

Self-supervision pretext tasks Focus on 3 distinct scales of molecules 18 Methods

Slide 19

Slide 19 text

19 Methods Self-supervision pretext tasks Focus on 3 distinct scales of molecules • Atom: Classification problem to predict what type of fragment the atom belongs to (from 2000 possible fragments used in SAScore7 computation).

Slide 20

Slide 20 text

Self-supervision pretext tasks Focus on 3 distinct scales of molecules • Atom: Classification problem to predict what type of fragment the atom belongs to (from 2000 possible fragments used in SAScore7 computation). • Fragment: Binary classification problem based on breaking molecules into fragments and predict for all pairwise comparisons if the fragments originate from the same molecule or not. 20 Methods

Slide 21

Slide 21 text

21 Methods Self-supervision pretext tasks Focus on 3 distinct scales of molecules • Atom: Classification problem to predict what type of fragment the atom belongs to (from 2000 possible fragments used in SAScore7 computation). • Fragment: Binary classification problem based on breaking molecules into fragments and predict for all pairwise comparisons if the fragments originate from the same molecule or not. • Molecule: Multilabel classification problem to predict which fragments can be found in the molecule (from the same 2000 fragments).

Slide 22

Slide 22 text

22 Methods

Slide 23

Slide 23 text

Self-supervision dataset: subset of 250,000 molecules from ZINC15 Downstream datasets: ADME-T datasets (from MoleculeNet8) - BACE: Contains 1,522 samples with binary labels on binding results with human Beta-secretase 1 (BACE-1). - BBBP: Contains 2,053 samples with binary labels about permeability of the compounds with the blood-brain barrier. - ClinTox: Contains 1,491 samples with two distinct binary labels associated. The first label refers to clinical trial toxicity and the second one to the FDA approval status. - SIDER: Contains 1,427 samples with binary labels for 27 different drug side-effects categories. - ToxCast: Contains 8,615 samples with 617 binary labels based on the results of in vitro toxicology experiments. - Tox21: Contains 8,014 samples with 12 binary labels based on toxicity measurements. 23 Methods

Slide 24

Slide 24 text

Evaluation procedure For each downstream tasks, the models were evaluated using 10 different train / validation / test splits generated with scaffold splitting. Each model was trained on the training data and a checkpoint was saved after each epoch. The best checkpoint was selected via the validation set and the ROC-AUC was measured on the test set. 24 Methods

Slide 25

Slide 25 text

25 Results Model / Dataset BBBP (2039) SIDER (1427) Clintox (1478) BACE (1513) Tox21 (7831) Toxcast (8575) Average GCN (no pretrain) 0.898 ± 0.033 0.605 ± 0.027 0.909 ± 0.038 0.831 ± 0.027 0.769 ± 0.013 0.651 ± 0.014 0.777 ± 0.025 GCN SSL 0.904 ± 0.035 0.615 ± 0.017 0.927 ± 0.026 0.858 ± 0.023 0.761 ± 0.024 0.653 ± 0.012 0.786 ± 0.023 GIN (no pretrain) 0.885 ± 0.038 0.602 ± 0.023 0.902 ± 0.045 0.844 ± 0.028 0.773 ± 0.012 0.625 ± 0.013 0.772 ± 0.027 GIN SSL 0.891 ± 0.032 0.614 ± 0.017 0.904 ± 0.033 0.854 ± 0.025 0.773 ± 0.018 0.630 ± 0.013 0.778 ± 0.023 DMPNN (no pretrain) 0.894 ± 0.038 0.620 ± 0.021 0.908 ± 0.029 0.800 ± 0.034 0.762 ± 0.018 0.637 ± 0.012 0.770 ± 0.025 DMPNN SSL 0.898 ± 0.034 0.605 ± 0.025 0.910 ± 0.040 0.855 ± 0.027 0.757 ± 0.020 0.618 ± 0.013 0.774 ± 0.026

Slide 26

Slide 26 text

26 Results Ablation study Model / Dataset BBBP (2039) SIDER (1427) Clintox (1478) BACE (1513) Tox21 (7831) Toxcast (8575) Average GCN (no pretrain) 0.898 ± 0.033 0.605 ± 0.027 0.909 ± 0.038 0.831 ± 0.027 0.769 ± 0.013 0.651 ± 0.014 0.777 ± 0.025 GCN SSL Atom only 0.905 ± 0.033 0.614 ± 0.024 0.894 ± 0.042 0.848 ± 0.019 0.766 ± 0.016 0.651 ± 0.008 0.780 ± 0.024 GCN SSL Molecule only 0.898 ± 0.037 0.610 ± 0.024 0.905 ± 0.041 0.861 ± 0.028 0.752 ± 0.023 0.635 ± 0.008 0.777 ± 0.027 GCN SSL Fragment only 0.908 ± 0.039 0.611 ± 0.028 0.900 ± 0.040 0.837 ± 0.028 0.769 ± 0.014 0.651 ± 0.010 0.779 ± 0.027 GCN SSL Atom + Fragment 0.904 ± 0.037 0.621 ± 0.022 0.899 ± 0.040 0.850 ± 0.024 0.769 ± 0.20 0.652 ± 0.010 0.783 ± 0.025 GCN SSL 0.904 ± 0.035 0.615 ± 0.017 0.927 ± 0.026 0.858 ± 0.023 0.761 ± 0.024 0.653 ± 0.012 0.786 ± 0.023

Slide 27

Slide 27 text

27 Conclusion Self-supervised learning can effectively improve the performance of graph based predictive models for predicting molecular properties. Unfortunately, for the framework tested, the performance largely depends on the model and dataset. The challenge lies in designing pretext tasks that improve performance consistently across model architectures and downstream tasks.

Slide 28

Slide 28 text

28 References • [1]: PubChem. PubChem. URL https://pubchem.ncbi.nlm.nih.gov/ • [2]: Shoichet Brian K Irwin John J. Zinc–a free database of commercially available compounds for virtual screening. Journal of chemical information and modeling, 2005. • [3]: Anna Gaulton, Anne Hersey, Michał Nowotka, A. Patrícia Bento, Jon Chambers, David Mendez, Prudence Mutowo, Francis Atkinson, Louisa J. Bellis, Elena Cibrián-Uhalte, Mark Davies, Nathan Dedman, Anneli Karlsson, María Paula Magariños, John P. Overington, George Papadatos, Ines Smit, Andrew R. Leach, The ChEMBL database in 2017, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D945–D954, https://doi.org/10.1093/nar/gkw1074 • [4]: Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. CoRR, abs/1609.02907, 2016. URLhttp://arxiv.org/abs/1609.02907. • [5]: Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? CoRR, abs/1810.00826, 2018. URLhttp://arxiv.org/abs/1810.00826. • [6]: Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker422Settels, Tommi Jaakkola, Klavs Jensen, and Regina Barzilay. Analyzing learned molecular representations for property prediction. Journal of Chemical Information and Modeling, 59(8):3370–3388, Aug 2019. ISSN 1549-9596. doi: 10.1021/acs.jcim.9b00237. URL https://doi.org/10.1021/acs.jcim.9b00237 • [7]: Ertl, P., Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform 1, 8 (2009). https://doi.org/10.1186/1758-2946-1-8 • [8]: Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay S. Pande. Moleculenet: A benchmark for molecular machine learning. CoRR,abs/ 1703.00564, 2017. • [9]: Chang-Ying Ma, Sheng-Yong Yang, Hui Zhang, Ming-Li Xiang, Qi Huang, Yu-Quan Wei. Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA–CG–SVM method. Journal of Pharmaceutical and Biomedical Analysis, Volume 47, Issues 4–5, 2008, Pages 677-682. • [10]: Carbon-Mangels, M. and Hutter, M.C. (2011), Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets. Mol. Inf., 30: 885-895. https://doi.org/10.1002/minf.201100069 • [11]: Shuangquan Wang, Huiyong Sun, Hui Liu, Dan Li, Youyong Li, and Tingjun Hou. Admet evaluation in drug discovery. Predicting herg blockers by combining multiple pharmacophores and machine learning approaches. Molecular Pharmaceutics, 13(8):2855–2866, 2016. PMID:27379394. • [12]: Youjun Xu, Ziwei Dai, Fangjin Chen, Shuaishi Gao, Jianfeng Pei, and Luhua Lai. Deep learning for drug- induced liver injury. Journal of Chemical Information and Modeling, 55(10):2085–2093, 2015. PMID:26437739. • [13]: Mark Wenlock and Nicholas Tomkinson. Experimental in vitro dmpk and physicochemical data on a set of publicly disclosed compounds. DOI: 10.6019/CHEMBL3301361 • [14]: Dominique Douguet. Data sets representative of the structures and experimental properties of fda- approved drugs. ACS Medicinal Chemistry Letters, 9(3):204–209, 2018.

Slide 29

Slide 29 text

29 Results Additional experiments Changing pretraining set size from 250k to 2M or 10M did not yield significant change in performance. The performance on the pretext tasks saturates in all cases. Improving the difficulty of the pretext task could change that.

Slide 30

Slide 30 text

30 Appendix Model / Dataset Bioavailability MA (640) ROC-AUC CYP 2C9 Substrate (669) ROC-AUC CYP 2D6 Substrate (667) ROC-AUC hERG Blockers (665) ROC-AUC DILI (475) ROC-AUC Clearance Microsome AZ (1102) R2 PPBR EDrug3d (828) R2 GCN (no pretrain) 0.698 0.618 0.649 0.870 0.856 0.043 0.345 GCN SSL 0.675 0.606 0.593 0.878 0.861 0.112 0.381 Additional experiments on other ADME-T datasets Bioavailability MA9 is a dataset of 640 samples measuring absorption properties. CYP 2C9 and CYP 2D610 datasets contain respectively 669 and 667 samples measuring metabolic properties. hERG Blockers11 and DILI12 contain 665 and 475 samples respectively measuring toxicity properties. Clearance Microsome AZ13 contains 1102 samples measuring excretion properties. PPBR EDrug3d14 contains 828 samples and measures distribution properties.