that are active against the Androgen Receptor (AR) as antagonists. AR is an important receptor in treating prostate cancer. • Collected 4810 compounds from ChEMBL that had IC50 assay data for AR. • Compound pIC50 values and plotted on the left. The compounds covered 7 orders of magnitude, indicating a broad range of inhibitory activity.
to create a predictive model. • The model interprets the input data as a graph of atoms, and assigns features to these atoms (atomic weight, charge, etc.). The connections between atoms is recorded in an adjacency matrix. • A series of graph convolutional layers and fully connected layers learn to predict IC50 values from the atom features. Prediction confidence is also calculated.
53 compounds Shortlist 53 compounds • Using the predictive model, the IC50 values of 350,000 molecules were predicted. The 350,000 molecules are a virtual library that is being used to screen for lead candidates. • Based on the predicted values of IC50, molecules that had an IC50 < 10 nM (pIC = 9) and sigma < 0.1 passed. This filter led to 53 compounds out of the original 350,000 compounds. • Of these 53 compounds, approximately ⅔ were patented, indicating that the compounds are used as a drug in some application. The remaining ⅓ were not patented, suggesting possible lead candidates.
0.776 0.551 0.755 • The quality of the predictive model was assessed by comparing predicted values of pIC50 with ground truth values. • These values are created by separating the original AR pIC50 dataset into training and testing sets, and assessing the model on the test set. • R2 Score and Spearman R measure correlation (1.0 = perfect correlation). MAE (Mean Absolute Error) and RMSE (Root Mean Square Error) are measures of the distance between the predicted and ground truth values (lower is better). • The model has good correlation and error values given the heterogeneity of the input data.
RMSE 0.616 0.776 0.551 0.755 0.661 0.800 0.518 0.702 • The model also gives a measure of prediction uncertainty. The plot on this slide shows predictions with uncertainty over the 0.1 threshold as red. Qualitative examination of the plot shows many of the red points show large differences between the prediction and ground truth values. • The same correlation and error metrics were calculated when the red points are excluded. • The updated metrics (underlined in the table below) show that excluding points with high uncertainty leads to notable improvements in correlation and error values.
uncertainty • This slide shows the differences in molecules when ignoring and when considering uncertainty. • The molecular structures when ignoring uncertainty tend to have molecules that are out of distribution for drug-like molecules. • Considering uncertainty leads to more drug-like molecules.
shortlist of lead candidates was created from the 350,000 screened molecules • After applying pharmacophore and drug-likeness filters, the following molecules were identified • The molecule in the green box below was identified as a novel compound for AR targeting