Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Uncertainty in Virtual Screening, Elix, CBI 2021

Elix
October 27, 2021

Uncertainty in Virtual Screening, Elix, CBI 2021

Elix

October 27, 2021
Tweet

More Decks by Elix

Other Decks in Technology

Transcript

  1. Uncertainty in Virtual Screening
    1
    Romeo Cozac, Elix Inc.

    View Slide

  2. Dataset
    2
    ● This project attempted to
    “rediscover”
    pharmaceutical leads that
    are active against the
    Androgen Receptor (AR)
    as antagonists. AR is an
    important receptor in
    treating prostate cancer.
    ● Collected 4810
    compounds from ChEMBL
    that had IC50 assay data
    for AR.
    ● Compound pIC50 values
    and plotted on the left. The
    compounds covered 7
    orders of magnitude,
    indicating a broad range of
    inhibitory activity.

    View Slide

  3. Model Architecture
    3
    ● We used a graph neural
    network to create a
    predictive model.
    ● The model interprets the
    input data as a graph of
    atoms, and assigns features
    to these atoms (atomic
    weight, charge, etc.). The
    connections between atoms
    is recorded in an adjacency
    matrix.
    ● A series of graph
    convolutional layers and fully
    connected layers learn to
    predict IC50 values from the
    atom features. Prediction
    confidence is also calculated.

    View Slide

  4. Virtual Screening
    4
    Dataset
    350K compounds
    IC50 ≤10nM
    σ ≤0.1
    53 compounds
    Shortlist
    53 compounds
    ● Using the predictive model, the IC50
    values of 350,000 molecules were
    predicted. The 350,000 molecules are
    a virtual library that is being used to
    screen for lead candidates.
    ● Based on the predicted values of
    IC50, molecules that had an IC50 <
    10 nM (pIC = 9) and sigma < 0.1
    passed. This filter led to 53
    compounds out of the original
    350,000 compounds.
    ● Of these 53 compounds,
    approximately ⅔ were patented,
    indicating that the compounds are
    used as a drug in some application.
    The remaining ⅓ were not patented,
    suggesting possible lead candidates.

    View Slide

  5. Model Performance
    5
    R2 Score Spearman R MAE RMSE
    0.616 0.776 0.551 0.755
    ● The quality of the predictive model was assessed
    by comparing predicted values of pIC50 with
    ground truth values.
    ● These values are created by separating the original
    AR pIC50 dataset into training and testing sets, and
    assessing the model on the test set.
    ● R2 Score and Spearman R measure correlation
    (1.0 = perfect correlation). MAE (Mean Absolute
    Error) and RMSE (Root Mean Square Error) are
    measures of the distance between the predicted
    and ground truth values (lower is better).
    ● The model has good correlation and error values
    given the heterogeneity of the input data.

    View Slide

  6. Model Performance considering uncertainty
    6
    R2 Score Spearman R MAE RMSE
    0.616 0.776 0.551 0.755
    0.661 0.800 0.518 0.702
    ● The model also gives a measure of prediction
    uncertainty. The plot on this slide shows predictions
    with uncertainty over the 0.1 threshold as red.
    Qualitative examination of the plot shows many of
    the red points show large differences between the
    prediction and ground truth values.
    ● The same correlation and error metrics were
    calculated when the red points are excluded.
    ● The updated metrics (underlined in the table below)
    show that excluding points with high uncertainty
    leads to notable improvements in correlation and
    error values.

    View Slide

  7. Uncertainty and recognition of out-of-distribution molecules
    7
    Ignoring uncertainty Considering uncertainty
    ● This slide shows the
    differences in molecules
    when ignoring and when
    considering uncertainty.
    ● The molecular structures
    when ignoring uncertainty
    tend to have molecules
    that are out of distribution
    for drug-like molecules.
    ● Considering uncertainty
    leads to more drug-like
    molecules.

    View Slide

  8. Shortlist
    8
    ● Based on predicted AR IC50 values, a
    shortlist of lead candidates was
    created from the 350,000 screened
    molecules
    ● After applying pharmacophore and
    drug-likeness filters, the following
    molecules were identified
    ● The molecule in the green box below
    was identified as a novel compound
    for AR targeting

    View Slide

  9. 株式会社Elix
    http://ja.elix-inc.com/
    9

    View Slide