Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using urine to diagnose prostate cancer in R

Shea Connell
September 14, 2019

Using urine to diagnose prostate cancer in R

A Framework for robust & Reproducible biomarker discovery from integrated datasets

Presented at R/Medicine 2019

Shea Connell

September 14, 2019
Tweet

Other Decks in Research

Transcript

  1. Using urine to diagnose prostate cancer in R A F

    R A M E W O R K F O R R O B U S T & R E P R O D U C I B L E B I O M A R K E R D I SC O V E RY F R O M I N T E G R AT E D D ATAS E T S SHEA CONNELL - UNIVERSITY OF EAST ANGLIA
  2. Prostate cancer has an overdiagnosis problem 10-year survival reaching 90%

    Responsible for 13% of all male cancer deaths Primarily driven more by detection than improvements in survival. Incidence Mortality 20 40 60 80 100 1970 1980 1990 2000 2010 Year Age−standardised rate per 100,000 European Age−Standardised Mortality & Incidence Prostate Cancer Rates Source: Office for National Statistics Prostate cancer has an overdiagnosis problem 10-year survival reaching 90% Responsible for 13% of all male cancer deaths Primarily driven more by detection than improvements in survival. Incidence Mortality 20 40 60 80 100 1970 1980 1990 2000 2010 Year Age−standardised rate per 100,000 European Age−Standardised Mortality & Incidence Prostate Cancer Rates Source: Office for National Statistics
  3. The Trouble with Prostate Cancer: 0 25 50 75 100

    NEC Raised PSA, No Cancer Low−Risk Int−Risk High−Risk PSA (ng/mL)
  4. The Trouble with Prostate Cancer: 0 25 50 75 100

    NEC Raised PSA, No Cancer Low−Risk Int−Risk High−Risk PSA (ng/mL)
  5. The Trouble with Prostate Cancer: Biopsies scored by Gleason Grade

    Group: 1 3 5 2 4 “Safe” Unknowns Sometimes Need Treatment Definitely Need Treatment Similar underlying pathology
  6. Current Standards of Care are Imprecise. C B A D

    (But they’re the best we have)
  7. The Research & Data • Movember GAP1 Urine Project -

    Biodiscovery, pre-clinical trials • Collected 1000+ urine samples, 5 countries, 12 institutes • Multiple experiments, many variables, some overlap: • Proteomics • Metabolomics • Targeted transcriptomics • ELISAs • Methylation studies
  8. Goals: • Find potentially useful features/variables in small(er) datasets •

    Develop potential diagnostic models robustly, reproducibly & better than standards of care. • Adhere to the TRIPOD guidelines for reporting the development of such a model. • Make the process easy to adapt to new data.
  9. The Training Label issue Biopsies are a limited snapshot –

    no two men with the same “label” will have identical proportions of disease. Solution: design a new training label! 1. Bin patient biopsy into one of three categories. • No cancer found on biopsy • Mostly GG1 / 2 • GG3 or above 2. Treat this label continuously. • Allows for more “wiggle room” when compared to classification error metrics that strongly penalize “wrong” answers. GG2 GG2 GG1 High PSA –bx GG3 GG4 “Benign”
  10. The Training Label issue Biopsies are a limited snapshot –

    no two men with the same “label” will have identical proportions of disease. Solution: design a new training label! 1. Bin patient biopsy into one of three categories. • No cancer found on biopsy • Mostly GG1 / 2 • GG3 or above 2. Treat this label continuously. • Allows for more “wiggle room” when compared to classification error metrics that strongly penalize “wrong” answers. GG2 GG2 GG1 High PSA –bx GG3 GG4 “Benign”
  11. • Feature selection wrapper algorithm based on random forests and

    imputation – with a handy R package! • Permutates each predictor and records this “shadow” feature’s importance via the Mean Decrease Accuracy • Checks this importance against the real features: Real Z-score > Shadow Z Score? • Rinse and repeat with each shadow predictor • Only keep features that are better than the best shadow features across majority of iterations
  12. The Framework: The Framework employs a resampling based feature selection

    process coupled with Random Forests to produce a single, model that can be interrogated for clinical utility and aims to recognise the uncertainty inherent to prostate sampling. dbplyr, ssh dplyr, tidyr, stringr, recipes tidymodels, furrr, Boruta, ranger randomForest pROC, pdp, rmda, dabestR ggplot2, redoc, cowplot, ggpubr rmarkdown
  13. The Successes ExoMeth is a 16 variable model integrating urinary

    cell methylation, cell-free RNA and clinical data. ExoGrail is a 12 variable model integrating urinary cell protein, cell-free RNA data and clinical data. GGPLOT2, RMDA, PROC, COWPLOT
  14. Variable Importance 1,000 resamples generate a lot of information through

    Boruta We aggregate and reduce the importances using furrr Plot and fill according to the proportion of resamples where that variable was selected. FURRR, STRINGR, BORUTA, GGPLOT2
  15. Waterfall Plots 0.00 0.25 0.50 0.75 1.00 Patient Biopsy ExoGrail

    Risk Signature Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3 Visualisation of calibration and trend in model outputs. Good sanity check for spotting false-positives and erroneous samples GGPLOT2, DPLYR, MASS
  16. Waterfall Plots 0.00 0.25 0.50 0.75 1.00 Patient Biopsy ExoGrail

    Risk Signature Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3 Visualisation of calibration and trend in model outputs. Good sanity check for spotting false-positives and erroneous samples GGPLOT2, DPLYR, MASS Assay technical failures? What happened here?
  17. Waterfall Plots 0.00 0.25 0.50 0.75 1.00 Patient Biopsy ExoGrail

    Risk Signature Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3 Visualisation of calibration and trend in model outputs. Good sanity check for spotting false-positives and erroneous samples OR = 2.04 per 0.1 ExoGrail increase (95% CI: 1.78 - 2.35) GGPLOT2, DPLYR, MASS Assay technical failures? What happened here?
  18. Comparator Models Predictive ability in isolation means little. Comparator models

    show potential uplift over: ◦ Clinical standards of care (SoC) – Age and PSA ◦ Each variable source/experiment in isolation 0.0 0.5 1.0 1.5 0.00 0.25 0.50 0.75 1.00 SoC model risk score Density Model AUCs for determining: • GG ≥3: 0.75 • GG ≥2: 0.73 • Any cancer: 0.7 A 0 2 4 0.00 0.25 0.50 0.75 1.00 Methylation model risk score Density Model AUCs for determining: • GG ≥3: 0.77 • GG ≥2: 0.78 • Any cancer: 0.73 B 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 ExoRNA model risk score Density Model AUCs for determining: • GG ≥3: 0.74 • GG ≥2: 0.81 • Any cancer: 0.86 C 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 ExoMeth model risk score Density Model AUCs for determining: • GG ≥3: 0.81 • GG ≥2: 0.89 • Any cancer: 0.91 D Initial biopsy: GG ≤1 GG ≥2 GGPLOT2, COWPLOT, PROC
  19. Density vs ROC ROC plots obscure thresholds and distributions of

    a complex outcome. Density plots: ◦ Can explore multiclass outcomes ◦ Easier to see where models differ in predictive ability. ◦ Avoid reliance on sens and spec 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 SoC model risk score Density Model AUCs for determining: • GG ≥3: 0.77 • GG ≥2: 0.72 • Any cancer: 0.75 A 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 EN2 model risk score Density Model AUCs for determining: • GG ≥3: 0.81 • GG ≥2: 0.83 • Any cancer: 0.81 B 0 1 2 0.0 0.2 0.4 0.6 0.8 ExoRNA model risk score Density Model AUCs for determining: • GG ≥3: 0.67 • GG ≥2: 0.77 • Any cancer: 0.81 C 0 1 2 3 0.00 0.25 0.50 0.75 ExoGrail model risk score Density Model AUCs for determining: • GG ≥3: 0.84 • GG ≥2: 0.9 • Any cancer: 0.89 D Initial biopsy outcome: NEC GG = 1 GG = 2 GG ≥3 GGPLOT2, PURRR, PROC, COWPLOT
  20. 0.00 0.25 0.50 0.75 1.00 NEC N = 77 GG

    = 1 N = 24 GG = 2 N = 42 GG ≥3 N = 54 ExoMeth Risk Signature Category High Risk Intermediate Risk Low Risk Raised PSA NEC 0.1 0.2 0.3 0.4 0.5 GG = 1 minus NEC GG = 2 minus NEC GG ≥3 minus NEC Mean ExoMeth Risk Signature difference from NEC samples Estimation Plots Estimation plots show differences between biopsy outcomes Also get bootstrapped confidence intervals of effect sizes Modification of the functions from the dabestR package GGPLOT2, DABBESTR
  21. Decision Curve Analysis Allows the interpreter to pick risk thresholds

    relevant to them Our issue is that our datasets are not often representative of population. We simulate incidence rates by proportionally resampling. Benefit is measured against the Standard of Care model - the best clinical variables could possibly do given the data. 0 10 20 30 40 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Decision threshold for detection of GG ≥3 Net percentage reduction in biopsies A 0 10 20 30 40 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Decision threshold for detection of GG ≥2 Net percentage reduction in biopsies B 0 10 20 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Decision threshold for detection of any prostate cancer Net percentage reduction in biopsies C Comparators EN2 cf-RNA ExoGrail GGPLOT2, COWPLOT, RMDA
  22. The Failures To date we have trained and evaluated 6

    models across 5 different datasets. Not a magic bullet - this doesn’t “fix” data or pull out something out of nothing. • Two datasets beyond salvage – unbalanced/weird dataset effects • Two require more detailed evaluation – maybe different modelling strategies. Not good… Patient Biopsy Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3
  23. How is this reproducible? In-house RMarkdown template works as a

    push-button analysis to move from raw data to publication-ready figures including: 1. Feature reduction (if required): • Options to push computation to cluster 2. Model Construction 3. Comparison to larger datasets (where available) 4. Easy sharing of analyses and objects. Analyses now work with binder and holepunch through GitHub for single-step reproducible analyses in the cloud.
  24. What’s Next? Direct Impacts of The Framework: • We now

    have more reasonable lists of targets for new prospective trials and basic research. • Which have been designed around TRIPOD guidelines – to refine these models before true clinical validation. Make plots interactive. Still a handful of datasets to process and investigate – a few months not years. Make all analyses and code openly available to all collaborators and the wider scientific community (uphill battle)