Using urine to diagnose prostate cancer in R

Using urine to diagnose prostate cancer in R A F
R A M E W O R K F O R R O B U S T & R E P R O D U C I B L E B I O M A R K E R D I SC O V E RY F R O M I N T E G R AT E D D ATAS E T S SHEA CONNELL - UNIVERSITY OF EAST ANGLIA

Disclosure: I AM A NAMED INVENTOR ON PATENT APPLICATIONS RELEVANT
TO THIS PRESENTATION

Prostate cancer is more commonly a disease that patients die
with rather than of.

Prostate cancer has an overdiagnosis problem 10-year survival reaching 90%
Responsible for 13% of all male cancer deaths Primarily driven more by detection than improvements in survival. Incidence Mortality 20 40 60 80 100 1970 1980 1990 2000 2010 Year Age−standardised rate per 100,000 European Age−Standardised Mortality & Incidence Prostate Cancer Rates Source: Office for National Statistics Prostate cancer has an overdiagnosis problem 10-year survival reaching 90% Responsible for 13% of all male cancer deaths Primarily driven more by detection than improvements in survival. Incidence Mortality 20 40 60 80 100 1970 1980 1990 2000 2010 Year Age−standardised rate per 100,000 European Age−Standardised Mortality & Incidence Prostate Cancer Rates Source: Office for National Statistics

The Trouble with Prostate Cancer: 0 25 50 75 100
NEC Raised PSA, No Cancer Low−Risk Int−Risk High−Risk PSA (ng/mL)

The Trouble with Prostate Cancer: Biopsies scored by Gleason Grade
Group: 1 3 5 2 4 “Safe” Unknowns Sometimes Need Treatment Definitely Need Treatment Similar underlying pathology

Current Standards of Care are Imprecise. C B A D
(But they’re the best we have)

Why Urine?

The Research & Data • Movember GAP1 Urine Project -
Biodiscovery, pre-clinical trials • Collected 1000+ urine samples, 5 countries, 12 institutes • Multiple experiments, many variables, some overlap: • Proteomics • Metabolomics • Targeted transcriptomics • ELISAs • Methylation studies

Goals: • Find potentially useful features/variables in small(er) datasets •
Develop potential diagnostic models robustly, reproducibly & better than standards of care. • Adhere to the TRIPOD guidelines for reporting the development of such a model. • Make the process easy to adapt to new data.

The Training Label issue Biopsies are a limited snapshot –
no two men with the same “label” will have identical proportions of disease. Solution: design a new training label! 1. Bin patient biopsy into one of three categories. • No cancer found on biopsy • Mostly GG1 / 2 • GG3 or above 2. Treat this label continuously. • Allows for more “wiggle room” when compared to classification error metrics that strongly penalize “wrong” answers. GG2 GG2 GG1 High PSA –bx GG3 GG4 “Benign”

• Feature selection wrapper algorithm based on random forests and
imputation – with a handy R package! • Permutates each predictor and records this “shadow” feature’s importance via the Mean Decrease Accuracy • Checks this importance against the real features: Real Z-score > Shadow Z Score? • Rinse and repeat with each shadow predictor • Only keep features that are better than the best shadow features across majority of iterations

The Framework: The Framework employs a resampling based feature selection
process coupled with Random Forests to produce a single, model that can be interrogated for clinical utility and aims to recognise the uncertainty inherent to prostate sampling. dbplyr, ssh dplyr, tidyr, stringr, recipes tidymodels, furrr, Boruta, ranger randomForest pROC, pdp, rmda, dabestR ggplot2, redoc, cowplot, ggpubr rmarkdown

The Successes ExoMeth is a 16 variable model integrating urinary
cell methylation, cell-free RNA and clinical data. ExoGrail is a 12 variable model integrating urinary cell protein, cell-free RNA data and clinical data. GGPLOT2, RMDA, PROC, COWPLOT

Variable Importance 1,000 resamples generate a lot of information through
Boruta We aggregate and reduce the importances using furrr Plot and fill according to the proportion of resamples where that variable was selected. FURRR, STRINGR, BORUTA, GGPLOT2

Waterfall Plots 0.00 0.25 0.50 0.75 1.00 Patient Biopsy ExoGrail
Risk Signature Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3 Visualisation of calibration and trend in model outputs. Good sanity check for spotting false-positives and erroneous samples GGPLOT2, DPLYR, MASS

Risk Signature Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3 Visualisation of calibration and trend in model outputs. Good sanity check for spotting false-positives and erroneous samples GGPLOT2, DPLYR, MASS Assay technical failures? What happened here?

Risk Signature Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3 Visualisation of calibration and trend in model outputs. Good sanity check for spotting false-positives and erroneous samples OR = 2.04 per 0.1 ExoGrail increase (95% CI: 1.78 - 2.35) GGPLOT2, DPLYR, MASS Assay technical failures? What happened here?

Comparator Models Predictive ability in isolation means little. Comparator models
show potential uplift over: ◦ Clinical standards of care (SoC) – Age and PSA ◦ Each variable source/experiment in isolation 0.0 0.5 1.0 1.5 0.00 0.25 0.50 0.75 1.00 SoC model risk score Density Model AUCs for determining: • GG ≥3: 0.75 • GG ≥2: 0.73 • Any cancer: 0.7 A 0 2 4 0.00 0.25 0.50 0.75 1.00 Methylation model risk score Density Model AUCs for determining: • GG ≥3: 0.77 • GG ≥2: 0.78 • Any cancer: 0.73 B 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 ExoRNA model risk score Density Model AUCs for determining: • GG ≥3: 0.74 • GG ≥2: 0.81 • Any cancer: 0.86 C 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 ExoMeth model risk score Density Model AUCs for determining: • GG ≥3: 0.81 • GG ≥2: 0.89 • Any cancer: 0.91 D Initial biopsy: GG ≤1 GG ≥2 GGPLOT2, COWPLOT, PROC

Density vs ROC ROC plots obscure thresholds and distributions of
a complex outcome. Density plots: ◦ Can explore multiclass outcomes ◦ Easier to see where models differ in predictive ability. ◦ Avoid reliance on sens and spec 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 SoC model risk score Density Model AUCs for determining: • GG ≥3: 0.77 • GG ≥2: 0.72 • Any cancer: 0.75 A 0.0 0.5 1.0 1.5 2.0 0.00 0.25 0.50 0.75 1.00 EN2 model risk score Density Model AUCs for determining: • GG ≥3: 0.81 • GG ≥2: 0.83 • Any cancer: 0.81 B 0 1 2 0.0 0.2 0.4 0.6 0.8 ExoRNA model risk score Density Model AUCs for determining: • GG ≥3: 0.67 • GG ≥2: 0.77 • Any cancer: 0.81 C 0 1 2 3 0.00 0.25 0.50 0.75 ExoGrail model risk score Density Model AUCs for determining: • GG ≥3: 0.84 • GG ≥2: 0.9 • Any cancer: 0.89 D Initial biopsy outcome: NEC GG = 1 GG = 2 GG ≥3 GGPLOT2, PURRR, PROC, COWPLOT

0.00 0.25 0.50 0.75 1.00 NEC N = 77 GG
= 1 N = 24 GG = 2 N = 42 GG ≥3 N = 54 ExoMeth Risk Signature Category High Risk Intermediate Risk Low Risk Raised PSA NEC 0.1 0.2 0.3 0.4 0.5 GG = 1 minus NEC GG = 2 minus NEC GG ≥3 minus NEC Mean ExoMeth Risk Signature difference from NEC samples Estimation Plots Estimation plots show differences between biopsy outcomes Also get bootstrapped confidence intervals of effect sizes Modification of the functions from the dabestR package GGPLOT2, DABBESTR

Decision Curve Analysis Allows the interpreter to pick risk thresholds
relevant to them Our issue is that our datasets are not often representative of population. We simulate incidence rates by proportionally resampling. Benefit is measured against the Standard of Care model - the best clinical variables could possibly do given the data. 0 10 20 30 40 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Decision threshold for detection of GG ≥3 Net percentage reduction in biopsies A 0 10 20 30 40 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Decision threshold for detection of GG ≥2 Net percentage reduction in biopsies B 0 10 20 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Decision threshold for detection of any prostate cancer Net percentage reduction in biopsies C Comparators EN2 cf-RNA ExoGrail GGPLOT2, COWPLOT, RMDA

The Failures To date we have trained and evaluated 6
models across 5 different datasets. Not a magic bullet - this doesn’t “fix” data or pull out something out of nothing. • Two datasets beyond salvage – unbalanced/weird dataset effects • Two require more detailed evaluation – maybe different modelling strategies. Not good… Patient Biopsy Biopsy Outcome No Evidence of Cancer GG = 1 GG = 2 GG ≥3

How is this reproducible? In-house RMarkdown template works as a
push-button analysis to move from raw data to publication-ready figures including: 1. Feature reduction (if required): • Options to push computation to cluster 2. Model Construction 3. Comparison to larger datasets (where available) 4. Easy sharing of analyses and objects. Analyses now work with binder and holepunch through GitHub for single-step reproducible analyses in the cloud.

What’s Next? Direct Impacts of The Framework: • We now
have more reasonable lists of targets for new prospective trials and basic research. • Which have been designed around TRIPOD guidelines – to refine these models before true clinical validation. Make plots interactive. Still a handful of datasets to process and investigate – a few months not years. Make all analyses and code openly available to all collaborators and the wider scientific community (uphill battle)

Andy Ripley Memorial Fund @Shedimus @UEA-Cancer-Genetics-Lab

Using urine to diagnose prostate cancer in R

Using urine to diagnose prostate cancer in R

Shea Connell

Other Decks in Research

Featured

Transcript

Using urine to diagnose prostate cancer in R A F

Disclosure: I AM A NAMED INVENTOR ON PATENT APPLICATIONS RELEVANT

Prostate cancer is more commonly a disease that patients die

Prostate cancer has an overdiagnosis problem 10-year survival reaching 90%

The Trouble with Prostate Cancer: 0 25 50 75 100

The Trouble with Prostate Cancer: 0 25 50 75 100

The Trouble with Prostate Cancer: Biopsies scored by Gleason Grade

Current Standards of Care are Imprecise. C B A D

Why Urine?

The Research & Data • Movember GAP1 Urine Project -

Goals: • Find potentially useful features/variables in small(er) datasets •

The Training Label issue Biopsies are a limited snapshot –

The Training Label issue Biopsies are a limited snapshot –

• Feature selection wrapper algorithm based on random forests and

The Framework: The Framework employs a resampling based feature selection

The Successes ExoMeth is a 16 variable model integrating urinary

Variable Importance 1,000 resamples generate a lot of information through

Waterfall Plots 0.00 0.25 0.50 0.75 1.00 Patient Biopsy ExoGrail

Waterfall Plots 0.00 0.25 0.50 0.75 1.00 Patient Biopsy ExoGrail

Waterfall Plots 0.00 0.25 0.50 0.75 1.00 Patient Biopsy ExoGrail

Comparator Models Predictive ability in isolation means little. Comparator models

Density vs ROC ROC plots obscure thresholds and distributions of

0.00 0.25 0.50 0.75 1.00 NEC N = 77 GG

Decision Curve Analysis Allows the interpreter to pick risk thresholds

The Failures To date we have trained and evaluated 6

How is this reproducible? In-house RMarkdown template works as a

What’s Next? Direct Impacts of The Framework: • We now

Andy Ripley Memorial Fund @Shedimus @UEA-Cancer-Genetics-Lab