The {medicaldata} package at NHS-R 2021

The {medicaldata} Teaching Package Peter D.R. Higgins University of Michigan,
Ann Arbor, Michigan, USA @ibddoctor $1.62B/yr research grants

Why a medical data package for teaching? • Learners find
relevant examples motivating • These datasets illustrate data challenges they may face • There are a few medical datasets in R packages, but widely scattered • Many datasets are poorly documented, hard to understand • Quite bare-bones • It is really convenient to have datasets wrapped in one package • Easier for students and for instructors • Can re-use datasets across teaching concepts. • {medicaldata} is focused on patient-level research data, which complements the system-level data in {NHSRdatasets}

{medicaldata} • A data package with 15 (for now) medical
datasets

Github • The code can be found on GitHub at:
https://github.com/higgi13425/medicaldata

Contents • Two historical reconstructions of datasets • 1747 scurvy
trial on the HMS Salisbury by James Lind • 1948 MRC streptomycin for tuberculosis trial • Five other RCTs • Sulindac for polyps, indomethacin for post-ERCP pancreatitis, others • Six Cohort & Case-Control studies • COVID testing, esophageal cancer case-control, CMV after BMT, others • Two pharmacokinetic studies • Indomethacin and Theophylline

Documentation • Definitions & details on each variable, units, range,
levels • Background on each study • Description of study design, intervention, measurements • Specification of study outcomes • Some suggestions for uses of each dataset • Full help(dataset) files • Linked codebooks and description documents on the packagedown website & github README

Website • Packagedown website at https://higgi13425.github.io/medicaldata/

Two Asks, One Plan, and one Give 1. Add Examples
2. Please Donate Datasets 3. Plan – add Untidy datasets

1st Ask: Try it out – use {medicaldata} for teaching,
add examples as issues • Used strep_tb dataset to teach table construction with {gtsummary} • Attach code (reprex) • Used scurvy dataset for categorical scatterplots of outcomes • Attach code (reprex) • Used the indo_rct dataset to make a covariate forest plot • Attach code (reprex) • Used theoph dataset for GAM modeling • Attach code (reprex) I would like to turn your examples Into vignettes Avishai Tsur: https://avishaitsur.netlify.app/posts/2021-09-04- reproducing-the-results-of-an-rct/

2nd Ask: Donate datasets • Do you have access to
medical datasets? • Randomized controlled trials • Cohort studies • Case-control studies • Must be of reasonable size (5MB limit on CRAN) • Must be anonymized • Fake names, fake study IDs are helpful • Need a reasonable level of documentation/codebook/a publication I am adding several from Frank Harrell for the January 2022 release

Future Plan: Add some untidy medical datasets • Wide medical
data that need pivot_longer() • Untidy medical data that need help from {tidyr} • Separate, unite • Separate_rows • Nest, unnest • Fill, complete, replace_na • Color-coded medical data that need {tidyxl} • Multiheaded medical data that need {unheadr} • Messy medical data that need {unpivotr} • Feel free to donate some untidy messes/examples! Likely for the July 2022 release

One Give • {medicaldata} hex stickers to the first 10
people who send a DM • Include your snail mail address Sender (your name and address) 123 Data Street Medical Center City, State, Country, Postal code On Twitter @ibddoctor Important – must be in one of 180 allowed countries: https://bit.ly/3vtPWnf Roughly corresponds to FIFA membership

CRAN Update • {medicaldata} is available on CRAN as of
16 August, 2021 • You can now install.packages(“medicaldata”) • Plan for updates ~ q6m (gradual changes to dev version) • Thanks for your feedback and github issues!

Thank You! Please ask questions, provide feedback, and discuss in
the chat!

The {medicaldata} package at NHS-R 2021

The {medicaldata} package at NHS-R 2021

Peter Higgins

More Decks by Peter Higgins

Other Decks in Education

Featured

Transcript

The {medicaldata} Teaching Package Peter D.R. Higgins University of Michigan,

Why a medical data package for teaching? • Learners find

{medicaldata} • A data package with 15 (for now) medical

Github • The code can be found on GitHub at:

Contents • Two historical reconstructions of datasets • 1747 scurvy

Documentation • Definitions & details on each variable, units, range,

Website • Packagedown website at https://higgi13425.github.io/medicaldata/

Two Asks, One Plan, and one Give 1. Add Examples

1st Ask: Try it out – use {medicaldata} for teaching,

2nd Ask: Donate datasets • Do you have access to

Future Plan: Add some untidy medical datasets • Wide medical

One Give • {medicaldata} hex stickers to the first 10

CRAN Update • {medicaldata} is available on CRAN as of

Thank You! Please ask questions, provide feedback, and discuss in