Presentation of the medicaldata package at the R/Medicine 2021 conference. An R package with datasets for learners in the health sciences.


Peter Higgins

August 31, 2021


  1. The {medicaldata} Teaching Package Peter D.R. Higgins University of Michigan

  2. Why a medical data package for teaching? • Learners find

    relevant examples motivating • These datasets illustrate data challenges they may face • There are a few medical datasets in R packages, but widely scattered • Many datasets are poorly documented, hard to understand • Quite bare-bones • It is really convenient to have datasets wrapped in one package • Easier for students and for instructors • Can re-use datasets across teaching concepts.
  3. {medicaldata} • A data package with 15 medical datasets

  4. Github • The code can be found on GitHub at:

  5. Contents • Two historical reconstructions of datasets • 1747 scurvy

    trial on the HMS Salisbury by James Lind • 1948 streptomycin for tuberculosis trial • Five other RCTs • Sulindac for polyps, indomethacin for post-ERCP pancreatitis • Six Cohort & Case-Control studies • COVID testing, esophageal cancer case-control, CMV after BMT • Two pharmacokinetic studies • Indomethacin and Theophylline
  6. Documentation • Definitions & details on each variable, units, range,

    levels • Background on each study • Description of study design, intervention, measurements • Specification of study outcomes • Some suggestions for uses of each dataset • Full help(dataset) files • Linked codebooks and description documents on the packagedown website & github README
  7. Website • Packagedown website at https://higgi13425.github.io/medicaldata/

  8. Three Asks and one Give 1. Add Examples 2. Please

    Donate Datasets 3. Chat Q: Add Untidy datasets (yes/no)?
  9. 1st Ask: Try it out – use {medicaldata} for teaching,

    add examples as issues • Used strep_tb dataset to teach table construction with {gtsummary} • Attach code (reprex) • Used scurvy dataset for categorical scatterplots of outcomes • Attach code (reprex) • Used the indo_rct dataset to make a covariate forest plot • Attach code (reprex) • Used theoph dataset for GAM modeling • Attach code (reprex)
  10. 2nd Ask: Donate datasets • Do you have access to

    medical datasets? • Randomized controlled trials • Cohort studies • Case-control studies • Must be of reasonable size (5MB limit on CRAN) • Must be anonymized • Fake names, fake study IDs are helpful • Need a reasonable level of documentation/codebook/publication
  11. 3rd Ask(opinions in chat): Should I add untidy medical datasets?

    • Wide medical data that need pivot_longer()? • Untidy medical data that need help from {tidyr}? • Separate, unite • Separate_rows • Nest, unnest • Fill, complete, replace_na • Color-coded medical data that need {tidyxl}? • Multiheaded medical data that need {unheadr}? • Messy medical data that need {unpivotr}?
  13. Breaking News • {medicaldata} is available on CRAN as of

    16 August, 2021 • You can now install.packages(“medicaldata”) • Thanks for your feedback and github issues!
  14. Thank You! Please ask questions, provide feedback, and discuss in

    the chat!