Upgrade to Pro — share decks privately, control downloads, hide ads and more …



Presentation of the medicaldata package at the R/Medicine 2021 conference. An R package with datasets for learners in the health sciences.


Peter Higgins

August 31, 2021


  1. The {medicaldata} Teaching Package Peter D.R. Higgins University of Michigan

  2. Why a medical data package for teaching? • Learners find

    relevant examples motivating • These datasets illustrate data challenges they may face • There are a few medical datasets in R packages, but widely scattered • Many datasets are poorly documented, hard to understand • Quite bare-bones • It is really convenient to have datasets wrapped in one package • Easier for students and for instructors • Can re-use datasets across teaching concepts.
  3. {medicaldata} • A data package with 15 medical datasets

  4. Github • The code can be found on GitHub at:

  5. Contents • Two historical reconstructions of datasets • 1747 scurvy

    trial on the HMS Salisbury by James Lind • 1948 streptomycin for tuberculosis trial • Five other RCTs • Sulindac for polyps, indomethacin for post-ERCP pancreatitis • Six Cohort & Case-Control studies • COVID testing, esophageal cancer case-control, CMV after BMT • Two pharmacokinetic studies • Indomethacin and Theophylline
  6. Documentation • Definitions & details on each variable, units, range,

    levels • Background on each study • Description of study design, intervention, measurements • Specification of study outcomes • Some suggestions for uses of each dataset • Full help(dataset) files • Linked codebooks and description documents on the packagedown website & github README
  7. Website • Packagedown website at https://higgi13425.github.io/medicaldata/

  8. Three Asks and one Give 1. Add Examples 2. Please

    Donate Datasets 3. Chat Q: Add Untidy datasets (yes/no)?
  9. 1st Ask: Try it out – use {medicaldata} for teaching,

    add examples as issues • Used strep_tb dataset to teach table construction with {gtsummary} • Attach code (reprex) • Used scurvy dataset for categorical scatterplots of outcomes • Attach code (reprex) • Used the indo_rct dataset to make a covariate forest plot • Attach code (reprex) • Used theoph dataset for GAM modeling • Attach code (reprex)
  10. 2nd Ask: Donate datasets • Do you have access to

    medical datasets? • Randomized controlled trials • Cohort studies • Case-control studies • Must be of reasonable size (5MB limit on CRAN) • Must be anonymized • Fake names, fake study IDs are helpful • Need a reasonable level of documentation/codebook/publication
  11. 3rd Ask(opinions in chat): Should I add untidy medical datasets?

    • Wide medical data that need pivot_longer()? • Untidy medical data that need help from {tidyr}? • Separate, unite • Separate_rows • Nest, unnest • Fill, complete, replace_na • Color-coded medical data that need {tidyxl}? • Multiheaded medical data that need {unheadr}? • Messy medical data that need {unpivotr}?
  12. One Give • {medicaldata} hex stickers to the first 100

    people who send an SASE • SASE = self-addressed, stamped envelope • Important – include sufficient postage on the SASE and the sending envelope! STAMP here Sender (your name and address) 123 Data Street Medical Center City, State, Country, Postal code Fold the SASE into thirds, and put the SASE into a 2nd envelope. Mail this outer envelope (with postage) with SASE enclosed to: Peter Higgins SPC 5682 1150 West Medical Center Drive Michigan Medicine Ann Arbor, Michigan, 48109 USA
  13. Breaking News • {medicaldata} is available on CRAN as of

    16 August, 2021 • You can now install.packages(“medicaldata”) • Thanks for your feedback and github issues!
  14. Thank You! Please ask questions, provide feedback, and discuss in

    the chat!