Slide 1

Slide 1 text

The {medicaldata} Teaching Package Peter D.R. Higgins University of Michigan

Slide 2

Slide 2 text

Why a medical data package for teaching? • Learners find relevant examples motivating • These datasets illustrate data challenges they may face • There are a few medical datasets in R packages, but widely scattered • Many datasets are poorly documented, hard to understand • Quite bare-bones • It is really convenient to have datasets wrapped in one package • Easier for students and for instructors • Can re-use datasets across teaching concepts.

Slide 3

Slide 3 text

{medicaldata} • A data package with 15 medical datasets

Slide 4

Slide 4 text

Github • The code can be found on GitHub at: https://github.com/higgi13425/medicaldata

Slide 5

Slide 5 text

Contents • Two historical reconstructions of datasets • 1747 scurvy trial on the HMS Salisbury by James Lind • 1948 streptomycin for tuberculosis trial • Five other RCTs • Sulindac for polyps, indomethacin for post-ERCP pancreatitis • Six Cohort & Case-Control studies • COVID testing, esophageal cancer case-control, CMV after BMT • Two pharmacokinetic studies • Indomethacin and Theophylline

Slide 6

Slide 6 text

Documentation • Definitions & details on each variable, units, range, levels • Background on each study • Description of study design, intervention, measurements • Specification of study outcomes • Some suggestions for uses of each dataset • Full help(dataset) files • Linked codebooks and description documents on the packagedown website & github README

Slide 7

Slide 7 text

Website • Packagedown website at https://higgi13425.github.io/medicaldata/

Slide 8

Slide 8 text

Three Asks and one Give 1. Add Examples 2. Please Donate Datasets 3. Chat Q: Add Untidy datasets (yes/no)?

Slide 9

Slide 9 text

1st Ask: Try it out – use {medicaldata} for teaching, add examples as issues • Used strep_tb dataset to teach table construction with {gtsummary} • Attach code (reprex) • Used scurvy dataset for categorical scatterplots of outcomes • Attach code (reprex) • Used the indo_rct dataset to make a covariate forest plot • Attach code (reprex) • Used theoph dataset for GAM modeling • Attach code (reprex)

Slide 10

Slide 10 text

2nd Ask: Donate datasets • Do you have access to medical datasets? • Randomized controlled trials • Cohort studies • Case-control studies • Must be of reasonable size (5MB limit on CRAN) • Must be anonymized • Fake names, fake study IDs are helpful • Need a reasonable level of documentation/codebook/publication

Slide 11

Slide 11 text

3rd Ask(opinions in chat): Should I add untidy medical datasets? • Wide medical data that need pivot_longer()? • Untidy medical data that need help from {tidyr}? • Separate, unite • Separate_rows • Nest, unnest • Fill, complete, replace_na • Color-coded medical data that need {tidyxl}? • Multiheaded medical data that need {unheadr}? • Messy medical data that need {unpivotr}?

Slide 12

Slide 12 text

One Give • {medicaldata} hex stickers to the first 100 people who send an SASE • SASE = self-addressed, stamped envelope • Important – include sufficient postage on the SASE and the sending envelope! STAMP here Sender (your name and address) 123 Data Street Medical Center City, State, Country, Postal code Fold the SASE into thirds, and put the SASE into a 2nd envelope. Mail this outer envelope (with postage) with SASE enclosed to: Peter Higgins SPC 5682 1150 West Medical Center Drive Michigan Medicine Ann Arbor, Michigan, 48109 USA

Slide 13

Slide 13 text

Breaking News • {medicaldata} is available on CRAN as of 16 August, 2021 • You can now install.packages(“medicaldata”) • Thanks for your feedback and github issues!

Slide 14

Slide 14 text

Thank You! Please ask questions, provide feedback, and discuss in the chat!