ANALYZING GENOMICS DATA IN R WITH BIOCONDUCTOR Stephanie Hicks Assistant Professor, Biostatistics Johns Hopkins Bloomberg School of Public Health #rstatsdc Conference November 8, 2018
ABOUT ME Teaching: Data Science Research: Genomics • R/Bioconductor developer Other fun things about me: • Co-founded R-Ladies Baltimore • Creating a children’s book featuring women statisticians and data scientists
• Open-source, open development software project • Began in 2001 • Big priorities: reproducible research and high-quality documentation • Vignettes • Diverse community support • Workflows (super helpful for n00bs) • Teaching resources and open development
GENOMIC VERBS/ACTIONS + TIDY DATA = PLYRANGES • Goal: Write human readable analysis workflows • Idea: Define an API (i.e. extend dplyr) that maps relational genomic algebra to “verbs” that act on ”tidy” genomic data • Another great idea: Borrow dplyr’s syntax and design principles • And another great idea: Compose verbs together with pipe operator from magrittr Stuart Lee Di Cook Michael Lawrence