Slide 1

Slide 1 text

mine çetinkaya-rundel toolkit for the modern statistician 🔗 bit.ly/modern-toolkit

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

data transformation and tidying with tidyverse

Slide 4

Slide 4 text

tidyverse opinionated collection of R packages designed for data science library(tidyverse)) ggplot2: data visualization dplyr: data wrangling tidyr: data tidying readr: data reading/writing forcats: working with factors stringr: working with strings tibble: modern data frames purrr: functional programming install.packages(tidyverse)) above + a few more

Slide 5

Slide 5 text

tidyverse all packages share an underlying design philosophy, grammar, and data structures tidy data data pipelines with %>%

Slide 6

Slide 6 text

tidy data each variable must have its own column each observation must have its own row each value must have its own cell

Slide 7

Slide 7 text

each variable must have its own column each observation must have its own row each value must have its own cell tidy data

Slide 8

Slide 8 text

each variable must have its own column each observation must have its own row each value must have its own cell tidy data

Slide 9

Slide 9 text

task I want to fi nd my keys, then start my car, then drive to work, then park my car.

Slide 10

Slide 10 text

park(drive(start_car(f i nd("keys")), to = "work")) nested

Slide 11

Slide 11 text

park(drive(start_car(f i nd("keys")), to = "work")) nested

Slide 12

Slide 12 text

park(drive(start_car(f i nd("keys")), to = "work")) nested

Slide 13

Slide 13 text

park(drive(start_car(f i nd("keys")), to = "work")) nested

Slide 14

Slide 14 text

f i nd("keys") %>% start_car() %>% drive(to = "work") %>% park() piped

Slide 15

Slide 15 text

f i nd("keys") %>% start_car() %>% drive(to = "work") %>% park() piped

Slide 16

Slide 16 text

f i nd("keys") %>% start_car() %>% drive(to = "work") %>% park() piped

Slide 17

Slide 17 text

f i nd("keys") %>% start_car() %>% drive(to = "work") %>% park() piped

Slide 18

Slide 18 text

ex: ggplot2 library(palmerpenguins) library(tidyverse) ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species)) + labs( title = "Penguin size, Palmer Station LTER", subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins", x = "Flipper length (mm)", y = "Body mass (g)", color = "Penguin species", shape = "Penguin species" ) Visually pleasing defaults!

Slide 19

Slide 19 text

library(palmerpenguins) library(tidyverse) ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species)) + labs( title = "Penguin size, Palmer Station LTER", subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins", x = "Flipper length (mm)", y = "Body mass (g)", color = "Penguin species", shape = "Penguin species" ) legends for free!

Slide 20

Slide 20 text

customize to your heart’s desire!

Slide 21

Slide 21 text

ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species), size = 3, alpha = 0.8) + labs( title = "Penguin size, Palmer Station LTER", subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins", x = "Flipper length (mm)", y = "Body mass (g)", color = "Penguin species", shape = "Penguin species" )

Slide 22

Slide 22 text

ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species), size = 3, alpha = 0.8) + labs( title = "Penguin size, Palmer Station LTER", subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins", x = "Flipper length (mm)", y = "Body mass (g)", color = "Penguin species", shape = "Penguin species" ) + scale_color_manual ( values = c("darkorange", "purple", “cyan4"))

Slide 23

Slide 23 text

ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species), size = 3, alpha = 0.8) + labs( title = "Penguin size, Palmer Station LTER", subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins", x = "Flipper length (mm)", y = "Body mass (g)", color = "Penguin species", shape = "Penguin species" ) + scale_color_manual ( values = c("darkorange", "purple", “cyan4”)) + theme_minimal()

Slide 24

Slide 24 text

ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species), size = 3, alpha = 0.8) + labs( title = "Penguin size, Palmer Station LTER", subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins", x = "Flipper length (mm)", y = "Body mass (g)", color = "Penguin species", shape = "Penguin species" ) + scale_color_manual ( values = c("darkorange", "purple", “cyan4”)) + theme_minimal() + theme ( legend.position = c(0.2, 0.7) , legend.background = element_rect ( fill = "white", color = N A ) )

Slide 25

Slide 25 text

experiment_dat a #> # A tibble: 6 x 5 #> patient group bp_1 bp_2 bp_3 #> #> 1 1 treatment 120/80 135/93 125/90 #> 2 2 control 172/105 171/82 161/11 7 #> 3 3 treatment 140/89 133/92 121/86 #> 4 4 control 151/92 112/109 150/83 #> 5 5 treatment 175/93 173/90 120/11 8 #> 6 6 control 180/85 173/94 174/106 #> # A tibble: 18 x 5 #> patient group measurement systolic diastoli c #> #> 1 1 treatment 1 120 8 0 #> 2 1 treatment 2 135 9 3 #> 3 1 treatment 3 125 9 0 #> 4 2 control 1 172 10 5 #> 5 2 control 2 171 8 2 #> 6 2 control 3 161 11 7 #> # … with 12 more rows ex: tidyr

Slide 26

Slide 26 text

experiment_data %>% pivot_longer( cols = contains("bp"), names_to = "measurement", names_prefix = "bp_", values_to = "value " ) #> # A tibble: 18 x 4 #> patient group measurement value #> #> 1 1 treatment 1 120/80 #> 2 1 treatment 2 135/93 #> 3 1 treatment 3 125/90 #> 4 2 control 1 172/105 #> 5 2 control 2 171/82 #> 6 2 control 3 161/117 #> # … with 12 more rows experiment_dat a #> # A tibble: 6 x 5 #> patient group bp_1 bp_2 bp_3 #> #> 1 1 treatment 120/80 135/93 125/90 #> 2 2 control 172/105 171/82 161/11 7 #> 3 3 treatment 140/89 133/92 121/86 #> 4 4 control 151/92 112/109 150/83 #> 5 5 treatment 175/93 173/90 120/11 8 #> 6 6 control 180/85 173/94 174/106

Slide 27

Slide 27 text

experiment_data %>% pivot_longer( cols = contains("bp"), names_to = "measurement", names_prefix = "bp_", values_to = "value " ) %>% separate(value, into = c("systolic", "diastolic"), convert = TRUE) #> # A tibble: 18 x 5 #> patient group measurement systolic diastolic #> #> 1 1 treatment 1 120 80 #> 2 1 treatment 2 135 93 #> 3 1 treatment 3 125 90 #> 4 2 control 1 172 105 #> 5 2 control 2 171 82 #> 6 2 control 3 161 117 #> # … with 12 more rows #> # A tibble: 18 x 4 #> patient group measurement value #> #> 1 1 treatment 1 120/80 #> 2 1 treatment 2 135/93 #> 3 1 treatment 3 125/90 #> 4 2 control 1 172/10 5 #> 5 2 control 2 171/82 #> 6 2 control 3 161/11 7 #> # … with 12 more rows

Slide 28

Slide 28 text

modeling and machine learning with tidymodels

Slide 29

Slide 29 text

tidymodels collection of packages for modeling and machine learning using tidyverse principles parsnip: uni fi ed interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages recipes: tidy interface to data pre- processing tools for feature engineering rsample: ef fi cient resampling for estimation and model evaluation “many models” in a single data frame to avoid environment clutter and easy access with helper functions

Slide 30

Slide 30 text

a vast tidy ecosystem

Slide 31

Slide 31 text

laying out multiple plots gghighlight highlighting data in ggplots these are just some of my favourite packages! work with data pipelines work with ggplot2 layers pretty (complex) tables for PDF output data cleaning

Slide 32

Slide 32 text

share and communicate with rmarkdown

Slide 33

Slide 33 text

rmarkdown create computational documents that knit together text, code, results, and fi gures into polished outputs that are easy to read and share reproducible by default bookdown: and make them into books… xaringan: and make them into slides… blogdown / distill: and make them into websites… rticles: and make them into manuscripts… …

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

interact with shiny

Slide 36

Slide 36 text

minecr.shinyapps.io/penguins

Slide 37

Slide 37 text

calcat.covid19.ca.gov/cacovidmodels

Slide 38

Slide 38 text

version control and collaborate with git and github

Slide 39

Slide 39 text

Git xkcd.com/1597

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

GitHub web hosting for projects version controlled with Git collaboration and project management discoverability and publishing (with ghpages) where the technical side of the R community lives: look for code samples make feature requests contribute to packages

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

stay current and connected with #rstats community

Slide 46

Slide 46 text

ask (good) questions make reproducible examples make them as minimal as you can If asking publicly (RStudio Community, Stack Over fl ow, etc.) try to use data available in a package let reprex take care of checking for reproducibility and formatting for you!

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

community #rstats on Twitter R Weekly newsletter: rweekly.org TidyTuesday: github.com/rfordatascience/tidytuesday RLadies: rladies.org + community Slack useR groups: r-consortium.org/blog/2019/09/09/r-community-explorer-r-user- groups talk to each other (including your students!) about computing

Slide 51

Slide 51 text

resources lear n tidyverse: tidyverse.org/learn tidymodels: tidymodels.org/start rmarkdown: rmarkdown.rstudio.com/lesson-1.html RStudio visual editor: rstudio.github.io/visual-markdown-editing/# shiny: shiny.rstudio.com/tutorial Git and GitHub: happygitwithr.com teach: datasciencebox.org

Slide 52

Slide 52 text

toolkit for the modern statistician đź”— bit.ly/modern-toolkit mine-cetinkaya-rundel [email protected] @minebocek