July 12, 2018
4.2k

# Should all statistics students  be programmers?

A presentation at ICOTS 10 (Kyoto, Japan)

July 12, 2018

## Transcript

students  be programmers? July 2018 No!

all statistics students  program? Yes!

4. ### What should a statistics student be able to do? Tidy

Surprises, but doesn't scale Create new variables & new summaries Visualise Transform Model Communicate Scales, but doesn't (fundamentally) surprise Automate Store data consistently Import Understand

7. ### 1. Code is text 2. Code is read-able 3. Code

is shareable 4. Code is open Why is programming preferable for statistics?
8. ### Code is text And this provides for two   extremely

powerful techniques

14. ### library(tidycensus) geo <- get_acs( geography = "metropolitan statistical area...", variables

= "DP03_0021PE", summary_var = "B01003_001", survey = "acs1", endyear = 2016 ) # Thanks to Kyle Walker (@kyle_e_walker) # For package and example A small example
15. ### big_metro <- geo %>% filter(summary_est > 2e6) %>% select(-variable) %>%

mutate( NAME = gsub(" Metro Area", "", NAME) ) %>% separate(NAME, c("city", "state"), ", ") %>% mutate( city = str_extract(city, "^[A-Za-z ]+"), state = str_extract(state, "^[A-Za-z ]+"), name = paste0(city, ", ", state), summary_moe = na_if(summary_moe, -555555555) ) Followed by data munging
16. ### big_metro %>% ggplot(aes( x = estimate, y = reorder(name, estimate))

) + geom_errorbarh( aes( xmin = estimate - moe, xmax = estimate + moe ), width = 0.1 ) + geom_point(color = "navy")
17. ### • • • • • • • • • •

• • • • • • • • • • • • • • • • • • • • • • • • • Indianapolis, IN Kansas City, MO Riverside, CA Charlotte, NC Dallas, TX Tampa, FL Detroit, MI Columbus, OH Phoenix, AZ Cincinnati, OH Houston, TX Orlando, FL Sacramento, CA Austin, TX San Antonio, TX San Juan, PR St, MO San Diego, CA Atlanta, GA Cleveland, OH Las Vegas, NV Miami, FL Denver, CO Minneapolis, MN Los Angeles, CA Pittsburgh, PA Baltimore, MD Portland, OR Philadelphia, PA Seattle, WA Chicago, IL Boston, MA Washington, DC San Francisco, CA New York, NY 0 10 20 30 estimate reorder(name, estimate)
18. ### library(tidyverse) library(magick) dir(pattern = ".png") %>% map(image_read) %>% image_join() %>%

image_animate(fps = 1, loop = 25) %>% image_write("my_animation.gif") And hence you can read unfamiliar code https://twitter.com/ricardokriebel/status/849626401611411458 What does this code do?

22. ### Why is sharing so important? Learn from others Open science

Easily critique

24. ### All modern programming languages are open source Free Students can

use same tools as practitioners.  Anyone can use best tools regardless of wealth.  Anyone can re-run your analysis You can ﬁx problems  You can build your own tools Fluid

26. ### 1. Code is text 2. Code is read-able 3. Code

is shareable 4. Code is open Why is programming preferable for statistics?