Slide 1

Slide 1 text

D D T X 2 0 1 8 PILGRIM’S PROGRESS a journey from confusion to contribution h t t p : // b i t . l y /m a r a - d d t x

Slide 2

Slide 2 text

D D T X 2 0 1 8 2 Mara Averick TIDYVERSE DEV ADVOCATE, RSTUDIO

Slide 3

Slide 3 text

D D T X 2 0 1 8 2 Mara Averick TIDYVERSE DEV ADVOCATE, RSTUDIO Not a Real Data Scientist™

Slide 4

Slide 4 text

D D T X 2 0 1 8 2 Mara Averick TIDYVERSE DEV ADVOCATE, RSTUDIO Not a Real Data Scientist™

Slide 5

Slide 5 text

D D T X 2 0 1 8 2 Mara Averick TIDYVERSE DEV ADVOCATE, RSTUDIO Less true these days!

Slide 6

Slide 6 text

An aside on the title 3 “Like many social groups that do not reproduce themselves biologically, the experimental particle physics community renews itself by training novices.” — Sharon Traweek, Pilgrim's Progress: Male Tales Told During a Life in Physics In Beamtimes and Lifetimes: The World of High Energy Physics. (1988). Cambridge, MA: Harvard University Press.

Slide 7

Slide 7 text

SCIENCE & SOCIETY D D T X 2 0 1 8

Slide 8

Slide 8 text

An aside on the title 5

Slide 9

Slide 9 text

An aside on the title 5 confusion

Slide 10

Slide 10 text

my jouRney... 6 D D T X 2 0 1 8

Slide 11

Slide 11 text

my jouRney... 6 D D T X 2 0 1 8

Slide 12

Slide 12 text

my jouRney... 7 D D T X 2 0 1 8 OMG I just learned a thing!! ! 100% selfish

Slide 13

Slide 13 text

D D T X 2 0 1 8 but... 8 D D T X 2 0 1 8

Slide 14

Slide 14 text

D D T X 2 0 1 8 but... 8 D D T X 2 0 1 8

Slide 15

Slide 15 text

D D T X 2 0 1 8 things that are selfish things that are useful to other people but... 9 D D T X 2 0 1 8 things that are selfish things that are useful to other people FOSS happy place

Slide 16

Slide 16 text

ex•o•ter•ic D D T X 2 0 1 8 10 adj. understandable by outsiders or the general public

Slide 17

Slide 17 text

ex•o•ter•ic D D T X 2 0 1 8 10 adj. understandable by outsiders or the general public

Slide 18

Slide 18 text

ex•o•ter•ic D D T X 2 0 1 8 10 adj. understandable by outsiders or the general public

Slide 19

Slide 19 text

you never know... D D T X 2 0 1 8 11

Slide 20

Slide 20 text

you never know... D D T X 2 0 1 8 11

Slide 21

Slide 21 text

D D T X 2 0 1 8 12 This Talk Will Not Cover

Slide 22

Slide 22 text

D D T X 2 0 1 8 13 R - a computer language for scientists CC by RStudio R - A computer language for scientists Human thought Machine language C++ via Garrett Grolemund

Slide 23

Slide 23 text

D D T X 2 0 1 8 14

Slide 24

Slide 24 text

D D T X 2 0 1 8 The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. 15 What about this so-called tidyverse? source: https://www.tidyverse.org/ data structures R packages data science. design philosophy grammar

Slide 25

Slide 25 text

D D T X 2 0 1 8 16 TIDY TOOLS Source: Wickham, Hadley. 2017-11-13. “The tidy tools manifesto.” https://cran.r-project.org/web/packages/tidyverse/vignettes/manifesto.html SIMPLE Do one thing and do it well. COMPOSABLE Combine with other functions for multi-step operations. Functions should be... DESIGNED FOR HUMANS Use evocative verb names, making them easy to remember.

Slide 26

Slide 26 text

D D T X 2 0 1 8 17 filter(.data, …) Extract rows that meet logical criteria. Also filter_(). filter(iris, Sepal.Length > 7) top_n(x, n, wt) Select and order top n entries (by group if grouped data). top_n(iris, 5, Sepal.Width) FUNCTION EXAMPLES Data Transformation with dplyr cheat sheet. CC BY SA RStudio

Slide 27

Slide 27 text

D D T X 2 0 1 8 18 COMPOSE WITH THE PIPE Data Transformation with dplyr cheat sheet. CC BY SA RStudio iris %>% filter(Sepal.Length > 7) %>% top_n(5, Sepal.Width)

Slide 28

Slide 28 text

D D T X 2 0 1 8 18 COMPOSE WITH THE PIPE Data Transformation with dplyr cheat sheet. CC BY SA RStudio iris %>% filter(Sepal.Length > 7) %>% top_n(5, Sepal.Width)

Slide 29

Slide 29 text

D D T X 2 0 1 8 18 COMPOSE WITH THE PIPE Data Transformation with dplyr cheat sheet. CC BY SA RStudio iris %>% filter(Sepal.Length > 7) %>% top_n(5, Sepal.Width)

Slide 30

Slide 30 text

D D T X 2 0 1 8 19 A B C A B C TIDY DATA Source: Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10): 1–23. doi:http://dx.doi.org/10.18637/jss.v059.i10. VARIABLES IN COLUMNS OBSERVATIONS IN ROWS & VALUES IN CELLS

Slide 31

Slide 31 text

Tidy Import Visualise Transform Model Communicate Program tibble tidyr purrr magrittr dplyr forcats hms ggplot2 broom modelr readr readxl haven xml2 shiny rmarkdown lubridate stringr Source: Hadley Wickham

Slide 32

Slide 32 text

D D T X 2 0 1 8 22 install.packages("tidyverse")

Slide 33

Slide 33 text

D D T X 2 0 1 8 23 library(tidyverse)

Slide 34

Slide 34 text

D D T X 2 0 1 8 24 TIDYR a set of verbs that help you get to tidy data, allowing you to work with other tidyverse packages and store results TIDYVERSE PACKAGES THE CORE TIDYVERSE a wrapper package that makes it easy to install and load core packages from the tidyverse in a single command READR a fast and friendly way to read in and parse rectangular data (like csv, tsv, and fwf)

Slide 35

Slide 35 text

D D T X 2 0 1 8 25 TIBBLE DPLYR GGPLOT2 a modern reimagining of data.frames that do less and complain more forcing you to confront problems earlier a grammar of data manipulation with a set of verbs to solve common data wrangling problems a system for declaratively creating graphics, based on The Grammar of Graphics TIDYVERSE PACKAGES THE CORE

Slide 36

Slide 36 text

D D T X 2 0 1 8 26 STRINGR FORCATS PURRR a cohesive set of functions designed to make working with strings as easy as possible a suite of useful tools that solve common problems with factors, which R uses to handle categorical variables a consistent toolkit for enhancing R’s functional programming, and working with functions and vectors TIDYVERSE PACKAGES THE CORE

Slide 37

Slide 37 text

D D T X 2 0 1 8 27 MAGRITTR READXL LUBRIDATE offers a set of operators (e.g. %>%) which make code more readable by structuring sequences of operations makes it easy to get data out of Excel and into R, and work with tabular data in R provides robust methods for working with date-times in R, and functionality not offered in base R TIDYVERSE PACKAGES SOME NON-CORE

Slide 38

Slide 38 text

D D T X 2 0 1 8 28 HMS BROOM HAVEN provides a simple class for storing durations or time-of-day values takes untidy model outputs of predictions and estimations to the tidy data we want to work with enables R to read and write various data formats used by other statistical packages TIDYVERSE PACKAGES SOME MORE NON-CORE

Slide 39

Slide 39 text

D D T X 2 0 1 8 29 GOOGLEDRIVE RMARKDOWN SHINY allows you to interact with files on Google Drive from R an authoring framework for data science that allows you to combine prose, code, and output makes it easy to build interactive web apps straight from R TIDYVERSE PACKAGES SOME MORE NON-CORE

Slide 40

Slide 40 text

C O N T R I B U T I N G T O T H E tidyverse

Slide 41

Slide 41 text

photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.

Slide 42

Slide 42 text

photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.

Slide 43

Slide 43 text

photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS • “I can't write code.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.

Slide 44

Slide 44 text

photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS • “I can't write code.” • “I'm not really good at this.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.

Slide 45

Slide 45 text

photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS • “I can't write code.” • “I'm not really good at this.” • “I'd just be a burden.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.

Slide 46

Slide 46 text

photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS • “I can't write code.” • “I'm not really good at this.” • “I'd just be a burden.” • “They already have enough people smarter than me.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.

Slide 47

Slide 47 text

Luckily... 32 FOSS happy place CONTRIBUTION CONFUSION

Slide 48

Slide 48 text

Luckily... 32 FOSS happy place CONTRIBUTION CONFUSION Otherwise I'd be out of a job

Slide 49

Slide 49 text

D D T X 2 0 1 8 Ask questions 33 The most useless problem statement that one can face is “it doesn’t work”, yet we seem to get it far too often. – Thiago Maciera Maciera, Thiago. 2012. “The Art of Problem Solving.” In Open Advice: FOSS: What We Wish We Had Known When We Started, edited by Lydia Pintscher, 55–61.

Slide 50

Slide 50 text

The newcomer's paradox... When you ask for help, some friendly soul will no doubt tell you that “it’s easy, just do foo, bar and baz.” Except for you, it is not easy, there may be no documentation for foo, bar is not doing what it is supposed to be doing and what is this baz thing anyway with its eight disambiguation entries on Wikipedia? — Leslie Hawthorne 34 “You’ll Eventually Know Everything They’ve Forgotten.” In Open Advice: FOSS: What We Wish We Had Known When We Started, edited by Lydia Pintscher, 29–32.

Slide 51

Slide 51 text

D D T X 2 0 1 8 35 https://animagraffs.com/how-a-car-engine-works/

Slide 52

Slide 52 text

D D T X 2 0 1 8 35

Slide 53

Slide 53 text

D D T X 2 0 1 8 35

Slide 54

Slide 54 text

where to ask 36 StackOverflow ! Twitter " RStudio Community

Slide 55

Slide 55 text

D D T X 2 0 1 8 the magic of reprex reproducible example

Slide 56

Slide 56 text

D D T X 2 0 1 8 reprex raison d’être 38

Slide 57

Slide 57 text

D D T X 2 0 1 8 Keys to reprex-cellence 39 ✓ Code that actually runs ✓ Code that doesn't have to be run ✓ Code that can be easily run Source: Jenny Bryan, 2017. "reprex: the package, the point." https://speakerdeck.com/jennybc/reprex-help-me-help-you

Slide 58

Slide 58 text

Live demo? 40 Source: Nick Tierney. "Magic reprex." 2017-01-11

Slide 59

Slide 59 text

D D T X 2 0 1 8 Nailing those reprexes? • Help others ask questions. • Answer questions. • Write about it. 41

Slide 60

Slide 60 text

RStudio Community where to answer 42 StackOverflow GitHub # ! #

Slide 61

Slide 61 text

D D T X 2 0 1 8 File issues 43 Remember, behind every octocat there is an actual human…

Slide 62

Slide 62 text

D D T X 2 0 1 8 44 PROBLEM DESCRIPTION REPREX EXPECTED BEHAVIOUR the anatomy of an issue $ % &

Slide 63

Slide 63 text

D D T X 2 0 1 8 Contribute documentation 45 “Innocence lost is not easily regained. The designer simply cannot predict the problems people will have, the misinterpretations that will arise, and the errors that will get made.” — Donald Norman, The Design of Everyday Things Hawthorn, Leslie. 2012. “You’ll Eventually Know Everything They’ve Forgotten.” In Open Advice: FOSS: What We Wish We Had Known When We Started2, edited by Lydia Pintscher, 29–32.

Slide 64

Slide 64 text

My first “contribution” 46

Slide 65

Slide 65 text

My first “contribution” 46

Slide 66

Slide 66 text

My first “contribution” 46

Slide 67

Slide 67 text

D D T X 2 0 1 8 every contribution counts... 47 PULL REQUESTS ISSUES COMMENTS

Slide 68

Slide 68 text

D D T X 2 0 1 8 Send me a pull request You have a typo in your documentation Can you fix it? Learn Git No Yes Adapted from: You Do Not Need to Tell Me I Have A Typo in My Documentation by Yihui Xie tpyos 48 D D T X 2 0 1 8

Slide 69

Slide 69 text

D D T X 2 0 1 8 Send me a pull request You have a typo in your documentation Can you fix it? Learn Git No Yes Adapted from: You Do Not Need to Tell Me I Have A Typo in My Documentation by Yihui Xie 48 typos D D T X 2 0 1 8

Slide 70

Slide 70 text

See Typo On Pkgdown Site Ignore Write strongly- worded letter Go to GitHub File an issue Look in folders Search repo 49

Slide 71

Slide 71 text

Source code?! 50

Slide 72

Slide 72 text

Source code?! 50

Slide 73

Slide 73 text

D D T X 2 0 1 8 51

Slide 74

Slide 74 text

D D T X 2 0 1 8 Go to the source... 52

Slide 75

Slide 75 text

D D T X 2 0 1 8 require(n00bs) 53

Slide 76

Slide 76 text

D D T X 2 0 1 8 require(n00bs) 53

Slide 77

Slide 77 text

D D T X 2 0 1 8 54 ROXYGEN2 DEVTOOLS TESTTHAT generates documentation from specially-formatted comments, used by all tidyverse packages makes package development easier by providing R functions that simplify common tasks provides functions that make it easy to create unit tests for R packages, used throughout tidyverse BABY STEPS WITH MORE PACKAGES roxygen2: In-Line Documentation for R. R package v 6.0.1. by Hadley Wickham, Peter Danenberg and Manuel Eugster (2017). https://CRAN.R-project.org/package=roxygen2

Slide 78

Slide 78 text

D D T X 2 0 1 8 55 GET THE PULSE OF A PROJECT WATCH THE REPO READ THE CODE DISCUSS YOUR IDEAS Hints for happy contributing in the tidyverse ' ( & )

Slide 79

Slide 79 text

D D T X 2 0 1 8 leaRn out loud 56 OMG, I just learned a thing!

Slide 80

Slide 80 text

D D T X 2 0 1 8 give back 57

Slide 81

Slide 81 text

embrace it... 58 FOSS happy place CONTRIBUTION CONFUSION "

Slide 82

Slide 82 text

Thank You D D T X 2 0 1 8 59 http://bit.ly/mara-ddtx

Slide 83

Slide 83 text

Thank You D D T X 2 0 1 8 59 http://bit.ly/mara-ddtx