Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pilgrim's Progress: a journey from confusion to contribution

Pilgrim's Progress: a journey from confusion to contribution

Talk given at Data Day Texas 2018
PDF with links and resources available at: http://bit.ly/mara-ddtx

Mara Averick

January 27, 2018
Tweet

More Decks by Mara Averick

Other Decks in Technology

Transcript

  1. D D T X 2 0 1 8 PILGRIM’S PROGRESS

    a journey from confusion to contribution h t t p : // b i t . l y /m a r a - d d t x
  2. D D T X 2 0 1 8 2 Mara

    Averick TIDYVERSE DEV ADVOCATE, RSTUDIO
  3. D D T X 2 0 1 8 2 Mara

    Averick TIDYVERSE DEV ADVOCATE, RSTUDIO Not a Real Data Scientist™
  4. D D T X 2 0 1 8 2 Mara

    Averick TIDYVERSE DEV ADVOCATE, RSTUDIO Not a Real Data Scientist™
  5. D D T X 2 0 1 8 2 Mara

    Averick TIDYVERSE DEV ADVOCATE, RSTUDIO Less true these days!
  6. An aside on the title 3 “Like many social groups

    that do not reproduce themselves biologically, the experimental particle physics community renews itself by training novices.” — Sharon Traweek, Pilgrim's Progress: Male Tales Told During a Life in Physics In Beamtimes and Lifetimes: The World of High Energy Physics. (1988). Cambridge, MA: Harvard University Press.
  7. my jouRney... 7 D D T X 2 0 1

    8 OMG I just learned a thing!! ! 100% selfish
  8. D D T X 2 0 1 8 but... 8

    D D T X 2 0 1 8
  9. D D T X 2 0 1 8 but... 8

    D D T X 2 0 1 8
  10. D D T X 2 0 1 8 things that

    are selfish things that are useful to other people but... 9 D D T X 2 0 1 8 things that are selfish things that are useful to other people FOSS happy place
  11. ex•o•ter•ic D D T X 2 0 1 8 10

    adj. understandable by outsiders or the general public
  12. ex•o•ter•ic D D T X 2 0 1 8 10

    adj. understandable by outsiders or the general public
  13. ex•o•ter•ic D D T X 2 0 1 8 10

    adj. understandable by outsiders or the general public
  14. D D T X 2 0 1 8 12 This

    Talk Will Not Cover
  15. D D T X 2 0 1 8 13 R

    - a computer language for scientists CC by RStudio R - A computer language for scientists Human thought Machine language C++ via Garrett Grolemund
  16. D D T X 2 0 1 8 The tidyverse

    is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. 15 What about this so-called tidyverse? source: https://www.tidyverse.org/ data structures R packages data science. design philosophy grammar
  17. D D T X 2 0 1 8 16 TIDY

    TOOLS Source: Wickham, Hadley. 2017-11-13. “The tidy tools manifesto.” https://cran.r-project.org/web/packages/tidyverse/vignettes/manifesto.html SIMPLE Do one thing and do it well. COMPOSABLE Combine with other functions for multi-step operations. Functions should be... DESIGNED FOR HUMANS Use evocative verb names, making them easy to remember.
  18. D D T X 2 0 1 8 17 filter(.data,

    …) Extract rows that meet logical criteria. Also filter_(). filter(iris, Sepal.Length > 7) top_n(x, n, wt) Select and order top n entries (by group if grouped data). top_n(iris, 5, Sepal.Width) FUNCTION EXAMPLES Data Transformation with dplyr cheat sheet. CC BY SA RStudio <https://www.rstudio.com/resources/cheatsheets/>
  19. D D T X 2 0 1 8 18 COMPOSE

    WITH THE PIPE Data Transformation with dplyr cheat sheet. CC BY SA RStudio <https://www.rstudio.com/resources/cheatsheets/> iris %>% filter(Sepal.Length > 7) %>% top_n(5, Sepal.Width)
  20. D D T X 2 0 1 8 18 COMPOSE

    WITH THE PIPE Data Transformation with dplyr cheat sheet. CC BY SA RStudio <https://www.rstudio.com/resources/cheatsheets/> iris %>% filter(Sepal.Length > 7) %>% top_n(5, Sepal.Width)
  21. D D T X 2 0 1 8 18 COMPOSE

    WITH THE PIPE Data Transformation with dplyr cheat sheet. CC BY SA RStudio <https://www.rstudio.com/resources/cheatsheets/> iris %>% filter(Sepal.Length > 7) %>% top_n(5, Sepal.Width)
  22. D D T X 2 0 1 8 19 A

    B C A B C TIDY DATA Source: Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10): 1–23. doi:http://dx.doi.org/10.18637/jss.v059.i10. VARIABLES IN COLUMNS OBSERVATIONS IN ROWS & VALUES IN CELLS
  23. Tidy Import Visualise Transform Model Communicate Program tibble tidyr purrr

    magrittr dplyr forcats hms ggplot2 broom modelr readr readxl haven xml2 shiny rmarkdown lubridate stringr Source: Hadley Wickham
  24. D D T X 2 0 1 8 24 TIDYR

    a set of verbs that help you get to tidy data, allowing you to work with other tidyverse packages and store results TIDYVERSE PACKAGES THE CORE TIDYVERSE a wrapper package that makes it easy to install and load core packages from the tidyverse in a single command READR a fast and friendly way to read in and parse rectangular data (like csv, tsv, and fwf)
  25. D D T X 2 0 1 8 25 TIBBLE

    DPLYR GGPLOT2 a modern reimagining of data.frames that do less and complain more forcing you to confront problems earlier a grammar of data manipulation with a set of verbs to solve common data wrangling problems a system for declaratively creating graphics, based on The Grammar of Graphics TIDYVERSE PACKAGES THE CORE
  26. D D T X 2 0 1 8 26 STRINGR

    FORCATS PURRR a cohesive set of functions designed to make working with strings as easy as possible a suite of useful tools that solve common problems with factors, which R uses to handle categorical variables a consistent toolkit for enhancing R’s functional programming, and working with functions and vectors TIDYVERSE PACKAGES THE CORE
  27. D D T X 2 0 1 8 27 MAGRITTR

    READXL LUBRIDATE offers a set of operators (e.g. %>%) which make code more readable by structuring sequences of operations makes it easy to get data out of Excel and into R, and work with tabular data in R provides robust methods for working with date-times in R, and functionality not offered in base R TIDYVERSE PACKAGES SOME NON-CORE
  28. D D T X 2 0 1 8 28 HMS

    BROOM HAVEN provides a simple class for storing durations or time-of-day values takes untidy model outputs of predictions and estimations to the tidy data we want to work with enables R to read and write various data formats used by other statistical packages TIDYVERSE PACKAGES SOME MORE NON-CORE
  29. D D T X 2 0 1 8 29 GOOGLEDRIVE

    RMARKDOWN SHINY allows you to interact with files on Google Drive from R an authoring framework for data science that allows you to combine prose, code, and output makes it easy to build interactive web apps straight from R TIDYVERSE PACKAGES SOME MORE NON-CORE
  30. C O N T R I B U T I

    N G T O T H E tidyverse
  31. photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS Pintscher,

    Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.
  32. photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS WHAT

    HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.
  33. photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS •

    “I can't write code.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.
  34. photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS •

    “I can't write code.” • “I'm not really good at this.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.
  35. photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS •

    “I can't write code.” • “I'm not really good at this.” • “I'd just be a burden.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.
  36. photo cred: Sail Fish Scuba https://sailfishscuba.com/manowar/ Contributing to FOSS •

    “I can't write code.” • “I'm not really good at this.” • “I'd just be a burden.” • “They already have enough people smarter than me.” WHAT HOLDS PEOPLE BACK? Pintscher, Lydia, Ed. 2012. Open Advice: Foss: What We Wish We Had Known When We Started.
  37. D D T X 2 0 1 8 Ask questions

    33 The most useless problem statement that one can face is “it doesn’t work”, yet we seem to get it far too often. – Thiago Maciera Maciera, Thiago. 2012. “The Art of Problem Solving.” In Open Advice: FOSS: What We Wish We Had Known When We Started, edited by Lydia Pintscher, 55–61.
  38. The newcomer's paradox... When you ask for help, some friendly

    soul will no doubt tell you that “it’s easy, just do foo, bar and baz.” Except for you, it is not easy, there may be no documentation for foo, bar is not doing what it is supposed to be doing and what is this baz thing anyway with its eight disambiguation entries on Wikipedia? — Leslie Hawthorne 34 “You’ll Eventually Know Everything They’ve Forgotten.” In Open Advice: FOSS: What We Wish We Had Known When We Started, edited by Lydia Pintscher, 29–32.
  39. D D T X 2 0 1 8 the magic

    of reprex reproducible example
  40. D D T X 2 0 1 8 Keys to

    reprex-cellence 39 ✓ Code that actually runs ✓ Code that doesn't have to be run ✓ Code that can be easily run Source: Jenny Bryan, 2017. "reprex: the package, the point." https://speakerdeck.com/jennybc/reprex-help-me-help-you
  41. D D T X 2 0 1 8 Nailing those

    reprexes? • Help others ask questions. • Answer questions. • Write about it. 41
  42. D D T X 2 0 1 8 File issues

    43 Remember, behind every octocat there is an actual human…
  43. D D T X 2 0 1 8 44 PROBLEM

    DESCRIPTION REPREX EXPECTED BEHAVIOUR the anatomy of an issue $ % &
  44. D D T X 2 0 1 8 Contribute documentation

    45 “Innocence lost is not easily regained. The designer simply cannot predict the problems people will have, the misinterpretations that will arise, and the errors that will get made.” — Donald Norman, The Design of Everyday Things Hawthorn, Leslie. 2012. “You’ll Eventually Know Everything They’ve Forgotten.” In Open Advice: FOSS: What We Wish We Had Known When We Started2, edited by Lydia Pintscher, 29–32.
  45. D D T X 2 0 1 8 every contribution

    counts... 47 PULL REQUESTS ISSUES COMMENTS
  46. D D T X 2 0 1 8 Send me

    a pull request You have a typo in your documentation Can you fix it? Learn Git No Yes Adapted from: You Do Not Need to Tell Me I Have A Typo in My Documentation by Yihui Xie tpyos 48 D D T X 2 0 1 8
  47. D D T X 2 0 1 8 Send me

    a pull request You have a typo in your documentation Can you fix it? Learn Git No Yes Adapted from: You Do Not Need to Tell Me I Have A Typo in My Documentation by Yihui Xie 48 typos D D T X 2 0 1 8
  48. See Typo On Pkgdown Site Ignore Write strongly- worded letter

    Go to GitHub File an issue Look in folders Search repo 49
  49. D D T X 2 0 1 8 Go to

    the source... 52
  50. D D T X 2 0 1 8 54 ROXYGEN2

    DEVTOOLS TESTTHAT generates documentation from specially-formatted comments, used by all tidyverse packages makes package development easier by providing R functions that simplify common tasks provides functions that make it easy to create unit tests for R packages, used throughout tidyverse BABY STEPS WITH MORE PACKAGES roxygen2: In-Line Documentation for R. R package v 6.0.1. by Hadley Wickham, Peter Danenberg and Manuel Eugster (2017). https://CRAN.R-project.org/package=roxygen2
  51. D D T X 2 0 1 8 55 GET

    THE PULSE OF A PROJECT WATCH THE REPO READ THE CODE DISCUSS YOUR IDEAS Hints for happy contributing in the tidyverse ' ( & )
  52. D D T X 2 0 1 8 leaRn out

    loud 56 OMG, I just learned a thing!
  53. Thank You D D T X 2 0 1 8

    59 http://bit.ly/mara-ddtx
  54. Thank You D D T X 2 0 1 8

    59 http://bit.ly/mara-ddtx