Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a package that lasts — eRum 2018 workshop

Building a package that lasts — eRum 2018 workshop

Colin Fay

May 15, 2018
Tweet

More Decks by Colin Fay

Other Decks in Technology

Transcript

  1. Building a package that lasts Colin FAY - eRum 2018

    Colin Fay, ThinkR - http://thinkr.fr 1 / 35
  2. What are we going to talk about today? 13h30- 14h00:

    Introduction & Package init 14h00 - 14h30: Functions and documentation 14h30 - 15h00: Dependencies Coffee Break: 15h - 15h30 15h30 - 16h00: Optimisation 16h00 - 16h30: Testing 16h30 - 16h45: Continuous integration 16h45 - 17h00: Conclusion Colin Fay, ThinkR - http://thinkr.fr 2 / 35
  3. Colin Fay - ThinkR French agency of R experts, focused

    on everything related to R. Training Dev & Infrastructure Consulting $whoami Colin Fay, ThinkR - http://thinkr.fr 4 / 35
  4. What is a package? In R, the fundamental unit of

    portable code is called a package. A package is a more or less a combination of: code data documentation tests You can put other content, but we won't cover it today. Colin Fay, ThinkR - http://thinkr.fr 7 / 35
  5. Package structure DESCRIPTION : the metadata of your package NAMESPACE

    : how your package interacts with R and with other packages R/ : the code man/ : the documentation inst/ : content that will be put in your package folder after installation data/ : data data-raw/ : a folder with content that will be ignored on build tests : the tests vignettes : the vignettes .Rbuildignore : a description of what will be ignored when the package is built ... Colin Fay, ThinkR - http://thinkr.fr 8 / 35
  6. About .Rbuildignore The .Rbuildignore file is used to tell R

    what to ignore when building the package. The name of the content to ignore can be written in full, or match a regex. For example: ^.*\.Rproj$ ^\.Rproj\.user$ ^README\.Rmd$ ^README-.*\.png$ .travis.yml ^CONDUCT\.md$ ^data-raw$ ^cran-comments\.md$ paper\..* ^revdep$ ^docs$ Colin Fay, ThinkR - http://thinkr.fr 9 / 35
  7. What is a "package that lasts"? Reproducible and automated package

    Make a package you will be able to develop Make a package you will be able to maintain The UX of a package: taking a user first approach Make a package people will use Make a package people will effectively use Colin Fay, ThinkR - http://thinkr.fr 10 / 35
  8. Make a package you will be able to develop Automate

    everything you can automate: on the long run, it will prevent mistakes. Don't lose your breathe on what can be automated. Make a package you will be able to maintain Create a package you can come back to in two years without having to start everything over. Prevent your package from failing when it is released. Colin Fay, ThinkR - http://thinkr.fr 11 / 35
  9. Make a package people will use Take a UX-first approach:

    it should be as easy as possible to start using your package. The simpler and clearer your package is for the user, the better. Make a package people will e ectively use Use meaningful package and function names. Create useful, easy to understand, and complete documentation. That will prevent from "issues overload". Colin Fay, ThinkR - http://thinkr.fr 13 / 35
  10. Some of the tools we'll use RStudio {devtools} - https://github.com/r-lib/devtools

    {desc} - https://github.com/r-lib/desc {usethis} - https://github.com/r-lib/usethis {roxygen2} - https://github.com/klutometis/roxygen {testthat} - https://github.com/r-lib/testthat Rtools.exe (if you're on window) : https://cran.r-project.org/bin/windows/Rtools/ r-base-dev on Unix Colin Fay, ThinkR - http://thinkr.fr 14 / 35
  11. Building a package that lasts Part 1: init Colin Fay,

    ThinkR - http://thinkr.fr 15 / 35
  12. A package is like a house: it won't last without

    solid fundations. Colin Fay, ThinkR - http://thinkr.fr 16 / 35
  13. Before anything... find a good name! Some tips & conventions

    : If Open Source, prefer a name that is easy to find on Google. Find a name that is unique, and that describes well what the package does: for example, {testthat} allows to "test that X", and with {usetthis} we "use this X". Placing an r on a word can make a good package name: for example {stringr} allows to manipulate strings in R. The name can only contain letters, numbers, and dots The name must begin with a letter, and must not end with a period. Good practices : Avoid capitalisation, to facilitate memorisation and typing Avoid dots, to prevent confusion with S3 methods Colin Fay, ThinkR - http://thinkr.fr 17 / 35
  14. Automation, automation, automation "Anything that can be automated, should be

    automated. Do as little as possible by hand". H. Wickham, "R Packages" Package developers automate not out of laziness, but out of security. => Once an automated process works, it will always work (well, in theory). => If you lose your computer in a train, you should be able to redo everything that you did. Colin Fay, ThinkR - http://thinkr.fr 19 / 35
  15. Automate your package creation Create your package with: devtools::create("plop") This

    will create a basic package skeleton. You can create a list of options in options("devtools.desc") to prefill your DESCRIPTION. You can also use the {desc} package, that we will see in a few slides. Colin Fay, ThinkR - http://thinkr.fr 21 / 35
  16. Where do I put my dev script? Using {usethis} and

    data-raw library(usethis) use_data_raw() -> Creates a "data-raw" folder, and everything in there will be ignored at build. file.create("data-raw/devstuffs.R") -> Create a "devstuffs.R" (you can use any other name) to keep everything you did during the engineering process. Do this for reproducibility Colin Fay, ThinkR - http://thinkr.fr 22 / 35
  17. {desc} {desc} is a package that helps you create and

    configure your DESCRIPTION files # Remove default DESC unlink("DESCRIPTION") # Create a new description object my_desc <- desc::description$new("!new") # Set your package name my_desc$set("Package", "dockerfiler") #Set your name my_desc$set("Authors@R", "person('Colin', 'Fay', email = '[email protected]', role = c('cre', 'aut'))") # Remove some author fields my_desc$del("Maintainer") Colin Fay, ThinkR - http://thinkr.fr 23 / 35
  18. {desc} # Set the version my_desc$set_version("0.0.0.9000") # The title of

    your package my_desc$set(Title = "Easy Dockerfile Creation from R") # The description of your package my_desc$set(Description = "Create a Dockerfile.") # The urls my_desc$set("URL", "https://github.com/ColinFay/dockerfiler") my_desc$set("BugReports", "https://github.com/ColinFay/dockerfiler/issues") # Save everyting my_desc$write(file = "DESCRIPTION") Colin Fay, ThinkR - http://thinkr.fr 24 / 35
  19. {desc} Other {desc} methods: my_desc$add_author() my_desc$add_remotes() my_desc$add_to_collate() my_desc$bump_version() my_desc$del_* (author,

    collate, dep, remote...) my_desc$get_* (author, deps, urls...) my_desc$set_* (authors, deps, urls...) my_desc$normalize() my_desc$to_latex() Colin Fay, ThinkR - http://thinkr.fr 25 / 35
  20. About package number Choosing your version number The version number

    reads major.minor.patch, where major is a major release, minor a minor, and patch a bug fix. Good practice Until the first stable version of the package is released, the version number should be 0.0.0.9000. This allows to increment 0001 at each new stage of the project with ease, without getting stuck, and also to clearly notify that the package is still in the development phase. Colin Fay, ThinkR - http://thinkr.fr 26 / 35
  21. The license Dependencies The README The connection to git The

    NEWS file The data-raw folder The use of external data More about {usethis} {usethis} is a package designed to automate the implementation of package elements. For example : In the console, you will find messages with the following nomenclature : ✅: {usethis} did all the work : some tasks remain to be done Colin Fay, ThinkR - http://thinkr.fr 27 / 35
  22. Development with {usethis} All the functions that start with use_

    allow you to use a template and/or place the right thing in the right place. use_build_ignore(file) : create a regular expression from a file name and add it to .Rbuildignore use_data and use_data_raw : the first transforms a data set into a .Rdata, then places it in the data/ folder. The second creates the data-raw folder. use_description : creates the DESCRIPTION file (used only if you don't use the RStudio package creation interface, the {desc} package or the devtools::create function). use_package : adds the package as Imports in the DESCRIPTION, use_dev_package adds a dependency in the Remote field. use_git, use_github, use_github_labels, use_github_links, use_git_hook, use_git_ignore : interaction with Git and Github. Colin Fay, ThinkR - http://thinkr.fr 28 / 35
  23. Development with {usethis} use_pipe : import %>% from {magrittr}. use_rcpp

    : if your package uses Rcpp. use_testthat : creates the testthat folder. use_vignette : creates a Vignette template. use_revdep : creates documents for reverse dependencies. use_appveyor, use_travis: for continuous integration. use_coverage: for test coverage Colin Fay, ThinkR - http://thinkr.fr 29 / 35
  24. Development with {usethis} use_tidy_description: puts the DESCRIPTION fields in a

    standard order and sorts the dependencies in alphabetical order. use_tidy_eval: functions for tidyeval. use_tidy_versions: adds to all dependencies the restriction to at least the version installed on the machine. use_apl2_license, use_cc0_license, use_gpl3_license, use_mit_license: licenses. Colin Fay, ThinkR - http://thinkr.fr 30 / 35
  25. Development with {usethis} use_code_of_conduct: integrates a Code of Conducts file

    use_cran_badge, use_depsy_badge: create a CRAN badge with http://www.r- pkg.org, a Depsy badge with http://depsy.org use_cran_comments: create a comments file before submitting to CRAN use_lifecycle_badge : allows to indicate in the README the "state of development" of the package : Experimental, Maturing, Dormant, Stable, Questioning, Retired, Archived. use_news_md : create a NEWS file use_pkgdown : create a pkgdown use_readme_rmd and use_readme_md : create the README file in the corresponding format Colin Fay, ThinkR - http://thinkr.fr 31 / 35
  26. Development with {usethis} use_news_md() use_readme_rmd() use_mit_license(name = "Colin FAY") use_code_of_conduct()

    use_lifecycle_badge("Experimental") use_testthat() use_test("R6") use_package("attempt") use_vignette("dockerfiler") use_travis() use_appveyor() use_coverage() use_tidy_description() Colin Fay, ThinkR - http://thinkr.fr 32 / 35
  27. Automate startup init_data_raw <- function(name = "devstuffs"){ stop_if_not(name, is.character, "Please

    use a character vector") use_data_raw() file.create(glue("data-raw/{name}.R")) file.edit("data-raw/devstuffs.R") } init_docs <- function(name = "Colin FAY"){ stop_if_not(name, is.character, "Please use a character vector") use_mit_license(name) use_readme_rmd() use_news_md() use_testthat() } Colin Fay, ThinkR - http://thinkr.fr 33 / 35
  28. Automate startup fill_desc <- function(name, Title, Description, repo){ unlink("DESCRIPTION") my_desc

    <- description$new("!new") my_desc$set("Package", name) my_desc$set("Authors@R", "person('Colin', 'Fay', email = '[email protected]', role = c('cre', 'aut'))") my_desc$del("Maintainer") my_desc$set_version("0.0.0.9000") my_desc$set(Title = Title) my_desc$set(Description = Description) my_desc$set("URL", glue("https://github.com/ColinFay/{repo}")) my_desc$set("BugReports", glue("https://github.com/ColinFay/{repo}/issues")) my_desc$write(file = "DESCRIPTION") } Colin Fay, ThinkR - http://thinkr.fr 34 / 35
  29. A "package-ready" R function In the R folder are what

    we can call "nice" functions, i.e. functions that : Do not use library() or require(): dependencies are managed in the NAMESPACE file. Do not change user options() or par(). Do not use source() to call code. Do not play with setwd(). Do not silently write in any other place than a temp file. Colin Fay, ThinkR - http://thinkr.fr 2 / 32
  30. Arranging your R folder You can organize your functions in

    as many .R files as you want. There is no perfect number, but consider that each .R contains one "big" function (and its methods if necessary) or one family of functions. You can your .R in family, with an accronyme at the beginning (family-*.R). Group the "utility" functions in a utils.R file. Beware of capital letters and non alphanumeric characters in file name, which can give surprising results if you change operating system. Note that the R/ folder must not contain subfolders. Colin Fay, ThinkR - http://thinkr.fr 3 / 32
  31. Package UX is about as important as the code behind.

    Colin Fay, ThinkR - http://thinkr.fr 4 / 32
  32. Looking at your own code written more than 6 months

    ago is like looking at someone else code. Comment your functions - for the dev Colin Fay, ThinkR - http://thinkr.fr 6 / 32
  33. Doc is not about what it is Doc is about

    what is does Documenting your functions - the UX Colin Fay, ThinkR - http://thinkr.fr 9 / 32
  34. Documenting your functions - the UX Documenting your package is

    essential: both for your "future you", and for other people who will use your package. The documentation is what you will read in the Help tab once the package is installed, or via ?function (or by pressing F1). Colin Fay, ThinkR - http://thinkr.fr 10 / 32
  35. Enable roxygen2 In Build> Configure build tools, make sure to

    check "Generate documentation with roxygen" and all the boxes in the options window. The documentation is managed by the .rd files contained in the man/ folder, generated in LaTeX, and are available in HTLM, text or pdf. Thanks to {roxygen2}, you dont have to type the LaTeX code by hand. Using {roxygen2} has several advantages. Notably: You don't have to write LaTeX The documentation is in the same place as the object The documentation adapts to the type of object (function, data...) Colin Fay, ThinkR - http://thinkr.fr 11 / 32
  36. Documentation Example: #' Title #' #' description #' @param x

    param #' @param y param #' @return what the function returns #' @examples #' myfun(4,5) #' myfun(5,9) #' @export myfun <- function(x, y){ return(x+y) } Colin Fay, ThinkR - http://thinkr.fr 13 / 32
  37. Documentation Each roxygen comment is preceded by #'. Then, we

    open with @, then the name of the filed to fill. RStudio offers autocompletion after the @. If they are no @, the first two lines are understood as the title and description. Then, the most common fields are: @details : details on the function @param x a function param, with a description @return : what the function returns @examples : examples of how to use the function @export : needed if you want the function to be accessible outside the package @import and @importFrom : functions or packages to import @source and @references : references on the function @section my_section : add a custom section to the documentation To know the list of parameters: browseVignettes("roxygen2"). Colin Fay, ThinkR - http://thinkr.fr 14 / 32
  38. Documentation Some roxygen fields make it easier to navigate in

    packages: @seealso : points to other resources, on the web or in the package @family : makes the function belong to a family of functions @aliases : gives "nicknames" to the function, to allow it to be found with ? my_alias. Colin Fay, ThinkR - http://thinkr.fr 15 / 32
  39. Documentation Do not forget to create a help for the

    package. It can be called by the user by doing ? pkgname. #' What the package does #' #' the description of the package, with details #' #' @name pkgname-package #' @aliases pkgname-package pkgname #' @docType package #' @author colin <colin@@thinkr.fr> NULL Colin Fay, ThinkR - http://thinkr.fr 16 / 32
  40. Document several functions There are several methods to document multiple

    functions "at once". Document a function, and comment with @rdname. Then, each new function with the same documentation must contain the same @rdname. This method generates a single .rd document for all functions with the same @rdname. use @describeIn : in a new function, we indicate where the documentation is located. inherit parameters with @inheritParams : in this case, you "only" inherit the specified parameters, not all the function documentation. Colin Fay, ThinkR - http://thinkr.fr 17 / 32
  41. Misc Prevent an example from being executed If we put

    an example in \dontrun{}, it won't be evaluated when the package is built. Document two parameters #' @param x,y a numerical vector Generate the doc Once the documentation has been written, launch the roxygenise(). roxygen2::roxygenise() Colin Fay, ThinkR - http://thinkr.fr 18 / 32
  42. A "UX first" approach Building a package that lasts means

    that once your package is stable, you'll need people to (be able) use it. "Be able" means: documentation should easily guide them. Consider: "An R Package using the R6 paradigm to create an object oriented API designed to interactively and programmatically write Docker setup files inside an R session or script." VS "Easy Dockerfile Creation with R6" The truth is 90% of your end users don't care about technical stu s. Colin Fay, ThinkR - http://thinkr.fr 20 / 32
  43. Writing a README usethis::use_readme_rmd() Creates a template of a README

    file, perfect for GitHub and for any other sharing platform. This files should include: A quick explanation of what the package does How to install it Basic use of the package Where to fill issues / ask questions Knit it to create a GitHub markdown Tips: a well written README can easily be used as an introduction Vignette. Colin Fay, ThinkR - http://thinkr.fr 21 / 32
  44. Vignettes A vignette is an html/pdf file that accompanies a

    package and is more developed than help pages, as the form is free (you can include images, tables, links, html...). usethis::use_vignette("mypackage") ...creates a "mypackage.Rmd" file in a vignettes/ folder at the root of the package, and adds the appropriate dependency in your DESCRIPTION. The vignette is a simple RMarkdown page. Colin Fay, ThinkR - http://thinkr.fr 22 / 32
  45. --- title: "Vignette Title" author: "Vignette Author" date: `2018-05-14'' output:

    rmarkdown::html_vignette thumbnail: > %\VignetteIndexEntry{Vignette Title} % ThumbnailEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- Describes your vignette We replace the title, the author, and the "VignetteIndexEntry". Then, you can write the vignette as a classic Markdown file. To preview the vignette rendering, press the knit button, or Ctrl/Cmd + Shift + K. Vignettes Colin Fay, ThinkR - http://thinkr.fr 23 / 32
  46. How many vignettes? There are no right answers to this

    question, that depends of the size of your package and of what you have implemented. Build Vignettes Use devtools::build_vignettes() to render the vignette (in /inst/doc). The end user can see the vignettes with the following instruction: browseVignettes("myPackage") Colin Fay, ThinkR - http://thinkr.fr 24 / 32
  47. {pkgdown} If you want to publish a "mini-site" on your

    package, you can use the {pkgdown} package. devtools::install_github("hadley/pkgdown") # or install.packages("pkgdown") Then pkgdown::build_site() This command will add a docs/ folder, in which you will find all the necessary elements to a mini-site built from README, your vignettes, and the documentation of your functions. You can customize this site as a classic site (CSS...). The home of the site is on index.html. Colin Fay, ThinkR - http://thinkr.fr 25 / 32
  48. {pkgdown} online As these are plain html files, you can

    upload them to GitHub or to any other server. Colin Fay, ThinkR - http://thinkr.fr 26 / 32
  49. {pkgdown} in your package You can "install" the site in

    any folder: pkgdown::build_site(path ="inst/site") Reminder: all the elements contained in inst/ are moved at the root of the package folder once installed. It can be accessed with: path <- system.file("index.html",package ="mypkg") browseURL(path) We can therefore create a function: launch_help <- function() { browseURL(system.file("index.html",package ="mypkg")) } Colin Fay, ThinkR - http://thinkr.fr 27 / 32
  50. citation There is a standardized form of quoting a package.

    We find it with citation("package") citation("purrr") #> #> To cite package 'purrr' in publications use: #> #> Lionel Henry and Hadley Wickham (2017). purrr: Functional #> Programming Tools. R package version 0.2.4. #> https://CRAN.R-project.org/package=purrr #> #> A BibTeX entry for LaTeX users is #> #> @Manual{, #> title = {purrr: Functional Programming Tools}, #> author = {Lionel Henry and Hadley Wickham}, #> year = {2017}, #> note = {R package version 0.2.4}, Colin Fay, ThinkR - http://thinkr.fr 29 / 32
  51. citation It is possible to provide custom content for this

    function. In this case, we place a CITATION in the inst/ folder, which will have this form: citHeader("Pour citer le lubrifiant dans les publications, utiliser :") citEntry(entrée = "Article", title = "Dates and Times Made Easy with {lubridate}", author = personList(as.person("Garrett Grolemund"), as.person("Hadley Wickham")), journal = "Journal of Statistical Software", année = "2011", volume = "40", nombre = "3", pages = "1--25", url = "http://www.jstatsoft.org/v40/i03/", textVersion = textVersion paste("Garrett Grolemund, Hadley Wickham (2011).".", "Dates and Times Made Easy with lubridate.", "Journal of Statistical Software, 40(3), 1-25.", Colin Fay, ThinkR - http://thinkr.fr 30 / 32
  52. About the NAMESPACE The NAMESPACE file is one of the

    most important files of your package. It's also the one you should not edit by hand. This file describes how your package interacts with R, and with other packages. This is where, among other things, the dependencies are managed. This file also lists the functions that are exported. The namespace allows the package to work. This file is managed by {roxygen2}, via the tags @export, @import and @importFrom. Colin Fay, ThinkR - http://thinkr.fr 3 / 15
  53. Understanding searchPath When R looks for an object, it will

    go up the searchPath until it finds this object (keep in mind that a function is an object). Here is an example of a search path: search() #> [1] ".GlobalEnv" "package:attempt" "package:forcats" #> [4] "package:stringr" "package:dplyr" "package:purrr" #> [7] "package:readr" "package:tidyr" "package:tibble" #> [10] "package:ggplot2" "package:tidyverse" "package:rhub" #> [13] "package:testthat" "package:bindrcpp" "tools:rstudio" #> [16] "package:stats" "package:graphics" "package:grDevices" #> [19] "package:utils" "package:datasets" "package:methods" #> [22] "Autoloads" "package:base" Colin Fay, ThinkR - http://thinkr.fr 4 / 15
  54. Understanding searchPath Each time I load a new package, R

    moves it to the "top of the list". library(attempt) search() #> [1] ".GlobalEnv" "package:attempt" "package:forcats" #> [4] "package:stringr" "package:dplyr" "package:purrr" #> [7] "package:readr" "package:tidyr" "package:tibble" #> [10] "package:ggplot2" "package:tidyverse" "package:rhub" #> [13] "package:testthat" "package:bindrcpp" "tools:rstudio" #> [16] "package:stats" "package:graphics" "package:grDevices" #> [19] "package:utils" "package:datasets" "package:methods" #> [22] "Autoloads" "package:base" Colin Fay, ThinkR - http://thinkr.fr 5 / 15
  55. You can give any name to a function: rnorm <-

    function(x) x + 1 rnorm(1) #> [1] 2 rnorm #> function(x) x + 1 You can specify the namespace of the function with :: stats::rnorm(1) #> [1] -2.449438 stats::rnorm #> function (n, mean = 0, sd = 1) #> .Call(C_rnorm, n, mean, sd) #> <bytecode: 0x102e9c4a0> #> <environment: namespace:stats> Understanding searchPath The notation namespace:stats indicates in which namespace the function is located. Colin Fay, ThinkR - http://thinkr.fr 6 / 15
  56. Understanding namespaces When programming, we are likely to name objects

    that have the same name as those in other packages. And most of the time, you will need functions from other packages. This is what the NAMESPACE is used for: to manage the interconnection between the different packages in the environment. Colin Fay, ThinkR - http://thinkr.fr 7 / 15
  57. NAMESPACE conflicts When two functions are called the same in

    two packages, and these two packages are launched, there is what is called a namespace conflict: library(tidyverse) tidyverse_conflicts() #> ── Conflicts ─────────────────────────────────── tidyverse_conflicts() ── #> ✖ dplyr::filter() masks stats::filter() #> ✖ attempt::if_else() masks dplyr::if_else() #> ✖ purrr::is_null() masks testthat::is_null() #> ✖ dplyr::lag() masks stats::lag() #> ✖ dplyr::matches() masks testthat::matches() #> ✖ dplyr::vars() masks ggplot2::vars() Colin Fay, ThinkR - http://thinkr.fr 8 / 15
  58. What's a dependency? To work, your package may need external

    functions, i.e. contained in other packages. R has three types of dependencies that will be contained in the DESCRIPTION : Depends & Imports: the packages that will be attach() or load() respectively. In practice, always list in Imports : attach means that the package is attached to the search path (and remember that a good package should not touch to the user's environment). Suggests : suggests packages to use in addition to your package. Will not be attached or loaded. These elements are filled automatically thanks to {usethis}. Colin Fay, ThinkR - http://thinkr.fr 10 / 15
  59. Dependencies The DESCRIPTION file contains the package dependencies. A quick

    and easy way to specify it is the instruction: usethis::use_package("attempt") ... adds {attempt} in the DESCRIPTION. That's not enough: we will need to use a roxygen comment on each function of our package to specify the dependencies used. To do that, we will use @import (a whole package) and @importFrom (a specific function). Colin Fay, ThinkR - http://thinkr.fr 11 / 15
  60. Add dependencies to EACH function #' @import magrittr #' @importFrom

    stats na.omit moyenne <- function(x){ x <- x %>% na.omit() sum(x)/length(x) } You can use import or importFrom. The better is to use importFrom, for preventing namespace conflict. Add to EACH function. It will take a lot of time, but it's better on the long run. Colin Fay, ThinkR - http://thinkr.fr 12 / 15
  61. Strategies for naming function... ... to prevent NAMESPACE conflicts Use

    some letters at the beginning of each functions (as does {stringr}, for example) You can't search the whole CRAN for function names, but you can start with your own computer: ??my_fun ??bitShiftL ... to help UX Name the function after what it does: if the function is used to get X, call it get_x. Colin Fay, ThinkR - http://thinkr.fr 13 / 15
  62. How many dependencies? Tough question... Just keep in mind that

    depending on another package means that : you will potentially have to recode your package if some breaking change happen in your dependencies. If one of your dependencies is removed from CRAN, you will be removed too. In other words, beware the {clipr} effect. Colin Fay, ThinkR - http://thinkr.fr 14 / 15
  63. a <- function(num) { res <- num * 10 /

    pi round(res, 2) } b <- function(num) { res <- num * 20 / pi round(res, 2) } c <- function(num) { res <- num * 30 / pi round(res, 2) } rt <- function(n, m, d = 2) { round( (n * m / pi), d ) } a <- function(num) { rt(num, 10) } b <- function(num) { rt(num, 20) } c <- function(num) { rt(num, 30) } Developer oriented optimisation The biggest rule to remember: If you need to copy and paste something, write a function. Colin Fay, ThinkR - http://thinkr.fr 3 / 24
  64. Developer oriented optimisation The more code you have, the more

    errors you will get Colin Fay, ThinkR - http://thinkr.fr 4 / 24
  65. UX oriented functions The "unfair burden" of the package developer:

    you should make the complex tasks seem easy / invisible for the user. Colin Fay, ThinkR - http://thinkr.fr 6 / 24
  66. Anticipate errors Two things you should keep in mind: R

    is not reknown for its clear error messages. The end user will, at some point, try to run your function with weird arguments. So basically, you should expect a user to use a weird input, to get back a cryptic message, and to: open a Stackoverflow question (in the best case scenario) open an issue on your GitHub simply stop using your package because "it doesn't work" Colin Fay, ThinkR - http://thinkr.fr 7 / 24
  67. Defensive programming Debugging is the art and science of fixing

    unexpected problems in your code. Wickham H., Advanced R Not all errors are unexpected. To prevent errors, adopt a "defensive programming" strategy: anticipate errors and/or unexpected behaviors, in order to manage them upstream and to inform the user. Three types of alerts exist: stop : an error, stops the execution of the program. warning : an alert, informs of a potential error, does not however prevent the program from working. message : a message printed on the console, for information purposes. Colin Fay, ThinkR - http://thinkr.fr 8 / 24
  68. Using {attempt} for defensive programming # remote::install_github("ColinFay/attempt") # install.packages("attempt") library(attempt)

    my_sqrt <- function(num) { stop_if_not(.x = num, .p = is.numeric, msg = "You should enter a number") sqrt(num) } my_sqrt(1) #> [1] 1 my_sqrt("1") #> Error: You should enter a number Colin Fay, ThinkR - http://thinkr.fr 9 / 24
  69. Stop, alert, inform grep("stop", ls("package:attempt"),value = TRUE) #> [1] "stop_if"

    "stop_if_all" "stop_if_any" "stop_if_none" #> [5] "stop_if_not" grep("warn", ls("package:attempt"),value = TRUE) #> [1] "warn_if" "warn_if_all" "warn_if_any" "warn_if_none" #> [5] "warn_if_not" "with_warning" "without_warning" grep("message", ls("package:attempt"),value = TRUE) #> [1] "message_if" "message_if_all" "message_if_any" "message_if_none" #> [5] "message_if_not" "with_message" "without_message" Colin Fay, ThinkR - http://thinkr.fr 10 / 24
  70. Speed optimisation Not all functions need to be optimised. Don't

    spend a week optimising for 10 microseconds. You can always do more, you need to know when to stop. Unless really needed, do not spend too much time optimising for speed. Colin Fay, ThinkR - http://thinkr.fr 11 / 24
  71. Benchmarking your code How can I know if my code

    is slow? => BENCHMARK! You can do this in base R with system.time: system.time({ Sys.sleep(1) }) #> user system elapsed #> 0.007 0.005 1.002 But there are more automated ways to do this. Colin Fay, ThinkR - http://thinkr.fr 12 / 24
  72. Benchmark with {microbenchmark} {microbenchmark} is a wrapper around system.time that

    automates benchmark. More efficient : allows the comparison to be repeated many times, and displays the maximum, minimum, average and median of the elapsed time as a result. We always start by making sure the two results are all.equal. all.equal(sum(1,2), 1+2) #> [1] TRUE microbenchmark::microbenchmark(sum(1,2), 1 + 2, times = 10000) #> Unit: nanoseconds #> expr min lq mean median uq max neval cld #> sum(1, 2) 168 181 226.7462 201 243 13262 10000 b #> 1 + 2 61 74 101.0710 92 99 40196 10000 a Colin Fay, ThinkR - http://thinkr.fr 13 / 24
  73. Benchmark with {bench} More recent package (mid-april 2018). With {bench},

    you don't have to test for equality of results before running the benchmark. More human readable results. Only on GitHub for now. Read more : http://bench.r-lib.org/ Colin Fay, ThinkR - http://thinkr.fr 15 / 24
  74. Benchmark with {bench} bench::mark(sum(1,3), 1 + 2, iterations = 10000)

    #> Error: All results must equal the first result: #> `sum(1, 3)` does not equal `1 + 2` bench::mark(sum(1,2), 1 + 2, iterations = 10000) #> # A tibble: 2 x 14 #> expression min mean median max `itr/sec` mem_alloc n_gc n_itr #> <chr> <bch:t> <bch:t> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> #> 1 sum(1, 2) 169ns 259ns 234ns 31.8µs 3863516. 0B 0. 10000 #> 2 1 + 2 55ns 114ns 67ns 10.6µs 8741809. 0B 0. 10000 #> # ... with 5 more variables: total_time <bch:tm>, result <list>, #> # memory <list>, time <list>, gc <list> Colin Fay, ThinkR - http://thinkr.fr 16 / 24
  75. Identify bottlenecks with profiling voyage <- function(days) { print("take your

    breath") profvis::pause(1) print("Let's go!") travel(transport = 2, stay = days) } travel <- function(transport, stay) { plane(transport) + enjoy(stay) } plane <- function(times){ purrr::rerun(times, profvis::pause(sample(1:5, 1))) } enjoy <- function(stay){ beer <- stay ^ 2 profvis::pause(beer) } Colin Fay, ThinkR - http://thinkr.fr 17 / 24
  76. Optimisation, some ground rules return and stop as soon as

    possible If a test stops the execution, do it first: compute_first <- function(a){ d <- log(a) * 10 + log(a) * 100 if( a == 1 ) return(0) return(d) } if_first <- function(a){ if( a == 1 ) return(0) d <- log(a) * 10 + log(a) * 100 return(d) } Colin Fay, ThinkR - http://thinkr.fr 20 / 24
  77. Optimisation, some ground rules return and stop as soon as

    possible microbenchmark(compute_first(1), if_first(1), compute_first(0), if_first(0), times = 100) %>% autoplot() Colin Fay, ThinkR - http://thinkr.fr 21 / 24
  78. Minimise the number of function calls l <- c(1:100, NA)

    microbenchmark(anyNA(l), any(is.na(l)), times = 100) %>% autoplot() Colin Fay, ThinkR - http://thinkr.fr 22 / 24
  79. Use R formats library(readr) write_rds(iris, "iris.RDS") saveRDS(iris, "iris2.RDS") write.csv(iris, "iris.csv")

    microbenchmark(readRDS("iris2.RDS"), read_rds("iris.RDS"), read.csv("iris.csv")) %>% autoplot() Colin Fay, ThinkR - http://thinkr.fr 23 / 24
  80. Building a package that lasts Part 5: Test and code

    coverage Colin Fay, ThinkR - http://thinkr.fr 1 / 26
  81. Everything which is not tested will, at last, break. Colin

    Fay, ThinkR - http://thinkr.fr 2 / 26
  82. Why automate code testing? To save time! Work serenely with

    your coworkers Transfer the project Guarantee the stability on the long run Thanks to {testthat}, we can automate the tests. -> Allows bugs to be detected before they happen, and guarantees the validity of the code. install.packages("testthat") usethis::use_test("myfunction") Creates a test/testthat folder, adds {testthat} to the Suggests of the DESCRIPTION, and creates test/testthat.R (do not touch it). Colin Fay, ThinkR - http://thinkr.fr 3 / 26
  83. Use test in a "defensive programming" approach to prevent bugs.

    Don't trust yourself in 6 months. Be sure to send in production a package with minimum bugs. Good news: you're already writing test, you just didn't know that before. Write tests before it's too late Colin Fay, ThinkR - http://thinkr.fr 4 / 26
  84. Does it rings a bell? my_awesome_function <- function(a, b){ res

    <- a + b return(res) } # Works my_awesome_function(1, 2) # Doesn't work my_awesome_function("a","b") Colin Fay, ThinkR - http://thinkr.fr 5 / 26
  85. Test that In the test/ folder is a testthat.R file

    and a testthat folder. In this folder, you'll find .R files of the following form: test-my_function.R One test file per (big) function, with general contexts. Break down the tests in this file by type. These tests will be performed during devtools::check() (which also performs other tests), or with devtools::test() (Ctr/Cmd + Shift + T). Colin Fay, ThinkR - http://thinkr.fr 6 / 26
  86. Test that Each file is composed of a series of

    tests in this format: context("global info") test_that("details series 1", { test1a test1b }) test_that("details series 2", { test2a test2b }) Colin Fay, ThinkR - http://thinkr.fr 7 / 26
  87. Test functions Your test functions start with expect_*. They take

    two elements: the first is the actual result, the second is the expected result. If the test is not passed, the function returns an error. If the test passes, the function returns nothing. library(testthat) expect_equal(10, 10) a <- sample(1:10, 1) b <- sample(1:10, 1) expect_equal(a+b, 200) Erreur : a + b not equal to 200. 1/1 mismatches [1] 11 - 200 == -189 Colin Fay, ThinkR - http://thinkr.fr 8 / 26
  88. Expectations library(testthat) grep("^expect", ls("package:testthat"), value = TRUE) #> [1] "expect"

    "expect_condition" #> [3] "expect_cpp_tests_pass" "expect_equal" #> [5] "expect_equal_to_reference" "expect_equivalent" #> [7] "expect_error" "expect_failure" #> [9] "expect_false" "expect_gt" #> [11] "expect_gte" "expect_identical" #> [13] "expect_is" "expect_known_failure" #> [15] "expect_known_hash" "expect_known_output" #> [17] "expect_known_value" "expect_length" #> [19] "expect_less_than" "expect_lt" #> [21] "expect_lte" "expect_match" #> [23] "expect_message" "expect_more_than" #> [25] "expect_named" "expect_null" #> [27] "expect_output" "expect_output_file" #> [29] "expect_reference" "expect_s3_class" #> [31] "expect_s4_class" "expect_setequal" Colin Fay, ThinkR - http://thinkr.fr 9 / 26
  89. "Skip" a test If you want to skip a test

    (if the code depends on a web connection, an API, etc...), use the skip_if_not() function. library(httr) url <- "http://numbersapi.com/42" test_that("API test", { skip_if_not(curl::has_internet(), "No internet connection") res < content(GET(url)) expect_is(url, "character") }) testthat:::skip_if_not_installed() skips a test if a package is not installed. There are other functions to skip a test under particular conditions, such as skip_on_os(), to prevent from testing on specific operating systems. Colin Fay, ThinkR - http://thinkr.fr 10 / 26
  90. Create your own test You can also create your own

    tests in test_that(): plop <- function(class) { structure(1:10, class = class) } expect_plop <- function(object, class){ expect_is(object, class) } test_that("Class well assigned", { a <- plop("ma_class") expect_plop(a, "ma_classe") }) Colin Fay, ThinkR - http://thinkr.fr 11 / 26
  91. Launch tests grep("^test", ls("package:testthat"), value = TRUE) #> [1] "test_check"

    "test_dir" "test_env" "test_example" #> [5] "test_examples" "test_file" "test_package" "test_path" #> [9] "test_rd" "test_that" Colin Fay, ThinkR - http://thinkr.fr 12 / 26
  92. Launch tests devtools::check() ... ✔ | 12 | adverbs ✔

    | 7 | test-utils.R ✔ | 22 | test-warn.R ✔ | 18 | test-any-all-none.R ══ Results ═════════════════════════════════════════════════════════ Duration: 0.8 s OK: 192 Failed: 3 Warnings: 0 Skipped: 0 Colin Fay, ThinkR - http://thinkr.fr 13 / 26
  93. R CMD check To test the code more globally, in

    the command line (i.e. in the terminal): R CMD check. Or simply the devtools::check() function in your R session. More tests are performed with check than with devtools::test(), which "only" performs the tests in the test folder. This command runs around 50 different tests. Is performed when you click the "Check" button on the Build tab of RStudio. Colin Fay, ThinkR - http://thinkr.fr 14 / 26
  94. Test with rhub {rhub} is a package that allows you

    to test for several OS: library(rhub) ls("package:rhub") #> [1] "check" "check_for_cran" #> [3] "check_on_centos" "check_on_debian" #> [5] "check_on_fedora" "check_on_linux" #> [7] "check_on_macos" "check_on_ubuntu" #> [9] "check_on_windows" "check_with_rdevel" #> [11] "check_with_roldrel" "check_with_rpatched" #> [13] "check_with_rrelease" "check_with_sanitizers" #> [15] "check_with_valgrind" "last_check" #> [17] "list_my_checks" "list_package_checks" #> [19] "list_validated_emails" "platforms" #> [21] "rhub_check" "rhub_check_for_cran" #> [23] "rhub_check_list" "validate_email" Colin Fay, ThinkR - http://thinkr.fr 15 / 26
  95. Test with rhub devtools::install_github("r-hub/rhub") # or install.packages("rhub") # verify your

    email library(rhub) validate_email() Colin Fay, ThinkR - http://thinkr.fr 16 / 26
  96. Test with rhub rhub::check() Which platforms are supported? rhub::platforms() #>

    debian-gcc-devel: #> Debian Linux, R-devel, GCC #> debian-gcc-patched: #> Debian Linux, R-patched, GCC #> debian-gcc-release: #> Debian Linux, R-release, GCC #> fedora-clang-devel: #> Fedora Linux, R-devel, clang, gfortran #> fedora-gcc-devel: #> Fedora Linux, R-devel, GCC #> linux-x86_64-centos6-epel: #> CentOS 6, stock R from EPEL #> linux-x86_64-centos6-epel-rdt: Colin Fay, ThinkR - http://thinkr.fr 17 / 26
  97. What is code coverage? Code coverage is the proportion of

    code that is launched when you run your package tests. Code Coverage Colin Fay, ThinkR - http://thinkr.fr 19 / 26
  98. Local code coverage Which part of my code are not

    covered by tests? my_coverage <- covr::package_coverage() covr::zero_coverage() Colin Fay, ThinkR - http://thinkr.fr 21 / 26
  99. codecov.io usethis::use_coverage() Code cov is a service that is used

    with Travis (we'll see Travis in the next chapterr), and allows to know the amount of code covered by the tests online. The {usethis} function creates the appropriate yaml, and inserts in your clipboard some code to paste in your travis yaml. Colin Fay, ThinkR - http://thinkr.fr 23 / 26
  100. Why use Travis? To save time (again) ! Make sure

    the package is tested regularly Have a shareable test report Guarantee the stability of the project Travis in my project To use Travis: usethis::use_travis() Notes : The package must first be connected to Github The use of Travis is free for Open Source projects (i.e. that are on Github) With a "pro" plan you can use Travis on private rests Colin Fay, ThinkR - http://thinkr.fr 2 / 8
  101. What's Travis? Travis https://travis-ci.org/ is a continuous integration service that

    will perform a check whenever you commit on GitHub. Configuration The {usethis} command automatically generates the travis configuration file : Colin Fay, ThinkR - http://thinkr.fr 3 / 8
  102. The cache: packages specification speeds up the build. We can

    test on several versions of R by specifying : r: oldrel release devel You can request the installation of specific elements, like Pandoc versions, or apt packages : addons: apt: packages: libxml2-dev Yaml configuration If the DESCRIPTION file contains all the necessary information, the yaml can be as simple as : language: r Colin Fay, ThinkR - http://thinkr.fr 5 / 8
  103. Github Gitlab Git Visual Studio Subversion ... Notes: the free

    plan "only" allows to cover Open Source projects. AppVeyor usethis::use_appveyor() AppVeyor is a continuous integration service that integrates with a larger number of services: Colin Fay, ThinkR - http://thinkr.fr 6 / 8
  104. AppVeyor Log in on https://ci.appveyor.com/, select a project, and launch

    "New Build" : Colin Fay, ThinkR - http://thinkr.fr 7 / 8
  105. Further readings (in no precise order) Books and blogs "R

    Packages": http://r-pkgs.had.co.nz/ "Créer un package R en quelques minutes": https://thinkr.fr/creer-package-r- quelques-minutes "R package primer" : http://kbroman.org/pkg_primer/ "Preparing your package for a CRAN submission" : https://github.com/ThinkR- open/prepare-for-cran Colin Fay, ThinkR - http://thinkr.fr 2 / 6
  106. Further readings (in no precise order) Webinars "Programming Part 3

    (Package writing in RStudio)": https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series- programming-part-3/ "You can make a package in 20 minutes – Jim Hester": http://www.rstudio.com/resources/videos/you-can-make-a-package-in-20-minutes/ Colin Fay, ThinkR - http://thinkr.fr 3 / 6
  107. Further tools (in no precise order) sinew - "Generate roxygen2

    skeletons populated with information scraped from the function script": https://github.com/metrumresearchgroup/sinew prefixer - "Prefix function with their namespace" : https://github.com/dreamRs/prefixer remedy - "RStudio Addins to Simplify Markdown Writing" : https://github.com/ThinkR-open/remedy revdepcheck - "R package reverse dependency checking": https://github.com/r- lib/revdepcheck Colin Fay, ThinkR - http://thinkr.fr 4 / 6
  108. Building a package that lasts... Think "UX first" The easier

    it looks for the user, the better Think user first, from function names to documentation Help yourself Automate as much as you can Document as much as you can your development process Prevent bug at an early stage Keep testing Colin Fay, ThinkR - http://thinkr.fr 5 / 6