Pro Yearly is on sale from $80 to $50! »

it depends

6170c1d1970baf2a36a9ae2955e47ff3?s=47 Jim Hester
January 18, 2019

it depends

It depends: A dialog about dependencies

Software dependencies can often be a double edged sword. On one hand they
let you take advantage of others' work, giving your software marvelous new
features and reducing bugs. On the other hand they can change, causing your
software to break unexpectedly and increasing your maintenance burden. These
problems occur everywhere, in R scripts, R packages, Shiny applications and
deployed ML pipelines.

So when should you take a dependency and when should you avoid them?

Well, it depends!

This talk will show ways to weigh the pros and cons of a given dependency and
provide tools for calculating the weights for your project. It will also
provide strategies for dealing with dependency changes, and if
needed, removing them. We will demonstrate these techniques with some real life
cases from packages in the tidyverse and r-lib.

6170c1d1970baf2a36a9ae2955e47ff3?s=128

Jim Hester

January 18, 2019
Tweet

Transcript

  1. it depends: a dialog about dependencies Jim Hester  @jimhester

     @jimhester_
  2. all R code has dependencies R script R packages external

    libraries R system libraries
  3. dependencies break left-pad event-stream bitrot

  4. https://xkcd.com/1987/

  5. dependency hell https://xkcd.com/1987/

  6. None
  7. None
  8. not all dependencies are equal

  9. None
  10. None
  11. None
  12. None
  13. None
  14. library(magrittr) library(httr) HEAD("https://bioconductor.org/packages/ 3.7/data/annotation/src/contrib/ MafDb.gnomAD.r2.0.1.hs37d5_3.7.0.tar.gz") %>% headers() %>% {.[["content-length"]]} %>%

    as.numeric() %>% prettyunits::pretty_bytes() #> [1] "4.16 GB"
  15. None
  16. None
  17. not all dependencies are equal

  18. features bugfixes testing time installation diskspace breakage generality more less

  19. consider your users package developers? install time costly smaller, limited

    packages easier to depend on stability more important than features data scientists / statisticians? install time cheap top packages already installed features most important
  20. illusionary superiority teaching ability (Cross 1977) 68% of the surveyed

    faculty at the University of Nebraska–Lincoln, ranked themselves in the top 25% and more than 90% rated themselves in the top 50%. driving skill (Svenson 1981) 93% of the U.S. respondents and 69% of the Swedish respondents put themselves in the top 50%.
  21. pitfalls of dependency removal overestimation of abilities underestimation of new

    bugs widely used == free tests less is not always more
  22. quantification of dependency weight critical

  23. itdepends github.com/jimhester/itdepends

  24. itdepends Assess usage Measure weights Visualize proportions Assist removal

  25. determine usage itdepends::dep_usage_proj() itdepends::dep_usage_pkg()

  26. itdepends::dep_usage_proj("~/p/tidyversedashboard") %>% count(pkg, sort = TRUE) #> # A tibble:

    13 x 2 #> pkg n #> <chr> <int> #> 1 base 558 #> 2 <NA> 82 #> 3 purrr 44 #> 4 glue 10 #> 5 utils 5 #> 6 tibble 4 #> 7 htmlwidgets 3 #> 8 magick 3 #> 9 gh 2 #> 10 cranlogs 1 #> 11 desc 1 #> 12 stats 1 #> 13 tools 1
  27. itdepends::dep_usage_proj("~/p/tidyversedashboard") %>% group_by(pkg) %>% count(fun) %>% top_n(1) %>% arrange(desc(n)) %>%

    head() #> Selecting by n #> # A tibble: 6 x 3 #> # Groups: pkg [5] #> pkg fun n #> <chr> <chr> <int> #> 1 base $ 118 #> 2 <NA> parse_datetime_8601 14 #> 3 purrr %||% 13 #> 4 purrr map_dfr 13 #> 5 glue glue 7 #> 6 tibble tibble 3
  28. itdepends::dep_usage_pkg("devtools") %>% count(pkg, sort = TRUE) #> # A tibble:

    36 x 2 #> pkg n #> <chr> <int> #> 1 base 3699 #> 2 devtools 362 #> 3 git2r 34 #> 4 usethis 33 #> 5 pkgload 31 #> 6 httr 25 #> 7 withr 25 #> 8 utils 24 #> 9 cli 16 #> 10 tools 15 #> # ... with 26 more rows
  29. measure weights itdepends::dep_weight()

  30. weights <- itdepends::dep_weight(c("dplyr", "data.table")) weights #> # A tibble: 2

    x 25 #> package num_user bin_self bin_user install_self install_user funs downloads last_release #> <chr> <int> <int> <dbl> <dbl> <dbl> <int> <dbl> <dttm> #> 1 dplyr 19 1692925 21738385 375. 538. 240 89826 2018-11-10 02:30:06 #> 2 data.t… 0 5720340 5720340 27.0 27.0 107 51658 2018-09-30 09:30:08 #> # ... with 16 more variables: open_issues <int>, last_updated <dttm>, stars <int>, forks <int>, #> # first_release <dttm>, total_releases <dbl>, releases_last_52 <int>, num_dev <int>, #> # install_dev <dbl>, bin_dev <dbl>, src_size <int>, user_deps <list>, dev_deps <list>, #> # self_timings <list>, user_timings <list>, dev_timings <list>
  31. weights[c("package", "num_user", "num_dev", "bin_self", "bin_user", "bin_dev", "install_self", "install_user", "install_dev")] #>

    # A tibble: 2 x 9 #> package num_user num_dev bin_self bin_user bin_dev install_self install_user install_dev #> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 dplyr 19 78 1692925 21738385 94327415 375. 538. 1989. #> 2 data.table 0 23 5720340 5720340 33679072 27.0 27.0 628.
  32. weights[c("package", "funs", "downloads", "first_release", "last_release", "releases_last_52")] #> # A tibble:

    2 x 6 #> package funs downloads first_release last_release releases_last_52 #> <chr> <int> <dbl> <dttm> <dttm> <int> #> 1 dplyr 240 89826 2014-01-16 16:53:37 2018-11-10 02:30:06 4 #> 2 data.table 107 51658 2006-04-14 18:03:15 2018-09-30 09:30:08 5
  33. weights[c("package", "open_issues", "stars", "forks", "last_updated")] #> # A tibble: 2

    x 5 #> package open_issues stars forks last_updated #> <chr> <int> <int> <int> <dttm> #> 1 dplyr 113 2757 1011 2019-01-08 14:25:09 #> 2 data.table 765 1696 725 2019-01-08 11:48:45
  34. visualize proportions itdepends::dep_plot_time() itdepends::dep_plot_size()

  35. itdepends::dep_plot_time("dplyr")

  36. itdepends::dep_plot_size("dplyr")

  37. assist removal first write tests then replace itdepends::dep_locate()

  38. itdepends::dep_locate("purrr", path = "~/p/tidyversedashboard") #> R/dashboard.R:11:3: warning: purrr::map_int #> map_int(res,

    ~ if (is.null(.x)) NA_integer_ else length(.x)) #> ^~~~~~~ #> R/dashboard.R:21:5: warning: purrr::map_int #> map_int(gh::gh("/repos/:org/:package/stats/commit_activity", org = org, package = package), "total"), #> ^~~~~~~ #> R/dashboard.R:49:3: warning: purrr::map_int #> map_int(description, #> ^~~~~~~ #> R/dashboard.R:69:10: warning: purrr::map_chr #> res <- map_chr(description, #> ^~~~~~~ #> R/dashboard.R:70:5: warning: purrr::possibly #> possibly(function(.x) { .x$get_maintainer() %|||% NA_character_}, otherwise = NA_character_)) #> ^~~~~~~~ #> R/issue_progress.R:18:39: warning: purrr::walk #> if (is.list(x[[i]]) && isTRUE(walk(x[[i]], depth + 1))) { #> ^~~~ #> R/issue_progress.R:25:3: warning: purrr::walk #> walk(x, 1) #> ^~~~
  39. itdepends::dep_locate("purrr", path = "~/p/tidyversedashboard")

  40. dependencies are not equal must measure and balance beware overconfidence

    less is not always more
 itdepends dep_usage dep_weight dep_plot dep_locate  @jimhester  @jimhester_ speakerdeck.com/jimhester/it-depends