Upgrade to Pro — share decks privately, control downloads, hide ads and more …

it depends

Jim Hester
January 18, 2019

it depends

It depends: A dialog about dependencies

Software dependencies can often be a double edged sword. On one hand they
let you take advantage of others' work, giving your software marvelous new
features and reducing bugs. On the other hand they can change, causing your
software to break unexpectedly and increasing your maintenance burden. These
problems occur everywhere, in R scripts, R packages, Shiny applications and
deployed ML pipelines.

So when should you take a dependency and when should you avoid them?

Well, it depends!

This talk will show ways to weigh the pros and cons of a given dependency and
provide tools for calculating the weights for your project. It will also
provide strategies for dealing with dependency changes, and if
needed, removing them. We will demonstrate these techniques with some real life
cases from packages in the tidyverse and r-lib.

Jim Hester

January 18, 2019
Tweet

More Decks by Jim Hester

Other Decks in Programming

Transcript

  1. consider your users package developers? install time costly smaller, limited

    packages easier to depend on stability more important than features data scientists / statisticians? install time cheap top packages already installed features most important
  2. illusionary superiority teaching ability (Cross 1977) 68% of the surveyed

    faculty at the University of Nebraska–Lincoln, ranked themselves in the top 25% and more than 90% rated themselves in the top 50%. driving skill (Svenson 1981) 93% of the U.S. respondents and 69% of the Swedish respondents put themselves in the top 50%.
  3. pitfalls of dependency removal overestimation of abilities underestimation of new

    bugs widely used == free tests less is not always more
  4. itdepends::dep_usage_proj("~/p/tidyversedashboard") %>% count(pkg, sort = TRUE) #> # A tibble:

    13 x 2 #> pkg n #> <chr> <int> #> 1 base 558 #> 2 <NA> 82 #> 3 purrr 44 #> 4 glue 10 #> 5 utils 5 #> 6 tibble 4 #> 7 htmlwidgets 3 #> 8 magick 3 #> 9 gh 2 #> 10 cranlogs 1 #> 11 desc 1 #> 12 stats 1 #> 13 tools 1
  5. itdepends::dep_usage_proj("~/p/tidyversedashboard") %>% group_by(pkg) %>% count(fun) %>% top_n(1) %>% arrange(desc(n)) %>%

    head() #> Selecting by n #> # A tibble: 6 x 3 #> # Groups: pkg [5] #> pkg fun n #> <chr> <chr> <int> #> 1 base $ 118 #> 2 <NA> parse_datetime_8601 14 #> 3 purrr %||% 13 #> 4 purrr map_dfr 13 #> 5 glue glue 7 #> 6 tibble tibble 3
  6. itdepends::dep_usage_pkg("devtools") %>% count(pkg, sort = TRUE) #> # A tibble:

    36 x 2 #> pkg n #> <chr> <int> #> 1 base 3699 #> 2 devtools 362 #> 3 git2r 34 #> 4 usethis 33 #> 5 pkgload 31 #> 6 httr 25 #> 7 withr 25 #> 8 utils 24 #> 9 cli 16 #> 10 tools 15 #> # ... with 26 more rows
  7. weights <- itdepends::dep_weight(c("dplyr", "data.table")) weights #> # A tibble: 2

    x 25 #> package num_user bin_self bin_user install_self install_user funs downloads last_release #> <chr> <int> <int> <dbl> <dbl> <dbl> <int> <dbl> <dttm> #> 1 dplyr 19 1692925 21738385 375. 538. 240 89826 2018-11-10 02:30:06 #> 2 data.t… 0 5720340 5720340 27.0 27.0 107 51658 2018-09-30 09:30:08 #> # ... with 16 more variables: open_issues <int>, last_updated <dttm>, stars <int>, forks <int>, #> # first_release <dttm>, total_releases <dbl>, releases_last_52 <int>, num_dev <int>, #> # install_dev <dbl>, bin_dev <dbl>, src_size <int>, user_deps <list>, dev_deps <list>, #> # self_timings <list>, user_timings <list>, dev_timings <list>
  8. weights[c("package", "num_user", "num_dev", "bin_self", "bin_user", "bin_dev", "install_self", "install_user", "install_dev")] #>

    # A tibble: 2 x 9 #> package num_user num_dev bin_self bin_user bin_dev install_self install_user install_dev #> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 dplyr 19 78 1692925 21738385 94327415 375. 538. 1989. #> 2 data.table 0 23 5720340 5720340 33679072 27.0 27.0 628.
  9. weights[c("package", "funs", "downloads", "first_release", "last_release", "releases_last_52")] #> # A tibble:

    2 x 6 #> package funs downloads first_release last_release releases_last_52 #> <chr> <int> <dbl> <dttm> <dttm> <int> #> 1 dplyr 240 89826 2014-01-16 16:53:37 2018-11-10 02:30:06 4 #> 2 data.table 107 51658 2006-04-14 18:03:15 2018-09-30 09:30:08 5
  10. weights[c("package", "open_issues", "stars", "forks", "last_updated")] #> # A tibble: 2

    x 5 #> package open_issues stars forks last_updated #> <chr> <int> <int> <int> <dttm> #> 1 dplyr 113 2757 1011 2019-01-08 14:25:09 #> 2 data.table 765 1696 725 2019-01-08 11:48:45
  11. itdepends::dep_locate("purrr", path = "~/p/tidyversedashboard") #> R/dashboard.R:11:3: warning: purrr::map_int #> map_int(res,

    ~ if (is.null(.x)) NA_integer_ else length(.x)) #> ^~~~~~~ #> R/dashboard.R:21:5: warning: purrr::map_int #> map_int(gh::gh("/repos/:org/:package/stats/commit_activity", org = org, package = package), "total"), #> ^~~~~~~ #> R/dashboard.R:49:3: warning: purrr::map_int #> map_int(description, #> ^~~~~~~ #> R/dashboard.R:69:10: warning: purrr::map_chr #> res <- map_chr(description, #> ^~~~~~~ #> R/dashboard.R:70:5: warning: purrr::possibly #> possibly(function(.x) { .x$get_maintainer() %|||% NA_character_}, otherwise = NA_character_)) #> ^~~~~~~~ #> R/issue_progress.R:18:39: warning: purrr::walk #> if (is.list(x[[i]]) && isTRUE(walk(x[[i]], depth + 1))) { #> ^~~~ #> R/issue_progress.R:25:3: warning: purrr::walk #> walk(x, 1) #> ^~~~
  12. dependencies are not equal must measure and balance beware overconfidence

    less is not always more
 itdepends dep_usage dep_weight dep_plot dep_locate  @jimhester  @jimhester_ speakerdeck.com/jimhester/it-depends