$30 off During Our Annual Pro Sale. View Details »

it depends

Jim Hester
January 18, 2019

it depends

It depends: A dialog about dependencies

Software dependencies can often be a double edged sword. On one hand they
let you take advantage of others' work, giving your software marvelous new
features and reducing bugs. On the other hand they can change, causing your
software to break unexpectedly and increasing your maintenance burden. These
problems occur everywhere, in R scripts, R packages, Shiny applications and
deployed ML pipelines.

So when should you take a dependency and when should you avoid them?

Well, it depends!

This talk will show ways to weigh the pros and cons of a given dependency and
provide tools for calculating the weights for your project. It will also
provide strategies for dealing with dependency changes, and if
needed, removing them. We will demonstrate these techniques with some real life
cases from packages in the tidyverse and r-lib.

Jim Hester

January 18, 2019
Tweet

More Decks by Jim Hester

Other Decks in Programming

Transcript

  1. it depends:
    a dialog about dependencies
    Jim Hester
     @jimhester  @jimhester_

    View Slide

  2. all R code has
    dependencies
    R script
    R packages
    external libraries
    R
    system libraries

    View Slide

  3. dependencies
    break
    left-pad
    event-stream
    bitrot

    View Slide

  4. https://xkcd.com/1987/

    View Slide

  5. dependency hell
    https://xkcd.com/1987/

    View Slide

  6. View Slide

  7. View Slide

  8. not all
    dependencies are
    equal

    View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. library(magrittr)
    library(httr)
    HEAD("https://bioconductor.org/packages/
    3.7/data/annotation/src/contrib/
    MafDb.gnomAD.r2.0.1.hs37d5_3.7.0.tar.gz")
    %>%
    headers() %>%
    {.[["content-length"]]} %>%
    as.numeric() %>%
    prettyunits::pretty_bytes()
    #> [1] "4.16 GB"

    View Slide

  15. View Slide

  16. View Slide

  17. not all
    dependencies are
    equal

    View Slide

  18. features
    bugfixes
    testing
    time
    installation
    diskspace
    breakage
    generality
    more
    less

    View Slide

  19. consider your users
    package developers?
    install time costly
    smaller, limited packages easier to depend on
    stability more important than features
    data scientists / statisticians?
    install time cheap
    top packages already installed
    features most important

    View Slide

  20. illusionary
    superiority
    teaching ability (Cross 1977)
    68% of the surveyed faculty at the
    University of Nebraska–Lincoln, ranked
    themselves in the top 25% and more than
    90% rated themselves in the top 50%.
    driving skill (Svenson 1981)
    93% of the U.S. respondents and 69% of the
    Swedish respondents put themselves in the
    top 50%.

    View Slide

  21. pitfalls of
    dependency removal
    overestimation of abilities
    underestimation of new bugs
    widely used == free tests
    less is not always more

    View Slide

  22. quantification of dependency
    weight critical

    View Slide

  23. itdepends
    github.com/jimhester/itdepends

    View Slide

  24. itdepends
    Assess usage
    Measure weights
    Visualize proportions
    Assist removal

    View Slide

  25. determine usage
    itdepends::dep_usage_proj()
    itdepends::dep_usage_pkg()

    View Slide

  26. itdepends::dep_usage_proj("~/p/tidyversedashboard") %>%
    count(pkg, sort = TRUE)
    #> # A tibble: 13 x 2
    #> pkg n
    #>
    #> 1 base 558
    #> 2 82
    #> 3 purrr 44
    #> 4 glue 10
    #> 5 utils 5
    #> 6 tibble 4
    #> 7 htmlwidgets 3
    #> 8 magick 3
    #> 9 gh 2
    #> 10 cranlogs 1
    #> 11 desc 1
    #> 12 stats 1
    #> 13 tools 1

    View Slide

  27. itdepends::dep_usage_proj("~/p/tidyversedashboard") %>%
    group_by(pkg) %>%
    count(fun) %>%
    top_n(1) %>%
    arrange(desc(n)) %>%
    head()
    #> Selecting by n
    #> # A tibble: 6 x 3
    #> # Groups: pkg [5]
    #> pkg fun n
    #>
    #> 1 base $ 118
    #> 2 parse_datetime_8601 14
    #> 3 purrr %||% 13
    #> 4 purrr map_dfr 13
    #> 5 glue glue 7
    #> 6 tibble tibble 3

    View Slide

  28. itdepends::dep_usage_pkg("devtools") %>%
    count(pkg, sort = TRUE)
    #> # A tibble: 36 x 2
    #> pkg n
    #>
    #> 1 base 3699
    #> 2 devtools 362
    #> 3 git2r 34
    #> 4 usethis 33
    #> 5 pkgload 31
    #> 6 httr 25
    #> 7 withr 25
    #> 8 utils 24
    #> 9 cli 16
    #> 10 tools 15
    #> # ... with 26 more rows

    View Slide

  29. measure weights
    itdepends::dep_weight()

    View Slide

  30. weights <- itdepends::dep_weight(c("dplyr", "data.table"))
    weights
    #> # A tibble: 2 x 25
    #> package num_user bin_self bin_user install_self install_user funs downloads last_release
    #>
    #> 1 dplyr 19 1692925 21738385 375. 538. 240 89826 2018-11-10 02:30:06
    #> 2 data.t… 0 5720340 5720340 27.0 27.0 107 51658 2018-09-30 09:30:08
    #> # ... with 16 more variables: open_issues , last_updated , stars , forks ,
    #> # first_release , total_releases , releases_last_52 , num_dev ,
    #> # install_dev , bin_dev , src_size , user_deps , dev_deps ,
    #> # self_timings , user_timings , dev_timings

    View Slide

  31. weights[c("package", "num_user", "num_dev", "bin_self", "bin_user", "bin_dev",
    "install_self", "install_user", "install_dev")]
    #> # A tibble: 2 x 9
    #> package num_user num_dev bin_self bin_user bin_dev install_self install_user install_dev
    #>
    #> 1 dplyr 19 78 1692925 21738385 94327415 375. 538. 1989.
    #> 2 data.table 0 23 5720340 5720340 33679072 27.0 27.0 628.

    View Slide

  32. weights[c("package", "funs", "downloads", "first_release", "last_release", "releases_last_52")]
    #> # A tibble: 2 x 6
    #> package funs downloads first_release last_release releases_last_52
    #>
    #> 1 dplyr 240 89826 2014-01-16 16:53:37 2018-11-10 02:30:06 4
    #> 2 data.table 107 51658 2006-04-14 18:03:15 2018-09-30 09:30:08 5

    View Slide

  33. weights[c("package", "open_issues", "stars", "forks", "last_updated")]
    #> # A tibble: 2 x 5
    #> package open_issues stars forks last_updated
    #>
    #> 1 dplyr 113 2757 1011 2019-01-08 14:25:09
    #> 2 data.table 765 1696 725 2019-01-08 11:48:45

    View Slide

  34. visualize
    proportions
    itdepends::dep_plot_time()
    itdepends::dep_plot_size()

    View Slide

  35. itdepends::dep_plot_time("dplyr")

    View Slide

  36. itdepends::dep_plot_size("dplyr")

    View Slide

  37. assist removal
    first write tests
    then replace
    itdepends::dep_locate()

    View Slide

  38. itdepends::dep_locate("purrr", path = "~/p/tidyversedashboard")
    #> R/dashboard.R:11:3: warning: purrr::map_int
    #> map_int(res, ~ if (is.null(.x)) NA_integer_ else length(.x))
    #> ^~~~~~~
    #> R/dashboard.R:21:5: warning: purrr::map_int
    #> map_int(gh::gh("/repos/:org/:package/stats/commit_activity", org = org, package =
    package), "total"),
    #> ^~~~~~~
    #> R/dashboard.R:49:3: warning: purrr::map_int
    #> map_int(description,
    #> ^~~~~~~
    #> R/dashboard.R:69:10: warning: purrr::map_chr
    #> res <- map_chr(description,
    #> ^~~~~~~
    #> R/dashboard.R:70:5: warning: purrr::possibly
    #> possibly(function(.x) { .x$get_maintainer() %|||% NA_character_}, otherwise =
    NA_character_))
    #> ^~~~~~~~
    #> R/issue_progress.R:18:39: warning: purrr::walk
    #> if (is.list(x[[i]]) && isTRUE(walk(x[[i]], depth + 1))) {
    #> ^~~~
    #> R/issue_progress.R:25:3: warning: purrr::walk
    #> walk(x, 1)
    #> ^~~~

    View Slide

  39. itdepends::dep_locate("purrr", path = "~/p/tidyversedashboard")

    View Slide

  40. dependencies are
    not equal
    must measure and
    balance
    beware
    overconfidence
    less is not always
    more

    itdepends
    dep_usage
    dep_weight
    dep_plot
    dep_locate
     @jimhester
     @jimhester_
    speakerdeck.com/jimhester/it-depends

    View Slide