Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Box plots: A case study in debugging and perseverance

Kara Woo
January 17, 2019

Box plots: A case study in debugging and perseverance

Come on a journey through pull request #2196. What started as a seemingly simple fix for a bug in ggplot2's box plots developed into an entirely new placement algorithm for ggplot2 geoms. This talk will cover tips and techniques for debugging, testing, and not smashing your computer when dealing with tricky bugs.

Kara Woo

January 17, 2019
Tweet

More Decks by Kara Woo

Other Decks in Technology

Transcript

  1. Kara Woo | @kara_woo Sage Bionetworks rstudio::conf 2019 BOX PLOTS

    A CASE STUDY IN DEBUGGING AND PERSEVERANCE
  2. "When I try to produce boxplots with colours depending on

    a categorical variable, these appear overlapping if varwidth is set to TRUE” —GitHub user mcol
  3. How do I know what the bug is? How do

    I know when I’m done? How do I fix it?
  4. require(ggplot2) #> Loading required package: ggplot2 ggplot(data = iris, aes(Species,

    Sepal.Length)) + geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE) #> Warning: position_dodge requires non-overlapping x intervals
  5. require(ggplot2) #> Loading required package: ggplot2 ggplot(data = iris, aes(Species,

    Sepal.Length)) + geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = FALSE)
  6. require(ggplot2) #> Loading required package: ggplot2 ggplot(data = iris, aes(Species,

    Sepal.Length)) + geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE) #> Warning: position_dodge requires non-overlapping x intervals
  7. COLLIDE() Gets information about box location Looks for box overlap

    Passes boxes that share position to pos_dodge() POS_DODGE() Scales boxes down Places boxes side by side
  8. > debug(ggplot2:::collide) > ggplot(data = iris, aes(Species, Sepal.Length)) + >

    geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = FALSE) #> debugging in: collide(data, params$width, #> name = “position_dodge", strategy = pos_dodge, #> n = params$n, check.width = FALSE) Browse[2]>
  9. > debug(ggplot2:::collide) > ggplot(data = iris, aes(Species, Sepal.Length)) + >

    geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = FALSE) #> debugging in: collide(data, params$width, #> name = “position_dodge", strategy = pos_dodge, #> n = params$n, check.width = FALSE) Browse[2]> data #> ... x xmin xmax #> 1 ... 1 0.625 1.375 #> 2 ... 2 1.625 2.375 #> 3 ... 3 2.625 3.375 #> 4 ... 1 0.625 1.375 #> 5 ... 2 1.625 2.375 #> 6 ... 3 2.625 3.375
  10. VARWIDTH = TRUE VARWIDTH = FALSE data #> ... x

    xmin xmax #> 1 ... 1 0.625 1.375 #> 2 ... 2 1.625 2.375 #> 3 ... 3 2.625 3.375 #> 4 ... 1 0.625 1.375 #> 5 ... 2 1.625 2.375 #> 6 ... 3 2.625 3.375 data #> ... x xmin xmax #> 1 ... 1 0.6553988 1.344601 #> 2 ... 2 1.8750000 2.125000 #> 3 ... 3 2.7984436 3.201556 #> 4 ... 1 0.8063508 1.193649 #> 5 ... 2 1.6250000 2.375000 #> 6 ... 3 2.6599632 3.340037
  11. Boxes with different xmin aren’t treated as the same position.

    collide <- function(data, ...) { # ... plyr::ddply(data, "xmin", strategy, ..., width = width) # ... }
  12. ccc6bbb4 This doesn't fix position_dodge, but it might be in

    the right direction? f5946680 Commit before I break something else
  13. •Boxes in the wrong order •Doesn’t work for continuous x

    axes •Incorrect scaling INITIAL “FIX” What’s wrong with this picture?
  14. Can we extend to bars? Can we solve other open

    issues? Arbitrary rectangles?
  15. HOW DO YOU KNOW WHEN YOU’RE DONE? Photo: Jia Ye

    Don’t let perfect be the enemy of good
  16. • Isolate the problem • Follow the trails • Experiment

    • Test many scenarios • Make it general • Don’t let perfect be the enemy of good https://github.com/tidyverse/ggplot2/pull/2196
  17. Thank you Thanks to @mcol for reporting this bug, Hadley

    Wickham for repeated code reviews, and Sean Kross and Karthik Ram for feedback on this presentation.