Pro Yearly is on sale from $80 to $50! »

Box plots: A case study in debugging and perseverance

1548a7c3c4273ded4ca2bd765548a370?s=47 Kara Woo
January 17, 2019

Box plots: A case study in debugging and perseverance

Come on a journey through pull request #2196. What started as a seemingly simple fix for a bug in ggplot2's box plots developed into an entirely new placement algorithm for ggplot2 geoms. This talk will cover tips and techniques for debugging, testing, and not smashing your computer when dealing with tricky bugs.

1548a7c3c4273ded4ca2bd765548a370?s=128

Kara Woo

January 17, 2019
Tweet

Transcript

  1. Kara Woo | @kara_woo Sage Bionetworks rstudio::conf 2019 BOX PLOTS

    A CASE STUDY IN DEBUGGING AND PERSEVERANCE
  2. THIS IS ONLY KIND OF ABOUT BOX PLOTS

  3. "When I try to produce boxplots with colours depending on

    a categorical variable, these appear overlapping if varwidth is set to TRUE” —GitHub user mcol
  4. “This should be straightforward.”

  5. None
  6. How do I know what the bug is? How do

    I know when I’m done? How do I fix it?
  7. HOW DO YOU KNOW WHAT THE BUG IS? Photo: Jia

    Ye Isolate the problem
  8. Photo: Sarah Dorweiler REPREX minimal reproducible example

  9. reprex::reprex()

  10. require(ggplot2) #> Loading required package: ggplot2 ggplot(data = iris, aes(Species,

    Sepal.Length)) + geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE) #> Warning: position_dodge requires non-overlapping x intervals
  11. require(ggplot2) #> Loading required package: ggplot2 ggplot(data = iris, aes(Species,

    Sepal.Length)) + geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = FALSE)
  12. Follow the trails Photo: Jia Ye

  13. require(ggplot2) #> Loading required package: ggplot2 ggplot(data = iris, aes(Species,

    Sepal.Length)) + geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE) #> Warning: position_dodge requires non-overlapping x intervals
  14. None
  15. position_dodge() !" PositionDodge !" collide() !" pos_dodge()

  16. COLLIDE() Gets information about box location Looks for box overlap

    Passes boxes that share position to pos_dodge() POS_DODGE() Scales boxes down Places boxes side by side
  17. > debug(ggplot2:::collide)

  18. > debug(ggplot2:::collide) > ggplot(data = iris, aes(Species, Sepal.Length)) + >

    geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = FALSE) #> debugging in: collide(data, params$width, #> name = “position_dodge", strategy = pos_dodge, #> n = params$n, check.width = FALSE) Browse[2]>
  19. > debug(ggplot2:::collide) > ggplot(data = iris, aes(Species, Sepal.Length)) + >

    geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = FALSE) #> debugging in: collide(data, params$width, #> name = “position_dodge", strategy = pos_dodge, #> n = params$n, check.width = FALSE) Browse[2]> data #> ... x xmin xmax #> 1 ... 1 0.625 1.375 #> 2 ... 2 1.625 2.375 #> 3 ... 3 2.625 3.375 #> 4 ... 1 0.625 1.375 #> 5 ... 2 1.625 2.375 #> 6 ... 3 2.625 3.375
  20. VARWIDTH = TRUE VARWIDTH = FALSE data #> ... x

    xmin xmax #> 1 ... 1 0.625 1.375 #> 2 ... 2 1.625 2.375 #> 3 ... 3 2.625 3.375 #> 4 ... 1 0.625 1.375 #> 5 ... 2 1.625 2.375 #> 6 ... 3 2.625 3.375 data #> ... x xmin xmax #> 1 ... 1 0.6553988 1.344601 #> 2 ... 2 1.8750000 2.125000 #> 3 ... 3 2.7984436 3.201556 #> 4 ... 1 0.8063508 1.193649 #> 5 ... 2 1.6250000 2.375000 #> 6 ... 3 2.6599632 3.340037
  21. Boxes with different xmin aren’t treated as the same position.

    collide <- function(data, ...) { # ... plyr::ddply(data, "xmin", strategy, ..., width = width) # ... }
  22. HOW DO YOU FIX IT? Photo: Jia Ye Experiment

  23. ccc6bbb4 This doesn't fix position_dodge, but it might be in

    the right direction? f5946680 Commit before I break something else
  24. INITIAL “FIX” What’s wrong with this picture?

  25. •Boxes in the wrong order •Doesn’t work for continuous x

    axes •Incorrect scaling INITIAL “FIX” What’s wrong with this picture?
  26. Photo: Jia Ye Test many scenarios HOW DO YOU FIX

    IT?
  27. None
  28. Photo: Jia Ye Make it general HOW DO YOU FIX

    IT?
  29. Can we extend to bars? Can we solve other open

    issues? Arbitrary rectangles?
  30. → POSITION_DODGE2() Used for boxes, bars, rectangles Finds overlap by

    comparing xmin to previous xmax →
  31. HOW DO YOU KNOW WHEN YOU’RE DONE? Photo: Jia Ye

    Again, test
  32. HOW DO YOU KNOW WHEN YOU’RE DONE? Photo: Jia Ye

    Don’t let perfect be the enemy of good
  33. CAN POSITION_DODGE2() REPLACE POSITION_DODGE()?

  34. None
  35. None
  36. • Isolate the problem • Follow the trails • Experiment

    • Test many scenarios • Make it general • Don’t let perfect be the enemy of good https://github.com/tidyverse/ggplot2/pull/2196
  37. Thank you Thanks to @mcol for reporting this bug, Hadley

    Wickham for repeated code reviews, and Sean Kross and Karthik Ram for feedback on this presentation.