$30 off During Our Annual Pro Sale. View Details »

Eight visualisation challenges with ggplot2

Hadley Wickham
November 17, 2016

Eight visualisation challenges with ggplot2

Presented to the NYC datavis meetup.

Hadley Wickham

November 17, 2016
Tweet

More Decks by Hadley Wickham

Other Decks in Science

Transcript

  1. Hadley Wickham 

    @hadleywickham

    Chief Scientist, RStudio
    Solving 8 visualisation
    challenges with ggplot2
    November 2016

    View Slide

  2. http://fivethirtyeight.com/features/our-47-weirdest-charts-from-2015/

    View Slide

  3. https://flowingdata.com/tag/upshot/

    View Slide

  4. 1Labelling plots
    Solved by Bob Rudis
    A problem ignored for too long

    View Slide













  5. ● ●







    ● ●














    ● ●







    ● ●









    ● ●















    ● ●




    ● ●

















































    ● ●





    ● ●








    ● ●




    ● ●










    ● ●
















































    ● ●














    ● ●
    20
    30
    40
    2 3 4 5 6 7
    displ
    hwy
    class







    2seater
    compact
    midsize
    minivan
    pickup
    subcompact
    suv
    Two seaters (sports cars) are an exception because of their light weight
    Fuel efficiency generally decreases with engine size
    Data from fueleconomy.gov

    View Slide

  6. ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class)) +
    geom_smooth(se = FALSE, method = "loess") +
    labs(
    title = "Fuel efficiency generally ...",
    subtitle = "Two seaters (sports cars) ...",
    caption = "Data from fueleconomy.gov"
    )
    Accessed with the labs() function

    View Slide

  7. 2
    Axes

    View Slide

  8. View Slide

  9. Stages of visualisation system popularity
    1. Someone used it and complained about a bug
    2. Someone used it in an academic paper
    3. Someone used it in a newspaper
    4.Someone used it to commit academic fraud
    5. So many people use it that google has autocompletes
    for bad graphics ideas

    View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. Isenberg, Petra, et al. "A study on dual-scale data charts." IEEE Transactions on
    Visualization and Computer Graphics 17.12 (2011): 2469-2478.
    https://www.lri.fr/~isenberg/publications/papers/Isenberg_2011_ASO.pdf

    View Slide

  14. But...

    View Slide













  15. ● ●







    ● ●














    ● ●







    ● ●









    ● ●















    ● ●




    ● ●

















































    ● ●





    ● ●








    ● ●




    ● ●










    ● ●
















































    ● ●














    ● ●
    20
    30
    40
    6
    8
    10
    12
    14
    16
    18
    20
    2 3 4 5 6 7
    displ
    mpg
    l / 100 km

    View Slide

  16. ggplot(mpg, aes(displ, hwy)) +
    geom_point() +
    scale_y_continuous(
    "mpg",
    sec.axis = sec_axis(
    ~ 235 / .,
    name = "l / 100 km",
    breaks = seq(2, 20, by = 2)
    )
    )
    Only 1-to-1 transformations are allowed
    function(x) {
    235 / x
    }

    View Slide













  17. ● ●







    ● ●














    ● ●







    ● ●









    ● ●















    ● ●




    ● ●

















































    ● ●





    ● ●








    ● ●




    ● ●










    ● ●
















































    ● ●














    ● ●
    20
    30
    40
    6
    8
    10
    12
    14
    16
    18
    20
    2 3 4 5 6 7
    displ
    mpg
    l / 100 km

    View Slide

  18. Labelling
    3

    data

    View Slide













  19. ● ●







    ● ●














    ● ●







    ● ●









    ● ●















    ● ●




    ● ●

















































    ● ●





    ● ●








    ● ●




    ● ●










    ● ●
















































    ● ●














    ● ● corvette
    caravan 2wd
    altima
    forester awd
    toyota tacoma 4wd
    jetta
    new beetle
    20
    30
    40
    2 3 4 5 6 7
    displ
    hwy
    class







    2seater
    compact
    midsize
    minivan
    pickup
    subcompact
    suv
    geom_text()

    View Slide













  20. ● ●







    ● ●














    ● ●







    ● ●









    ● ●















    ● ●




    ● ●

















































    ● ●





    ● ●








    ● ●




    ● ●










    ● ●
















































    ● ●














    ● ● corvette
    caravan 2wd
    altima
    forester awd
    toyota tacoma 4wd
    jetta
    new beetle
    20
    30
    40
    2 3 4 5 6 7
    displ
    hwy
    class







    2seater
    compact
    midsize
    minivan
    pickup
    subcompact
    suv
    geom_label()

    View Slide













  21. ● ●







    ● ●














    ● ●







    ● ●









    ● ●















    ● ●




    ● ●

















































    ● ●





    ● ●








    ● ●




    ● ●










    ● ●
















































    ● ●














    ● ● ●






    corvette
    caravan 2wd
    altima
    forester awd
    toyota tacoma 4wd
    jetta
    new beetle
    20
    30
    40
    2 3 4 5 6 7
    displ
    hwy
    class







    2seater
    compact
    midsize
    minivan
    pickup
    subcompact
    suv
    https://github.com/slowkow/ggrepel
    geom_label_repel()

    View Slide













  22. ● ●







    ● ●














    ● ●







    ● ●









    ● ●















    ● ●




    ● ●

















































    ● ●





    ● ●








    ● ●




    ● ●










    ● ●
















































    ● ●














    ● ● ●






    corvette
    caravan 2wd
    altima
    forester awd
    toyota tacoma 4wd
    jetta
    new beetle
    20
    30
    40
    2 3 4 5 6 7
    displ
    hwy
    class







    2seater
    compact
    midsize
    minivan
    pickup
    subcompact
    suv
    dev version

    View Slide

  23. View Slide

  24. Two difference between a factor and a string:
    1.Fixed set of possible values
    2.Arbitrary order

    View Slide

  25. relig <- gss_cat %>%
    group_by(relig) %>%
    summarise(
    tvhours = mean(tvhours, na.rm = TRUE),
    n = n()
    )
    Some data from the general social survey

    View Slide
















  26. No answer
    Don't know
    Inter−nondenominational
    Native american
    Christian
    Orthodox−christian
    Moslem/islam
    Other eastern
    Hinduism
    Buddhism
    Other
    None
    Jewish
    Catholic
    Protestant
    2 3 4
    tvhours
    relig

    View Slide
















  27. Other eastern
    Hinduism
    Buddhism
    Orthodox−christian
    Moslem/islam
    Jewish
    None
    No answer
    Other
    Christian
    Inter−nondenominational
    Catholic
    Protestant
    Native american
    Don't know
    2 3 4
    tvhours
    fct_reorder(relig, tvhours)

    View Slide

  28. by_age <- gss_cat %>%
    filter(!is.na(age)) %>%
    group_by(age, marital) %>%
    count() %>%
    mutate(prop = n / sum(n))
    You have the same problem with more dimensions

    View Slide

  29. 0.00
    0.25
    0.50
    0.75
    1.00
    20 40 60 80
    age
    prop
    marital
    No answer
    Never married
    Separated
    Divorced
    Widowed
    Married

    View Slide

  30. 0.00
    0.25
    0.50
    0.75
    1.00
    20 40 60 80
    age
    prop
    marital
    Widowed
    Married
    Divorced
    Never married
    No answer
    Separated

    View Slide

  31. 5
    Missing
    values

    View Slide

  32. An explicit missing value (NA)
    is the presence of an absence; 

    an implicit missing value is the
    absence of a presence.

    View Slide

  33. Demo

    View Slide

  34. 6
    Histograms

    View Slide

  35. hist(1:4)

    View Slide

  36. df <- tibble(x = 1:4)
    df %>%
    ggplot(aes(x)) +
    geom_histogram(binwidth = 1)
    Equivalent ggplot2 code is a little longer

    View Slide

  37. 0.00
    0.25
    0.50
    0.75
    1.00
    1 2 3 4
    x
    count
    (0.5, 1.5] (1.5, 2.5] (2.5, 3.5] (3.5, 4.5]
    Thanks to Randall Pruim

    View Slide

  38. df %>%
    ggplot(aes(x)) +
    geom_histogram(
    binwidth = 1,
    boundary = 0
    )
    df %>%
    ggplot(aes(x)) +
    geom_histogram(
    binwidth = 1,
    boundary = 0,
    closed = "left"
    )

    View Slide

  39. 0.0
    0.5
    1.0
    1.5
    2.0
    1 2 3 4
    x
    count
    [1, 2] (2, 3] (3, 4]

    View Slide

  40. 0.0
    0.5
    1.0
    1.5
    2.0
    1 2 3 4
    x
    count
    (0.5, 1.5]
    [1, 2) [2, 3) [3, 4]

    View Slide

  41. 0.0
    0.5
    1.0
    1.5
    2.0
    1 2 3 4
    x
    count
    (0.5, 1.5]
    [0.99999, 1.99999) [1.99999, 2.99999) [2.99999, 4.00001]

    View Slide

  42. 7
    Bar
    charts

    View Slide

  43. 0
    20
    40
    60
    2seater compact midsize minivan pickup subcompact suv
    class
    count
    ggplot(mpg, aes(class)) +
    geom_bar(colour = "white")

    View Slide

  44. 0
    20
    40
    60
    2seater compact midsize minivan pickup subcompact suv
    class
    count
    ggplot(mpg, aes(class, group = id)) +
    geom_bar(col = "white")

    View Slide

  45. 0
    20
    40
    60
    2seater compact midsize minivan pickup subcompact suv
    class
    count
    drv
    4
    f
    r
    ggplot(mpg, aes(class, group = id, fill = drv)) +
    geom_bar(col = "white")

    View Slide

  46. 0
    20
    40
    60
    2seater compact midsize minivan pickup subcompact suv
    class
    count
    drv
    4
    f
    r
    ggplot(mpg, aes(class, fill = drv)) +
    geom_bar(col = "white")

    View Slide

  47. class_mpg <- mpg %>%
    group_by(class) %>%
    summarise(
    mean = mean(hwy),
    se = 1.96 * sd(hwy) / sqrt(n())
    )
    Another type of bar chart displays summaries

    View Slide

  48. 0
    10
    20
    2seater compact midsize minivan pickup subcompact suv
    class
    mean
    ggplot(class_mpg, aes(class, mean)) +
    geom_bar(stat = "identity")

    View Slide

  49. 0
    10
    20
    2seater compact midsize minivan pickup subcompact suv
    class
    mean
    ggplot(class_mpg, aes(class, mean)) +
    geom_col() # Thanks to Bob Rudis

    View Slide








  50. 20
    24
    28
    2seater compact midsize minivan pickup subcompact suv
    class
    mean

    View Slide








  51. 15
    20
    25
    30
    2seater compact midsize minivan pickup subcompact suv
    class
    mean

    View Slide

  52. 8
    ggplot2
    extension
    9 10
    11

    View Slide

  53. 2.1.0 introduced a formal extension mechanism
    https://www.ggplot2-exts.org, by Daniel Emaasit

    View Slide

  54. ggraph, by Thomas Lin Pedersen
    https://github.com/thomasp85/ggraph

    View Slide

  55. ggseas by Peter Ellis
    https://github.com/ellisp/ggseas
    Uses X13-SEATS-ARIMA 

    in seasonal package

    View Slide

  56. gganimate by David Robinson
    https://github.com/dgrtwo/gganimate

    View Slide

  57. Conclusion

    View Slide

  58. 1Labelling plots
    Solved by Bob Rudis
    A problem ignored for too long

    View Slide

  59. 2
    Axes

    View Slide

  60. Labelling
    3

    data

    View Slide

  61. View Slide

  62. 5
    Missing
    values

    View Slide

  63. 6
    Histograms

    View Slide

  64. 7
    Bar
    charts

    View Slide

  65. 8
    ggplot2
    extension
    9 10
    11

    View Slide

  66. Many of the features I
    discussed here have
    been added in recent
    versions of ggplot2. 


    See the release notes
    for more detail.

    View Slide

  67. http://ggplot2.tidyverse.org

    View Slide