Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Teaching data science with puzzles

isteves
January 17, 2019

Teaching data science with puzzles

Of the many coding puzzles on the web, few focus on the programming skills needed for handling untidy data. During my summer internship at RStudio, I worked with Jenny Bryan to develop a series of data science puzzles known as the "Tidies of March." These puzzles isolate data wrangling tasks into bite-sized pieces to nurture core data science skills such as importing, reshaping, and summarizing data. We also provide access to puzzles and puzzle data directly in R through an accompanying Tidies of March package. I will show how this package models best practices for both data wrangling and project management.

isteves

January 17, 2019
Tweet

More Decks by isteves

Other Decks in Programming

Transcript

  1. Teaching data science
    with puzzles
    rstudio::conf 2019
    Irene Steves
    i_steves isteves
    irene.rbind.io bit.ly/ds-puzzles

    View Slide

  2. View Slide

  3. Unique puzzle input
    Answer submission
    Puzzle text

    View Slide

  4. I solved these
    with R, but boy
    was it clunky!

    View Slide

  5. Let’s make puzzles
    that highlight what
    R/the tidyverse are
    good at!

    View Slide

  6. Bite-sized puzzles that focus
    on core data science skills as
    championed by the tidyverse
    set of packages
    march
    tidies
    of

    View Slide

  7. SOOTHSAYER. Beware the ides of March.
    CAESAR. What man is that?
    BRUTUS. A soothsayer bids you beware the ides of March.
    CAESAR. Set him before me; let me see his face.
    CASSIUS. Fellow, come from the throng; look upon Caesar.
    CAESAR. What say'st thou to me now? Speak once again.
    SOOTHSAYER. Beware the ides of March.
    CAESAR. He is a dreamer; let us leave him.
    The Death of Julius Caesar, Vincenzo Camuccini 1771-1844

    View Slide

  8. Photo: flickr clement127

    View Slide

  9. Wrangling Workflow

    View Slide

  10. march
    tidies
    of
    Workflow

    View Slide

  11. View Slide

  12. New R Project

    View Slide

  13. @JennyBryan

    View Slide

  14. Pre-populated file path
    here::here() for defining the path
    Knittable .R file
    Omit tidyverse
    messages from
    html output

    View Slide

  15. Auto-generated
    table of contents

    View Slide

  16. The neighborhood sandwich store makes the
    best sandwiches! They’ve got everything from
    classics like BLTs to more unusual options like
    Fluffernutters. Since many of their specialty
    ingredients keep going bad, they've decided to
    cut their selection and only focus on their best-
    selling sandwich.
    Photo: flickr skywhisperer

    View Slide

  17. To help with the decision, the storeowners have collected data on their customers’ favorite
    sandwiches. Most people listed several varieties (in no particular order). Here’s a sample
    of the data:
    In this sample, the Dagwood sandwich is the most popular.
    In the full dataset, what is the most popular sandwich among the customers?

    View Slide

  18. In this sample, the Dagwood sandwich is the most popular.

    View Slide

  19. View Slide

  20. View Slide

  21. Photo: Wikipedia

    View Slide

  22. View Slide

  23. View Slide

  24. Beyond the
    Test cases
    Parseable and
    predictable file &
    folder names
    Projects
    & git
    Reproducible
    code*

    View Slide

  25. Thank you!
    Irene Steves
    i_steves isteves
    irene.rbind.io
    bit.ly/ds-puzzles

    View Slide