Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Teaching data science with puzzles

isteves
January 17, 2019

Teaching data science with puzzles

Of the many coding puzzles on the web, few focus on the programming skills needed for handling untidy data. During my summer internship at RStudio, I worked with Jenny Bryan to develop a series of data science puzzles known as the "Tidies of March." These puzzles isolate data wrangling tasks into bite-sized pieces to nurture core data science skills such as importing, reshaping, and summarizing data. We also provide access to puzzles and puzzle data directly in R through an accompanying Tidies of March package. I will show how this package models best practices for both data wrangling and project management.

isteves

January 17, 2019
Tweet

More Decks by isteves

Other Decks in Programming

Transcript

  1. Teaching data science
    with puzzles
    rstudio::conf 2019
    Irene Steves
    i_steves isteves
    irene.rbind.io bit.ly/ds-puzzles

    View full-size slide

  2. Unique puzzle input
    Answer submission
    Puzzle text

    View full-size slide

  3. I solved these
    with R, but boy
    was it clunky!

    View full-size slide

  4. Let’s make puzzles
    that highlight what
    R/the tidyverse are
    good at!

    View full-size slide

  5. Bite-sized puzzles that focus
    on core data science skills as
    championed by the tidyverse
    set of packages
    march
    tidies
    of

    View full-size slide

  6. SOOTHSAYER. Beware the ides of March.
    CAESAR. What man is that?
    BRUTUS. A soothsayer bids you beware the ides of March.
    CAESAR. Set him before me; let me see his face.
    CASSIUS. Fellow, come from the throng; look upon Caesar.
    CAESAR. What say'st thou to me now? Speak once again.
    SOOTHSAYER. Beware the ides of March.
    CAESAR. He is a dreamer; let us leave him.
    The Death of Julius Caesar, Vincenzo Camuccini 1771-1844

    View full-size slide

  7. Photo: flickr clement127

    View full-size slide

  8. Wrangling Workflow

    View full-size slide

  9. march
    tidies
    of
    Workflow

    View full-size slide

  10. New R Project

    View full-size slide

  11. Pre-populated file path
    here::here() for defining the path
    Knittable .R file
    Omit tidyverse
    messages from
    html output

    View full-size slide

  12. Auto-generated
    table of contents

    View full-size slide

  13. The neighborhood sandwich store makes the
    best sandwiches! They’ve got everything from
    classics like BLTs to more unusual options like
    Fluffernutters. Since many of their specialty
    ingredients keep going bad, they've decided to
    cut their selection and only focus on their best-
    selling sandwich.
    Photo: flickr skywhisperer

    View full-size slide

  14. To help with the decision, the storeowners have collected data on their customers’ favorite
    sandwiches. Most people listed several varieties (in no particular order). Here’s a sample
    of the data:
    In this sample, the Dagwood sandwich is the most popular.
    In the full dataset, what is the most popular sandwich among the customers?

    View full-size slide

  15. In this sample, the Dagwood sandwich is the most popular.

    View full-size slide

  16. Photo: Wikipedia

    View full-size slide

  17. Beyond the
    Test cases
    Parseable and
    predictable file &
    folder names
    Projects
    & git
    Reproducible
    code*

    View full-size slide

  18. Thank you!
    Irene Steves
    i_steves isteves
    irene.rbind.io
    bit.ly/ds-puzzles

    View full-size slide