Teaching data science with puzzles

January 17, 2019

Of the many coding puzzles on the web, few focus on the programming skills needed for handling untidy data. During my summer internship at RStudio, I worked with Jenny Bryan to develop a series of data science puzzles known as the "Tidies of March." These puzzles isolate data wrangling tasks into bite-sized pieces to nurture core data science skills such as importing, reshaping, and summarizing data. We also provide access to puzzles and puzzle data directly in R through an accompanying Tidies of March package. I will show how this package models best practices for both data wrangling and project management.


  Teaching data science with puzzles rstudio::conf 2019 Irene Steves

    isteves irene.rbind.io bit.ly/ds-puzzles
  3. Unique puzzle input Answer submission Puzzle text

  4. I solved these with R, but boy was it clunky!

  5. Let’s make puzzles that highlight what R/the tidyverse are good

  6. Bite-sized puzzles that focus on core data science skills as

    championed by the tidyverse set of packages march tidies of
  7. SOOTHSAYER. Beware the ides of March. CAESAR. What man is

    that? BRUTUS. A soothsayer bids you beware the ides of March. CAESAR. Set him before me; let me see his face. CASSIUS. Fellow, come from the throng; look upon Caesar. CAESAR. What say'st thou to me now? Speak once again. SOOTHSAYER. Beware the ides of March. CAESAR. He is a dreamer; let us leave him. The Death of Julius Caesar, Vincenzo Camuccini 1771-1844
  8. Photo: flickr clement127

  9. Wrangling Workflow

  10. march tidies of Workflow

  12. New R Project

  13. @JennyBryan

  14. Pre-populated file path here::here() for defining the path Knittable .R

    file Omit tidyverse messages from html output
  15. Auto-generated table of contents

  16. The neighborhood sandwich store makes the best sandwiches! They’ve got

    everything from classics like BLTs to more unusual options like Fluffernutters. Since many of their specialty ingredients keep going bad, they've decided to cut their selection and only focus on their best- selling sandwich. Photo: flickr skywhisperer
  17. To help with the decision, the storeowners have collected data

    on their customers’ favorite sandwiches. Most people listed several varieties (in no particular order). Here’s a sample of the data: In this sample, the Dagwood sandwich is the most popular. In the full dataset, what is the most popular sandwich among the customers?
  18. In this sample, the Dagwood sandwich is the most popular.

  21. Photo: Wikipedia

  24. Beyond the Test cases Parseable and predictable file & folder

    names Projects & git Reproducible code*
  Irene Steves