Data Science in a Box

Data Science in a Box

Data Science in a Box (datasciencebox.org) is an open-source project that aims to equip educators with concrete information on content and infrastructure for designing and painlessly running a semester-long modern introductory data science course with R. In this talk we outline five guiding pedagogical priniples that underlie the choice of topics and concepts introduced in the course as well as their ordering, highlight a sample of examples and assignments that demonstrate how the pedagogy is put into action, introduce `dsbox` -- the companion R package for datasets used in the course, and share sample student work and feedback. We will also walk through a quick start guide for faculty interested in using all or some of these resources in their teaching.

81689b093f75cf3f383e581ca57188df?s=128

Mine Cetinkaya-Rundel

July 10, 2019
Tweet

Transcript

  1. ! rstd.io/dsbox-slides mine-cetinkaya-rundel cetinkaya.mine@gmail.com @minebocek MINE ÇETINKAYA-RUNDEL UNIVERSITY OF EDINBURGH

    + RSTUDIO
  2. ! rstd.io/dsbox-slides Three questions that keep me up at night…

    1 What should my students learn? 2 How will my students learn best? 3 What tools will enhance my students’ learning?
  3. ! rstd.io/dsbox-slides 1 What should my students learn? 2 How

    will my students learn best? 3 What tools will enhance my students’ learning? Three questions that keep me up at night… Content Pedagogy Infrastructure
  4. ! rstd.io/dsbox-slides Infrastructure Pedagogy Content

  5. ! rstd.io/dsbox-slides

  6. ! rstd.io/dsbox-slides ! datasciencebox.org ! rstudio-education/datascience-box

  7. ! rstd.io/dsbox-slides AUDIENCE I have been teaching with R for

    a while, but I want to update my teaching materials I’m new to teaching with R and need to build up my course materials This teaching slide deck I came across on Twitter is pretty cool, but I have no idea what type of course it belongs in
  8. ! rstd.io/dsbox-slides TOPICS Fundamentals of data & data viz, confounding

    variables, Simpson’s paradox + R / RStudio, R Markdown, simple Git Tidy data, data frames vs. summary tables, recoding & transforming, web scraping & iteration + collaboration on GitHub Building & selecting models, visualizing interactions, prediction & validation, inference via simulation Data science ethics, interactive viz & reporting, text analysis, Bayesian inference + communication & dissemination
  9. ! rstd.io/dsbox-slides CONTENTS " 27 slide decks # 10 application

    exercises $ 10 computing labs ✍ 6 homework assignments ✔ 2 take-home exams ' 1 open-ended project ( (10) interactive tutorials website datasciencebox.org repository package dsbox
  10. ! rstd.io/dsbox-slides DESIGN PRINCIPLES ) cherish day one * skip

    baby steps + start with cake , leverage the ecosystem - hide the veggies
  11. DESIGN PRINCIPLES Which kitchen would you rather bake a cake?

  12. DESIGN PRINCIPLES Which kitchen would you rather bake a cake?

  13. DESIGN PRINCIPLES ) Cherish day one

  14. DESIGN PRINCIPLES How do you prefer your cake recipes? Words

    only, or words & pictures?
  15. DESIGN PRINCIPLES How do you prefer your cake recipes? Words

    only, or words & pictures?
  16. DESIGN PRINCIPLES + Start with cake ‣ Open today’s demo

    project ‣ Knit the document and discuss the results with your neighbor ‣ Then, change Turkey to a different country, and plot again
  17. DESIGN PRINCIPLES + Start with cake With great examples, comes

    a great amount of code… but let’s focus on the task at hand… ‣ Open today’s demo project ‣ Knit the document and discuss the results with your neighbor ‣ Then, change Turkey to a different country, and plot again
  18. DESIGN PRINCIPLES + Start with cake un_votes %>% filter(country %in%

    c("UK & NI", “US”, "Turkey")) %>% inner_join(un_roll_calls, by = "rcid") %>% inner_join(un_roll_call_issues, by = "rcid") %>% group_by(country, year = year(date), issue) %>% summarize( votes = n(), percent_yes = mean(vote !" "yes") ) %>% filter(votes > 5) %>% # only use records where there are more than 5 votes ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + geom_smooth(method = "loess", se = FALSE) + facet_wrap(~ issue) + labs( title = "Percentage of Yes votes in the UN General Assembly", subtitle = "1946 to 2015", y = "% Yes", x = "Year", color = "Country" )
  19. DESIGN PRINCIPLES + Start with cake un_votes %>% filter(country %in%

    c("UK & NI", “US”, "Turkey")) %>% inner_join(un_roll_calls, by = "rcid") %>% inner_join(un_roll_call_issues, by = "rcid") %>% group_by(country, year = year(date), issue) %>% summarize( votes = n(), percent_yes = mean(vote !" "yes") ) %>% filter(votes > 5) %>% # only use records where there are more than 5 votes ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + geom_smooth(method = "loess", se = FALSE) + facet_wrap(~ issue) + labs( title = "Percentage of Yes votes in the UN General Assembly", subtitle = "1946 to 2015", y = "% Yes", x = "Year", color = "Country" )
  20. DESIGN PRINCIPLES + Start with cake un_votes %>% filter(country %in%

    c("UK & NI", “US”, "Turkey")) %>% inner_join(un_roll_calls, by = "rcid") %>% inner_join(un_roll_call_issues, by = "rcid") %>% group_by(country, year = year(date), issue) %>% summarize( votes = n(), percent_yes = mean(vote !" "yes") ) %>% filter(votes > 5) %>% # only use records where there are more than 5 votes ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + geom_smooth(method = "loess", se = FALSE) + facet_wrap(~ issue) + labs( title = "Percentage of Yes votes in the UN General Assembly", subtitle = "1946 to 2015", y = "% Yes", x = "Year", color = "Country" )
  21. DESIGN PRINCIPLES + Start with cake un_votes %>% filter(country %in%

    c("UK & NI", “US”, “France")) %>% inner_join(un_roll_calls, by = "rcid") %>% inner_join(un_roll_call_issues, by = "rcid") %>% group_by(country, year = year(date), issue) %>% summarize( votes = n(), percent_yes = mean(vote !" "yes") ) %>% filter(votes > 5) %>% # only use records where there are more than 5 votes ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + geom_smooth(method = "loess", se = FALSE) + facet_wrap(~ issue) + labs( title = "Percentage of Yes votes in the UN General Assembly", subtitle = "1946 to 2015", y = "% Yes", x = "Year", color = "Country" )
  22. DESIGN PRINCIPLES + Start with cake

  23. DESIGN PRINCIPLES Which motivates you more to learn how to

    cook: perfectly chopped onions or ratatouille?
  24. DESIGN PRINCIPLES Which motivates you more to learn how to

    cook: perfectly chopped onions or ratatouille?
  25. DESIGN PRINCIPLES * Skip baby steps Re-insert

  26. DESIGN PRINCIPLES Which is more likely to appeal to someone

    who has never tried broccoli?
  27. DESIGN PRINCIPLES Which is more likely to appeal to someone

    who has never tried broccoli?
  28. DESIGN PRINCIPLES - Hide the veggies ‣ Today we go

    from this to that ‣ And do so in a way that is easy to replicate for another state →
  29. DESIGN PRINCIPLES Lesson: Web scraping essentials for turning a structured

    table into a data frame in R. - Hide the veggies
  30. DESIGN PRINCIPLES Lesson: Web scraping essentials for turning a structured

    table into a data frame in R. Ex 1: Scrape the table off the web and save as a data frame. - Hide the veggies
  31. DESIGN PRINCIPLES Lesson: Web scraping essentials for turning a structured

    table into a data frame in R. Ex 1: Scrape the table off the web and save as a data frame. Ex 2: What other information do we need represented as variables to make this figure? - Hide the veggies
  32. DESIGN PRINCIPLES Lesson: Web scraping essentials for turning a structured

    table into a data frame in R. Ex 1: Scrape the table off the web and save as a data frame. Ex 2: What other information do we need represented as variables to make this figure? Lesson: “Just enough” regex - Hide the veggies
  33. DESIGN PRINCIPLES If you are already taking a baking class,

    which will be easier to venture on to?
  34. DESIGN PRINCIPLES If you are already taking a baking class,

    which will be easier to venture on to?
  35. DESIGN PRINCIPLES , Leverage the ecosystem student + instructor instructor

  36. ! rstd.io/dsbox-slides USAGE in full to jumpstart / overhaul your

    teaching in bits & pieces to supplement your teaching
  37. ! rstd.io/dsbox-slides LICENSE

  38. ! rstd.io/dsbox-slides FUTURE If you use resources from , hope

    you’ll let me know / provide feedback! rstd.io/dsbox-feedback scalability ‣ more formative assessments ‣ automated feedback ‣ peer review assessment ‣ curriculum ‣ reach & impact
  39. mine-cetinkaya-rundel cetinkaya.mine@gmail.com @minebocek MINE ÇETINKAYA-RUNDEL UNIVERSITY OF EDINBURGH + RSTUDIO

    . datasciencebox.org / github.com/rstudio-education/dsbox " rstd.io/dsbox-slides ' rstd.io/dsbox-feedback