Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science in a(n Ever-Evolving) Box

Data Science in a(n Ever-Evolving) Box

What should a first course in data science for students who have limited to no experience with statistics and programming look like? How do we teach it in a way that lends itself to iteration as the landscape of data science evolves and that scales to more students and more instructors? In this talk I will aim to accomplish two goals to answer these questions: (1) Introduce a semester-long, modern introductory data science curriculum, along with its design philosophy, implementation details (particularly as class sizes increase), technical infrastructure, and real examples from course content as well as from student projects. (2) Discuss how I've open-sourced this curriculum at datasciencebox.org for sharing with and re-use / adaptation by other instructors and what it takes to maintain this open-source project as the landscape of data science, data science education curriculum guidelines, and data science tooling evolves.

Mine Cetinkaya-Rundel

June 01, 2023
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for

    Data Science, 2nd Edition. Program Import Tidy Transform Visualize Model Communicate Understand Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science, 2nd Edition. DOING DATA SCIENCE
  2. ‣ Go to Posit Cloud ‣ Start the project titled

    UN Votes ‣ Open the Quarto document called unvotes.qmd
  3. ‣ Go to Posit Cloud ‣ Start the project titled

    UN Votes ‣ Open the Quarto document called unvotes.qmd ‣ Render the document and review the data visualization you just produced
  4. ‣ Go to Posit Cloud ‣ Start the project titled

    UN Votes ‣ Open the Quarto document called unvotes.qmd ‣ Render the document and review the data visualization you just produced ‣ Then, look for the character string “Turkey” in the code and replace it with another country of your choice ‣ Render again, and review how the voting patterns of the country you picked compare to the United States and the United Kingdom
  5. ✴ data joins fisheries |> select(country) #> # A tibble:

    82 × 1 #> country #> <chr> #> 1 Angola #> 2 Argentina #> 3 Australia #> 4 Bangladesh #> 5 Brazil #> 6 Cambodia #> 7 Cameroon #> 8 Canada #> 9 Chad #> 10 Chile # ℹ 72 more rows continents #> # A tibble: 245 × 2 #> country continent #> <chr> <chr> #> 1 Afghanistan Asia #> 2 Åland Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # ℹ 235 more rows fisheries <- left_join(fisheries, continents) Joining with `by = join_by(country)`
  6. ✴ data joins ✴ data science ethics fisheries |> filter(is.na(continent))

    #> # A tibble: 3 × 5 #> country capture aquaculture total continent #> <chr> <dbl> <dbl> <dbl> <chr> #> 1 Democratic Republic of the Congo 237372 3161 240533 NA #> 2 Hong Kong 142775 4258 147033 NA #> 3 Myanmar 2072390 1017644 3090034 NA fisheries <- fisheries |> mutate( continent = case_when( country == "Democratic Republic of the Congo" ~ "Africa", country == "Hong Kong" ~ "Asia", country == "Myanmar" ~ "Asia", .default = continent ) )
  7. ✴ data joins ✴ data science ethics ✴ critique ✴

    improving data visualisations ✴ mapping
  8. Project: Regional differences in average GPA and SAT Question: Exploring

    the regional differences in average GPA and SAT score across the US and the factors that could potentially explain them. Team: Mine’s Minions
  9. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration
  10. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation
  11. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis
  12. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis ✴ data science ethics robotstxt::paths_allowed("https://www.gov.scot") #> www.gov.scot #> [1] TRUE
  13. Project: Factors Most Important to University Ranking Question: Explore how

    various metrics (e.g., SAT/ACT scores, admission rate, region, Carnegie classification) predict rankings on the Niche College Ranking List. Team: 2cool4school
  14. ✴ logistic regression ✴ prediction ✴ decision errors ✴ sensitivity

    / specificity ✴ intuition around loss functions
  15. Project: Predicting League of Legends success Question: After 10 minutes

    into the game, whether a gold lead or an experienced lead was a better predictor of which team wins? Team: Blue Squirrels
  16. Project: A Critique of Hollywood Relationship Stereotypes Question: How has

    the average age difference between two actors in an on- screen relationship changed over the years? Furthermore, do on-screen same-sex relationships have a different average age gap than on-screen heterosexual relationships? Team: team300
  17. live coding: in every “lecture”, along with time for students

    to attempt exercises on their own “minute paper”: weekly online quizzes ending with a brief reflection of the week’s material creativity: assignments that make room for creativity peer feedback: at various stages of the project teams: weekly labs in teams + periodic team evaluations + term project in teams
  18. Çetinkaya-Rundel, Mine, Mine Dogucu, and Wendy Rummerfield. "The 5Ws and

    1H of term projects in the introductory data science classroom." Statistics Education Research Journal 21.2 (2022): 4-4.
  19. student-facing + 📦 ghclass + instructor-facing 📦 checklist + +

    📦 learnr + 📦 gradethis 📦 learnrhash or another browser/ server-based solution …
  20. Beckman, M. D., Çetinkaya-Rundel, M., Horton, N. J., Rundel, C.

    W., Sullivan, A. J., & Tackett, M. "Implementing version control with Git and GitHub as a learning objective in statistics and data science courses." Journal of Statistics and Data Science Education 29. (2021): S132-S144.
  21. AUDIENCE I have been teaching with R for a while,

    but I want to update my teaching materials I’m new to teaching with R and need to build up my course materials This teaching slide deck I came across on Twitter is pretty cool, but I have no idea what type of course it belongs in
  22. SCHOLARSHIP Çetinkaya-Rundel, Mine, and Victoria Ellison. "A fresh look at

    introductory data science." Journal of Statistics and Data Science Education 29.sup1 (2021): S16-S26.