Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introductory data science, a fresh look (CSUF)

Introductory data science, a fresh look (CSUF)

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science, it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we will describe a fresh approach to teaching data science at the introductory level, introduce the design philosophy behind the curriculum, and give examples from course materials as well as from student projects. We will also discuss new directions in assessment and tooling as we scale up the course and move it online.

81689b093f75cf3f383e581ca57188df?s=128

Mine Cetinkaya-Rundel

April 09, 2021
Tweet

Transcript

  1. introductory data science a fresh look 🔗 bit.ly/fresh-ds-csuf mine-cetinkaya-rundel cetinkaya.mine@gmail.com

    minebocek mine çetinkaya-rundel
  2. How can we effectively and ef fi ciently teach data

    science to students with little to no background in computing and statistical thinking? How can we equip them with the skills and tools for reasoning with various types of data and leave them wanting to learn more?
  3. demonstrate concrete course examples share a few tips provide open-source

    teaching resources goals
  4. data visualisation data wrangling, tidying, acquisition exploratory data analysis predictive

    modeling + uncertainty quanti fi cation effective communication of results interactive visualizations text analysis machine learning Bayesian inference … consistent syntax | tidyverse reproducibility | R Markdown version control and collaboration | Git + GitHub focus on emphasise foray into
  5. topics

  6. None
  7. ex. 1 united nations

  8. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votes 🔗 rstd.io/dsbox-cloud
  9. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votes ‣ Open the R Markdown document called unvotes.Rmd 🔗 rstd.io/dsbox-cloud
  10. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votes ‣ Open the R Markdown document called unvotes.Rmd ‣ Knit the document and review the data visualisation you just produced 🔗 rstd.io/dsbox-cloud
  11. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votesdocument called unvotes.Rmd ‣ Knit the document and review the data visualisation you just produced ‣ Then, look for the character string “Turkey” in the code and replace it with another country of your choice ‣ Knit again, and review how the voting patterns of the country you picked compares to the United States and United Kingdom & Northern Ireland 🔗 rstd.io/dsbox-cloud
  12. un_votes %>% f i lter(country %in% c("UK & NI", “US”,

    "Turkey")) %>% inner_join(un_roll_calls, by = "rcid") %>% inner_join(un_roll_call_issues, by = "rcid") %>% group_by(country, year = year(date), issue) %>% summarize( votes = n(), percent_yes = mean(vote = = "yes") ) %>% f i lter(votes > 5) %>% # only use records where there are more than 5 votes ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + geom_smooth(method = "loess", se = FALSE) + facet_wrap(~ issue) + labs( title = "Percentage of Yes votes in the UN General Assembly", subtitle = "1946 to 2015", y = "% Yes", x = "Year", color = "Country" )
  13. un_votes %>% f i lter(country %in% c("UK & NI", “US”,

    "Turkey")) %>% inner_join(un_roll_calls, by = "rcid") %>% inner_join(un_roll_call_issues, by = "rcid") %>% group_by(country, year = year(date), issue) %>% summarize( votes = n(), percent_yes = mean(vote = = "yes") ) %>% f i lter(votes > 5) %>% # only use records where there are more than 5 votes ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + geom_smooth(method = "loess", se = FALSE) + facet_wrap(~ issue) + labs( title = "Percentage of Yes votes in the UN General Assembly", subtitle = "1946 to 2015", y = "% Yes", x = "Year", color = "Country" ) "Turkey"
  14. un_votes %>% f i lter(country %in% c("UK & NI", “US”,

    “France")) %>% inner_join(un_roll_calls, by = "rcid") %>% inner_join(un_roll_call_issues, by = "rcid") %>% group_by(country, year = year(date), issue) %>% summarize( votes = n(), percent_yes = mean(vote = = "yes") ) %>% f i lter(votes > 5) %>% # only use records where there are more than 5 votes ggplot(mapping = aes(x = year, y = percent_yes, color = country)) + geom_smooth(method = "loess", se = FALSE) + facet_wrap(~ issue) + labs( title = "Percentage of Yes votes in the UN General Assembly", subtitle = "1946 to 2015", y = "% Yes", x = "Year", color = "Country" ) "France"
  15. None
  16. None
  17. ex. 2 fi sheries of the world

  18. None
  19. fisheries %>% select(country) #> # A tibble: 75 x 1

    #> country #> <chr> #> 1 Algeria #> 2 Angola #> 3 Argentina #> 4 Australia #> 5 Bangladesh #> 6 Brazil #> 7 Cambodia #> 8 Canada #> 9 Chile #> 10 Colombia #> # … with 65 more rows continents #> # A tibble: 245 x 2 #> country continent #> <chr> <chr> #> 1 Afghanistan Asia #> 2 Åland Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # … with 235 more rows fisheries <- left_join(fisheries, continents) Joining, by = “country" ✓ data joins
  20. fisheries %>% filter(is.na(continent))#> # A tibble: 75 x 1 #>

    # A tibble: 5 x 4 #> country capture aquaculture continent #> <chr> <dbl> <dbl> <chr> #> 1 Congo, Democratic Republic of the 220000 2965 NA #> 2 Hong Kong 161964 4130 NA #> 3 Myanmar 1742956 474510 NA #> 4 Other 9685851 786993 NA #> 5 Taiwan (Republic of China) 1017243 304756 NA ✓ data joins ✓ ethics
  21. ✓ data joins ✓ ethics ✓ critique ✓ improving visualisations

  22. ✓ data joins ✓ ethics ✓ critique ✓ improving ✓

    visualisations ✓ mapping
  23. None
  24. ex. 3 First Minister’s COVID brie fi ngs

  25. None
  26. robotstxt::paths_allowed("https://www.gov.scot/") www.gov.scot [1] TRUE ✓ ethics

  27. ✓ web scraping ✓ text parsing ✓ data types ✓

    regular expressions ✓ ethics
  28. ✓ web scraping ✓ text parsing ✓ data types ✓

    regular expressions ✓ functions ✓ iteration ✓ ethics
  29. ✓ web scraping ✓ text parsing ✓ data types ✓

    regular expressions ✓ functions ✓ iteration ✓ visualisation ✓ interpretation ✓ ethics
  30. ✓ web scraping ✓ text parsing ✓ data types ✓

    regular expressions ✓ functions ✓ iteration ✓ visualisation ✓ interpretation ✓ text analysis ✓ ethics
  31. None
  32. ex. 3 spam fi lters

  33. ✓ logistic regression ✓ prediction

  34. ✓ logistic regression ✓ prediction ✓ decision errors ✓ sensitivity

    / speci fi city ✓ intuition around loss functions
  35. None
  36. ✓ machine learning for text data

  37. ✓ repetition tips

  38. ✓ repetition ✓ re fl ection # A tibble: 19

    x 2 bigram n <chr> <int> 1 question 7 19 2 question 8 16 3 questions 7 12 4 join function 9 5 question 2 9 6 choice questions 7 7 first question 7 8 multiple choice 7 9 correct answer 6 10 necessarily improve 6 11 join functions 5 12 question 1 5 13 7 8 4 14 airline names 4 15 data frames 4 16 feel like 4 17 many options 4 18 right answer 4 19 x axis 4 tips
  39. tips ✓ repetition ✓ re fl ection ✓ creativity

  40. tips ✓ re fl ection ✓ creativity ✓ peer review

    ✓ repetition
  41. tips ✓ repetition ✓ re fl ection ✓ creativity ✓

    peer review ✓ real work fl ows
  42. toolbox student

  43. toolbox instructor

  44. 🔗 datasciencebox.org

  45. 🔗 introds.org

  46. Mine Çetinkaya-Rundel & Victoria Ellison (2020) A Fresh Look at

    Introductory Data Science Journal of Statistics Education DOI: 10.1080/10691898.2020.1804497
  47. Journal of Statistics Education Special Issue on Computing in the

    Curriculum 🔗 tandfonline.com/doi/full/10.1080/10691898.2020.1870416 🔗 causeweb.org/cause/webinars
  48. 🔗 bit.ly/fresh-ds-csuf mine-cetinkaya-rundel cetinkaya.mine@gmail.com minebocek 🔗 datasciencebox.org