Upgrade to Pro — share decks privately, control downloads, hide ads and more …

If you teach it, they will come

If you teach it, they will come

The proliferation of vast quantities of available datasets that are large and complex has challenged universities to keep up with the demand for graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data. Nowadays, this training starts in an introductory data science course at most institutions. The demand for seats in such courses from students has also been increasing equally steadily over the past decade. In this talk, we present a case study of an introductory undergraduate course in data science and statistical thinking that is designed to address these needs and that has scaled from serving 18 students to over 300 students each semester over the last nine years.

Mine Cetinkaya-Rundel

January 12, 2024
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. STA 199 Introduction to Data Science and Statistical Thinking Intro

    to data science and statistical thinking. Learn to explore, visualize, and analyze data to understand natural phenomena, investigate patterns, model outcomes, and make predictions, and do so in a reproducible and shareable manner. Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization, and effective communication of results. Work on problems and case studies inspired by and based on real-world questions and data. The course will focus on the R statistical computing language. No statistical or computing background is necessary, i.e., no pre-requisites.
  2. “doing” data science Program Import Tidy Transform Visualize Model Communicate

    Understand Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science, 2nd Edition.
  3. design principles ‣ cherish day one ‣ start with cake

    ‣ skip baby steps ‣ hide the veggies ‣ leverage the ecosystem Data Science in a Box. datasciencebox.org/01-design-principles.
  4. ‣ data joins fisheries |> select(country) #> # A tibble:

    82 × 1 #> country #> <chr> #> 1 Angola #> 2 Argentina #> 3 Australia #> 4 Bangladesh #> 5 Brazil #> 6 Cambodia #> 7 Cameroon #> 8 Canada #> 9 Chad #> 10 Chile # ℹ 72 more rows continents #> # A tibble: 245 × 2 #> country continent #> <chr> <chr> #> 1 Afghanistan Asia #> 2 Åland Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # ℹ 235 more rows fisheries <- left_join(fisheries, continents) Joining with `by = join_by(country)`
  5. ‣ data joins ‣ data science ethics fisheries |> filter(is.na(continent))

    #> # A tibble: 3 × 5 #> country capture aquaculture total continent #> <chr> <dbl> <dbl> <dbl> <chr> #> 1 Democratic Republic of the Congo 237372 3161 240533 NA #> 2 Hong Kong 142775 4258 147033 NA #> 3 Myanmar 2072390 1017644 3090034 NA fisheries <- fisheries |> mutate( continent = case_when( country == "Democratic Republic of the Congo" ~ "Africa", country == "Hong Kong" ~ "Asia", country == "Myanmar" ~ "Asia", .default = continent ) )
  6. ‣ data joins ‣ data science ethics ‣ critique ‣

    improving data visualisations ‣ mapping
  7. Project: Regional differences in average GPA and SAT Question: Exploring

    the regional differences in average GPA and SAT score across the US and the factors that could potentially explain them.
  8. Project: Predicting League of Legends success Question: After 10 minutes

    into the game, whether a gold lead or an experienced lead was a better predictor of which team wins?
  9. the students “they” WHO WHY first and second year undergraduates

    want to learn data science as a potential pathway to their Stat/CS/etc. major second year or junior PoliSci/PubPol/etc. students pre-requisite for major junior or senior Social Science students want to learn / get better at R for research / senior thesis senior Humanities/etc. students wish to learn data science to explore digital humanities
  10. the students “they” MORE WHY from the Spring 2024 “getting

    to know you” survey New skills that I can adapt to everyday life or even my career hopefully as a lawyer. I hope to gain a basic foundation of intro level statistics that can apply to rudimentary analyses in my research lab. I also want to be able to work with, understand, and run "regressions." I want to see if stats would be something I am interested in doing as a major or just learning more about. Open to new fronts
  11. the students “they” MORE WHY common words/phrases mentioned in the

    Spring 2024 “Getting to know you” survey (with a bit of data clean-up) number of respondents = 258
  12. the departments “they” WHO HOW Political Science as 1 of

    2 foundational courses, along with their intro Public Policy as 1 of 5 core courses, along with four of their own CS as any 1 stat course Biology as any 1 stat course … … Statistical Science as 1 of 3 electives
  13. how can we continue to cultivate the welcoming and supportive

    nature of our introductory data science course without challenge cutting down on the communication thread in the course? building systems that limit flexibility in content and examples? growing the number of TAs without additional support and training?
  14. THEN NOW 6-10 homework assignments (individual) 6-10 labs with dedicated

    time in lab sessions (individual) 6-10 labs with dedicated time in lab sessions (team-based) 2 take-home exams (individual) 2 exams with in-class + take-home components (individual) 1 project with write-up + presentation (team-based) 1 project with write-up + presentation (team-based) attendance / participation / other / none attendance / participation / other / none
  15. How do the distributions of median income compare across major

    categories? For this exercise, focus on undergraduates (major_income_undergrad). (a) Calculate a the minimum, median, and maximum median income per major category as well as the number of majors in each category. Your summary statistics should be in decreasing order of median median income (b) Create box plots of the distribution of median income by major category. • The variable major_category should be on the y-axis and undergrad_median on the x-axis. • The order of the boxes in your plot should match the order in your summary table from part (a). • Use color to enhance your plot, and turn off any legends providing redundant information. • Style the x-axis labels such that the values are shown in thousands, e.g., 20000 should show up as $20K. (c) In 1-2 sentences, describe how median incomes across various major categories compare. Your description should also touch on where your own intended/declared major (yes, your major at Duke). example.
  16. THEN NOW ‣ Each item graded on the same scale:

    0 (no response) to 4 (mastery) ‣ Same number of items on each assignment ‣ Elements of rubric shared with students as a bullet point list ‣ TAs spend more of their time writing comments than thinking about points ‣ Faculty spend more of their time designing consistent items and assignments