If you teach it, they will come

if you teach it, they will come 🔗 bit.ly/if-you-teach-it-imsi mine
çetinkaya-rundel duke university

what are they coming to? if you teach it, they
will come

STA 199 Introduction to Data Science and Statistical Thinking Intro
to data science and statistical thinking. Learn to explore, visualize, and analyze data to understand natural phenomena, investigate patterns, model outcomes, and make predictions, and do so in a reproducible and shareable manner. Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization, and effective communication of results. Work on problems and case studies inspired by and based on real-world questions and data. The course will focus on the R statistical computing language. No statistical or computing background is necessary, i.e., no pre-requisites.

“doing” data science Program Import Tidy Transform Visualize Model Communicate
Understand Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science, 2nd Edition.

“learning” data science Communicate

hello world Communicate

exploring data Communicate + curiosity +

data science ethics Communicate + responsibility

making rigorous conclusions Communicate + complexity +

looking further Communicate 🦪

✨ communicate ✨ Communicate

design principles ‣ cherish day one ‣ start with cake
‣ skip baby steps ‣ hide the veggies ‣ leverage the ecosystem Data Science in a Box. datasciencebox.org/01-design-principles.

example.

‣ data joins fisheries |> select(country) #> # A tibble:
82 × 1 #> country #> <chr> #> 1 Angola #> 2 Argentina #> 3 Australia #> 4 Bangladesh #> 5 Brazil #> 6 Cambodia #> 7 Cameroon #> 8 Canada #> 9 Chad #> 10 Chile # ℹ 72 more rows continents #> # A tibble: 245 × 2 #> country continent #> <chr> <chr> #> 1 Afghanistan Asia #> 2 Åland Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # ℹ 235 more rows fisheries <- left_join(fisheries, continents) Joining with `by = join_by(country)`

‣ data joins ‣ data science ethics fisheries |> filter(is.na(continent))
#> # A tibble: 3 × 5 #> country capture aquaculture total continent #> <chr> <dbl> <dbl> <dbl> <chr> #> 1 Democratic Republic of the Congo 237372 3161 240533 NA #> 2 Hong Kong 142775 4258 147033 NA #> 3 Myanmar 2072390 1017644 3090034 NA fisheries <- fisheries |> mutate( continent = case_when( country == "Democratic Republic of the Congo" ~ "Africa", country == "Hong Kong" ~ "Asia", country == "Myanmar" ~ "Asia", .default = continent ) )

‣ data joins ‣ data science ethics ‣ critique ‣
improving data visualisations

‣ data joins ‣ data science ethics ‣ critique ‣
improving data visualisations ‣ mapping

Project: Regional differences in average GPA and SAT Question: Exploring
the regional differences in average GPA and SAT score across the US and the factors that could potentially explain them.

Project: Predicting League of Legends success Question: After 10 minutes
into the game, whether a gold lead or an experienced lead was a better predictor of which team wins?

if you teach it, they will come who are they?

the students “they” WHO WHY first and second year undergraduates
want to learn data science as a potential pathway to their Stat/CS/etc. major second year or junior PoliSci/PubPol/etc. students pre-requisite for major junior or senior Social Science students want to learn / get better at R for research / senior thesis senior Humanities/etc. students wish to learn data science to explore digital humanities

the students “they” MORE WHY from the Spring 2024 “getting
to know you” survey New skills that I can adapt to everyday life or even my career hopefully as a lawyer. I hope to gain a basic foundation of intro level statistics that can apply to rudimentary analyses in my research lab. I also want to be able to work with, understand, and run "regressions." I want to see if stats would be something I am interested in doing as a major or just learning more about. Open to new fronts

the students “they” MORE WHY common words/phrases mentioned in the
Spring 2024 “Getting to know you” survey (with a bit of data clean-up) number of respondents = 258

the departments “they” WHO HOW Political Science as 1 of
2 foundational courses, along with their intro Public Policy as 1 of 5 core courses, along with four of their own CS as any 1 stat course Biology as any 1 stat course … … Statistical Science as 1 of 3 electives

if you teach it, they will come how many are
they?

developed as part of Duke’s first-year seminar cluster program (FOCUS)

launched as an option for 100-level “service course”

established curriculum offered by multiple faculty across different sections 90
students / section

students / section

everyone together in a single section 282* students

if you teach it, they will come welcome them with
open arms! 🤗

if you teach it, they will come we’re running out
of arms! 😱

how can we continue to cultivate the welcoming and supportive
nature of our introductory data science course without challenge cutting down on the communication thread in the course? building systems that limit flexibility in content and examples? growing the number of TAs without additional support and training?

is less more? assessments

THEN NOW 6-10 homework assignments (individual) 6-10 labs with dedicated
time in lab sessions (individual) 6-10 labs with dedicated time in lab sessions (team-based) 2 take-home exams (individual) 2 exams with in-class + take-home components (individual) 1 project with write-up + presentation (team-based) 1 project with write-up + presentation (team-based) attendance / participation / other / none attendance / participation / other / none

is different better? grading

How do the distributions of median income compare across major
categories? For this exercise, focus on undergraduates (major_income_undergrad). (a) Calculate a the minimum, median, and maximum median income per major category as well as the number of majors in each category. Your summary statistics should be in decreasing order of median median income (b) Create box plots of the distribution of median income by major category. • The variable major_category should be on the y-axis and undergrad_median on the x-axis. • The order of the boxes in your plot should match the order in your summary table from part (a). • Use color to enhance your plot, and turn off any legends providing redundant information. • Style the x-axis labels such that the values are shown in thousands, e.g., 20000 should show up as $20K. (c) In 1-2 sentences, describe how median incomes across various major categories compare. Your description should also touch on where your own intended/declared major (yes, your major at Duke). example.

THEN NOW ‣ Each item graded on the same scale:
0 (no response) to 4 (mastery) ‣ Same number of items on each assignment ‣ Elements of rubric shared with students as a bullet point list ‣ TAs spend more of their time writing comments than thinking about points ‣ Faculty spend more of their time designing consistent items and assignments

let’s check back in the summer! outcome

thank you! mine çetinkaya-rundel duke university 🔗 bit.ly/if-you-teach-it-imsi

If you teach it, they will come

If you teach it, they will come

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Featured

Transcript