Slide 1

Slide 1 text

if you teach it, they will come 🔗 bit.ly/if-you-teach-it-imsi mine çetinkaya-rundel duke university

Slide 2

Slide 2 text

what are they coming to? if you teach it, they will come

Slide 3

Slide 3 text

STA 199 Introduction to Data Science and Statistical Thinking Intro to data science and statistical thinking. Learn to explore, visualize, and analyze data to understand natural phenomena, investigate patterns, model outcomes, and make predictions, and do so in a reproducible and shareable manner. Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization, and effective communication of results. Work on problems and case studies inspired by and based on real-world questions and data. The course will focus on the R statistical computing language. No statistical or computing background is necessary, i.e., no pre-requisites.

Slide 4

Slide 4 text

“doing” data science Program Import Tidy Transform Visualize Model Communicate Understand Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science, 2nd Edition.

Slide 5

Slide 5 text

“learning” data science Communicate

Slide 6

Slide 6 text

hello world Communicate

Slide 7

Slide 7 text

exploring data Communicate + curiosity +

Slide 8

Slide 8 text

data science ethics Communicate + responsibility

Slide 9

Slide 9 text

making rigorous conclusions Communicate + complexity +

Slide 10

Slide 10 text

looking further Communicate 🦪

Slide 11

Slide 11 text

✨ communicate ✨ Communicate

Slide 12

Slide 12 text

design principles ‣ cherish day one ‣ start with cake ‣ skip baby steps ‣ hide the veggies ‣ leverage the ecosystem Data Science in a Box. datasciencebox.org/01-design-principles.

Slide 13

Slide 13 text

example.

Slide 14

Slide 14 text

‣ data joins fisheries |> select(country) #> # A tibble: 82 × 1 #> country #> #> 1 Angola #> 2 Argentina #> 3 Australia #> 4 Bangladesh #> 5 Brazil #> 6 Cambodia #> 7 Cameroon #> 8 Canada #> 9 Chad #> 10 Chile # ℹ 72 more rows continents #> # A tibble: 245 × 2 #> country continent #> #> 1 Afghanistan Asia #> 2 Åland Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # ℹ 235 more rows fisheries <- left_join(fisheries, continents) Joining with `by = join_by(country)`

Slide 15

Slide 15 text

‣ data joins ‣ data science ethics fisheries |> filter(is.na(continent)) #> # A tibble: 3 × 5 #> country capture aquaculture total continent #> #> 1 Democratic Republic of the Congo 237372 3161 240533 NA #> 2 Hong Kong 142775 4258 147033 NA #> 3 Myanmar 2072390 1017644 3090034 NA fisheries <- fisheries |> mutate( continent = case_when( country == "Democratic Republic of the Congo" ~ "Africa", country == "Hong Kong" ~ "Asia", country == "Myanmar" ~ "Asia", .default = continent ) )

Slide 16

Slide 16 text

‣ data joins ‣ data science ethics ‣ critique ‣ improving data visualisations

Slide 17

Slide 17 text

‣ data joins ‣ data science ethics ‣ critique ‣ improving data visualisations ‣ mapping

Slide 18

Slide 18 text

Project: Regional differences in average GPA and SAT Question: Exploring the regional differences in average GPA and SAT score across the US and the factors that could potentially explain them.

Slide 19

Slide 19 text

Project: Predicting League of Legends success Question: After 10 minutes into the game, whether a gold lead or an experienced lead was a better predictor of which team wins?

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

if you teach it, they will come who are they?

Slide 22

Slide 22 text

the students “they” WHO WHY first and second year undergraduates want to learn data science as a potential pathway to their Stat/CS/etc. major second year or junior PoliSci/PubPol/etc. students pre-requisite for major junior or senior Social Science students want to learn / get better at R for research / senior thesis senior Humanities/etc. students wish to learn data science to explore digital humanities

Slide 23

Slide 23 text

the students “they” MORE WHY from the Spring 2024 “getting to know you” survey New skills that I can adapt to everyday life or even my career hopefully as a lawyer. I hope to gain a basic foundation of intro level statistics that can apply to rudimentary analyses in my research lab. I also want to be able to work with, understand, and run "regressions." I want to see if stats would be something I am interested in doing as a major or just learning more about. Open to new fronts

Slide 24

Slide 24 text

the students “they” MORE WHY common words/phrases mentioned in the Spring 2024 “Getting to know you” survey (with a bit of data clean-up) number of respondents = 258

Slide 25

Slide 25 text

the departments “they” WHO HOW Political Science as 1 of 2 foundational courses, along with their intro Public Policy as 1 of 5 core courses, along with four of their own CS as any 1 stat course Biology as any 1 stat course … … Statistical Science as 1 of 3 electives

Slide 26

Slide 26 text

if you teach it, they will come how many are they?

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

developed as part of Duke’s first-year seminar cluster program (FOCUS)

Slide 29

Slide 29 text

launched as an option for 100-level “service course”

Slide 30

Slide 30 text

established curriculum offered by multiple faculty across different sections 90 students / section

Slide 31

Slide 31 text

established curriculum offered by multiple faculty across different sections 120 students / section

Slide 32

Slide 32 text

established curriculum offered by multiple faculty across different sections 150 students / section

Slide 33

Slide 33 text

everyone together in a single section 282* students

Slide 34

Slide 34 text

if you teach it, they will come welcome them with open arms! 🤗

Slide 35

Slide 35 text

if you teach it, they will come we’re running out of arms! 😱

Slide 36

Slide 36 text

how can we continue to cultivate the welcoming and supportive nature of our introductory data science course without challenge cutting down on the communication thread in the course? building systems that limit flexibility in content and examples? growing the number of TAs without additional support and training?

Slide 37

Slide 37 text

is less more? assessments

Slide 38

Slide 38 text

THEN NOW 6-10 homework assignments (individual) 6-10 labs with dedicated time in lab sessions (individual) 6-10 labs with dedicated time in lab sessions (team-based) 2 take-home exams (individual) 2 exams with in-class + take-home components (individual) 1 project with write-up + presentation (team-based) 1 project with write-up + presentation (team-based) attendance / participation / other / none attendance / participation / other / none

Slide 39

Slide 39 text

is different better? grading

Slide 40

Slide 40 text

How do the distributions of median income compare across major categories? For this exercise, focus on undergraduates (major_income_undergrad). (a) Calculate a the minimum, median, and maximum median income per major category as well as the number of majors in each category. Your summary statistics should be in decreasing order of median median income (b) Create box plots of the distribution of median income by major category. • The variable major_category should be on the y-axis and undergrad_median on the x-axis. • The order of the boxes in your plot should match the order in your summary table from part (a). • Use color to enhance your plot, and turn off any legends providing redundant information. • Style the x-axis labels such that the values are shown in thousands, e.g., 20000 should show up as $20K. (c) In 1-2 sentences, describe how median incomes across various major categories compare. Your description should also touch on where your own intended/declared major (yes, your major at Duke). example.

Slide 41

Slide 41 text

THEN NOW ‣ Each item graded on the same scale: 0 (no response) to 4 (mastery) ‣ Same number of items on each assignment ‣ Elements of rubric shared with students as a bullet point list ‣ TAs spend more of their time writing comments than thinking about points ‣ Faculty spend more of their time designing consistent items and assignments

Slide 42

Slide 42 text

let’s check back in the summer! outcome

Slide 43

Slide 43 text

thank you! mine çetinkaya-rundel duke university 🔗 bit.ly/if-you-teach-it-imsi