110

The art and science of teaching data science

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we introduce the design philosophy behind an introductory data science course, discuss in progress and future research on student learning as well as new directions in assessment and tooling as we scale up the course.

Talk given at Women in Data Science (WiDS) Conference, FAU Erlangen-Nürnberg, 20 – 21 April 2023. https://www.datascience.nat.fau.eu/women-in-datascience.

April 21, 2023

Transcript

1. Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of

teaching data science mine çetinkaya-rundel bit.ly/ds-art-sci-wids mine-cetinkaya-rundel [email protected] @minebocek fosstodon.org/@minecr
2. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf
3. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 1 NOT a commonly used subset of tests and intervals and produce them with hand calculations
4. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 2 Multivariate analysis requires the use of computing
5. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 3 NOT use technology that is only applicable in the intro course or that doesn’t follow good science principles
6. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 4 Data analysis isn’t just inference and modelling, it’s also data importing, cleaning, preparation, exploration, and visualisation
7. a course that satis fi es these four points is

looking more like today’s intro data science courses than (most) intro stats courses but this is not because intro stats is inherently “bad for you” instead it is because it’s time to visit intro stats in light of emergence of data science

10. ‣ Go to Posit Cloud ‣ Start the project titled

UN Votes ‣ Open the Quarto document called unvotes.qmd ‣ Render the document and review the data visualization you just produced
11. ‣ Go to Posit Cloud ‣ Start the project titled

UN Votes ‣ Open the Quarto document called unvotes.qmd ‣ Knit the document and review the data visualization you just produced ‣ Then, look for the character string “Turkey” in the code and replace it with another country of your choice ‣ Render again, and review how the voting patterns of the country you picked compare to the United States and the United Kingdom
12. three questions that keep me up at night… 1 what

should students learn? 2 how will students learn best? 3 what tools will enhance student learning?
13. three questions that keep me up at night… 1 what

should students learn? 2 how will students learn best? 3 what tools will enhance student learning? content pedagogy infrastructure

18. ✴ data joins ✴ data science ethics ✴ critique ✴

improving data visualisations
19. ✴ data joins ✴ data science ethics ✴ critique ✴

improving data visualisations ✴ mapping
20. Project: Regional differences in average GPA and SAT Question: Exploring

the regional differences in average GPA and SAT score across the US and the factors that could potentially explain them. Team: Mine’s Minions

22. ✴ web scraping ✴ text parsing ✴ data types ✴

regular expressions
23. ✴ web scraping ✴ text parsing ✴ data types ✴

regular expressions ✴ functions ✴ iteration
24. ✴ web scraping ✴ text parsing ✴ data types ✴

regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation
25. ✴ web scraping ✴ text parsing ✴ data types ✴

regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis
26. ✴ web scraping ✴ text parsing ✴ data types ✴

regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis ✴ data science ethics robotstxt::paths_allowed("https://www.gov.scot") #> www.gov.scot #> [1] TRUE
27. Project: Factors Most Important to University Ranking Question: Explore how

various metrics (e.g., SAT/ACT scores, admission rate, region, Carnegie classi fi cation) predict rankings on the Niche College Ranking List. Team: 2cool4school

30. ✴ logistic regression ✴ prediction ✴ decision errors ✴ sensitivity

/ speci fi city ✴ intuition around loss functions
31. Project: Predicting League of Legends success Question: After 10 minutes

into the game, whether a gold lead or an experienced lead was a better predictor of which team wins? Team: Blue Squirrels
32. Project: A Critique of Hollywood Relationship Stereotypes Question: How has

the average age difference between two actors in an on-screen relationship changed over the years? Furthermore, do on-screen same-sex relationships have a different average age gap than on-screen heterosexual relationships? Team: team300

34. teams: weekly labs in teams + periodic team evaluations +

term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief re fl ection of the week’s material
35. teams: weekly labs in teams + periodic team evaluations +

term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief re fl ection of the week’s material creativity: assignments that make room for creativity
36. Çetinkaya-Rundel, Mine, Mine Dogucu, and Wendy Rummer fi eld. "The

5Ws and 1H of term projects in the introductory data science classroom." Statistics Education Research Journal 21.2 (2022): 4-4.

38. student-facing + 📦 ghclass + instructor-facing 📦 checklist + +

📦 learnr + 📦 gradethis 📦 learnrhash

41. Beckman, M. D., Çetinkaya- Rundel, M., Horton, N. J., Rundel,

C. W., Sullivan, A. J., & Tackett, M. "Implementing version control with Git and GitHub as a learning objective in statistics and data science courses." Journal of Statistics and Data Science Education 29.sup1 (2021): S132-S144.

44. Çetinkaya-Rundel, Mine, and Victoria Ellison. "A fresh look at introductory

data science." Journal of Statistics and Data Science Education 29.sup1 (2021): S16-S26.
45. Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of

teaching data science mine çetinkaya-rundel mine-cetinkaya-rundel [email protected] @minebocek bit.ly/ds-art-sci-wids fosstodon.org/@minecr