Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The art and science of teaching data science

The art and science of teaching data science

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we introduce the design philosophy behind an introductory data science course, discuss in progress and future research on student learning as well as new directions in assessment and tooling as we scale up the course.

Talk given at Women in Data Science (WiDS) Conference, FAU Erlangen-Nürnberg, 20 – 21 April 2023. https://www.datascience.nat.fau.eu/women-in-datascience.

Mine Cetinkaya-Rundel

April 21, 2023
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of

    teaching data science mine çetinkaya-rundel bit.ly/ds-art-sci-wids mine-cetinkaya-rundel [email protected] @minebocek fosstodon.org/@minecr
  2. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf
  3. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 1 NOT a commonly used subset of tests and intervals and produce them with hand calculations
  4. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 2 Multivariate analysis requires the use of computing
  5. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 3 NOT use technology that is only applicable in the intro course or that doesn’t follow good science principles
  6. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 4 Data analysis isn’t just inference and modelling, it’s also data importing, cleaning, preparation, exploration, and visualisation
  7. a course that satis fi es these four points is

    looking more like today’s intro data science courses than (most) intro stats courses but this is not because intro stats is inherently “bad for you” instead it is because it’s time to visit intro stats in light of emergence of data science
  8. ‣ Go to Posit Cloud ‣ Start the project titled

    UN Votes ‣ Open the Quarto document called unvotes.qmd
  9. ‣ Go to Posit Cloud ‣ Start the project titled

    UN Votes ‣ Open the Quarto document called unvotes.qmd ‣ Render the document and review the data visualization you just produced
  10. ‣ Go to Posit Cloud ‣ Start the project titled

    UN Votes ‣ Open the Quarto document called unvotes.qmd ‣ Knit the document and review the data visualization you just produced ‣ Then, look for the character string “Turkey” in the code and replace it with another country of your choice ‣ Render again, and review how the voting patterns of the country you picked compare to the United States and the United Kingdom
  11. three questions that keep me up at night… 1 what

    should students learn? 2 how will students learn best? 3 what tools will enhance student learning?
  12. three questions that keep me up at night… 1 what

    should students learn? 2 how will students learn best? 3 what tools will enhance student learning? content pedagogy infrastructure
  13. ✴ data joins ✴ data science ethics ✴ critique ✴

    improving data visualisations ✴ mapping
  14. Project: Regional differences in average GPA and SAT Question: Exploring

    the regional differences in average GPA and SAT score across the US and the factors that could potentially explain them. Team: Mine’s Minions
  15. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration
  16. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation
  17. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis
  18. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis ✴ data science ethics robotstxt::paths_allowed("https://www.gov.scot") #> www.gov.scot #> [1] TRUE
  19. Project: Factors Most Important to University Ranking Question: Explore how

    various metrics (e.g., SAT/ACT scores, admission rate, region, Carnegie classi fi cation) predict rankings on the Niche College Ranking List. Team: 2cool4school
  20. ✴ logistic regression ✴ prediction ✴ decision errors ✴ sensitivity

    / speci fi city ✴ intuition around loss functions
  21. Project: Predicting League of Legends success Question: After 10 minutes

    into the game, whether a gold lead or an experienced lead was a better predictor of which team wins? Team: Blue Squirrels
  22. Project: A Critique of Hollywood Relationship Stereotypes Question: How has

    the average age difference between two actors in an on-screen relationship changed over the years? Furthermore, do on-screen same-sex relationships have a different average age gap than on-screen heterosexual relationships? Team: team300
  23. teams: weekly labs in teams + periodic team evaluations +

    term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief re fl ection of the week’s material
  24. teams: weekly labs in teams + periodic team evaluations +

    term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief re fl ection of the week’s material creativity: assignments that make room for creativity
  25. Çetinkaya-Rundel, Mine, Mine Dogucu, and Wendy Rummer fi eld. "The

    5Ws and 1H of term projects in the introductory data science classroom." Statistics Education Research Journal 21.2 (2022): 4-4.
  26. student-facing + 📦 ghclass + instructor-facing 📦 checklist + +

    📦 learnr + 📦 gradethis 📦 learnrhash
  27. Beckman, M. D., Çetinkaya- Rundel, M., Horton, N. J., Rundel,

    C. W., Sullivan, A. J., & Tackett, M. "Implementing version control with Git and GitHub as a learning objective in statistics and data science courses." Journal of Statistics and Data Science Education 29.sup1 (2021): S132-S144.
  28. on

  29. Çetinkaya-Rundel, Mine, and Victoria Ellison. "A fresh look at introductory

    data science." Journal of Statistics and Data Science Education 29.sup1 (2021): S16-S26.
  30. Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of

    teaching data science mine çetinkaya-rundel mine-cetinkaya-rundel [email protected] @minebocek bit.ly/ds-art-sci-wids fosstodon.org/@minecr