Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The art and science of teaching data science (UoE Biology)

81689b093f75cf3f383e581ca57188df?s=47 Mine Cetinkaya-Rundel
November 11, 2021
73

The art and science of teaching data science (UoE Biology)

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science, it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we introduce the design philosophy behind an introductory data science course, discuss in progress and future research on student learning as well as new directions in assessment and tooling as we scale up the course.

81689b093f75cf3f383e581ca57188df?s=128

Mine Cetinkaya-Rundel

November 11, 2021
Tweet

Transcript

  1. Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of

    teaching data science mine çetinkaya-rundel bit.ly/ds-art-sci-uoe-bio mine-cetinkaya-rundel cetinkaya.mine@gmail.com @minebocek
  2. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf
  3. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 1 NOT a commonly used subset of tests and intervals and produce them with hand calculations
  4. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 2 Multivariate analysis requires the use of computing
  5. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 3 NOT use technology that is only applicable in the intro course or that doesn’t follow good science principles
  6. 2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as

    an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scienti fi c inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fi elds of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/ fi les/pdfs/GAISE/GaiseCollege_Full.pdf 4 Data analysis isn’t just inference and modelling, it’s also data importing, cleaning, preparation, exploration, and visualisation
  7. a course that satis fi es these four points is

    looking more like today’s intro data science courses than (most) intro stats courses but this is not because intro stats is inherently “bad for you” instead it is because it’s time to visit intro stats in light of emergence of data science
  8. None
  9. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votes
  10. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votes ‣ Open the R Markdown document called unvotes.Rmd
  11. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votes ‣ Open the R Markdown document called unvotes.Rmd ‣ Knit the document and review the data visualisation you just produced
  12. ‣ Go to RStudio Cloud ‣ Start the project titled

    UN Votes ‣ Open the R Markdown document called unvotes.Rmd ‣ Knit the document and review the data visualisation you just produced ‣ Then, look for the character string “Turkey” in the code and replace it with another country of your choice ‣ Knit again, and review how the voting patterns of the country you picked compares to the United States and United Kingdom & Northern Ireland
  13. three questions that keep me up at night… 1 what

    should students learn? 2 how will students learn best? 3 what tools will enhance student learning?
  14. three questions that keep me up at night… 1 what

    should students learn? 2 how will students learn best? 3 what tools will enhance student learning? content pedagogy infrastructure
  15. content

  16. ex. 1 fi sheries of the world

  17. None
  18. ✴ data joins

  19. ✴ data joins ✴ data science ethics

  20. ✴ data joins ✴ data science ethics ✴ critique ✴

    improving data visualisations
  21. ✴ data joins ✴ data science ethics ✴ critique ✴

    improving data visualisations ✴ mapping
  22. Project: 2016 US Election Redux Question: Would the outcome of

    the 2016 US Presidential Elections been di ff erent had Bernie Sanders been the Democrat candidate? Team: 4 Squared
  23. ex. 2 First Minister’s COVID brie fi ngs

  24. None
  25. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions
  26. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration
  27. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation
  28. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis
  29. ✴ web scraping ✴ text parsing ✴ data types ✴

    regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis ✴ data science ethics robotstxt::paths_allowed("https://www.gov.scot") #> www.gov.scot #> [1] TRUE
  30. Project: The North South Divide: University Edition Question: Does the

    geographical location of a UK university a ff ect its university score? Team: Fried Egg Jelly Fish
  31. ex. 3 spam fi lters

  32. ✴ logistic regression ✴ prediction

  33. ✴ logistic regression ✴ prediction ✴ decision errors ✴ sensitivity

    / speci fi city ✴ intuition around loss functions
  34. Project: Spotify Top 100 Tracks of 2017/18 Question: Is it

    possible to predict the year a song made the Top Tracks playlist based on its metadata? Team: weR20 year ~ danceability + energy + key + loudness + mode + speechiness + acousticness + instrumentalness + liveness + valence + tempo + duration_s 2017 name artists I'm the One DJ Khaled Redbone Childish Gambino Sign of the Times Harry Styles 2018 name artists Everybody Dies In Their Nightmares XXXTENTACION Jocelyn F l ores XXXTENTACION Plug Walk Rich The Kid Moonlight XXXTENTACION Nevermind Dennis Lloyd In My Mind Dynoro changes XXXTENTACION
  35. pedagogy

  36. teams: weekly labs in teams + periodic team evaluations +

    term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief re fl ection of the week’s material
  37. None
  38. # A tibble: 19 x 2 bigram n <chr> <int>

    1 question 7 19 2 question 8 16 3 questions 7 12 4 join function 9 5 question 2 9 6 choice questions 7 7 first question 7 8 multiple choice 7 9 correct answer 6 10 necessarily improve 6 11 join functions 5 12 question 1 5 13 7 8 4 14 airline names 4 15 data frames 4 16 feel like 4 17 many options 4 18 right answer 4 19 x axis 4
  39. teams: weekly labs in teams + periodic team evaluations +

    term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief re fl ection of the week’s material creativity: assignments that make room for creativity
  40. None
  41. None
  42. infrastructure & tooling

  43. student-facing + 📦 ghclass + instructor-facing 📦 checklist + +

    📦 learnr + 📦 parsermd 📦 gradethis 📦 learnrhash
  44. 📦 ghclass + +

  45. 📦 ghclass +

  46. openness

  47. None
  48. None
  49. on

  50. None
  51. None
  52. Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of

    teaching data science mine çetinkaya-rundel mine-cetinkaya-rundel cetinkaya.mine@gmail.com @minebocek bit.ly/ds-art-sci-uoe-bio