85

The art and science of teaching data science

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we introduce the design philosophy behind an introductory data science course, discuss in progress and future research on student learning as well as new directions in assessment and tooling as we scale up the course.

Talk given at Women in Data Science (WiDS) Conference, FAU Erlangen-Nürnberg, 20 – 21 April 2023. https://www.datascience.nat.fau.eu/women-in-datascience.

April 21, 2023

Transcript

1. Image credit: Thomas Pedersen, data-imaginist.com/art
the art and science
of teaching data science
mine çetinkaya-rundel
bit.ly/ds-art-sci-wids
mine-cetinkaya-rundel
[email protected]
@minebocek
fosstodon.org/@minecr

2. 2016 GAISE
1. Teach statistical thinking.
‣ Teach statistics as an investigative process of problem-solving and decision making.
Students should not leave their introductory statistics course with the mistaken impression
that statistics consists of an unrelated collection of formulas and methods. Rather, students
should understand that statistics is a problem-solving and decision making process that is
fundamental to scienti
fi
c inquiry and essential for making sound decisions.
‣ Give students experience with multivariable thinking. We live in a complex world in
which the answer to a question often depends on many factors. Students will encounter such
situations within their own
fi
elds of study and everyday lives. We must prepare our students
to answer challenging questions that require them to investigate and explore relationships
among many variables. Doing so will help them to appreciate the value of statistical thinking
and methods.
2. Focus on conceptual understanding.
3. Integrate real data with a context and purpose.
4. Foster active learning.
5. Use technology to explore concepts and analyse data.
6. Use assessments to improve and evaluate student learning.
amstat.org/asa/
fi
les/pdfs/GAISE/GaiseCollege_Full.pdf

3. 2016 GAISE
1. Teach statistical thinking.
‣ Teach statistics as an investigative process of problem-solving and decision making.
Students should not leave their introductory statistics course with the mistaken impression
that statistics consists of an unrelated collection of formulas and methods. Rather, students
should understand that statistics is a problem-solving and decision making process that is
fundamental to scienti
fi
c inquiry and essential for making sound decisions.
‣ Give students experience with multivariable thinking. We live in a complex world in
which the answer to a question often depends on many factors. Students will encounter such
situations within their own
fi
elds of study and everyday lives. We must prepare our students
to answer challenging questions that require them to investigate and explore relationships
among many variables. Doing so will help them to appreciate the value of statistical thinking
and methods.
2. Focus on conceptual understanding.
3. Integrate real data with a context and purpose.
4. Foster active learning.
5. Use technology to explore concepts and analyse data.
6. Use assessments to improve and evaluate student learning.
amstat.org/asa/
fi
les/pdfs/GAISE/GaiseCollege_Full.pdf
1 NOT a
commonly used
subset of tests
and intervals
and produce
them with hand
calculations

4. 2016 GAISE
1. Teach statistical thinking.
‣ Teach statistics as an investigative process of problem-solving and decision making.
Students should not leave their introductory statistics course with the mistaken impression
that statistics consists of an unrelated collection of formulas and methods. Rather, students
should understand that statistics is a problem-solving and decision making process that is
fundamental to scienti
fi
c inquiry and essential for making sound decisions.
‣ Give students experience with multivariable thinking. We live in a complex world in
which the answer to a question often depends on many factors. Students will encounter such
situations within their own
fi
elds of study and everyday lives. We must prepare our students
to answer challenging questions that require them to investigate and explore relationships
among many variables. Doing so will help them to appreciate the value of statistical thinking
and methods.
2. Focus on conceptual understanding.
3. Integrate real data with a context and purpose.
4. Foster active learning.
5. Use technology to explore concepts and analyse data.
6. Use assessments to improve and evaluate student learning.
amstat.org/asa/
fi
les/pdfs/GAISE/GaiseCollege_Full.pdf
2 Multivariate
analysis
requires the use
of computing

5. 2016 GAISE
1. Teach statistical thinking.
‣ Teach statistics as an investigative process of problem-solving and decision making.
Students should not leave their introductory statistics course with the mistaken impression
that statistics consists of an unrelated collection of formulas and methods. Rather, students
should understand that statistics is a problem-solving and decision making process that is
fundamental to scienti
fi
c inquiry and essential for making sound decisions.
‣ Give students experience with multivariable thinking. We live in a complex world in
which the answer to a question often depends on many factors. Students will encounter such
situations within their own
fi
elds of study and everyday lives. We must prepare our students
to answer challenging questions that require them to investigate and explore relationships
among many variables. Doing so will help them to appreciate the value of statistical thinking
and methods.
2. Focus on conceptual understanding.
3. Integrate real data with a context and purpose.
4. Foster active learning.
5. Use technology to explore concepts and analyse data.
6. Use assessments to improve and evaluate student learning.
amstat.org/asa/
fi
les/pdfs/GAISE/GaiseCollege_Full.pdf
3 NOT use
technology that
is only
applicable in the
intro course or
that doesn’t
science
principles

6. 2016 GAISE
1. Teach statistical thinking.
‣ Teach statistics as an investigative process of problem-solving and decision making.
Students should not leave their introductory statistics course with the mistaken impression
that statistics consists of an unrelated collection of formulas and methods. Rather, students
should understand that statistics is a problem-solving and decision making process that is
fundamental to scienti
fi
c inquiry and essential for making sound decisions.
‣ Give students experience with multivariable thinking. We live in a complex world in
which the answer to a question often depends on many factors. Students will encounter such
situations within their own
fi
elds of study and everyday lives. We must prepare our students
to answer challenging questions that require them to investigate and explore relationships
among many variables. Doing so will help them to appreciate the value of statistical thinking
and methods.
2. Focus on conceptual understanding.
3. Integrate real data with a context and purpose.
4. Foster active learning.
5. Use technology to explore concepts and analyse data.
6. Use assessments to improve and evaluate student learning.
amstat.org/asa/
fi
les/pdfs/GAISE/GaiseCollege_Full.pdf
4 Data analysis
isn’t just
inference and
modelling, it’s
also data
importing,
cleaning,
preparation,
exploration, and
visualisation

7. a course that satis
fi
es these four
points is looking more like today’s
intro data science courses than
(most) intro stats courses
but this is not because
intro stats is inherently
instead it is because it’s time to visit
intro stats in light of emergence of
data science

8. ‣ Go to Posit Cloud
‣ Start the project titled UN Votes

9. ‣ Go to Posit Cloud
‣ Start the project titled UN Votes
‣ Open the Quarto document called unvotes.qmd

10. ‣ Go to Posit Cloud
‣ Start the project titled UN Votes
‣ Open the Quarto document called unvotes.qmd
‣ Render the document and review the data visualization you just produced

11. ‣ Go to Posit Cloud
‣ Start the project titled UN Votes
‣ Open the Quarto document called unvotes.qmd
‣ Knit the document and review the data visualization you just produced
‣ Then, look for the character string “Turkey” in the code and replace it with
‣ Render again, and review how the voting patterns of the country you
picked compare to the United States and the United Kingdom

12. three questions that keep me up at night…
1 what should students learn?
2 how will students learn best?
3 what tools will enhance student learning?

13. three questions that keep me up at night…
1 what should students learn?
2 how will students learn best?
3 what tools will enhance student learning?
content
pedagogy
infrastructure

14. content

15. ex. 1
fi
sheries of the world

16. ✴ data joins

17. ✴ data joins
✴ data science ethics

18. ✴ data joins
✴ data science ethics
✴ critique
✴ improving data
visualisations

19. ✴ data joins
✴ data science ethics
✴ critique
✴ improving data
visualisations
✴ mapping

20. Project: Regional differences in average GPA and SAT
Question: Exploring the regional differences in average GPA and SAT
score across the US and the factors that could potentially explain them.
Team: Mine’s Minions

21. ex. 2
First Minister’s COVID brie
fi
ngs

22. ✴ web scraping
✴ text parsing
✴ data types
✴ regular expressions

23. ✴ web scraping
✴ text parsing
✴ data types
✴ regular expressions
✴ functions
✴ iteration

24. ✴ web scraping
✴ text parsing
✴ data types
✴ regular expressions
✴ functions
✴ iteration
✴ data visualisation
✴ interpretation

25. ✴ web scraping
✴ text parsing
✴ data types
✴ regular expressions
✴ functions
✴ iteration
✴ data visualisation
✴ interpretation
✴ text analysis

26. ✴ web scraping
✴ text parsing
✴ data types
✴ regular expressions
✴ functions
✴ iteration
✴ data visualisation
✴ interpretation
✴ text analysis
✴ data science ethics
robotstxt::paths_allowed("https://www.gov.scot")
#> www.gov.scot
#> [1] TRUE

27. Project: Factors Most Important to University Ranking
Question: Explore how various metrics (e.g., SAT/ACT scores, admission
rate, region, Carnegie classi
fi
cation) predict rankings on the Niche College
Ranking List.
Team: 2cool4school

28. ex. 3
spam
fi
lters

29. ✴ logistic regression
✴ prediction

30. ✴ logistic regression
✴ prediction
✴ decision errors
✴ sensitivity /
speci
fi
city
✴ intuition around
loss functions

31. Project: Predicting League of Legends success
Question: After 10 minutes into the game, whether a gold lead or an
experienced lead was a better predictor of which team wins?
Team: Blue Squirrels

32. Project: A Critique of Hollywood Relationship Stereotypes
Question: How has the average age difference between two actors in an
on-screen relationship changed over the years? Furthermore, do on-screen
same-sex relationships have a different average age gap than on-screen
heterosexual relationships?
Team: team300

33. pedagogy

34. teams: weekly labs in teams +
periodic team evaluations +
term project in teams
peer feedback: used
minimally so far, but
positive experience
“minute paper”: weekly online
quizzes ending with a brief re
fl
ection
of the week’s material

35. teams: weekly labs in teams +
periodic team evaluations +
term project in teams
peer feedback: used
minimally so far, but
positive experience
“minute paper”: weekly online
quizzes ending with a brief re
fl
ection
of the week’s material
creativity: assignments that
make room for creativity

36. Çetinkaya-Rundel, Mine, Mine
Dogucu, and Wendy
Rummer
fi
eld.

"The 5Ws and 1H of term
projects in the introductory
data science classroom."

Statistics Education Research
Journal 21.2 (2022): 4-4.

37. infrastructure & tooling

38. student-facing
+
📦
ghclass
+
instructor-facing
📦
checklist
+
+
📦
learnr
+
📦
📦
learnrhash

39. course
organization
students
members
assignments
repos

40. course
organization
teams
teams
projects
repos

41. Beckman, M. D., Çetinkaya-
Rundel, M., Horton, N. J., Rundel,
C. W., Sullivan, A. J., & Tackett, M.

"Implementing version control
with Git and GitHub as a
learning objective in statistics
and data science courses."

Journal of Statistics and Data
Science Education 29.sup1
(2021): S132-S144.

42. openness

43. on

44. Çetinkaya-Rundel, Mine, and
Victoria Ellison.

"A fresh look at introductory
data science."

Journal of Statistics and Data
Science Education 29.sup1
(2021): S16-S26.

45. Image credit:
Thomas Pedersen, data-imaginist.com/art
the art and science
of teaching data science
mine çetinkaya-rundel
mine-cetinkaya-rundel
[email protected]
@minebocek
bit.ly/ds-art-sci-wids
fosstodon.org/@minecr