Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The art and science of teaching data science

The art and science of teaching data science

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we introduce the design philosophy behind an introductory data science course, discuss in progress and future research on student learning as well as new directions in assessment and tooling as we scale up the course.

Talk given at Women in Data Science (WiDS) Conference, FAU Erlangen-Nürnberg, 20 – 21 April 2023. https://www.datascience.nat.fau.eu/women-in-datascience.

Mine Cetinkaya-Rundel

April 21, 2023
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. Image credit: Thomas Pedersen, data-imaginist.com/art
    the art and science
    of teaching data science
    mine çetinkaya-rundel
    bit.ly/ds-art-sci-wids
    mine-cetinkaya-rundel
    [email protected]
    @minebocek
    fosstodon.org/@minecr

    View Slide

  2. 2016 GAISE
    1. Teach statistical thinking.
    ‣ Teach statistics as an investigative process of problem-solving and decision making.
    Students should not leave their introductory statistics course with the mistaken impression
    that statistics consists of an unrelated collection of formulas and methods. Rather, students
    should understand that statistics is a problem-solving and decision making process that is
    fundamental to scienti
    fi
    c inquiry and essential for making sound decisions.
    ‣ Give students experience with multivariable thinking. We live in a complex world in
    which the answer to a question often depends on many factors. Students will encounter such
    situations within their own
    fi
    elds of study and everyday lives. We must prepare our students
    to answer challenging questions that require them to investigate and explore relationships
    among many variables. Doing so will help them to appreciate the value of statistical thinking
    and methods.
    2. Focus on conceptual understanding.
    3. Integrate real data with a context and purpose.
    4. Foster active learning.
    5. Use technology to explore concepts and analyse data.
    6. Use assessments to improve and evaluate student learning.
    amstat.org/asa/
    fi
    les/pdfs/GAISE/GaiseCollege_Full.pdf

    View Slide

  3. 2016 GAISE
    1. Teach statistical thinking.
    ‣ Teach statistics as an investigative process of problem-solving and decision making.
    Students should not leave their introductory statistics course with the mistaken impression
    that statistics consists of an unrelated collection of formulas and methods. Rather, students
    should understand that statistics is a problem-solving and decision making process that is
    fundamental to scienti
    fi
    c inquiry and essential for making sound decisions.
    ‣ Give students experience with multivariable thinking. We live in a complex world in
    which the answer to a question often depends on many factors. Students will encounter such
    situations within their own
    fi
    elds of study and everyday lives. We must prepare our students
    to answer challenging questions that require them to investigate and explore relationships
    among many variables. Doing so will help them to appreciate the value of statistical thinking
    and methods.
    2. Focus on conceptual understanding.
    3. Integrate real data with a context and purpose.
    4. Foster active learning.
    5. Use technology to explore concepts and analyse data.
    6. Use assessments to improve and evaluate student learning.
    amstat.org/asa/
    fi
    les/pdfs/GAISE/GaiseCollege_Full.pdf
    1 NOT a
    commonly used
    subset of tests
    and intervals
    and produce
    them with hand
    calculations

    View Slide

  4. 2016 GAISE
    1. Teach statistical thinking.
    ‣ Teach statistics as an investigative process of problem-solving and decision making.
    Students should not leave their introductory statistics course with the mistaken impression
    that statistics consists of an unrelated collection of formulas and methods. Rather, students
    should understand that statistics is a problem-solving and decision making process that is
    fundamental to scienti
    fi
    c inquiry and essential for making sound decisions.
    ‣ Give students experience with multivariable thinking. We live in a complex world in
    which the answer to a question often depends on many factors. Students will encounter such
    situations within their own
    fi
    elds of study and everyday lives. We must prepare our students
    to answer challenging questions that require them to investigate and explore relationships
    among many variables. Doing so will help them to appreciate the value of statistical thinking
    and methods.
    2. Focus on conceptual understanding.
    3. Integrate real data with a context and purpose.
    4. Foster active learning.
    5. Use technology to explore concepts and analyse data.
    6. Use assessments to improve and evaluate student learning.
    amstat.org/asa/
    fi
    les/pdfs/GAISE/GaiseCollege_Full.pdf
    2 Multivariate
    analysis
    requires the use
    of computing

    View Slide

  5. 2016 GAISE
    1. Teach statistical thinking.
    ‣ Teach statistics as an investigative process of problem-solving and decision making.
    Students should not leave their introductory statistics course with the mistaken impression
    that statistics consists of an unrelated collection of formulas and methods. Rather, students
    should understand that statistics is a problem-solving and decision making process that is
    fundamental to scienti
    fi
    c inquiry and essential for making sound decisions.
    ‣ Give students experience with multivariable thinking. We live in a complex world in
    which the answer to a question often depends on many factors. Students will encounter such
    situations within their own
    fi
    elds of study and everyday lives. We must prepare our students
    to answer challenging questions that require them to investigate and explore relationships
    among many variables. Doing so will help them to appreciate the value of statistical thinking
    and methods.
    2. Focus on conceptual understanding.
    3. Integrate real data with a context and purpose.
    4. Foster active learning.
    5. Use technology to explore concepts and analyse data.
    6. Use assessments to improve and evaluate student learning.
    amstat.org/asa/
    fi
    les/pdfs/GAISE/GaiseCollege_Full.pdf
    3 NOT use
    technology that
    is only
    applicable in the
    intro course or
    that doesn’t
    follow good
    science
    principles

    View Slide

  6. 2016 GAISE
    1. Teach statistical thinking.
    ‣ Teach statistics as an investigative process of problem-solving and decision making.
    Students should not leave their introductory statistics course with the mistaken impression
    that statistics consists of an unrelated collection of formulas and methods. Rather, students
    should understand that statistics is a problem-solving and decision making process that is
    fundamental to scienti
    fi
    c inquiry and essential for making sound decisions.
    ‣ Give students experience with multivariable thinking. We live in a complex world in
    which the answer to a question often depends on many factors. Students will encounter such
    situations within their own
    fi
    elds of study and everyday lives. We must prepare our students
    to answer challenging questions that require them to investigate and explore relationships
    among many variables. Doing so will help them to appreciate the value of statistical thinking
    and methods.
    2. Focus on conceptual understanding.
    3. Integrate real data with a context and purpose.
    4. Foster active learning.
    5. Use technology to explore concepts and analyse data.
    6. Use assessments to improve and evaluate student learning.
    amstat.org/asa/
    fi
    les/pdfs/GAISE/GaiseCollege_Full.pdf
    4 Data analysis
    isn’t just
    inference and
    modelling, it’s
    also data
    importing,
    cleaning,
    preparation,
    exploration, and
    visualisation

    View Slide

  7. a course that satis
    fi
    es these four
    points is looking more like today’s
    intro data science courses than
    (most) intro stats courses
    but this is not because
    intro stats is inherently
    “bad for you”
    instead it is because it’s time to visit
    intro stats in light of emergence of
    data science

    View Slide

  8. View Slide

  9. ‣ Go to Posit Cloud
    ‣ Start the project titled UN Votes

    View Slide

  10. ‣ Go to Posit Cloud
    ‣ Start the project titled UN Votes
    ‣ Open the Quarto document called unvotes.qmd

    View Slide

  11. ‣ Go to Posit Cloud
    ‣ Start the project titled UN Votes
    ‣ Open the Quarto document called unvotes.qmd
    ‣ Render the document and review the data visualization you just produced

    View Slide

  12. ‣ Go to Posit Cloud
    ‣ Start the project titled UN Votes
    ‣ Open the Quarto document called unvotes.qmd
    ‣ Knit the document and review the data visualization you just produced
    ‣ Then, look for the character string “Turkey” in the code and replace it with
    another country of your choice
    ‣ Render again, and review how the voting patterns of the country you
    picked compare to the United States and the United Kingdom

    View Slide

  13. three questions that keep me up at night…
    1 what should students learn?
    2 how will students learn best?
    3 what tools will enhance student learning?

    View Slide

  14. three questions that keep me up at night…
    1 what should students learn?
    2 how will students learn best?
    3 what tools will enhance student learning?
    content
    pedagogy
    infrastructure

    View Slide

  15. content

    View Slide

  16. ex. 1
    fi
    sheries of the world

    View Slide

  17. View Slide

  18. ✴ data joins

    View Slide

  19. ✴ data joins
    ✴ data science ethics

    View Slide

  20. ✴ data joins
    ✴ data science ethics
    ✴ critique
    ✴ improving data
    visualisations

    View Slide

  21. ✴ data joins
    ✴ data science ethics
    ✴ critique
    ✴ improving data
    visualisations
    ✴ mapping

    View Slide

  22. Project: Regional differences in average GPA and SAT
    Question: Exploring the regional differences in average GPA and SAT
    score across the US and the factors that could potentially explain them.
    Team: Mine’s Minions

    View Slide

  23. ex. 2
    First Minister’s COVID brie
    fi
    ngs

    View Slide

  24. View Slide

  25. ✴ web scraping
    ✴ text parsing
    ✴ data types
    ✴ regular expressions

    View Slide

  26. ✴ web scraping
    ✴ text parsing
    ✴ data types
    ✴ regular expressions
    ✴ functions
    ✴ iteration

    View Slide

  27. ✴ web scraping
    ✴ text parsing
    ✴ data types
    ✴ regular expressions
    ✴ functions
    ✴ iteration
    ✴ data visualisation
    ✴ interpretation

    View Slide

  28. ✴ web scraping
    ✴ text parsing
    ✴ data types
    ✴ regular expressions
    ✴ functions
    ✴ iteration
    ✴ data visualisation
    ✴ interpretation
    ✴ text analysis

    View Slide

  29. ✴ web scraping
    ✴ text parsing
    ✴ data types
    ✴ regular expressions
    ✴ functions
    ✴ iteration
    ✴ data visualisation
    ✴ interpretation
    ✴ text analysis
    ✴ data science ethics
    robotstxt::paths_allowed("https://www.gov.scot")
    #> www.gov.scot
    #> [1] TRUE

    View Slide

  30. Project: Factors Most Important to University Ranking
    Question: Explore how various metrics (e.g., SAT/ACT scores, admission
    rate, region, Carnegie classi
    fi
    cation) predict rankings on the Niche College
    Ranking List.
    Team: 2cool4school

    View Slide

  31. ex. 3
    spam
    fi
    lters

    View Slide

  32. ✴ logistic regression
    ✴ prediction

    View Slide

  33. ✴ logistic regression
    ✴ prediction
    ✴ decision errors
    ✴ sensitivity /
    speci
    fi
    city
    ✴ intuition around
    loss functions

    View Slide

  34. Project: Predicting League of Legends success
    Question: After 10 minutes into the game, whether a gold lead or an
    experienced lead was a better predictor of which team wins?
    Team: Blue Squirrels

    View Slide

  35. Project: A Critique of Hollywood Relationship Stereotypes
    Question: How has the average age difference between two actors in an
    on-screen relationship changed over the years? Furthermore, do on-screen
    same-sex relationships have a different average age gap than on-screen
    heterosexual relationships?
    Team: team300

    View Slide

  36. pedagogy

    View Slide

  37. teams: weekly labs in teams +
    periodic team evaluations +
    term project in teams
    peer feedback: used
    minimally so far, but
    positive experience
    “minute paper”: weekly online
    quizzes ending with a brief re
    fl
    ection
    of the week’s material

    View Slide

  38. teams: weekly labs in teams +
    periodic team evaluations +
    term project in teams
    peer feedback: used
    minimally so far, but
    positive experience
    “minute paper”: weekly online
    quizzes ending with a brief re
    fl
    ection
    of the week’s material
    creativity: assignments that
    make room for creativity

    View Slide

  39. Çetinkaya-Rundel, Mine, Mine
    Dogucu, and Wendy
    Rummer
    fi
    eld.


    "The 5Ws and 1H of term
    projects in the introductory
    data science classroom."


    Statistics Education Research
    Journal 21.2 (2022): 4-4.

    View Slide

  40. View Slide

  41. infrastructure & tooling

    View Slide

  42. student-facing
    +
    📦
    ghclass
    +
    instructor-facing
    📦
    checklist
    +
    +
    📦
    learnr
    +
    📦
    gradethis
    📦
    learnrhash

    View Slide

  43. course
    organization
    students
    members
    assignments
    repos

    View Slide

  44. course
    organization
    teams
    teams
    projects
    repos

    View Slide

  45. View Slide

  46. View Slide

  47. Beckman, M. D., Çetinkaya-
    Rundel, M., Horton, N. J., Rundel,
    C. W., Sullivan, A. J., & Tackett, M.


    "Implementing version control
    with Git and GitHub as a
    learning objective in statistics
    and data science courses."


    Journal of Statistics and Data
    Science Education 29.sup1
    (2021): S132-S144.

    View Slide

  48. openness

    View Slide

  49. View Slide

  50. View Slide

  51. on

    View Slide

  52. View Slide

  53. View Slide

  54. Çetinkaya-Rundel, Mine, and
    Victoria Ellison.


    "A fresh look at introductory
    data science."


    Journal of Statistics and Data
    Science Education 29.sup1
    (2021): S16-S26.

    View Slide

  55. Image credit:
    Thomas Pedersen, data-imaginist.com/art
    the art and science
    of teaching data science
    mine çetinkaya-rundel
    mine-cetinkaya-rundel
    [email protected]
    @minebocek
    bit.ly/ds-art-sci-wids
    fosstodon.org/@minecr

    View Slide