$30 off During Our Annual Pro Sale. View Details »

The art and science of teaching data science (Nordstat)

The art and science of teaching data science (Nordstat)

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science, it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we introduce the design philosophy behind an introductory data science course, discuss in progress and future research on student learning as well as new directions in assessment and tooling as we scale up the course.

Mine Cetinkaya-Rundel

June 22, 2021
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. Image credit: Thomas Pedersen, data-imaginist.com/art
    the art and science


    of teaching data science
    mine çetinkaya-rundel
    bit.ly/ds-art-sci-nordstat
    mine-cetinkaya-rundel
    [email protected]
    @minebocek
    duke university & rstudio

    View Slide

  2. How can we effectively and
    ef
    fi
    ciently teach data science


    to students with little to no


    background in computing


    and statistical thinking? How can we equip them with


    the skills and tools for reasoning
    with various types of data


    and leave them wanting


    to learn more?

    View Slide

  3. demonstrate concrete course examples
    share a few tips
    provide open-source teaching resources
    goals

    View Slide

  4. View Slide

  5. your
    fi
    rst data
    visualization


    +


    R / RStudio,


    R Markdown,
    simple Git

    View Slide

  6. your
    fi
    rst data
    visualization


    +


    R / RStudio,


    R Markdown,
    simple Git
    fundamentals of data &
    data viz,


    confounding variables,


    Simpson’s paradox,


    tidy data,


    recoding & transforming,


    web scraping & iteration


    +


    collaboration on GitHub

    View Slide

  7. your
    fi
    rst data
    visualization


    +


    R / RStudio,


    R Markdown,
    simple Git
    fundamentals of data &
    data viz,


    confounding variables,


    Simpson’s paradox,


    tidy data,


    recoding & transforming,


    web scraping & iteration


    +


    collaboration on GitHub
    ethical
    considerations
    around
    misrepresentation of
    data, relying on ML
    algorithms and the
    biases they might
    carry, privacy of one’s
    own data and
    reusing others’ data

    View Slide

  8. your
    fi
    rst data
    visualization


    +


    R / RStudio,


    R Markdown,
    simple Git
    fundamentals of data &
    data viz,


    confounding variables,


    Simpson’s paradox,


    tidy data,


    recoding & transforming,


    web scraping & iteration


    +


    collaboration on GitHub
    ethical
    considerations
    around
    misrepresentation of
    data, relying on ML
    algorithms and the
    biases they might
    carry, privacy of one’s
    own data and
    reusing others’ data
    building & selecting
    models,


    visualising interactions,
    prediction & validation,
    inference via simulation

    View Slide

  9. your
    fi
    rst data
    visualization


    +


    R / RStudio,


    R Markdown,
    simple Git
    fundamentals of data &
    data viz,


    confounding variables,


    Simpson’s paradox,


    tidy data,


    recoding & transforming,


    web scraping & iteration


    +


    collaboration on GitHub
    ethical
    considerations
    around
    misrepresentation of
    data, relying on ML
    algorithms and the
    biases they might
    carry, privacy of one’s
    own data and
    reusing others’ data
    building & selecting
    models,


    visualising interactions,
    prediction & validation,
    inference via simulation
    choose your own
    adventure:


    text analysis,


    Bayesian
    inference,


    Interactive
    visualization and
    reporting


    +


    communication &
    dissemination

    View Slide

  10. View Slide

  11. ‣ Go to RStudio Cloud - bit.ly/dsbox-cloud


    ‣ Start the project titled UN Votes

    View Slide

  12. ‣ Go to RStudio Cloud - bit.ly/dsbox-cloud


    ‣ Start the project titled UN Votes


    ‣ Open the R Markdown document called unvotes.Rmd

    View Slide

  13. ‣ Go to RStudio Cloud - bit.ly/dsbox-cloud


    ‣ Start the project titled UN Votes


    ‣ Open the R Markdown document called unvotes.Rmd


    ‣ Knit the document and review the data visualisation you just produced

    View Slide

  14. ‣ Go to RStudio Cloud - bit.ly/dsbox-cloud


    ‣ Start the project titled UN Votes


    ‣ Open the R Markdown document called unvotes.Rmd


    ‣ Knit the document and review the data visualisation you just produced


    ‣ Then, look for “France” in the code and replace it with another country
    Knit again, and review how the voting patterns of the country you picked
    compares to the United States and United Kingdom

    View Slide

  15. three questions that keep me up at night…
    1 what should students learn?


    2 how will students learn best?


    3 what tools will enhance student learning?

    View Slide

  16. three questions that keep me up at night…
    1 what should students learn?


    2 how will students learn best?


    3 what tools will enhance student learning?
    content


    pedagogy


    infrastructure

    View Slide

  17. content

    View Slide

  18. ex. 1


    fi
    sheries of the world

    View Slide

  19. View Slide

  20. ✴ data joins

    View Slide

  21. ✴ data joins


    ✴ data science ethics

    View Slide

  22. ✴ data joins


    ✴ data science ethics


    ✴ critique


    ✴ improving data
    visualisations

    View Slide

  23. ✴ data joins


    ✴ data science ethics


    ✴ critique


    ✴ improving data
    visualisations


    ✴ mapping

    View Slide

  24. Project: 2016 US Election Redux


    Question: Would the outcome of the 2016 US Presidential Elections been
    di
    ff
    erent had Bernie Sanders been the Democrat candidate?


    Team: 4 Squared

    View Slide

  25. ex. 2


    First Minister’s COVID brie
    fi
    ngs

    View Slide

  26. View Slide

  27. ✴ web scraping


    ✴ text parsing


    ✴ data types


    ✴ regular expressions

    View Slide

  28. ✴ web scraping


    ✴ text parsing


    ✴ data types


    ✴ regular expressions


    ✴ functions


    ✴ iteration

    View Slide

  29. ✴ web scraping


    ✴ text parsing


    ✴ data types


    ✴ regular expressions


    ✴ functions


    ✴ iteration


    ✴ data visualisation


    ✴ interpretation

    View Slide

  30. ✴ web scraping


    ✴ text parsing


    ✴ data types


    ✴ regular expressions


    ✴ functions


    ✴ iteration


    ✴ data visualisation


    ✴ interpretation


    ✴ text analysis

    View Slide

  31. ✴ web scraping


    ✴ text parsing


    ✴ data types


    ✴ regular expressions


    ✴ functions


    ✴ iteration


    ✴ data visualisation


    ✴ interpretation


    ✴ text analysis


    ✴ data science ethics
    robotstxt::paths_allowed("https://www.gov.scot")
    #> www.gov.scot
    #> [1] TRUE

    View Slide

  32. Project: The North South Divide: University Edition


    Question: Does the geographical location of a UK university a
    ff
    ect its
    university score?


    Team: Fried Egg Jelly Fish

    View Slide

  33. pedagogy

    View Slide

  34. teams: weekly labs in teams +
    periodic team evaluations +
    term project in teams

    View Slide

  35. teams: weekly labs in teams +
    periodic team evaluations +
    term project in teams
    “minute paper”: weekly online
    quizzes ending with a brief
    re
    fl
    ection of the week’s material

    View Slide

  36. View Slide

  37. # A tibble: 19 x 2


    bigram n





    1 question 7 19


    2 question 8 16


    3 questions 7 12


    4 join function 9


    5 question 2 9


    6 choice questions 7


    7 first question 7


    8 multiple choice 7


    9 correct answer 6


    10 necessarily improve 6


    11 join functions 5


    12 question 1 5


    13 7 8 4


    14 airline names 4


    15 data frames 4


    16 feel like 4


    17 many options 4


    18 right answer 4


    19 x axis 4

    View Slide

  38. teams: weekly labs in teams +
    periodic team evaluations +
    term project in teams
    peer feedback: on
    projects
    “minute paper”: weekly online
    quizzes ending with a brief
    re
    fl
    ection of the week’s material

    View Slide

  39. View Slide

  40. teams: weekly labs in teams +
    periodic team evaluations +
    term project in teams
    peer feedback: on
    projects
    “minute paper”: weekly online
    quizzes ending with a brief
    re
    fl
    ection of the week’s material
    web native (aka COVID
    friendly)

    View Slide

  41. View Slide

  42. teams: weekly labs in teams +
    periodic team evaluations +
    term project in teams
    peer feedback: on
    projects
    “minute paper”: weekly online
    quizzes ending with a brief
    re
    fl
    ection of the week’s material
    web native (aka COVID
    friendly)
    creativity: assignments that
    make room for creativity

    View Slide

  43. View Slide

  44. View Slide

  45. infrastructure & tooling

    View Slide

  46. student-facing
    +
    📦
    ghclass
    +
    instructor-facing
    📦
    checklist
    +
    +
    📦
    learnr
    +
    📦
    parsermd
    📦
    gradethis
    📦
    learnrhash

    View Slide

  47. 📦
    ghclass
    + +

    View Slide

  48. openness

    View Slide

  49. datasciencebox.org

    View Slide

  50. rstudio-education.github.io/dsbox

    View Slide

  51. on

    View Slide

  52. introds.org

    View Slide

  53. rstd.io/design-ds-class

    View Slide

  54. Image credit:


    Thomas Pedersen, data-imaginist.com/art
    the art and science


    of teaching data science
    mine çetinkaya-rundel
    mine-cetinkaya-rundel
    [email protected]
    @minebocek
    bit.ly/ds-art-sci-nordstat
    duke university & rstudio

    View Slide