Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introductory data science, a fresh look

Introductory data science, a fresh look

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science, it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we will describe a fresh approach to teaching data science at the introductory level, introduce the design philosophy behind the curriculum, and give examples from course materials as well as from student projects. We will also discuss new directions in assessment and tooling as we scale up the course and move it online.

Mine Cetinkaya-Rundel

January 06, 2021
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. introductory data science
    a fresh look
    bit.ly/fresh-ds-jmm mine-cetinkaya-rundel
    [email protected]
    minebocek
    mine çetinkaya-rundel

    View full-size slide

  2. How can we effectively and
    ef
    f
    i
    ciently teach data science


    to students with little to no


    background in computing


    and statistical thinking? How can we equip them with


    the skills and tools for reasoning
    with various types of data


    and leave them wanting


    to learn more?

    View full-size slide

  3. demonstrate concrete course examples
    share a few tips
    provide open-source teaching resources
    goals

    View full-size slide

  4. data visualisation


    data wrangling, tidying, acquisition


    exploratory data analysis


    predictive modeling + uncertainty quanti
    f
    i
    cation


    effective communication of results
    interactive visualizations


    text analysis


    machine learning


    Bayesian inference



    consistent syntax | tidyverse


    reproducibility | R Markdown


    version control and collaboration | Git + GitHub
    focus on
    emphasise
    foray into

    View full-size slide

  5. ex. 1


    united nations

    View full-size slide

  6. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN
    Votes
    rstd.io/dsbox-cloud

    View full-size slide

  7. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN
    Votes


    ‣ Open the R Markdown
    document called unvotes.Rmd
    rstd.io/dsbox-cloud

    View full-size slide

  8. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN
    Votes


    ‣ Open the R Markdown
    document called unvotes.Rmd


    ‣ Knit the document and review
    the data visualisation you just
    produced
    rstd.io/dsbox-cloud

    View full-size slide

  9. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN Votes


    ‣ Open the R Markdown
    document called unvotes.Rmd


    ‣ Knit the document and review
    the data visualisation you just
    produced


    ‣ Then, look for the character
    string “Turkey” in the code and
    replace it with another country
    of your choice


    ‣ Knit again, and review how the
    voting patterns of the country
    you picked compares to the
    United States and United
    Kingdom & Northern Ireland
    rstd.io/dsbox-cloud

    View full-size slide

  10. ex. 2

    f
    i
    sheries of the world

    View full-size slide

  11. fisheries %>% select(country)


    #> # A tibble: 75 x 1


    #> country


    #>


    #> 1 Algeria


    #> 2 Angola


    #> 3 Argentina


    #> 4 Australia


    #> 5 Bangladesh


    #> 6 Brazil


    #> 7 Cambodia


    #> 8 Canada


    #> 9 Chile


    #> 10 Colombia


    #> # … with 65 more rows
    continents


    #> # A tibble: 245 x 2


    #> country continent


    #>


    #> 1 Afghanistan Asia


    #> 2 Åland Islands Europe


    #> 3 Albania Europe


    #> 4 Algeria Africa


    #> 5 American Samoa Oceania


    #> 6 Andorra Europe


    #> 7 Angola Africa


    #> 8 Anguilla Americas


    #> 9 Antigua & Barbuda Americas


    #> 10 Argentina Americas


    #> # … with 235 more rows
    fisheries <- left_join(fisheries, continents)


    Joining, by = “country"
    ✓ data joins

    View full-size slide

  12. fisheries %>%


    filter(is.na(continent))#> # A tibble: 75 x 1


    #> # A tibble: 5 x 4


    #> country capture aquaculture continent


    #>


    #> 1 Congo, Democratic Republic of the 220000 2965 NA


    #> 2 Hong Kong 161964 4130 NA


    #> 3 Myanmar 1742956 474510 NA


    #> 4 Other 9685851 786993 NA


    #> 5 Taiwan (Republic of China) 1017243 304756 NA
    ✓ data joins
    ✓ ethics

    View full-size slide

  13. ✓ data joins
    ✓ ethics
    ✓ critique
    ✓ improving


    visualisations

    View full-size slide

  14. ✓ data joins
    ✓ ethics
    ✓ critique
    ✓ improving


    ✓ visualisations
    ✓ mapping

    View full-size slide

  15. ex. 3


    First Minister’s COVID brie
    f
    i
    ngs

    View full-size slide

  16. robotstxt::paths_allowed("https://www.gov.scot/")


    www.gov.scot


    [1] TRUE
    ✓ ethics

    View full-size slide

  17. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ ethics

    View full-size slide

  18. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ functions
    ✓ iteration
    ✓ ethics

    View full-size slide

  19. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ functions
    ✓ iteration
    ✓ visualisation
    ✓ interpretation
    ✓ ethics

    View full-size slide

  20. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ functions
    ✓ iteration
    ✓ visualisation
    ✓ interpretation
    ✓ text analysis
    ✓ ethics

    View full-size slide

  21. ex. 3


    spam
    f
    i
    lters

    View full-size slide

  22. ✓ logistic regression
    ✓ prediction

    View full-size slide

  23. ✓ logistic regression
    ✓ prediction
    ✓ decision errors
    ✓ sensitivity /
    speci
    f
    i
    city
    ✓ intuition around
    loss functions

    View full-size slide

  24. ✓ machine learning
    for text data

    View full-size slide

  25. ✓ repetition
    tips

    View full-size slide

  26. ✓ repetition
    ✓ re
    f
    l
    ection
    # A tibble: 19 x 2


    bigram n





    1 question 7 19


    2 question 8 16


    3 questions 7 12


    4 join function 9


    5 question 2 9


    6 choice questions 7


    7 first question 7


    8 multiple choice 7


    9 correct answer 6


    10 necessarily improve 6


    11 join functions 5


    12 question 1 5


    13 7 8 4


    14 airline names 4


    15 data frames 4


    16 feel like 4


    17 many options 4


    18 right answer 4


    19 x axis 4 tips

    View full-size slide

  27. tips
    ✓ repetition
    ✓ re
    f
    l
    ection
    ✓ creativity

    View full-size slide

  28. tips
    ✓ re
    f
    l
    ection
    ✓ creativity
    ✓ peer review

    View full-size slide

  29. tips
    ✓ repetition
    ✓ re
    f
    l
    ection
    ✓ creativity
    ✓ peer review
    ✓ real work
    f
    l
    ows

    View full-size slide

  30. toolbox
    student

    View full-size slide

  31. toolbox
    instructor

    View full-size slide

  32. datasciencebox.org

    View full-size slide

  33. Mine Çetinkaya-Rundel &


    Victoria Ellison (2020)


    A Fresh Look at Introductory
    Data Science


    Journal of Statistics Education


    DOI: 10.1080/10691898.2020.1804497

    View full-size slide

  34. bit.ly/fresh-ds-jmm
    mine-cetinkaya-rundel
    [email protected]
    minebocek
    datasciencebox.org

    View full-size slide