Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introductory data science, a fresh look

Introductory data science, a fresh look

Modern statistics is fundamentally a computational discipline, but too often this fact is not reflected in our statistics curricula. With the rise of data science, it has become increasingly clear that students want, expect, and need explicit training in this area of the discipline. Additionally, recent curricular guidelines clearly state that working with data requires extensive computing skills and that statistics students should be fluent in accessing, manipulating, analyzing, and modeling with professional statistical analysis software. In this talk, we will describe a fresh approach to teaching data science at the introductory level, introduce the design philosophy behind the curriculum, and give examples from course materials as well as from student projects. We will also discuss new directions in assessment and tooling as we scale up the course and move it online.

Mine Cetinkaya-Rundel

January 06, 2021
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. introductory data science
    a fresh look
    bit.ly/fresh-ds-jmm mine-cetinkaya-rundel
    [email protected]
    minebocek
    mine çetinkaya-rundel

    View Slide

  2. How can we effectively and
    ef
    f
    i
    ciently teach data science


    to students with little to no


    background in computing


    and statistical thinking? How can we equip them with


    the skills and tools for reasoning
    with various types of data


    and leave them wanting


    to learn more?

    View Slide

  3. demonstrate concrete course examples
    share a few tips
    provide open-source teaching resources
    goals

    View Slide

  4. data visualisation


    data wrangling, tidying, acquisition


    exploratory data analysis


    predictive modeling + uncertainty quanti
    f
    i
    cation


    effective communication of results
    interactive visualizations


    text analysis


    machine learning


    Bayesian inference



    consistent syntax | tidyverse


    reproducibility | R Markdown


    version control and collaboration | Git + GitHub
    focus on
    emphasise
    foray into

    View Slide

  5. topics

    View Slide

  6. View Slide

  7. ex. 1


    united nations

    View Slide

  8. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN
    Votes
    rstd.io/dsbox-cloud

    View Slide

  9. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN
    Votes


    ‣ Open the R Markdown
    document called unvotes.Rmd
    rstd.io/dsbox-cloud

    View Slide

  10. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN
    Votes


    ‣ Open the R Markdown
    document called unvotes.Rmd


    ‣ Knit the document and review
    the data visualisation you just
    produced
    rstd.io/dsbox-cloud

    View Slide

  11. ‣ Go to RStudio Cloud


    ‣ Start the project titled UN Votes


    ‣ Open the R Markdown
    document called unvotes.Rmd


    ‣ Knit the document and review
    the data visualisation you just
    produced


    ‣ Then, look for the character
    string “Turkey” in the code and
    replace it with another country
    of your choice


    ‣ Knit again, and review how the
    voting patterns of the country
    you picked compares to the
    United States and United
    Kingdom & Northern Ireland
    rstd.io/dsbox-cloud

    View Slide

  12. View Slide

  13. ex. 2

    f
    i
    sheries of the world

    View Slide

  14. View Slide

  15. fisheries %>% select(country)


    #> # A tibble: 75 x 1


    #> country


    #>


    #> 1 Algeria


    #> 2 Angola


    #> 3 Argentina


    #> 4 Australia


    #> 5 Bangladesh


    #> 6 Brazil


    #> 7 Cambodia


    #> 8 Canada


    #> 9 Chile


    #> 10 Colombia


    #> # … with 65 more rows
    continents


    #> # A tibble: 245 x 2


    #> country continent


    #>


    #> 1 Afghanistan Asia


    #> 2 Åland Islands Europe


    #> 3 Albania Europe


    #> 4 Algeria Africa


    #> 5 American Samoa Oceania


    #> 6 Andorra Europe


    #> 7 Angola Africa


    #> 8 Anguilla Americas


    #> 9 Antigua & Barbuda Americas


    #> 10 Argentina Americas


    #> # … with 235 more rows
    fisheries <- left_join(fisheries, continents)


    Joining, by = “country"
    ✓ data joins

    View Slide

  16. fisheries %>%


    filter(is.na(continent))#> # A tibble: 75 x 1


    #> # A tibble: 5 x 4


    #> country capture aquaculture continent


    #>


    #> 1 Congo, Democratic Republic of the 220000 2965 NA


    #> 2 Hong Kong 161964 4130 NA


    #> 3 Myanmar 1742956 474510 NA


    #> 4 Other 9685851 786993 NA


    #> 5 Taiwan (Republic of China) 1017243 304756 NA
    ✓ data joins
    ✓ ethics

    View Slide

  17. ✓ data joins
    ✓ ethics
    ✓ critique
    ✓ improving


    visualisations

    View Slide

  18. ✓ data joins
    ✓ ethics
    ✓ critique
    ✓ improving


    ✓ visualisations
    ✓ mapping

    View Slide

  19. View Slide

  20. ex. 3


    First Minister’s COVID brie
    f
    i
    ngs

    View Slide

  21. View Slide

  22. robotstxt::paths_allowed("https://www.gov.scot/")


    www.gov.scot


    [1] TRUE
    ✓ ethics

    View Slide

  23. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ ethics

    View Slide

  24. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ functions
    ✓ iteration
    ✓ ethics

    View Slide

  25. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ functions
    ✓ iteration
    ✓ visualisation
    ✓ interpretation
    ✓ ethics

    View Slide

  26. ✓ web scraping
    ✓ text parsing
    ✓ data types
    ✓ regular expressions
    ✓ functions
    ✓ iteration
    ✓ visualisation
    ✓ interpretation
    ✓ text analysis
    ✓ ethics

    View Slide

  27. View Slide

  28. ex. 3


    spam
    f
    i
    lters

    View Slide

  29. ✓ logistic regression
    ✓ prediction

    View Slide

  30. ✓ logistic regression
    ✓ prediction
    ✓ decision errors
    ✓ sensitivity /
    speci
    f
    i
    city
    ✓ intuition around
    loss functions

    View Slide

  31. View Slide

  32. ✓ machine learning
    for text data

    View Slide

  33. ✓ repetition
    tips

    View Slide

  34. ✓ repetition
    ✓ re
    f
    l
    ection
    # A tibble: 19 x 2


    bigram n





    1 question 7 19


    2 question 8 16


    3 questions 7 12


    4 join function 9


    5 question 2 9


    6 choice questions 7


    7 first question 7


    8 multiple choice 7


    9 correct answer 6


    10 necessarily improve 6


    11 join functions 5


    12 question 1 5


    13 7 8 4


    14 airline names 4


    15 data frames 4


    16 feel like 4


    17 many options 4


    18 right answer 4


    19 x axis 4 tips

    View Slide

  35. tips
    ✓ repetition
    ✓ re
    f
    l
    ection
    ✓ creativity

    View Slide

  36. tips
    ✓ re
    f
    l
    ection
    ✓ creativity
    ✓ peer review

    View Slide

  37. tips
    ✓ repetition
    ✓ re
    f
    l
    ection
    ✓ creativity
    ✓ peer review
    ✓ real work
    f
    l
    ows

    View Slide

  38. toolbox
    student

    View Slide

  39. toolbox
    instructor

    View Slide

  40. datasciencebox.org

    View Slide

  41. Mine Çetinkaya-Rundel &


    Victoria Ellison (2020)


    A Fresh Look at Introductory
    Data Science


    Journal of Statistics Education


    DOI: 10.1080/10691898.2020.1804497

    View Slide

  42. introds.org

    View Slide

  43. bit.ly/fresh-ds-jmm
    mine-cetinkaya-rundel
    [email protected]
    minebocek
    datasciencebox.org

    View Slide