$30 off During Our Annual Pro Sale. View Details »

Teaching data science, responsibly

Teaching data science, responsibly

Mine Cetinkaya-Rundel

April 08, 2022
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. teaching data science,
    responsibly
    🔗 bit.ly/teach-ds-responsible
    mine-cetinkaya-rundel
    [email protected]
    minebocek
    mine çetinkaya-rundel
    Photo by charlesdeluvio on Unsplash

    View Slide

  2. thread elements of responsible data science
    throughout a curriculum
    feature instruction of ethics as a
    standalone unit in a curriculum
    goals
    convince you that we need to both…
    and do so with examples

    View Slide

  3. introductory data science course
    undergraduate curriculum in statistics
    and data science
    scope

    View Slide

  4. data visualisation


    data wrangling, tidying, acquisition


    exploratory data analysis


    predictive modeling + uncertainty quantification


    effective communication of results
    interactive visualizations


    text analysis


    machine learning


    Bayesian inference



    consistent syntax | tidyverse


    reproducibility | R Markdown / Quarto


    version control and collaboration | Git + GitHub
    focus on
    emphasise
    foray into
    introductory
    data science
    course

    View Slide

  5. View Slide

  6. responsible computing

    View Slide

  7. reproducibility

    View Slide

  8. #1:


    convince researchers
    to adopt a
    reproducible
    research workflow
    #2:


    train new researchers
    who don’t have any
    other workflow

    View Slide

  9. data


    analysis
    - descriptive stats


    - plots & tables


    - model output
    write-up
    - research question


    & context


    - interpretations


    - conclusions
    lab


    report
    copy-paste copy-paste
    traditional

    View Slide

  10. a better
    approach
    text block
    data analysis
    text block
    data analysis
    text block
    or

    View Slide

  11. version control

    View Slide

  12. each
    assignment
    as a Git repo
    distributed
    on GitHub
    collected
    under a
    course
    organization

    View Slide

  13. responsible data collection

    View Slide

  14. web scraping

    View Slide

  15. activity:


    scrape and
    analyze Nicola
    Sturgeon’s
    COVID
    briefings

    View Slide

  16. robotstxt::paths_allowed("https://www.gov.scot/")


    www.gov.scot


    [1] TRUE
    first ask, can I?

    View Slide

  17. actually, first ask, should I?

    View Slide

  18. View Slide

  19. finding data sources

    View Slide

  20. get students out of the mindset of “internet
    search as the only way to access data” and
    connect them with domain experts, data
    librarians, etc.

    View Slide

  21. responsible datasets

    View Slide

  22. encoding people

    View Slide

  23. don’t use variables that reinforce
    the idea that gender is
    dichotomous or that
    exclude LGBT+ people
    present data analyses that
    reinforce negative
    stereotypes about
    marginalized groups
    do present
    analyses that
    are inclusive
    give context
    when using
    data where
    gender is
    dichotomized
    be mindful
    when collecting
    data on
    students for in-
    class exercises

    View Slide

  24. https://www.significancemagazine.com/culture/624-lgbt-resources-for-statisticians-and-data-scientists

    View Slide

  25. drawing maps

    View Slide

  26. activity:


    improve a
    visualization on
    fisheries around
    the world

    View Slide

  27. fisheries %>% select(country)


    #> # A tibble: 75 x 1


    #> country


    #>


    #> 1 Algeria


    #> 2 Angola


    #> 3 Argentina


    #> 4 Australia


    #> 5 Bangladesh


    #> 6 Brazil


    #> 7 Cambodia


    #> 8 Canada


    #> 9 Chile


    #> 10 Colombia


    #> # … with 65 more rows
    continents


    #> # A tibble: 245 x 2


    #> country continent


    #>


    #> 1 Afghanistan Asia


    #> 2 Åland Islands Europe


    #> 3 Albania Europe


    #> 4 Algeria Africa


    #> 5 American Samoa Oceania


    #> 6 Andorra Europe


    #> 7 Angola Africa


    #> 8 Anguilla Americas


    #> 9 Antigua & Barbuda Americas


    #> 10 Argentina Americas


    #> # … with 235 more rows
    fisheries <- left_join(fisheries, continents)


    Joining, by = “country"

    View Slide

  28. fisheries %>%


    filter(is.na(continent))#> # A tibble: 75 x 1


    #> # A tibble: 5 x 4


    #> country capture aquaculture continent


    #>


    #> 1 Congo, Democratic Republic of the 220000 2965 NA


    #> 2 Hong Kong 161964 4130 NA


    #> 3 Myanmar 1742956 474510 NA


    #> 4 Other 9685851 786993 NA


    #> 5 Taiwan (Republic of China) 1017243 304756 NA

    View Slide

  29. responsible visualizations

    View Slide

  30. activity:


    assess and
    improve
    accessibility

    View Slide

  31. responsible exposure

    View Slide

  32. providing choices

    View Slide

  33. activity:


    make first data
    visualization
    within the first
    15 minutes of
    course

    View Slide

  34. View Slide

  35. responsible models + algorithms

    View Slide

  36. ordering topics

    View Slide

  37. View Slide

  38. assigning sentiment

    View Slide

  39. View Slide

  40. responsible modules + threads

    View Slide

  41. View Slide

  42. responsible sharing

    View Slide

  43. 🔗 datasciencebox.org

    View Slide

  44. View Slide

  45. responsible activities
    ?

    View Slide

  46. 🔗 bit.ly/teach-ds-responsible
    mine-cetinkaya-rundel
    [email protected]
    minebocek
    Photo by charlesdeluvio on Unsplash
    thank you!

    View Slide