$30 off During Our Annual Pro Sale. View details »

Teaching data science, responsibly

Teaching data science, responsibly

Mine Cetinkaya-Rundel

April 08, 2022
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. teaching data science, responsibly 🔗 bit.ly/teach-ds-responsible mine-cetinkaya-rundel cetinkaya.mine@gmail.com minebocek mine

    çetinkaya-rundel Photo by charlesdeluvio on Unsplash
  2. thread elements of responsible data science throughout a curriculum feature

    instruction of ethics as a standalone unit in a curriculum goals convince you that we need to both… and do so with examples
  3. introductory data science course undergraduate curriculum in statistics and data

    science scope
  4. data visualisation data wrangling, tidying, acquisition exploratory data analysis predictive

    modeling + uncertainty quantification effective communication of results interactive visualizations text analysis machine learning Bayesian inference … consistent syntax | tidyverse reproducibility | R Markdown / Quarto version control and collaboration | Git + GitHub focus on emphasise foray into introductory data science course
  5. None
  6. responsible computing

  7. reproducibility

  8. #1: convince researchers to adopt a reproducible research workflow #2:

    train new researchers who don’t have any other workflow
  9. data analysis - descriptive stats - plots & tables -

    model output write-up - research question & context - interpretations - conclusions lab report copy-paste copy-paste traditional
  10. a better approach text block data analysis text block data

    analysis text block or
  11. version control

  12. each assignment as a Git repo distributed on GitHub collected

    under a course organization
  13. responsible data collection

  14. web scraping

  15. activity: scrape and analyze Nicola Sturgeon’s COVID briefings

  16. robotstxt::paths_allowed("https://www.gov.scot/") www.gov.scot [1] TRUE first ask, can I?

  17. actually, first ask, should I?

  18. None
  19. finding data sources

  20. get students out of the mindset of “internet search as

    the only way to access data” and connect them with domain experts, data librarians, etc.
  21. responsible datasets

  22. encoding people

  23. don’t use variables that reinforce the idea that gender is

    dichotomous or that exclude LGBT+ people present data analyses that reinforce negative stereotypes about marginalized groups do present analyses that are inclusive give context when using data where gender is dichotomized be mindful when collecting data on students for in- class exercises
  24. https://www.significancemagazine.com/culture/624-lgbt-resources-for-statisticians-and-data-scientists

  25. drawing maps

  26. activity: improve a visualization on fisheries around the world

  27. fisheries %>% select(country) #> # A tibble: 75 x 1

    #> country #> <chr> #> 1 Algeria #> 2 Angola #> 3 Argentina #> 4 Australia #> 5 Bangladesh #> 6 Brazil #> 7 Cambodia #> 8 Canada #> 9 Chile #> 10 Colombia #> # … with 65 more rows continents #> # A tibble: 245 x 2 #> country continent #> <chr> <chr> #> 1 Afghanistan Asia #> 2 Åland Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # … with 235 more rows fisheries <- left_join(fisheries, continents) Joining, by = “country"
  28. fisheries %>% filter(is.na(continent))#> # A tibble: 75 x 1 #>

    # A tibble: 5 x 4 #> country capture aquaculture continent #> <chr> <dbl> <dbl> <chr> #> 1 Congo, Democratic Republic of the 220000 2965 NA #> 2 Hong Kong 161964 4130 NA #> 3 Myanmar 1742956 474510 NA #> 4 Other 9685851 786993 NA #> 5 Taiwan (Republic of China) 1017243 304756 NA
  29. responsible visualizations

  30. activity: assess and improve accessibility

  31. responsible exposure

  32. providing choices

  33. activity: make first data visualization within the first 15 minutes

    of course
  34. None
  35. responsible models + algorithms

  36. ordering topics

  37. None
  38. assigning sentiment

  39. None
  40. responsible modules + threads

  41. None
  42. responsible sharing

  43. 🔗 datasciencebox.org

  44. None
  45. responsible activities ?

  46. 🔗 bit.ly/teach-ds-responsible mine-cetinkaya-rundel cetinkaya.mine@gmail.com minebocek Photo by charlesdeluvio on Unsplash

    thank you!