Slide 1

Slide 1 text

teaching data science, responsibly đź”— bit.ly/teach-ds-responsible mine-cetinkaya-rundel [email protected] minebocek mine çetinkaya-rundel Photo by charlesdeluvio on Unsplash

Slide 2

Slide 2 text

thread elements of responsible data science throughout a curriculum feature instruction of ethics as a standalone unit in a curriculum goals convince you that we need to both… and do so with examples

Slide 3

Slide 3 text

introductory data science course undergraduate curriculum in statistics and data science scope

Slide 4

Slide 4 text

data visualisation data wrangling, tidying, acquisition exploratory data analysis predictive modeling + uncertainty quantification effective communication of results interactive visualizations text analysis machine learning Bayesian inference … consistent syntax | tidyverse reproducibility | R Markdown / Quarto version control and collaboration | Git + GitHub focus on emphasise foray into introductory data science course

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

responsible computing

Slide 7

Slide 7 text

reproducibility

Slide 8

Slide 8 text

#1: convince researchers to adopt a reproducible research workflow #2: train new researchers who don’t have any other workflow

Slide 9

Slide 9 text

data analysis - descriptive stats - plots & tables - model output write-up - research question & context - interpretations - conclusions lab report copy-paste copy-paste traditional

Slide 10

Slide 10 text

a better approach text block data analysis text block data analysis text block or

Slide 11

Slide 11 text

version control

Slide 12

Slide 12 text

each assignment as a Git repo distributed on GitHub collected under a course organization

Slide 13

Slide 13 text

responsible data collection

Slide 14

Slide 14 text

web scraping

Slide 15

Slide 15 text

activity: scrape and analyze Nicola Sturgeon’s COVID briefings

Slide 16

Slide 16 text

robotstxt::paths_allowed("https://www.gov.scot/") www.gov.scot [1] TRUE first ask, can I?

Slide 17

Slide 17 text

actually, first ask, should I?

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

finding data sources

Slide 20

Slide 20 text

get students out of the mindset of “internet search as the only way to access data” and connect them with domain experts, data librarians, etc.

Slide 21

Slide 21 text

responsible datasets

Slide 22

Slide 22 text

encoding people

Slide 23

Slide 23 text

don’t use variables that reinforce the idea that gender is dichotomous or that exclude LGBT+ people present data analyses that reinforce negative stereotypes about marginalized groups do present analyses that are inclusive give context when using data where gender is dichotomized be mindful when collecting data on students for in- class exercises

Slide 24

Slide 24 text

https://www.significancemagazine.com/culture/624-lgbt-resources-for-statisticians-and-data-scientists

Slide 25

Slide 25 text

drawing maps

Slide 26

Slide 26 text

activity: improve a visualization on fisheries around the world

Slide 27

Slide 27 text

fisheries %>% select(country) #> # A tibble: 75 x 1 #> country #> #> 1 Algeria #> 2 Angola #> 3 Argentina #> 4 Australia #> 5 Bangladesh #> 6 Brazil #> 7 Cambodia #> 8 Canada #> 9 Chile #> 10 Colombia #> # … with 65 more rows continents #> # A tibble: 245 x 2 #> country continent #> #> 1 Afghanistan Asia #> 2 Åland Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # … with 235 more rows fisheries <- left_join(fisheries, continents) Joining, by = “country"

Slide 28

Slide 28 text

fisheries %>% filter(is.na(continent))#> # A tibble: 75 x 1 #> # A tibble: 5 x 4 #> country capture aquaculture continent #> #> 1 Congo, Democratic Republic of the 220000 2965 NA #> 2 Hong Kong 161964 4130 NA #> 3 Myanmar 1742956 474510 NA #> 4 Other 9685851 786993 NA #> 5 Taiwan (Republic of China) 1017243 304756 NA

Slide 29

Slide 29 text

responsible visualizations

Slide 30

Slide 30 text

activity: assess and improve accessibility

Slide 31

Slide 31 text

responsible exposure

Slide 32

Slide 32 text

providing choices

Slide 33

Slide 33 text

activity: make first data visualization within the first 15 minutes of course

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

responsible models + algorithms

Slide 36

Slide 36 text

ordering topics

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

assigning sentiment

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

responsible modules + threads

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

responsible sharing

Slide 43

Slide 43 text

đź”— datasciencebox.org

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

responsible activities ?

Slide 46

Slide 46 text

đź”— bit.ly/teach-ds-responsible mine-cetinkaya-rundel [email protected] minebocek Photo by charlesdeluvio on Unsplash thank you!