Slide 1

Slide 1 text

introductory data science a fresh look πŸ”— bit.ly/fresh-ds-isa mine-cetinkaya-rundel [email protected] minebocek mine Γ§etinkaya-rundel

Slide 2

Slide 2 text

How can we effectively and ef fi ciently teach data science to students with little to no background in computing and statistical thinking? How can we equip them with the skills and tools for reasoning with various types of data and leave them wanting to learn more?

Slide 3

Slide 3 text

demonstrate concrete course examples share a few tips provide open-source teaching resources goals

Slide 4

Slide 4 text

data visualisation data wrangling, tidying, acquisition exploratory data analysis predictive modeling + uncertainty quanti fi cation effective communication of results interactive visualizations text analysis machine learning Bayesian inference … consistent syntax | tidyverse reproducibility | R Markdown version control and collaboration | Git + GitHub focus on emphasise foray into

Slide 5

Slide 5 text

topics

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

ex. 1 united nations

Slide 8

Slide 8 text

β€£ Go to RStudio Cloud β€£ Start the project titled UN Votes πŸ”— rstd.io/dsbox-cloud

Slide 9

Slide 9 text

β€£ Go to RStudio Cloud β€£ Start the project titled UN Votes β€£ Open the R Markdown document called unvotes.Rmd πŸ”— rstd.io/dsbox-cloud

Slide 10

Slide 10 text

β€£ Go to RStudio Cloud β€£ Start the project titled UN Votes β€£ Open the R Markdown document called unvotes.Rmd β€£ Knit the document and review the data visualisation you just produced πŸ”— rstd.io/dsbox-cloud

Slide 11

Slide 11 text

β€£ Go to RStudio Cloud β€£ Start the project titled UN Votes β€£ Open the R Markdown document called unvotes.Rmd β€£ Knit the document and review the data visualisation you just produced β€£ Then, look for the character string β€œTurkey” in the code and replace it with another country of your choice β€£ Knit again, and review how the voting patterns of the country you picked compares to the United States and United Kingdom & Northern Ireland πŸ”— rstd.io/dsbox-cloud

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

ex. 2 fi sheries of the world

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

fisheries %>% select(country) #> # A tibble: 75 x 1 #> country #> #> 1 Algeria #> 2 Angola #> 3 Argentina #> 4 Australia #> 5 Bangladesh #> 6 Brazil #> 7 Cambodia #> 8 Canada #> 9 Chile #> 10 Colombia #> # … with 65 more rows continents #> # A tibble: 245 x 2 #> country continent #> #> 1 Afghanistan Asia #> 2 Γ…land Islands Europe #> 3 Albania Europe #> 4 Algeria Africa #> 5 American Samoa Oceania #> 6 Andorra Europe #> 7 Angola Africa #> 8 Anguilla Americas #> 9 Antigua & Barbuda Americas #> 10 Argentina Americas #> # … with 235 more rows fisheries <- left_join(fisheries, continents) Joining, by = β€œcountry" βœ“ data joins

Slide 16

Slide 16 text

fisheries %>% filter(is.na(continent))#> # A tibble: 75 x 1 #> # A tibble: 5 x 4 #> country capture aquaculture continent #> #> 1 Congo, Democratic Republic of the 220000 2965 NA #> 2 Hong Kong 161964 4130 NA #> 3 Myanmar 1742956 474510 NA #> 4 Other 9685851 786993 NA #> 5 Taiwan (Republic of China) 1017243 304756 NA βœ“ data joins βœ“ ethics

Slide 17

Slide 17 text

βœ“ data joins βœ“ ethics βœ“ critique βœ“ improving visualisations

Slide 18

Slide 18 text

βœ“ data joins βœ“ ethics βœ“ critique βœ“ improving βœ“ visualisations βœ“ mapping

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

ex. 3 First Minister’s COVID brie fi ngs

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

robotstxt::paths_allowed("https://www.gov.scot/") www.gov.scot [1] TRUE βœ“ ethics

Slide 23

Slide 23 text

βœ“ web scraping βœ“ text parsing βœ“ data types βœ“ regular expressions βœ“ ethics

Slide 24

Slide 24 text

βœ“ web scraping βœ“ text parsing βœ“ data types βœ“ regular expressions βœ“ functions βœ“ iteration βœ“ ethics

Slide 25

Slide 25 text

βœ“ web scraping βœ“ text parsing βœ“ data types βœ“ regular expressions βœ“ functions βœ“ iteration βœ“ visualisation βœ“ interpretation βœ“ ethics

Slide 26

Slide 26 text

βœ“ web scraping βœ“ text parsing βœ“ data types βœ“ regular expressions βœ“ functions βœ“ iteration βœ“ visualisation βœ“ interpretation βœ“ text analysis βœ“ ethics

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

ex. 3 spam fi lters

Slide 29

Slide 29 text

βœ“ logistic regression βœ“ prediction

Slide 30

Slide 30 text

βœ“ logistic regression βœ“ prediction βœ“ decision errors βœ“ sensitivity / speci fi city βœ“ intuition around loss functions

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

βœ“ machine learning for text data

Slide 33

Slide 33 text

βœ“ repetition tips

Slide 34

Slide 34 text

βœ“ repetition βœ“ re fl ection # A tibble: 19 x 2 bigram n 1 question 7 19 2 question 8 16 3 questions 7 12 4 join function 9 5 question 2 9 6 choice questions 7 7 first question 7 8 multiple choice 7 9 correct answer 6 10 necessarily improve 6 11 join functions 5 12 question 1 5 13 7 8 4 14 airline names 4 15 data frames 4 16 feel like 4 17 many options 4 18 right answer 4 19 x axis 4 tips

Slide 35

Slide 35 text

tips βœ“ repetition βœ“ re fl ection βœ“ creativity

Slide 36

Slide 36 text

tips βœ“ re fl ection βœ“ creativity βœ“ peer review

Slide 37

Slide 37 text

tips βœ“ repetition βœ“ re fl ection βœ“ creativity βœ“ peer review βœ“ real work fl ows

Slide 38

Slide 38 text

toolbox student

Slide 39

Slide 39 text

toolbox instructor

Slide 40

Slide 40 text

πŸ”— datasciencebox.org

Slide 41

Slide 41 text

πŸ”— introds.org

Slide 42

Slide 42 text

Mine Γ‡etinkaya-Rundel & Victoria Ellison (2020) A Fresh Look at Introductory Data Science Journal of Statistics Education DOI: 10.1080/10691898.2020.1804497

Slide 43

Slide 43 text

Journal of Statistics Education Special Issue on Computing in the Curriculum πŸ”— tandfonline.com/doi/full/10.1080/10691898.2020.1870416 πŸ”— causeweb.org/cause/webinars

Slide 44

Slide 44 text

πŸ”— bit.ly/fresh-ds-isa mine-cetinkaya-rundel [email protected] minebocek πŸ”— datasciencebox.org