Slide 1

Slide 1 text

Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of teaching data science mine çetinkaya-rundel bit.ly/ds-art-sci-ares mine-cetinkaya-rundel [email protected] @minebocek

Slide 2

Slide 2 text

2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scientific inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fields of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf

Slide 3

Slide 3 text

2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scientific inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fields of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf 1 NOT a commonly used subset of tests and intervals and produce them with hand calculations

Slide 4

Slide 4 text

2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scientific inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fields of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf 2 Multivariate analysis requires the use of computing

Slide 5

Slide 5 text

2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scientific inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fields of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf 3 NOT use technology that is only applicable in the intro course or that doesn’t follow good science principles

Slide 6

Slide 6 text

2016 GAISE 1. Teach statistical thinking. ‣ Teach statistics as an investigative process of problem-solving and decision making. Students should not leave their introductory statistics course with the mistaken impression that statistics consists of an unrelated collection of formulas and methods. Rather, students should understand that statistics is a problem-solving and decision making process that is fundamental to scientific inquiry and essential for making sound decisions. ‣ Give students experience with multivariable thinking. We live in a complex world in which the answer to a question often depends on many factors. Students will encounter such situations within their own fields of study and everyday lives. We must prepare our students to answer challenging questions that require them to investigate and explore relationships among many variables. Doing so will help them to appreciate the value of statistical thinking and methods. 2. Focus on conceptual understanding. 3. Integrate real data with a context and purpose. 4. Foster active learning. 5. Use technology to explore concepts and analyse data. 6. Use assessments to improve and evaluate student learning. amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf 4 Data analysis isn’t just inference and modelling, it’s also data importing, cleaning, preparation, exploration, and visualisation

Slide 7

Slide 7 text

a course that satisfies these four points is looking more like today’s intro data science courses than (most) intro stats courses but this is not because intro stats is inherently “bad for you” instead it is because it’s time to visit intro stats in light of emergence of data science

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

fundamentals of data & data viz, confounding variables, Simpson’s paradox + R / RStudio, R Markdown, simple Git tidy data, data frames vs. summary tables, recoding & transforming, web scraping & iteration + collaboration on GitHub

Slide 10

Slide 10 text

fundamentals of data & data viz, confounding variables, Simpson’s paradox + R / RStudio, R Markdown, simple Git tidy data, data frames vs. summary tables, recoding & transforming, web scraping & iteration + collaboration on GitHub building & selecting models, visualising interactions, prediction & validation, inference via simulation

Slide 11

Slide 11 text

fundamentals of data & data viz, confounding variables, Simpson’s paradox + R / RStudio, R Markdown, simple Git tidy data, data frames vs. summary tables, recoding & transforming, web scraping & iteration + collaboration on GitHub building & selecting models, visualising interactions, prediction & validation, inference via simulation data science ethics, text analysis, Bayesian inference + communication & dissemination

Slide 12

Slide 12 text

fundamentals of data & data viz, confounding variables, Simpson’s paradox + R / RStudio, R Markdown, simple Git tidy data, data frames vs. summary tables, recoding & transforming, web scraping & iteration + collaboration on GitHub building & selecting models, visualising interactions, prediction & validation, inference via simulation data science ethics, text analysis, Bayesian inference + communication & dissemination

Slide 13

Slide 13 text

‣ Go to RStudio Cloud ‣ Start the project titled UN Votes

Slide 14

Slide 14 text

‣ Go to RStudio Cloud ‣ Start the project titled UN Votes ‣ Open the R Markdown document called unvotes.Rmd

Slide 15

Slide 15 text

‣ Go to RStudio Cloud ‣ Start the project titled UN Votes ‣ Open the R Markdown document called unvotes.Rmd ‣ Knit the document and review the data visualisation you just produced

Slide 16

Slide 16 text

‣ Go to RStudio Cloud ‣ Start the project titled UN Votes ‣ Open the R Markdown document called unvotes.Rmd ‣ Knit the document and review the data visualisation you just produced ‣ Then, look for the character string “Turkey” in the code and replace it with another country of your choice ‣ Knit again, and review how the voting patterns of the country you picked compares to the United States and United Kingdom & Northern Ireland

Slide 17

Slide 17 text

three questions that keep me up at night… 1 what should students learn? 2 how will students learn best? 3 what tools will enhance student learning?

Slide 18

Slide 18 text

three questions that keep me up at night… 1 what should students learn? 2 how will students learn best? 3 what tools will enhance student learning? content pedagogy infrastructure

Slide 19

Slide 19 text

content

Slide 20

Slide 20 text

ex. 1 fisheries of the world

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

✴ data joins

Slide 23

Slide 23 text

✴ data joins ✴ data science ethics

Slide 24

Slide 24 text

✴ data joins ✴ data science ethics ✴ critique ✴ improving data visualisations

Slide 25

Slide 25 text

✴ data joins ✴ data science ethics ✴ critique ✴ improving data visualisations ✴ mapping

Slide 26

Slide 26 text

Project: 2016 US Election Redux Question: Would the outcome of the 2016 US Presidential Elections been different had Bernie Sanders been the Democrat candidate? Team: 4 Squared

Slide 27

Slide 27 text

ex. 2 First Minister’s COVID briefings

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

✴ web scraping ✴ text parsing ✴ data types ✴ regular expressions

Slide 30

Slide 30 text

✴ web scraping ✴ text parsing ✴ data types ✴ regular expressions ✴ functions ✴ iteration

Slide 31

Slide 31 text

✴ web scraping ✴ text parsing ✴ data types ✴ regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation

Slide 32

Slide 32 text

✴ web scraping ✴ text parsing ✴ data types ✴ regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis

Slide 33

Slide 33 text

✴ web scraping ✴ text parsing ✴ data types ✴ regular expressions ✴ functions ✴ iteration ✴ data visualisation ✴ interpretation ✴ text analysis ✴ data science ethics robotstxt::paths_allowed("https://www.gov.scot") #> www.gov.scot #> [1] TRUE

Slide 34

Slide 34 text

Project: The North South Divide: University Edition Question: Does the geographical location of a UK university affect its university score? Team: Fried Egg Jelly Fish

Slide 35

Slide 35 text

ex. 3 spam filters

Slide 36

Slide 36 text

✴ logistic regression ✴ prediction

Slide 37

Slide 37 text

✴ logistic regression ✴ prediction ✴ decision errors ✴ sensitivity / specificity ✴ intuition around loss functions

Slide 38

Slide 38 text

Project: Spotify Top 100 Tracks of 2017/18 Question: Is it possible to predict the year a song made the Top Tracks playlist based on its metadata? Team: weR20 year ~ danceability + energy + key + loudness + mode + speechiness + acousticness + instrumentalness + liveness + valence + tempo + duration_s 2017 name artists I'm the One DJ Khaled Redbone Childish Gambino Sign of the Times Harry Styles 2018 name artists Everybody Dies In Their Nightmares XXXTENTACION Jocelyn Flores XXXTENTACION Plug Walk Rich The Kid Moonlight XXXTENTACION Nevermind Dennis Lloyd In My Mind Dynoro changes XXXTENTACION

Slide 39

Slide 39 text

pedagogy

Slide 40

Slide 40 text

teams: weekly labs in teams + periodic team evaluations + term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief reflection of the week’s material

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

# A tibble: 19 x 2 bigram n 1 question 7 19 2 question 8 16 3 questions 7 12 4 join function 9 5 question 2 9 6 choice questions 7 7 first question 7 8 multiple choice 7 9 correct answer 6 10 necessarily improve 6 11 join functions 5 12 question 1 5 13 7 8 4 14 airline names 4 15 data frames 4 16 feel like 4 17 many options 4 18 right answer 4 19 x axis 4

Slide 43

Slide 43 text

teams: weekly labs in teams + periodic team evaluations + term project in teams peer feedback: used minimally so far, but positive experience “minute paper”: weekly online quizzes ending with a brief reflection of the week’s material creativity: assignments that make room for creativity

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

infrastructure & tooling

Slide 47

Slide 47 text

student-facing + ghclass + instructor-facing checklist + + learnr + parsermd gradethis learnrhash

Slide 48

Slide 48 text

ghclass + +

Slide 49

Slide 49 text

ghclass +

Slide 50

Slide 50 text

openness

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

on

Slide 55

Slide 55 text

Image credit: Thomas Pedersen, data-imaginist.com/art the art and science of teaching data science mine çetinkaya-rundel mine-cetinkaya-rundel [email protected] @minebocek bit.ly/ds-art-sci-ares