Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SOC 4930 & SOC 5050 - Lecture-01

SOC 4930 & SOC 5050 - Lecture-01

Lecture slides for Lecture 01 of the Saint Louis University Course Quantitative Analysis: Applied Inferential Statistics. This lecture introduces core concepts related to analysis development, reproducibility, and quantitative research.

Christopher Prener

August 27, 2018
Tweet

More Decks by Christopher Prener

Other Decks in Education

Transcript

  1. There is a quick anonymous poll to respond to at

    
 https://PollEv.com/chrisprener541 - you can skip creating a nickname! If you have not already completed the Course Onboarding and Course Preview tasks, please finish them! Details at:
 https://slu-soc5050.github.io/course-onboarding/ WELCOME! WELCOME TO SOC 4015 & 5050!
  2. AGENDA QUANTITATIVE ANALYSIS / WEEK 01 / LECTURE 01 1.

    Front Matter 2. Syllabus Overview 3. Defining Quantitative Data 4. What is a Workflow? 5. Introduction to R 6. Back Matter
  3. Course Onboarding and Course Preview materials were due today -

    please submit asap! Details on course website. 1. FRONT MATTER ANNOUNCEMENTS We’ll start every class with “Front Matter” - goal is to share what we are covering, what due dates are coming up, and any announcements. No class next week, but there is coursework! Before Lecture-03: Lab-01, LP-03, Final Project Memo
  4. ▸ Assistant Professor of Sociology • Coordinator, Sociology Honors Thesis

    ▸ Curriculum advisor & lesson maintainer for The Carpentries’ Social Science and Geospatial Lessons ▸ Former EMT and EMS Dispatcher ▸ I wanted to be a ED physician once upon a time 1. FRONT MATTER ABOUT CHRIS
  5. ▸ Other things I teach: • SOC 1120: Introduction to

    Sociology (Health/Diversity emphasis) • SOC 3220: Urban Sociology & The Wire • SOC 4650/5650: Intro to GIS • SLU Data Science Seminar 1. FRONT MATTER ABOUT CHRIS
  6. ▸ Things I research: • Paramedic work and the EMS

    system as a part of the social safety-net • Neighborhood order (and disorder) in St. Louis • Mental health outcomes, literacy, and discrimination • Approaches to processing “big” and complex data 1. FRONT MATTER ABOUT CHRIS
  7. INTRODUCTIONS 1. FRONT MATTER 1. What is your name? 2.

    What program are you enrolled in, and what year are you? 3. What was one excellent adventure you had this summer?
  8. Pierre Bourdieu On Television
 (1996) THE FUNCTION OF SOCIOLOGY, AS

    OF EVERY SCIENCE, IS TO REVEAL 
 THAT WHICH IS HIDDEN. empirically ^
  9. COURSE OBJECTIVES 2. SYLLABUS OVERVIEW 1. Fundamentals of Inferential Statistics

    2. Fundamentals of Data Analysis 3. Fundamentals of Data Visualization 4. Quantitative Research Synthesis
  10. COURSE OBJECTIVES 2. SYLLABUS OVERVIEW 1. Fundamentals of Inferential Statistics

    - Describe the use of various statistical tests, their requirements and assumptions, and their interpretation; execute these tests both by hand and programmatically using R 2. Fundamentals of Data Analysis 3. Fundamentals of Data Visualization 4. Quantitative Research Synthesis
  11. COURSE OBJECTIVES 2. SYLLABUS OVERVIEW 1. Fundamentals of Inferential Statistics

    2. Fundamentals of Data Analysis - Perform basic data cleaning and analysis tasks programmatically using R in ways that support high quality documentation and replication. 3. Fundamentals of Data Visualization 4. Quantitative Research Synthesis
  12. COURSE OBJECTIVES 2. SYLLABUS OVERVIEW 1. Fundamentals of Inferential Statistics

    2. Fundamentals of Data Analysis 3. Fundamentals of Data Visualization - Create and present publication quality plots programmatically using R and ggplot2. 4. Quantitative Research Synthesis
  13. COURSE OBJECTIVES 2. SYLLABUS OVERVIEW 1. Fundamentals of Inferential Statistics

    2. Fundamentals of Data Analysis 3. Fundamentals of Data Visualization 4. Quantitative Research Synthesis - Plan, implement (using R), and present (using knitr as well as the word 
 processing and presentation applications of 
 your choice) a research project that uses 
 linear regression to answer a research 
 question.
  14. COURSE POLICIES 2. SYLLABUS OVERVIEW 1. Compassionate Coursework & Title

    IX 2. Attendance & Participation 3. Communication 4. Electronic Devices 5. Student Support 6. Academic Honesty 7. Submission & Late Work
  15. 2. SYLLABUS OVERVIEW THREADS IN SLACK Use threads to respond

    if someone posts a question that you also had, to ask a clarification question, or to thank someone for posting! Hover your mouse over a message to reveal a mini toolbar:
  16. 2. SYLLABUS OVERVIEW THREADS IN SLACK Emojis can be used

    to respond quickly to people’s posts. They are absolutely encouraged! Hover your mouse over a message to reveal a mini toolbar:
  17. 2. SYLLABUS OVERVIEW #WEEKLY-WINS If something works right, you learn

    something new or something that you’re excited about, if someone else was particularly helpful… share it!
  18. 2. SYLLABUS OVERVIEW #WEEKLY-WINS If something works right, you learn

    something new or something that you’re excited about, if someone else was particularly helpful… share it!
  19. ASSIGNMENTS 2. SYLLABUS OVERVIEW 1. Attendance & Participation 2. Lecture

    Preps 3. Labs 4. Problem Sets 5. Final Project 10% 6% 15% 28% 41% 100 60 150 280 410 1,000 2*50 = 4*15 = 10*150 = 8*35 =
  20. ASSIGNMENTS 2. SYLLABUS OVERVIEW 1. Attendance & Participation 2. Lecture

    Preps 3. Labs 4. Problem Sets 5. Final Project + - Excellent Satisfactory Substantial
 Improvement
 Needed 100% 85% 75% Feedback only for and _ -
  21. ASSIGNMENTS 2. SYLLABUS OVERVIEW 1. Attendance & Participation 2. Lecture

    Preps 3. Labs 4. Problem Sets 5. Final Project within 24-hours 24 to 48-hours 48 to 72-hours -15% -30% -45% > 72-hours -100%
  22. COURSE “FLOW” Active reading Lecture prep Entry ticket* Active lecture

    Lab Problem set Before class During class After class
  23. 3. DEFINING QUANTITATIVE DATA TRENDS: MARKET CHANGES “Old” Academic Market

    Enterprise 
 Market Applied
 Data Science 
 Market Academic 
 Market
  24. WHAT ARE QUANTITATIVE DATA? ▸ Data that can be represented

    numerically ▸ Data that can (typically) be analyzed using statistical techniques 3. DEFINING QUANTITATIVE DATA Quantitative Qualitative
  25. ▸ Randomized control trials (RCTs) are the “gold standard” ▸

    Well designed experiments - large ones where there is essentially a 50/50 chance of being in a control or experimental group - allow us to isolate the effect of an intervention ▸ Non-randomized experiments, like the portacaval shunt experiments, can bias results 3. DEFINING QUANTITATIVE DATA EXPERIMENTS JONAS SALK (1957)
  26. ▸ Studies where data are collected from a (typically) large,

    pre-existing group ▸ Subjects assign themselves into different groups rather than being assigned by a researcher ▸ Observational studies can be affected by confounding - phenomena associated with both the intervention and the outcome 3. DEFINING QUANTITATIVE DATA OBSERVATIONAL DATA
  27. HAPPY FAMILIES ARE ALL ALIKE; EVERY UNHAPPY FAMILY IS UNHAPPY

    IN ITS OWN WAY Leo Tolstoy Anna Karenina
 (1878)
  28. LIKE FAMILIES, TIDY DATASETS ARE ALL ALIKE BUT EVERY MESSY

    DATASET IS MESSY IN ITS OWN WAY. Hadley Wickham “Tidy Data”
 (2014)
  29. ▸ Columns = “Variables” ▸ Variables should measure a single

    characteristics, concept, or idea ▸ Rows = “Observations” ▸ Observations (n) represent discrete individuals whose characteristics are measured by the given set of variables ▸ Cells contain “Values” 3. DEFINING QUANTITATIVE DATA TIDY DATA
  30. 3. DEFINING QUANTITATIVE DATA TIDY DATA Each data set should

    contain one, and only one, observational unit!
  31. ▸ Types of Data • Numeric - data are numbers

    that may have particular “labels” applied to them to represent “attributes” • String or Character - data are letters or words 3. DEFINING QUANTITATIVE DATA TIDY DATA
  32. ▸ Levels of Measurement • Numerical - can take on

    a wide range of values where order is important • Categorical - can take on only a limited number of values where order is important but flexible 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  33. ▸ Levels of Measurement • Binary variables represent the presence

    or absence of a characteristic • No/Yes and True/False are common examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  34. ▸ Levels of Measurement • Binary variables represent the presence

    or absence of a characteristic • No/Yes and True/False are common examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES 0 = No 1 = Yes Value Label Attributes 0 = False 1 = True Value Label Attributes
  35. ▸ Levels of Measurement • Binary variables represent the presence

    or absence of a characteristic • No/Yes and True/False are common examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  36. ▸ Levels of Measurement • Binary variables represent the presence

    or absence of a characteristic • No/Yes and True/False are common examples • Sometimes called “dummy” or “logical” variables 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS LOGICAL NOMINAL ORDINAL RATIO VARIABLES
  37. ▸ Levels of Measurement • Nominal variables represent categories where

    order is unimportant (values could be reordered without loss of meaning) • Race, gender, and states are all examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  38. ▸ Levels of Measurement • Nominal variables represent categories where

    order is unimportant (values could be reordered without loss of meaning) • Race, gender, and states are all examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES 1 = White 2 = African American 3 = American Indian 4 = Asian 5 = Native Hawaiian Value Label Attributes
  39. ▸ Levels of Measurement • Nominal variables represent categories where

    order is unimportant (values could be reordered without loss of meaning) • Race, gender, and states are all examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  40. ▸ Levels of Measurement • Nominal variables represent categories where

    order is unimportant (values could be reordered without loss of meaning) • Race, gender, and states are all examples • Sometimes called “factor” variables 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY FACTOR ORDINAL RATIO VARIABLES
  41. ▸ Levels of Measurement • Ordinal variables represent categories where

    relative order is important but there is not a precise or fixed difference between values • Likert scales are a common type 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  42. ▸ Levels of Measurement • Ordinal variables represent categories where

    relative order is important but there is not a precise or fixed difference between values • Likert scales are a common type 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES 1 = Strongly disagree 2 = Disagree 3 = Neither agree nor disagree 4 = Agree 5 = Strongly agree Value Label Attributes
  43. ▸ Levels of Measurement • Ordinal variables represent categories where

    relative order is important but there is not a precise or fixed difference between values • Likert scales are a common type 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  44. ▸ Levels of Measurement • Ordinal variables represent categories where

    relative order is important but there is not a precise or fixed difference between values • Likert scales are a common type • Sometimes called “ordered factors” 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDERED FACTOR RATIO VARIABLES
  45. ▸ Levels of Measurement • Discrete variables can take on

    only whole, non- negative integers for values • Age in years and population counts are common examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  46. ▸ Levels of Measurement • Discrete variables can take on

    only whole, non- negative integers for values • Age in years and population counts are common examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES Age: 0, 1, 2, … k Value Last 
 Possible Value
  47. ▸ Levels of Measurement • Discrete variables can take on

    only whole, non- negative integers for values • Age in years and population counts are common examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  48. ▸ Levels of Measurement • Discrete variables can take on

    only whole, non- negative integers for values • Age in years and population counts are common examples • Sometimes called “integer” variables 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL INTEGER CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  49. ▸ Levels of Measurement • Continuous variables can take on

    any value within an infinite set of real numbers • Cost can be represented this way 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  50. ▸ Levels of Measurement • Continuous variables can take on

    any value within an infinite set of real numbers • Cost can be represented this way 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES Income: -k…1, 2.24, 3.42… k Value Largest 
 Possible Value Smallest 
 Possible Value
  51. ▸ Levels of Measurement • Continuous variables can take on

    any value within an infinite set of real numbers • Cost can be represented this way 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  52. ▸ Levels of Measurement • Continuous variables can take on

    any value within an infinite set of real numbers • Cost can be represented this way • In R, these are called “numeric” variables 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE NUMERIC BINARY NOMINAL ORDINAL RATIO VARIABLES
  53. ▸ Levels of Measurement • Ratio variables can take on

    any real value that is ≥ 0 where 0 represents the condition of “not” having something • Both discrete and continuous variables can also be ratio variables • Number of children and age are examples 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES NUMERICAL CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES
  54. ▸ Levels of Measurement • In practice, we often refer

    to all numerical variables simply as “continuous” variables. 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES CATEGORICAL DISCRETE CONTINUOUS BINARY NOMINAL ORDINAL RATIO VARIABLES NUMERICAL CONTINUOUS
  55. 3. DEFINING QUANTITATIVE DATA VARIABLES & VALUES CATEGORICAL BINARY NOMINAL

    ORDINAL VARIABLES NUMERICAL CONTINUOUS You need to be able to distinguish between these types easily - Quizlet available via course website!
  56. 3. DEFINING QUANTITATIVE DATA GENERALIZATION = 1 observation Sample
 Population

    Sample Draw Inferences
 About Population Universe
  57. 4. WHAT IS A WORKFLOW? WORKFLOWS SOLVE PROBLEMS 33,425 There

    are two types of people in this world…
  58. 1 2 1. Draw some circles 2. Draw the rest

    of the owl HOW TO DRAW AN OWL…
  59. 4. WHAT IS A WORKFLOW? WORKFLOWS SOLVE PROBLEMS EXPLICITLY 33,425

    There are two types of people in this world…
  60. 4. WHAT IS A WORKFLOW? INBOX ZERO WORKFLOW UPDATE INBOXES

    SEVERAL TIMES PER DAY DELETE SPAM, JUNK READ REMAINING MESSAGES DOES RESPONDING TAKE > 2 MINUTES? RESPOND YES SNOOZE FOR LATER 5:45AM 
 & 4:00PM PROCESSING ARCHIVE RESPOND NO
  61. 4. WHAT IS A WORKFLOW? WE HAVE A REPRODUCIBILITY PROBLEM

    Baker, M. 2016. “1,500 scientists lift the lid on reproducibility.” Nature News 533(7604):452-54.
  62. 4. WHAT IS A WORKFLOW? WE HAVE A REPRODUCIBILITY PROBLEM

    Baker, M. 2016. “1,500 scientists lift the lid on reproducibility.” Nature News 533(7604):452-54.
  63. 4. WHAT IS A WORKFLOW? OUR WORKFLOW 1. Plan 2.

    Organize 3. Document 4. Execute For Each
 Step:
  64. 5. INTRODUCTION TO R STATISTICAL COMPUTING 1980s - SPSS, SAS,

    and Stata released for personal computers
  65. STATISTICAL COMPUTING ▸ 1997 - beta version released online ▸

    2000 - first stable release ▸ 2004 - first useR! conference ▸ 2011 - RStudio beta released ▸ 2013 - tidyverse packages begin to coalesce ▸ 2016 - RStudio v1.0 released 5. INTRODUCTION TO R
  66. ▸ what takes the message you wish to display ▸

    by is one the valid animal types that can be displayed Available in cowsay
 Download via CRAN 5. INTRODUCTION TO R ASCII MESSAGES Parameters: say(what = “message”, by = “animal”) f(x)
  67. ▸ what takes the message you wish to display ▸

    by is one the valid animal types that can be displayed 5. INTRODUCTION TO R ASCII MESSAGES Parameters: say(what = “message”, by = “animal”) f(x)
  68. ASCII MESSAGES 5. INTRODUCTION TO R say(what = “message”, by

    = “animal”) Using a famous green jedi: > say(what = “do or do not, there is no try”, 
 by = “yoda”) Output omitted (see next slide) Animals must be drawn from the items listed in the animals object! f(x)
  69. ASCII MESSAGES > library(cowsay) > say(what = "do or do

    not, there is no try", by = "yoda") ----- do or do not, there is no try ------ \ \ ____ _.' : `._ .-.'`. ; .'`.-. __ / : ___\ ; /___ ; \ __ ,'_ ""--.:__;".-.";: :".-.":__;.--"" _`, :' `.t""--.. '<@.`;_ ',@>` ..--""j.' `; `:-.._J '-.-'L__ `-- ' L_..-;' "-.__ ; .-" "-. : __.-" L ' /.------.\ ' J "-. "--" .-" __.l"-:_JL_;-";.__ .-j/'.; ;"""" / .'\"-. .' /:`. "-.: .-" .'; `. .-" / ; "-. "-..-" .-" : "-. .+"-. : : "-.__.-" ;-._ \ 5. INTRODUCTION TO R
  70. AGENDA REVIEW 6. BACK MATTER 2. Syllabus Overview 3. Defining

    Quantitative Data 4. What is a Workflow? 5. Introduction to R
  71. No class next week, but there is coursework! REMINDERS 6.

    BACK MATTER Course Onboarding and Course Preview materials were due today - please submit asap! Details on course website. We’ll end every class with “Back Matter” - goal is to share what we are covering, what due dates are coming up, and any announcements. Before Lecture-03: Lab-01, LP-03, Final Project Memo