# PSYC 562 Introduction to Regression

October 25, 2018

## Transcript

1. 1.

PhD, ABPP 1
2. 2.

### 2 COVERED TO DATE 1. Skills for Data Management using

R Data Wrangling/Munging Tidyverse Import Data for the Web APIs, OSF, GitHub Summarize Reshape Filter Select
3. 3.

### 3 COVERED TO DATE 2. Skills for Data Visualization ggplot

Scatterplots Line Histogram Box Plot ECDF
4. 4.

### 4 COVERED TO DATE 3. Skills for Communicating Results 4.

Skills for Collaborative Workflow Git/GitHub R Studio Project RMarkdown Knitr
5. 5.

### 5 Curriculum Vitae Building Research Experience Advanced Statistics Course: Used

R and R Studio for data management and visualization using packages in the Tidyverse including `ggplot2` and ‘dplyr’. Used R projects and GitHub for collaborative project building and R Markdown for communication of results. Statistical modeling included an emphasis on linear and logistic regression. Instructor: Jamie D. Bedics, PhD, ABPP
6. 6.

### DATA ANALYSIS OVERVIEW 6 Inferring to a larger population Part

I cor() & cor.test()
7. 7.

### NEXT: MODEL BUILDING 7 You’ll learn how to interpret linear

regression models with and without interaction terms (moderation) You’ll learn how to interpret logistic regression models with and without interaction terms
8. 8.

9. 9.

### LINEAR REGRESSION 9 Linear regression allows for the prediction of

one dependent variable (criterion) from one or more independent variables (predictors). Y=α+βX1 + e Criterion Predictor The Criterion and the Predictor variables come from your dataset. They are variables in your data. Y must be continuous (double, integer) X can be continuous or a factor
10. 10.

### LINEAR REGRESSION 10 Y=α+βX1 + e Intercept Slope The model

coefficients describe the relationship between X and Y in your data. For a one predictor model: 1. Intercept is the score of Y when X is 0. 2. Slope is the amount of change in Y for every 1-unit increase in X 3. Error represents what is unexplained in the model error We hope to use these coefﬁcients from our sample to predict future scores
11. 11.

### LINEAR REGRESSION 11 Y=α+βX1 + e Intercept Slope What we

interpret: 1. The intercept of the score only makes sense if the value of Y when X is 0 is meaningful (else we center…later). 2. Slope (Unstandardized) 3. Slope (Standardized) 4. Significance of Slopes 5. Confidence Intervals for intercept and slopes 6. R^2
12. 12.

### MULTIPLE LINEAR REGRESSION 12 Y=α+β1X1 +β2X2+ e Add a variable…New

Rules for interpreting slope and intercept. Slope is now the change in Y for 1-unit increase in X while other Xi is constant. Intercept is value of Y when all predictors are 0. Second Predictor Second Slope Coefﬁcient
13. 13.

14. 14.

### 14 Intercept α = The value of Y when X

is zero. Slope B = The amount of increase in Y for every 1-unit increase in X Interpreting R Output using `display` Y=α+βX1 + e Intercept Slope The slope is unstandardized or raw coefﬁcient and so the scale will depend on how the variables are measured.
15. 15.

### 15 SE represents the uncertainty in the data. We can

say that when the raw (unstandardized) slope coefﬁcient is within 2 standard errors (coef.se) then it is consistent with the null. Here the raw regression coefficient (3.94) is greater than 2*SE (0.74) and so it’s likely statistically significant SE or “noise” raw slope coef. or “signal” Standard Error (coef.se)
16. 16.

### 16 Residual Standard Deviation and R2 The overall ﬁt of

the model can be determined by the “Residual SD” and “R-Squared.” You can think of the residual standard deviation as a measure of the average distance that each observation falls from its prediction from the model. R SD and R2 are related: where sy is the sd of Y Residual SD or sigma hat R2
17. 17.

### 17 β = For every 1-SD increase in the predictor

(X), there is a 1-SD*β1 increase in our criterion (Y) Tip: According to APA Style, standardized regression coefﬁcients are denoted by the greek “β” and you use “B” for unstandardized regression coefﬁcients Use the `lm.beta` function from QuantPsyc You’ll need the SD for the predictor and the SD for the criterion to properly interpret. STANDARDIZED REGRESSION COEFFICIENTS Standardization allows comparisons of magnitude across different variables by standardizing the scale of measurement unique to each variable. Y=α+βX1 + e
18. 18.

### 18 Y=α+B1X1 + B2X2 Intercept α = The value of

Y when X1 & X2 are 0. Slopes B1 = The amount of increase in Y for every 1-unit increase in X1 when X2 is held constant B2 = The amount of increase in Y for every 1-unit increase in X2 when X1 is held constant Multiple Regression
19. 19.

### 19 Another way to think of slopes in multiple regression

with no interaction β1 = Compares participants with the same score on X2 (the other variable) but differ by1- unit on X1 (the variable you’re trying to interpret) β2 = Compares participants with the same score on X1 but differ by1-unit on X2 Y=α+β1X1 + β2X2
20. 20.

### 20 Y=α+B1X1*B2X2 Intercept α = The same as multiple regression

w/ no interaction. Slopes β1 = The amount of increase in Y for every 1- unit increase in X1 when X2 is 0 β2 = The amount of increase in Y for every 1-unit increase in X2 when X1 is 0 Multiple Regression w/ Interaction β1: β2 = The interaction coefficient tells you the exact amount you change (+/-) β1 or β2 based on the other variable
21. 21.

gender.
22. 22.

### 22 Validity: We often say include all relevant predictors but

think of the impact and interpretability of your ﬁndings. Assumptions and Diagnostics (in decreasing order of importance) Additivity and Linearity: The most serious violation. 1. The expected value of dependent variable is a straight-line function of each independent variable, holding the others ﬁxed. 2. The slope of that line does not depend on the values of the other variables. 3. The effects of different independent variables on the expected value of the dependent variable are additive.
23. 23.

### 23 Independence of Errors: Errors from the prediction line are

independent. Assumptions and Diagnostics (in decreasing order of importance) Equal Variance of errors: Constant variance of the errors. Consider weighted least squares. Often does not affect the most important parts of the regression model Normality of errors: The least important is that the errors are normally distributed.
24. 24.

### 24 Validity: R not required. Just think before you include

variables about expected outcomes and what you might ﬁnding using unstandardized regression coefﬁcients. Testing Assumptions in R Additivity and Linearity: Diagnose: Residuals versus Predicted: Top Left/Bottom plot(model) Violation of Independence: Diagnose: Durbin-Watson test Vary between 0 and 4 with 2 being no correlation <1 or >3 a concern (>2 is -correlation; <2 is + correlation) durbinWatsonTest(model) or dwt(model)
25. 25.

### 25 Testing Assumptions in R Equal Variance of errors (Homoscedasticity):

Normality of errors: The least important is that the errors are normally distributed. Diagnose: Levene’s Test; residuals versus predicted scores If signiﬁcant, variances are different leveneTest(data\$variable1, data\$variable2) Diagnose: QQ-PLOT; Plot function plot(model) Points should fall closely along the line
26. 26.

### 26 Extra Testing Assumptions in R Multicollinearity Outliers Diagnose: VIF

or 1/vif (tolerance) vif(model) or 1/vif(model) VIF >10 there is concern Tolerance <.1 or .2 is a concern. Diagnose: Cook’s distance Cook’s >1 cooks.distance(model)
27. 27.

### 27 Include all input variables that, for substantive reasons, might

be expected to be important in predicting the outcome. General Regression Principles Inputs do not always have to be separate. You might average or sum that can be used as a single predictor in the model For inputs that have large effects, consider including interactions as well.
28. 28.

### 28 Consider the following regarding whether or not to exclude

a variable: General Regression Principles 1. If a predictor is not statistically signiﬁcant and has the expected sign, it is generally ﬁne to keep in the model. 2. If a predictor is not statistically signiﬁcant and does not have the expected sign, then consider removing it from the model. 3. If a predictor is statistically signiﬁcant and does not have the expected sign, think about whether it makes sense. 4. If signiﬁcant and in the expected direction, then keep it in!
29. 29.

Gender
30. 30.

### 30 ~ Model1 Y=α+βX1 Intercept 84.87 Album Sales (84,870) with

0 Airplay. Slope For every 1-extra record play, there is an increase of 3.94 (3,940) album sales. We times by 1000 b/c Sales in 1000s in this example only
31. 31.

### 31 Model2 Intercept 196,760 Album Sales for Female Bands Slope

Male bands have 7,270 less Album Sales compared Female Bands ~ Y=α+β1X1 The SE is 11.43 and times by 2 is 22.86. The coefﬁcient (-7.27) is not greater than 2*SE and so it is not signiﬁcant
32. 32.

### 32 ~ ~ + + ~ * ~ * Model1

Model2 Model3 Model4 Model Building Using Linear Regression
33. 33.

### 33 ~ Model1 Intercept The value of y when x

is zero. Ex. Number of Album Sales when Airplay is Zero Slope The amount of increase in y for every 1-unit increase in x Ex. Expected increase in Album Sales for every 1-unit increase in Airplay
34. 34.

### 34 LINEAR REGRESSION HOMEWORK PRACTICE Can we predict child test

scores based on mother’s education (HS yes/no) and maternal IQ?
35. 35.

### 35 ~ ~ ~ + * THREE LINEAR MODELS Model1

Model2 Model3