PSYC 562 Introduction to Regression

PSYC 562 Introduction to Regression

B4b800c75ece8682ae6d73d1568b83eb?s=128

Jamie D. Bedics, PHD, ABPP

October 25, 2018
Tweet

Transcript

  1. 2.

    2 COVERED TO DATE 1. Skills for Data Management using

    R Data Wrangling/Munging Tidyverse Import Data for the Web APIs, OSF, GitHub Summarize Reshape Filter Select
  2. 3.

    3 COVERED TO DATE 2. Skills for Data Visualization ggplot

    Scatterplots Line Histogram Box Plot ECDF
  3. 4.

    4 COVERED TO DATE 3. Skills for Communicating Results 4.

    Skills for Collaborative Workflow Git/GitHub R Studio Project RMarkdown Knitr
  4. 5.

    5 Curriculum Vitae Building Research Experience Advanced Statistics Course: Used

    R and R Studio for data management and visualization using packages in the Tidyverse including `ggplot2` and ‘dplyr’. Used R projects and GitHub for collaborative project building and R Markdown for communication of results. Statistical modeling included an emphasis on linear and logistic regression. Instructor: Jamie D. Bedics, PhD, ABPP
  5. 7.

    NEXT: MODEL BUILDING 7 You’ll learn how to interpret linear

    regression models with and without interaction terms (moderation) You’ll learn how to interpret logistic regression models with and without interaction terms
  6. 9.

    LINEAR REGRESSION 9 Linear regression allows for the prediction of

    one dependent variable (criterion) from one or more independent variables (predictors). Y=α+βX1 + e Criterion Predictor The Criterion and the Predictor variables come from your dataset. They are variables in your data. Y must be continuous (double, integer) X can be continuous or a factor
  7. 10.

    LINEAR REGRESSION 10 Y=α+βX1 + e Intercept Slope The model

    coefficients describe the relationship between X and Y in your data. For a one predictor model: 1. Intercept is the score of Y when X is 0. 2. Slope is the amount of change in Y for every 1-unit increase in X 3. Error represents what is unexplained in the model error We hope to use these coefficients from our sample to predict future scores
  8. 11.

    LINEAR REGRESSION 11 Y=α+βX1 + e Intercept Slope What we

    interpret: 1. The intercept of the score only makes sense if the value of Y when X is 0 is meaningful (else we center…later). 2. Slope (Unstandardized) 3. Slope (Standardized) 4. Significance of Slopes 5. Confidence Intervals for intercept and slopes 6. R^2
  9. 12.

    MULTIPLE LINEAR REGRESSION 12 Y=α+β1X1 +β2X2+ e Add a variable…New

    Rules for interpreting slope and intercept. Slope is now the change in Y for 1-unit increase in X while other Xi is constant. Intercept is value of Y when all predictors are 0. Second Predictor Second Slope Coefficient
  10. 14.

    14 Intercept α = The value of Y when X

    is zero. Slope B = The amount of increase in Y for every 1-unit increase in X Interpreting R Output using `display` Y=α+βX1 + e Intercept Slope The slope is unstandardized or raw coefficient and so the scale will depend on how the variables are measured.
  11. 15.

    15 SE represents the uncertainty in the data. We can

    say that when the raw (unstandardized) slope coefficient is within 2 standard errors (coef.se) then it is consistent with the null. Here the raw regression coefficient (3.94) is greater than 2*SE (0.74) and so it’s likely statistically significant SE or “noise” raw slope coef. or “signal” Standard Error (coef.se)
  12. 16.

    16 Residual Standard Deviation and R2 The overall fit of

    the model can be determined by the “Residual SD” and “R-Squared.” You can think of the residual standard deviation as a measure of the average distance that each observation falls from its prediction from the model. R SD and R2 are related: where sy is the sd of Y Residual SD or sigma hat R2
  13. 17.

    17 β = For every 1-SD increase in the predictor

    (X), there is a 1-SD*β1 increase in our criterion (Y) Tip: According to APA Style, standardized regression coefficients are denoted by the greek “β” and you use “B” for unstandardized regression coefficients Use the `lm.beta` function from QuantPsyc You’ll need the SD for the predictor and the SD for the criterion to properly interpret. STANDARDIZED REGRESSION COEFFICIENTS Standardization allows comparisons of magnitude across different variables by standardizing the scale of measurement unique to each variable. Y=α+βX1 + e
  14. 18.

    18 Y=α+B1X1 + B2X2 Intercept α = The value of

    Y when X1 & X2 are 0. Slopes B1 = The amount of increase in Y for every 1-unit increase in X1 when X2 is held constant B2 = The amount of increase in Y for every 1-unit increase in X2 when X1 is held constant Multiple Regression
  15. 19.

    19 Another way to think of slopes in multiple regression

    with no interaction β1 = Compares participants with the same score on X2 (the other variable) but differ by1- unit on X1 (the variable you’re trying to interpret) β2 = Compares participants with the same score on X1 but differ by1-unit on X2 Y=α+β1X1 + β2X2
  16. 20.

    20 Y=α+B1X1*B2X2 Intercept α = The same as multiple regression

    w/ no interaction. Slopes β1 = The amount of increase in Y for every 1- unit increase in X1 when X2 is 0 β2 = The amount of increase in Y for every 1-unit increase in X2 when X1 is 0 Multiple Regression w/ Interaction β1: β2 = The interaction coefficient tells you the exact amount you change (+/-) β1 or β2 based on the other variable
  17. 22.

    22 Validity: We often say include all relevant predictors but

    think of the impact and interpretability of your findings. Assumptions and Diagnostics (in decreasing order of importance) Additivity and Linearity: The most serious violation. 1. The expected value of dependent variable is a straight-line function of each independent variable, holding the others fixed. 2. The slope of that line does not depend on the values of the other variables. 3. The effects of different independent variables on the expected value of the dependent variable are additive.
  18. 23.

    23 Independence of Errors: Errors from the prediction line are

    independent. Assumptions and Diagnostics (in decreasing order of importance) Equal Variance of errors: Constant variance of the errors. Consider weighted least squares. Often does not affect the most important parts of the regression model Normality of errors: The least important is that the errors are normally distributed.
  19. 24.

    24 Validity: R not required. Just think before you include

    variables about expected outcomes and what you might finding using unstandardized regression coefficients. Testing Assumptions in R Additivity and Linearity: Diagnose: Residuals versus Predicted: Top Left/Bottom plot(model) Violation of Independence: Diagnose: Durbin-Watson test Vary between 0 and 4 with 2 being no correlation <1 or >3 a concern (>2 is -correlation; <2 is + correlation) durbinWatsonTest(model) or dwt(model)
  20. 25.

    25 Testing Assumptions in R Equal Variance of errors (Homoscedasticity):

    Normality of errors: The least important is that the errors are normally distributed. Diagnose: Levene’s Test; residuals versus predicted scores If significant, variances are different leveneTest(data$variable1, data$variable2) Diagnose: QQ-PLOT; Plot function plot(model) Points should fall closely along the line
  21. 26.

    26 Extra Testing Assumptions in R Multicollinearity Outliers Diagnose: VIF

    or 1/vif (tolerance) vif(model) or 1/vif(model) VIF >10 there is concern Tolerance <.1 or .2 is a concern. Diagnose: Cook’s distance Cook’s >1 cooks.distance(model)
  22. 27.

    27 Include all input variables that, for substantive reasons, might

    be expected to be important in predicting the outcome. General Regression Principles Inputs do not always have to be separate. You might average or sum that can be used as a single predictor in the model For inputs that have large effects, consider including interactions as well.
  23. 28.

    28 Consider the following regarding whether or not to exclude

    a variable: General Regression Principles 1. If a predictor is not statistically significant and has the expected sign, it is generally fine to keep in the model. 2. If a predictor is not statistically significant and does not have the expected sign, then consider removing it from the model. 3. If a predictor is statistically significant and does not have the expected sign, think about whether it makes sense. 4. If significant and in the expected direction, then keep it in!
  24. 30.

    30 ~ Model1 Y=α+βX1 Intercept 84.87 Album Sales (84,870) with

    0 Airplay. Slope For every 1-extra record play, there is an increase of 3.94 (3,940) album sales. We times by 1000 b/c Sales in 1000s in this example only
  25. 31.

    31 Model2 Intercept 196,760 Album Sales for Female Bands Slope

    Male bands have 7,270 less Album Sales compared Female Bands ~ Y=α+β1X1 The SE is 11.43 and times by 2 is 22.86. The coefficient (-7.27) is not greater than 2*SE and so it is not significant
  26. 32.

    32 ~ ~ + + ~ * ~ * Model1

    Model2 Model3 Model4 Model Building Using Linear Regression
  27. 33.

    33 ~ Model1 Intercept The value of y when x

    is zero. Ex. Number of Album Sales when Airplay is Zero Slope The amount of increase in y for every 1-unit increase in x Ex. Expected increase in Album Sales for every 1-unit increase in Airplay
  28. 34.

    34 LINEAR REGRESSION HOMEWORK PRACTICE Can we predict child test

    scores based on mother’s education (HS yes/no) and maternal IQ?