Slide 1

Slide 1 text

1 LINEAR REGRESSION Jeff Goldsmith, PhD Department of Biostatistics

Slide 2

Slide 2 text

2 • Linear regression is one approach to modeling Modeling R for Data Science

Slide 3

Slide 3 text

3 • Like … seriously. I use regression for everything • Regression covers simple stuff (t-tests) to complex stuff (automated variable selection via penalization) – Yes, I use regression for t-tests Regression is my favorite

Slide 4

Slide 4 text

4 • Linear models

Slide 5

Slide 5 text

5 • Outcome is continuous; predictors can be anything • Continuous predictors are added directly • Categorical predictors require dummy indicator variables – For each non-reference group, a binary (0 / 1) variable indicating group membership for each subject is created and used in the model Predictors

Slide 6

Slide 6 text

6 • Testing

Slide 7

Slide 7 text

7 • Many model assumptions (constant variance, model specification, etc) can be examined using residuals – Look at overall distribution (centered at 0? Skewed? Outliers? – Look at residuals vs predictors (any non-linearity? Trends? Non-constant residual variance?) Diagnostics

Slide 8

Slide 8 text

8 • Generalized linear models

Slide 9

Slide 9 text

9 • lm for linear models • glm for generalized linear models • Arguments include – Formula: y ~ x1 + x2 – Data • Output is complex, and also kind of a mess – Use the broom package! Linear models in R