Slides for Lecture 12 of the Saint Louis University Course Quantitative Analysis: Applied Inferential Statistics. These slides cover the topics related to producing dissemination ready plots with ggplot2 and the basics of linear regression.
a change from the syllabus! Lab 11 is due next Monday - there will be no problem set. Please focus on the final project! 1. FRONT MATTER ANNOUNCEMENTS Aligning our syllabus with GIS and plan for next year - final two problem sets will be waved and given full credit.
mapping = aes(x = displ, y = hwy)) + geom_point(mapping = aes(fill = as.factor(cyl)), pch = 21, size = 4, position = "jitter") + geom_smooth(method = "lm", size = 2) + theme_hc(base_size = 28) + labs( title = "Fuel Efficiency and Engine Size", subtitle = "Select Vehicles Sold in the United States", caption = "Data via ggplot2 package for R", x = "Engine Displacement (litres)", y = "Highway Fuel Efficency (mpg)" ) + theme(legend.key.size = unit(1, units="cm")) + scale_fill_discrete(labels = c("Four", "Five", "Six", "Eight"), name = "Cylinders") Change fill to color if that is the aesthetic mapping used!
i xi + subscript used because the slope of y is dependent on multiple factors is included because we are estimating the line, there may be unexplained variation in y y = a + bx
= constant xi = independent variable i i = beta value of IV i DV = height IV = gender (where FALSE = male & TRUE = female) y = + i xi + yheight = + 1 xfemale +
= constant xi = independent variable i i = beta value of IV i DV = test score IV = grade (where 0 = pre-K, 1 = elementary, & 2 = middle) y = + i xi + yscore = + 1 xelementary + 2 xmiddle +
model sum of squares ▸ SST = total sum of squares 4. BIVARIATE REGRESSION THEORY Let: EXPLAINED VARIATION Estimate of the proportion of the variance of y that the independent variable x “explains”.
must be continuous* 2. x can be binary, ordinal*, or continuous 3. x must have a variance > 0 4. Relationship between x and y is linear 5. y should be normally distributed 6. There should be no significant outliers in x and y
where: • y is the dependent variable • x is the in dependent variable ▸ dataFrame is the data source (can be a tibble) Available in stats Included in base distributions of R 5. BIVARIATE REGRESSION IN R BASIC OLS MODEL Parameters: lm(y ~ x, data = dataFrame) f(x)
where: • y is the dependent variable • x is the in dependent variable ▸ dataFrame is the data source (can be a tibble) 5. BIVARIATE REGRESSION IN R BASIC OLS MODEL Parameters: lm(y ~ x, data = dataFrame) f(x)
x, data = dataFrame) Using the hwy and displ variables from ggplot2’s mpg data: > lm(hwy ~ displ, data = mpg) Save model output into an object for reference later. Output is stored as a list, and contains far more data than what is printed. f(x)
value Pr(>|t|) (Intercept) 35.6977 0.7204 49.55 <2e-16 *** displ -3.5306 0.1945 -18.15 <2e-16 *** 5. BIVARIATE REGRESSION IN R A liter increase in the size of the engine is associated with a 3.531 decrease in highway fuel efficiency (β = -3.531, p < .001). The larger the engine, the smaller the estimated fuel efficiency of the vehicle.