Lecture slides for Lecture 16 of the Saint Louis University Course Quantitative Analysis: Applied Inferential Statistics. These slides cover chi-squared.
no problem set. Please focus on the final project! 1. FRONT MATTER ANNOUNCEMENTS All final deliverables and a “response to reviewer” Issue are due next Monday. See cover pages of vignettes for final deliverables. We will be meeting here next week at 4pm. Please remember that talks are “lightning” talks - no more than 6 minutes - plan to be brief! Lab-15 is due on Monday. Week-16 grades will be posted soon!
used in a pipe) ▸ xvar is the variable name of the first variable you want to analyze (numeric, factor, or character); row variable ▸ yvar is the variable name of the second variable you want to analyze (numeric, factor, or character); column variable 3. CONTINGENCY TABLES IN R CONTINGENCY TABLES Parameters: tabyl(.data, xvar, yvar) f(x)
(for column totals), or both combined together with the concatenate function. 3. CONTINGENCY TABLES IN R ADDING TOTALS Parameters: adorn_totals(where = position) f(x)
Using the cyl and drv variables from ggplot2’s mpg data: > mpg %>% + tabyl(cyl, drv) %>% + adorn_totals(where = “row”) Should be used in a pipeline after the tabyl() function but before any other adornment functions! f(x)
(for column percents), or “all” (for all percentages). 3. CONTINGENCY TABLES IN R ADDING PERCENTAGES Parameters: adorn_percentages(denominator = pctType) f(x)
Using the cyl and drv variables from ggplot2’s mpg data: > mpg %>% + tabyl(cyl, drv) %>% + adorn_totals(where = c(“row”, “col”)) %>% + adorn_percentages(where = “row”) Should be used in a pipeline after the tabyl() function but before any other adornment functions! f(x)
Using the cyl and drv variables from ggplot2’s mpg data: > mpg %>% + tabyl(cyl, drv) %>% + adorn_totals(where = c(“row”, “col”)) %>% + adorn_percentages(where = “row”) %>% + adorn_pct_formatting(val = 3) Should be used in a pipeline after the tabyl() function but before any other adornment functions! f(x)
= position) Using the cyl and drv variables from ggplot2’s mpg data: > mpg %>% + tabyl(cyl, drv) %>% + adorn_totals(where = c(“row”, “col”)) %>% + adorn_percentages(where = “row”) %>% + adorn_pct_formatting(val = 3) %>% + adorn_ns(position = “front”) Should be used in a pipeline after the tabyl() function but before any other adornment functions! f(x)
(nominal or ordinal) data for both x and y 2. Independence between x and y 3. Sample size greater than 30 4. Less than 20% of cells can have an expected count of less than 5 cases, and no cell should have an expected count less than 1 • These are known as the “Cochran conditions” • Cochran acknowledged that 5 was an arbitrary value.
tested; they must both be specified with the data frame and the dollar sign Available in stats Included in base R distributions 4. CHI-SQUARE IN R CHI-SQUARE TEST Parameters: chisq.test(xvar, yvar) f(x)
cyl and drv variable from ggplot2’s mpg data: > chisq.test(mpg$cyl, mpg$drv) <<<<< OUTPUT OMITTED >>>>> Can be used with numeric, factor, or character variables. f(x)
and mpg$drv X-squared = 98.136, df = 6, p-value < 2.2e-16 Warning message: In chisq.test(mpg$cyl, mpg$drv) : Chi-squared approximation may be incorrect 4. CHI-SQUARE IN R
and mpg$drv X-squared = 98.136, df = 6, p-value < 2.2e-16 4. CHI-SQUARE IN R The chi-square test (2 = 98.136, df = 6, p < .001) indicates that there is substantial variation in cylinders by drive train type.
data: mpg$cyl and mpg$drv X-squared = 98.136, df = 6, p-value < 2.2e-16 4. CHI-SQUARE IN R Store the output in a model object, and use the () wrapped around the entire call to simultaneously print output.
4 35.653846 36.692308 8.6538462 5 1.760684 1.811966 0.4273504 6 34.773504 35.786325 8.4401709 8 30.811966 31.709402 7.4786325 4. CHI-SQUARE IN R Does this model meet the Cochran conditions?
35.653846 36.692308 8.6538462 5 1.760684 1.811966 0.4273504 6 34.773504 35.786325 8.4401709 8 30.811966 31.709402 7.4786325 4. CHI-SQUARE IN R It does not: 3 of the 12 cells (or 25%) are less 5, and 1 cell is less than one, violating the rule of thumb laid out by Cochran.
r 4 FALSE FALSE FALSE 5 TRUE TRUE TRUE 6 FALSE FALSE FALSE 8 FALSE FALSE FALSE 4. CHI-SQUARE IN R You can simplify the output if you want by setting up a logical test of each value in the expected matrix.
r 4 FALSE FALSE FALSE 5 FALSE FALSE TRUE 6 FALSE FALSE FALSE 8 FALSE FALSE FALSE 4. CHI-SQUARE IN R You can simplify the output if you want by setting up a logical test of each value in the expected matrix.
tested; they must both be specified with the data frame and the dollar sign ▸ simulate.p.value uses a Monte Carlo simulation process to find the best p-value; the alternative (if FALSE) is far more computationally consuming (in terms of time and computer processing power) 4. CHI-SQUARE IN R FISHER’S EXACT TEST Parameters: fisher.test(xvar, yvar, simulate.p.value = TRUE) f(x)
= TRUE) Using the hwy variable from ggplot2’s mpg data: > fisher.test(mpg$cyl, mpg$drv, simulate.p.value = TRUE) <<<<< OUTPUT OMITTED >>>>> Use this test to fine the p-value if the Cochran conditions are not met. f(x)
Exact Test for Count Data with simulated p-value (based on 2000 replicates) data: mpg$cyl and mpg$drv p-value = 0.0004998 alternative hypothesis: two.sided 4. CHI-SQUARE IN R
Fisher's Exact Test for Count Data with simulated p-value (based on 2000 replicates) data: mpg$cyl and mpg$drv p-value = 0.0004998 alternative hypothesis: two.sided 4. CHI-SQUARE IN R How would you interpret this result?
Exact Test for Count Data with simulated p-value (based on 2000 replicates) data: mpg$cyl and mpg$drv p-value = 0.0004998 alternative hypothesis: two.sided 4. CHI-SQUARE IN R The Fisher’s Exact test (p = .0005) indicates that there is substantial variation in cylinders by drive train type.
- there will be no problem set. Please focus on the final project! All final deliverables and a “response to reviewer” Issue are due next Monday. See cover pages of vignettes for final deliverables. We will be meeting here next week at 4pm. Please remember that talks are “lightning” talks - no more than 6 minutes - plan to be brief! Lab-15 is due on Monday. Week-16 grades will be posted soon!