Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SOC 4015 & SOC 5050 - Lecture 05

SOC 4015 & SOC 5050 - Lecture 05

Lecture slides for Lecture 05 of the Saint Louis University Course Quantitative Analysis: Applied Inferential Statistics. These slides cover the binomial, poisson, and normal distributions and provide an introduction to statistical significance.

Christopher Prener

September 24, 2018
Tweet

More Decks by Christopher Prener

Other Decks in Education

Transcript

  1. AGENDA QUANTITATIVE ANALYSIS / WEEK 05 / LECTURE 05 1.

    Front Matter 2. Binomial Distribution 3. Poisson Distribution 4. Normal Distribution 5. Statistical Significance 6. Normality Testing 7. Back Matter
  2. ⋆ THEME We want to think 
 systematically about the


    likelihood of observing 
 particular outcomes relative to a known set of possible outcomes.
  3. Lab 04, PS-02, and Lecture Prep 06 are due before

    the next lecture. There was an update on the final project due today as an issue in your final project GitHub repo (DoeProject if your last name is “Doe”) 1. FRONT MATTER ANNOUNCEMENTS
  4. ▸ A sequence of independent trials (n) with constant probability

    of success at each trial (p) where we are interested in the number of successes (x) 2. BINOMIAL DISTRIBUTION DEFINITION
  5. ▸ A sequence of independent trials (n) with constant probability

    of success at each trial (p) where we are interested in the number of successes (x) 2. BINOMIAL DISTRIBUTION DEFINITION
  6. ▸ A sequence of independent trials (n) with constant probability

    of success at each trial (p) where we are interested in the number of successes (x) ▸ Acronym: • B = binary outcome • I = independence • N = fixed sample size • S = same probability 2. BINOMIAL DISTRIBUTION DEFINITION
  7. ▸ A sequence of independent trials (n) with constant probability

    of success at each trial (p) where we are interested in the number of successes (x) ▸ 
 ▸ 
 2. BINOMIAL DISTRIBUTION EMPIRICAL RULE
  8. 2. BINOMIAL DISTRIBUTION R FUNCTIONS stats::dbinom(k, size=n, prob=p) returns probability

    of observing k successes P(X = k) stats::pbinom(k, size=n, prob=p, lower.tail = TRUE) returns probability of observing k or fewer successes P(X ≤ k) stats::pbinom(k, size=n, prob=p, lower.tail = FALSE) returns probability of observing more than k successes P(X > k)
  9. 2. BINOMIAL DISTRIBUTION BINOMIAL WORKFLOW ▸ What is the probability

    of 10 or fewer successes occurring in an sequence of 100 independent trials with a binary outcome where the probability of success is .25 for each trial? ▸ Is the binomial distribution appropriate?
  10. 2. BINOMIAL DISTRIBUTION BINOMIAL WORKFLOW ▸ What is the probability

    of 10 or fewer successes occurring in an sequence of 100 independent trials with a binary outcome where the probability of success is .25 for each trial? ▸ Is the binomial distribution appropriate?
  11. 2. BINOMIAL DISTRIBUTION BINOMIAL WORKFLOW ▸ What is the probability

    of 10 or fewer successes occurring in an sequence of 100 independent trials with a binary outcome where the probability of success is .25 for each trial? ▸ Is the binomial distribution appropriate? ▸ What is the appropriate R function?
  12. 2. BINOMIAL DISTRIBUTION BINOMIAL WORKFLOW ▸ What is the probability

    of 10 or fewer successes occurring in an sequence of 100 independent trials with a binary outcome where the probability of success is .25 for each trial? ▸ Is the binomial distribution appropriate? ▸ What is the appropriate R function?
  13. 2. BINOMIAL DISTRIBUTION R FUNCTIONS stats::dbinom(k, size=n, prob=p) returns probability

    of observing k successes P(X = k) stats::pbinom(k, size=n, prob=p, lower.tail = TRUE) returns probability of observing k or fewer successes P(X ≤ k) stats::pbinom(k, size=n, prob=p, lower.tail = FALSE) returns probability of observing more than k successes P(X > k)
  14. 2. BINOMIAL DISTRIBUTION R FUNCTIONS stats::dbinom(k, size=n, prob=p) returns probability

    of observing k successes P(X = k) stats::pbinom(k, size=n, prob=p, lower.tail = TRUE) returns probability of observing k or fewer successes P(X ≤ k) stats::pbinom(k, size=n, prob=p, lower.tail = FALSE) returns probability of observing more than k successes P(X > k)
  15. 2. BINOMIAL DISTRIBUTION BINOMIAL WORKFLOW ▸ What is the probability

    of 10 or fewer successes occurring in an sequence of 100 independent trials with a binary outcome where the probability of success is .25 for each trial? ▸ Is the binomial distribution appropriate? ▸ What is the appropriate R function? ▸ What is n? What is k? What is p?
  16. 2. BINOMIAL DISTRIBUTION BINOMIAL WORKFLOW ▸ What is the probability

    of 10 or fewer successes occurring in an sequence of 100 independent trials with a binary outcome where the probability of success is .25 for each trial? ▸ Is the binomial distribution appropriate? ▸ What is the appropriate R function? ▸ What is n? What is k? What is p?
  17. 2. BINOMIAL DISTRIBUTION BINOMIAL WORKFLOW ▸ What is the probability

    of 10 or fewer successes occurring in an sequence of 100 independent trials with a binary outcome where the probability of success is .25 for each trial? ▸ Is the binomial distribution appropriate? ▸ What is the appropriate R function? ▸ What is n? What is k? What is p? > pbinom(10, size=100, prob=.25, lower.tail = TRUE) [1] 0.0001371006
  18. ▸ A trial where there are only two outcomes -

    “success” and “failure” ▸ Over the long run (law of large numbers), there is a 50% chance of “success” and a 50% chance of “failure” ▸ Jacob Bernoulli was a Swiss mathematician and professor at the University of Basel 2. BINOMIAL DISTRIBUTION BERNOULLI TRIAL 1654-1705
  19. ▸ Blaise Pascal was a French mathematician ▸ Described a

    process (array) of repeated binomial coefficients that were triangular in shape ▸ Known for centuries in China, Persia, and even in Europe 2. BINOMIAL DISTRIBUTION PASCAL’S TRIANGLE 1623-1662
  20. ▸ Blaise Pascal was a French mathematician ▸ Described a

    process (array) of repeated binomial coefficients that were triangular in shape ▸ Known for centuries in China, Persia, and even in Europe 2. BINOMIAL DISTRIBUTION PASCAL’S TRIANGLE YANG HUI’S TRIANGLE (1303)
  21. ▸ Sir Francis Galton English statistician ▸ Cousin of Charles

    Darwin ▸ A eugenicist who coined the phrase “nature versus nurture” ▸ Demonstrated a number of important statistical ideas, including correlation ▸ Invented what is known as the quincunx or “Galton Box” 2. BINOMIAL DISTRIBUTION “BEAN MACHINE” 1822-1911
  22. ▸ French mathematician and physicist ▸ Laplace’s student ▸ Built

    upon the binomial distribution to describe events that occur with exceptional rarity 3. POISSON DISTRIBUTION SIMÉON POISSON 1781-1840
  23. DEFINITION ▸ Used for events where n is large and

    p is very small, … ▸ …so small that their product approaches a constant we call lambda (). 3. POISSON DISTRIBUTION
  24. ▸ n = a count of independent events ▸ p

    = probability 3. POISSON DISTRIBUTION Let: DEFINITION ▸ n can occur a (theoretically) infinite number of times. ▸ Its only parameter is (np) - the greek letter lambda.
  25. ▸ n = a count of independent events ▸ p

    = probability 3. POISSON DISTRIBUTION Let: DEFINITION ▸ n can occur a (theoretically) infinite number of times. ▸ Its only parameter is (np) - the greek letter lambda.
  26. 3. POISSON DISTRIBUTION R FUNCTIONS stats::dpois(k, lambda=m) returns probability of

    observing k successes P(X = k) stats::ppois(k, lambda=m, lower.tail = TRUE) returns probability of observing k or fewer successes P(X ≤ k) stats::ppois(k, lambda=m, lower.tail = FALSE) returns probability of observing more than k successes P(X > k)
  27. POISSON WORKFLOW ▸ The probability of a car accident at

    a given intersection is 0.00004. In a typical week, 100,000 cars pass through the intersection. What is the probability of observing 6 car accidents in a single week at this intersection? • Is the poisson distribution appropriate? 3. POISSON DISTRIBUTION
  28. POISSON WORKFLOW ▸ The probability of a car accident at

    a given intersection is 0.00004. In a typical week, 100,000 cars pass through the intersection. What is the probability of observing 6 car accidents in a single week at this intersection? • Is the poisson distribution appropriate? 3. POISSON DISTRIBUTION
  29. POISSON WORKFLOW ▸ The probability of a car accident at

    a given intersection is 0.00004. In a typical week, 100,000 cars pass through the intersection. What is the probability of observing 6 car accidents in a single week at this intersection? • Is the poisson distribution appropriate? • What is the appropriate R function? 3. POISSON DISTRIBUTION
  30. POISSON WORKFLOW ▸ The probability of a car accident at

    a given intersection is 0.00004. In a typical week, 100,000 cars pass through the intersection. What is the probability of observing 6 car accidents in a single week at this intersection? • Is the poisson distribution appropriate? • What is the appropriate R function? 3. POISSON DISTRIBUTION
  31. 3. POISSON DISTRIBUTION R FUNCTIONS stats::dpois(k, lambda=m) returns probability of

    observing k successes P(X = k) stats::ppois(k, lambda=m, lower.tail = TRUE) returns probability of observing k or fewer successes P(X ≤ k) stats::ppois(k, lambda=m, lower.tail = FALSE) returns probability of observing more than k successes P(X > k)
  32. 3. POISSON DISTRIBUTION R FUNCTIONS stats::dpois(k, lambda=m) returns probability of

    observing k successes P(X = k) stats::ppois(k, lambda=m, lower.tail = TRUE) returns probability of observing k or fewer successes P(X ≤ k) stats::ppois(k, lambda=m, lower.tail = FALSE) returns probability of observing more than k successes P(X > k)
  33. POISSON WORKFLOW ▸ The probability of a car accident at

    a given intersection is 0.00004. In a typical week, 100,000 cars pass through the intersection. What is the probability of observing 6 car accidents in a single week at this intersection? • Is the poisson distribution appropriate? • What is the appropriate R function? • What is ? What is k? 3. POISSON DISTRIBUTION
  34. POISSON WORKFLOW ▸ The probability of a car accident at

    a given intersection is 0.00004. In a typical week, 100,000 cars pass through the intersection. What is the probability of observing 6 car accidents in a single week at this intersection? • Is the poisson distribution appropriate? • What is the appropriate R function? • What is ? What is k? 3. POISSON DISTRIBUTION
  35. POISSON WORKFLOW ▸ The probability of a car accident at

    a given intersection is 0.00004. In a typical week, 100,000 cars pass through the intersection. What is the probability of observing 6 car accidents in a single week at this intersection? • Is the poisson distribution appropriate? • What is the appropriate R function? • What is ? What is k? 3. POISSON DISTRIBUTION > dpois(6, lambda=4) [1] 0.1041956
  36. ▸ First suggested by French mathematician Abraham de Moivre as

    a an outcome of the binomial distribution 4. NORMAL DISTRIBUTION HISTORY 1667-1754
  37. ▸ First suggested by French mathematician Abraham de Moivre as

    a an outcome of the binomial distribution ▸ Carl Friedrich Gauss, a German mathematician who had an immensely influential career, demonstrated its importance in 1809 4. NORMAL DISTRIBUTION HISTORY 1777-1855
  38. ▸ First suggested by French mathematician Abraham de Moivre as

    a an outcome of the binomial distribution ▸ Carl Friedrich Gauss, a German mathematician who had an immensely influential career, demonstrated its importance in 1809 ▸ Pierre Simon Laplace also made significant contributions to its usefulness beginning in 1810 4. NORMAL DISTRIBUTION HISTORY 1749-1827
  39. ▸ As opposed to the binomial and Poisson distributions, the

    normal is a continuous probability function ▸ Can take on an infinite range of values ( -∞ < x < ∞ ) ▸ Symmetric around , which has same value as median and mode ▸ Spread of distribution determined by ▸ Standard normal has = 0 and 
 = 1 4. NORMAL DISTRIBUTION DEFINITION
  40. 4. NORMAL DISTRIBUTION Z-SCORES ▸ The value of an observation

    expressed in standard deviation units. ▸ or stats::pnorm(z, mean=0, sd=1,
 lower.tail=TRUE) returns the cumulative probability under the standard normal distribution P(X ≤ z)
  41. ▸ A series of surveys conducted over the past month

    found that support for a new trade policy was 41.5% with a standard deviation of 2.25. What is the probability that a randomly selected poll shows support for the trade policy to be 46%? • Is the normal distribution appropriate? 4. NORMAL DISTRIBUTION Z-SCORE WORKFLOW
  42. ▸ A series of surveys conducted over the past month

    found that support for a new trade policy was 41.5% with a standard deviation of 2.25. What is the probability that a randomly selected poll shows support for the trade policy to be 46%? • Is the normal distribution appropriate? • What is z? 4. NORMAL DISTRIBUTION Z-SCORE WORKFLOW
  43. ▸ A series of surveys conducted over the past month

    found that support for a new trade policy was 41.5% with a standard deviation of 2.25. What is the probability that a randomly selected poll shows support for the trade policy to be 46%? • Is the normal distribution appropriate? • What is z? 4. NORMAL DISTRIBUTION Z-SCORE WORKFLOW
  44. ▸ A series of surveys conducted over the past month

    found that support for a new trade policy was 41.5% with a standard deviation of 2.25. What is the probability that a randomly selected poll shows support for the trade policy to be 46%? • Is the normal distribution appropriate? • What is z? 4. NORMAL DISTRIBUTION Z-SCORE WORKFLOW > pnorm(2, mean=0, sd=1, lower.tail=TRUE) [1] 0.9772499
  45. ▸ English mathematician ▸ Student of Sir Francis Galton (and,

    like Galton, he was a eugenicist and social Darwinist) 5. STATISTICAL SIGNIFICANCE KARL PEARSON 1857-1936
  46. ▸ English mathematician ▸ Student of Sir Francis Galton (and,

    like Galton, he was a eugenicist and social Darwinist) 5. STATISTICAL SIGNIFICANCE KARL PEARSON 1857-1936
  47. ▸ English mathematician ▸ Student of Sir Francis Galton (and,

    like Galton, he was a eugenicist and social Darwinist) ▸ Formalized the concept of the 
 “p-value” around 1914 ▸ Based on Laplace’s earlier use of the idea ▸ Also introduced “moments”, histograms, and a number of other concepts we’ll get to this semester! 5. STATISTICAL SIGNIFICANCE KARL PEARSON 1857-1936
  48. ▸ English mathematician and biologist ▸ Like Galton and Pearson,

    he was a eugenicist and social Darwinist ▸ Introduced the concept of the null hypothesis… ▸ … and popularized the idea of statistical significance including the selection of p = .05. ▸ Responsible for movement away from Bayesian analyses because of his preference for objectivity. 5. STATISTICAL SIGNIFICANCE R.A. FISHER 1890-1962
  49. INFORMALLY, A P-VALUE IS THE PROBABILITY UNDER A SPECIFIED STATISTICAL

    MODEL THAT A STATISTICAL SUMMARY OF THE DATA WOULD BE EQUAL TO OR MORE EXTREME THAN ITS OBSERVED VALUE. American Statistical Association “Statement on p-values”
 (2016)
  50. THE PROBABILITY OF GETTING RESULTS AT LEAST AS EXTREME AS

    THE ONES YOU OBSERVED, GIVEN THAT THE NULL HYPOTHESIS IS CORRECT Christie Aschwanden "Not Even Scientists Can Easily Explain P-values"
 (2015)
  51. IMAGINE, HE SAID, THAT YOU HAVE A COIN THAT YOU

    SUSPECT IS WEIGHTED TOWARD HEADS. (YOUR NULL HYPOTHESIS IS THEN THAT THE COIN IS FAIR.) YOU FLIP IT 100 TIMES AND GET MORE HEADS THAN TAILS. THE P-VALUE WON’T TELL YOU WHETHER THE COIN IS FAIR, BUT IT WILL TELL YOU THE PROBABILITY THAT YOU’D GET AT LEAST AS MANY HEADS AS YOU DID IF THE COIN WAS FAIR. THAT’S IT — NOTHING MORE. Christie Aschwanden "Not Even Scientists Can Easily Explain P-values"
 (2015)
  52. 95%

  53. 5%

  54. 95%

  55. 5%

  56. z 1.64 2.33 3.09 0.05 0.01 0.001 % of scores

    < 95% 99% 99.9% % of scores > 5% 1% 0.1%
  57. z 1.64 2.33 3.09 p < 0.05 0.01 0.001 %

    of scores < 95% 99% 99.9% % of scores > 5% 1% 0.1%
  58. 95%

  59. 5%

  60. z 1.96 2.58 3.29 0.025 0.005 0.0005 % of scores

    < 97.5% 99.5% 99.95% % of scores > 2.5% 0.5% 0.05%
  61. z 1.96 2.58 3.29 0.05 0.01 0.001 % of scores

    inside 95% 99% 99.9% % of scores outside 5% 1% 0.1%
  62. z 1.96 2.58 3.29 p < 0.05 0.01 0.001 %

    of scores inside 95% 99% 99.9% % of scores outside 5% 1% 0.1%
  63. Q: WHY DO SO MANY COLLEGES AND GRAD SCHOOLS TEACH

    P = 0.05? A: BECAUSE THAT’S STILL WHAT THE SCIENTIFIC COMMUNITY AND JOURNAL EDITORS USE.
 Q: WHY DO SO MANY PEOPLE STILL USE P = 0.05? A: BECAUSE THAT’S WHAT THEY WERE TAUGHT IN COLLEGE OR GRAD SCHOOL. George Cobb, Ph.D. “Statement on p-values”
 (2015)
  64. WE TEACH IT BECAUSE IT’S WHAT WE DO; WE DO

    IT BECAUSE IT’S WHAT WE TEACH. George Cobb, Ph.D. “Statement on p-values”
 (2015)
  65. ▸ = sample mean ▸ i = lower bound ▸

    n = sample size ▸ xi = a given value in the vector 6. NORMALITY TESTING Let: SKEW ¯ x
  66. ▸ “Third moment” of the normal distribution ▸ Measure of

    the asymmetry of a continuous distribution ▸ Normal distributions have sk = 0 ▸ sk > 0 indicates a longer right tail relative to the left tail’s length ▸ sk < 0 indicates a longer left tail relative to the right tail’s length ▸ Values < -2 or > 2 are indicators of a non-normal distribution 6. NORMALITY TESTING SKEW
  67. ▸ = sample mean ▸ i = lower bound ▸

    n = sample size ▸ xi = a given value in the vector 6. NORMALITY TESTING Let: KURTOSIS ¯ x
  68. ▸ “Fourth moment” of the normal distribution ▸ Measure of

    the “weight” of the tails - how many observations fall in the tails relative to the center ▸ Normal distributions have k = 3 where k is always positive and k ≥ 1. ▸ Normal distributions have k = 0 excess kurtosis (ek = k-3) where ek can be negative. ▸ k > 5 is one rule of thumb for problematic distributions 6. NORMALITY TESTING KURTOSIS
  69. QUANTILE-QUANTILE PLOT ▸ Known as the “q-q” plot for short

    ▸ Plots a given variable against a theoretical distribution (such as the standard normal distribution) as a diagnostic test ▸ Look for how your variable (the points) sits relative to the normal distribution (the 45-degree line). Pay particular attention to the tails. 6. NORMALITY TESTING
  70. 6. NORMALITY TESTING SHAPIRO-FRANCIA TEST Data are not markedly different

    from the normal distribution. H0 Data are markedly different from the normal distribution. HA
  71. 6. NORMALITY TESTING SHAPIRO-FRANCIA TEST Data are not markedly different

    from the normal distribution. H0 Data are markedly different from the normal distribution. HA If the p value associated with the 
 test statistic is greater than .05…
  72. 6. NORMALITY TESTING SHAPIRO-FRANCIA TEST Data are not markedly different

    from the normal distribution. H0 Data are markedly different from the normal distribution. HA If the p value associated with the 
 test statistic is less than .05…
  73. AGENDA REVIEW 7. BACK MATTER 2. Binomial Distribution 3. Poisson

    Distribution 4. Normal Distribution 5. Statistical Significance 6. Normality Testing
  74. REMINDERS 7. BACK MATTER Lab 04, PS-02, and Lecture Prep

    06 are due before the next lecture. There was an update on the final project due today as an issue in your final project GitHub repo (DoeProject if your last name is “Doe”)