Slide 1

Slide 1 text

an overview of statistical inference Dr. Mine Çetinkaya-Rundel Duke University

Slide 2

Slide 2 text

slides at bit.ly/lmu-inference

Slide 3

Slide 3 text

hypothesis testing

Slide 4

Slide 4 text

‣ Prediction of 2010 World Cup winners: ‣ Presented with 2 clear plastic boxes, each containing food and marked with fl ag of a team. ‣ Winner: Box which Paul opened fi rst to eat its contents. ‣ Accurately predicted the outcome of 8 games! https://www.youtube.com/watch?v=Ya85knuDzp8 example: Paul the octopus

Slide 5

Slide 5 text

Paul the Octopus predicted 8 World Cup games, and predicted them all correctly. Does this provide convincing evidence that Paul actually has psychic powers, i.e. that he does better than just randomly guessing? example: Paul the octopus

Slide 6

Slide 6 text

null hypothesis “There is nothing going on” alternative hypothesis “There is something going on” two competing claims

Slide 7

Slide 7 text

In context of Paul’s predictions, which of the following does the null hypothesis of “there is nothing going on” maps to? a. Paul does no better than random guessing. b. Paul does better than random guessing. c. Paul predicts all games correctly. d. Paul predicts none of the games correctly. e. Paul predicts 50% of the games correctly. setting the null

Slide 8

Slide 8 text

In context of Paul’s predictions, which of the following does the null hypothesis of “there is nothing going on” maps to? a. Paul does no better than random guessing. b. Paul does better than random guessing. c. Paul predicts all games correctly. d. Paul predicts none of the games correctly. e. Paul predicts 50% of the games correctly. setting the null

Slide 9

Slide 9 text

null hypothesis H0: Defendant is innocent alternative hypothesis HA: Defendant is guilty collect data present the evidence “Could these data plausibly have happened by chance if the null hypothesis were true?” judge the evidence Fail to reject H0 yes Reject H0 no burden of proof Image source: http://en.wikipedia.org/wiki/File:Trial_by_Jury_Usher.jpg

Slide 10

Slide 10 text

Which of the following is not a component of the hypothesis testing framework? a. Start with a null hypothesis that represents the status quo b. Set an alternative hypothesis that represents the research question, i.e. what we’re testing for c. Conduct a hypothesis test under the assumption that the altertnative hypothesis is true d. If the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis e. If the test results suggest that the data do provide convincing evidence for the alternative hypothesis, then reject the null hypothesis in favor of the alternative hypothesis testing framework

Slide 11

Slide 11 text

a. Start with a null hypothesis that represents the status quo b. Set an alternative hypothesis that represents the research question, i.e. what we’re testing for c. Conduct a hypothesis test under the assumption that the altertnative hypothesis is true d. If the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis e. If the test results suggest that the data do provide convincing evidence for the alternative hypothesis, then reject the null hypothesis in favor of the alternative hypothesis testing framework Which of the following is not a component of the hypothesis testing framework?

Slide 12

Slide 12 text

Which of the following is the best set of hypotheses associated with the following two claims: “Paul does no better than random guessing” and “Paul does better than random guessing”? a. H0: p = 0 ; HA: p > 0 b. H0: p = 1/8 ; HA: p > 1/8 c. H0: p < 0.5 ; HA: p = 0.5 d. H0: p = 0.5 ; HA: p > 0.5 e. H0: p = 0.5 ; HA: p =1 hypothesis testing framework

Slide 13

Slide 13 text

a. H0: p = 0 ; HA: p > 0 b. H0: p = 1/8 ; HA: p > 1/8 c. H0: p < 0.5 ; HA: p = 0.5 d. H0: p = 0.5 ; HA: p > 0.5 e. H0: p = 0.5 ; HA: p =1 hypothesis testing framework Which of the following is the best set of hypotheses associated with the following two claims: “Paul does no better than random guessing” and “Paul does better than random guessing”?

Slide 14

Slide 14 text

null hypothesis Paul does no better than random guessing. “There is nothing going on” alternative hypothesis Paul does better than random guessing. “There is something going on” H0: p = 0.5 HA: p > 0.5 two competing claims

Slide 15

Slide 15 text

‣ Use a fair coin, and label head as success (correct guess) ‣ One simulation: fl ip the coin 8 times and record the proportion of heads (correct guesses) ‣ Repeat the simulation many times, recording the proportion of heads at each iteration ‣ Calculate the percentage of simulations where the simulated proportion of heads is at least as extreme as the observed proportion Paul the Octopus predicted 8 World Cup games, and predicted them all correctly. Does this provide convincing evidence that Paul actually has psychic powers, i.e. that he does better than just randomly guessing? H0: p = 0.5 HA: p > 0.5 example: Paul the octopus

Slide 16

Slide 16 text

simulation 1: H H H H H H H T 7 / 8 = 0.875 simulation 2: T H H T H T T T 3 / 8 = 0.375 0 1 0.5 0.25 0.75 simulation 3: T T H H H H T H 5 / 8 = 0.625 simulation 10: T H T H H H H H 6 / 8 = 0.75 … … What proportion of simulations yielded a proportion of success at least as extreme as Paul’s? simulating Paul

Slide 17

Slide 17 text

Based on the probability that you just calculated, which of the following is the best conclusion of this hypothesis test? a. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. b. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. c. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. d. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. e. None of the above. conclusion of the test

Slide 18

Slide 18 text

a. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. b. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. c. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. d. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. e. None of the above. conclusion of the test Based on the probability that you just calculated, which of the following is the best conclusion of this hypothesis test?

Slide 19

Slide 19 text

‣ Hypotheses: ‣ H0: p = 0.5 - Paul does no better than random guessing ‣ HA: p > 0.5 - Paul does better than random guessing ‣ Data: Paul predicted 8 out of 8 games correctly ‣ Results: Assuming H0 is true, the probability of obtaining results at least as extreme as Paul’s is almost 0. ‣ Decision: Since this probability is low (lower than 5%), we reject H0 in favor of HA. ‣ This doesn’t mean we proved the alternative hypothesis, just that the data provide convincing evidence for it. making a decision

Slide 20

Slide 20 text

‣ study considered sex roles, and only allowed for options of “male” and “female.” We should note that the identities being considered are not gender identities and that the study allowed only for a binary classi fi cation of sex. ‣ 48 male bank supervisors given the same personnel fi le, asked to judge whether the person should be promoted ‣ identical fi les, except that half of them indicated the candidate identi fi ed as male and the other half indicated the candidate identi fi ed as female ‣ fi les randomly assigned to managers ‣ 35 / 48 promoted ‣ are females are unfairly discriminated against? example: sex discrimination “Are individuals who identify as female discriminated against in promotion decisions made by their managers who identify as male?”

Slide 21

Slide 21 text

promotion promoted not promoted total sex male 21 3 24 female 14 10 24 total 35 13 48 % of males promoted = 21/24 ≈ 88% % of females promoted = 14/24 ≈ 58% example: sex discrimination

Slide 22

Slide 22 text

null hypothesis promotion and gender are independent, no gender discrimination, observed difference in proportions is simply due to chance “There is nothing going on” alternative hypothesis promotion and gender are dependent, there is gender discrimination, observed difference in proportions is not due to chance. “There is something going on” two competing claims

Slide 23

Slide 23 text

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) [use a deck of playing cards to simulate this experiment]

Slide 24

Slide 24 text

Step 1: Image source: http://www.j fi tz.com/cards/

Slide 25

Slide 25 text

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) 2. shuf fl e the cards, deal into two groups of size 24, representing males and females [use a deck of playing cards to simulate this experiment]

Slide 26

Slide 26 text

Step 2: Image source: http://www.j fi tz.com/cards/

Slide 27

Slide 27 text

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) 2. shuf fl e the cards, deal into two groups of size 24, representing males and females 3. count how many number cards are in each group (representing promoted fi les) 4. calculate the proportion of promoted fi les in each group, take the difference (male - female), and record this value [use a deck of playing cards to simulate this experiment]

Slide 28

Slide 28 text

Steps 3&4: Image source: http://www.j fi tz.com/cards/

Slide 29

Slide 29 text

0 0.2 0.4 -0.4 -0.2 x

Slide 30

Slide 30 text

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) 2. shuf fl e the cards, deal into two groups of size 24, representing males and females 3. count how many number cards are in each group (representing promoted fi les) 4. calculate the proportion of promoted fi les in each group, take the difference (male - female), and record this value 5. repeat steps 2 - 4 many times [use a deck of playing cards to simulate this experiment]

Slide 31

Slide 31 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Difference in promotion rates −0.4 −0.2 0 0.2 0.4

Slide 32

Slide 32 text

‣ Results from the simulations look like the data → the difference between the proportions of promoted fi les between males and females was due to chance (promotion and sex are independent) ‣ Results from the simulations do not look like the data → the difference between the proportions of promoted fi les between males and females was not due to chance, but due to an actual effect of gender (promotion and sex are dependent) making a decision

Slide 33

Slide 33 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Difference in promotion rates −0.4 −0.2 0 0.2 0.4

Slide 34

Slide 34 text

‣ set a null and an alternative hypothesis ‣ simulate the experiment assuming that the null hypothesis is true ‣ evaluated the probability of observing an outcome at least as extreme as the one observed in the original data ‣ and if this probability is low, reject the null hypothesis in favor of the alternative p-value summary

Slide 35

Slide 35 text

con fi dence intervals

Slide 36

Slide 36 text

A plausible range of values for the population parameter is called a con fi dence interval. Net: Photo by ozgurmulazimoglu on Flickr: http://www. fl ickr.com/photos/mulazimoglu/5195133899, CC-A 3.0 http://creativecommons.org/licenses/by/3.0/deed.en Spear fi shing: Photo by Chris Penny on Flickr: http://www. fl ickr.com/photos/clearlydived/7029109617, CC-BY 2.0 http://creativecommons.org/licenses/by/2.0/ ‣ If we report a point estimate, we probably won’t hit the exact population parameter. ‣ If we report a range of plausible values we have a good shot at capturing the parameter.

Slide 37

Slide 37 text

x Central Limit Theorem (CLT): x ±2SE approximate 95% CI: µ − 3σ µ − 2σ µ − σ µ µ + σ µ + 2σ µ + 3σ 99.7% 95% 68% margin of error (ME)

Slide 38

Slide 38 text

One of the earliest examples of behavioral asymmetry is a preference in humans for turning the head to the right, rather than to the left, during the fi nal weeks of gestation and for the fi rst 6 months after birth. This is thought to in fl uence subsequent development of perceptual and motor preferences. A study of 124 couples found that 64.5% turned their heads to the right when kissing. The standard error associated with this estimate is roughly 4%. Which of the below is false? (a) A higher sample size would yield a lower standard error. (b) The margin of error for a 95% CI for the percentage of kissers who turn their heads to the right is roughly 8%. (c) The 95% CI for the percentage of kissers who turn their heads to the right is roughly 64.5% ± 4%. (d) The 99.7% CI for the percentage of kissers who turn their heads to the right is roughly 64.5% ± 12%. The Kiss: http://en.wikipedia.org/wiki/File:Gustav_Klimt_016.jpg ✔︎ ✔︎ x ✔︎ Study reference: Gunturkun, O. (2003) Adult persistence of head-turning asymmetry. Nature. Vol 421.

Slide 39

Slide 39 text

con fi dence level ‣ Then about 95% of those intervals would contain the true population mean (μ). ‣ Commonly used con fi dence levels in practice are 90%, 95%, 98%, and 99%. 24 / 25 = 0.96 µ = 94.52 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ‣ Suppose we took many samples and built a con fi dence interval from each sample using the equation

Slide 40

Slide 40 text

If we want to be very certain that we capture the population parameter, should we use a wider interval or a narrower interval? µ = 94.52 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Slide 41

Slide 41 text

standard deviations from the mean −3 −2 −1 0 1 2 3 95%, extends −1.96 to 1.96 99%, extends −2.58 to 2.58 CL ↑ width ↑

Slide 42

Slide 42 text

How can we get the best of both worlds — higher precision and higher accuracy? What drawbacks are associated with using a wider interval? Weather icon: Matthew Petroff, http://commons.wikimedia.org/wiki/File:Weather_Icons.png, Creative Commons CC0 1.0 Universal Public Domain Dedication, http://creativecommons.org/about/cc0 Low: -20F / -29C High: 110F / 43 C CL ↑ width ↑ accuracy ↑ precision ↓ increase sample size

Slide 43

Slide 43 text

The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. In 2010, the survey collected responses from 1,154 US residents. Based on the survey results, a 95% con fi dence interval for the average number of hours Americans have to relax or pursue activities that they enjoy after an average work day was found to be 3.53 to 3.83 hours. Determine if each of the following statements are true or false. (a) 95% of Americans spend 3.53 to 3.83 hours relaxing after a work day. (b) 95% of random samples of 1,154 Americans will yield con fi dence intervals that contain the true average number of hours Americans spend relaxing after a work day. (c) 95% of the time the true average number of hours Americans spend relaxing after a work day is between 3.53 and 3.83 hours. (d) We are 95% con fi dent that Americans in this sample spend on average 3.53 to 3.83 hours relaxing after a work day. F T F F

Slide 44

Slide 44 text

The General Social Survey asks: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% con fi dence interval of 3.40 to 4.24 days in 2010. Interpret this interval in context of the data. We are 95% con fi dent that Americans on average have 3.40 to 4.24 bad mental health days per month.

Slide 45

Slide 45 text

95% of random samples of 1,151 Americans will yield CIs that capture the true population mean of number of bad mental health days per month. The General Social Survey asks: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% con fi dence interval of 3.40 to 4.24 days in 2010. Interpret this interval in context of the data In this context, what does a 95% con fi dence level mean?

Slide 46

Slide 46 text

As CL increases so does the width of the con fi dence interval, so wider. The General Social Survey asks: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% con fi dence interval of 3.40 to 4.24 days in 2010. Interpret this interval in context of the data Suppose the researchers think a 99% con fi dence level would be more appropriate for this interval. Will this new interval be narrower or wider than the 95% con fi dence interval?

Slide 47

Slide 47 text

A sample of 50 college students were asked how many exclusive relationships they’ve been in so far. The students in the sample had an average of 3.2 exclusive relationships, with a standard deviation of 1.74. In addition, the sample distribution was only slightly skewed to the right. Estimate the true average number of exclusive relationships based on this sample using a 95% con fi dence interval. 1. random sample & 50 < 10% of all college students We can assume that the number of exclusive relationships one student in the sample has been in is independent of another. 2. n > 30 & not so skewed sample We can assume that the sampling distribution of average number of exclusive relationships from samples of size 50 will be nearly normal. n = 50 s = 1.74 x = 3.2 Heart: http://commons.wikimedia.org/wiki/File:Heart-padlock.svg

Slide 48

Slide 48 text

n = 50 s = 1.74 x = 3.2 x ± z* SE = 3.2 ± 1.96 (0.246) s n 1.74 50 SE = = ≈ 0.246 = 3.2 ± 0.48 = (2.72, 3.68) We are 95% con fi dent that college students on average have been in 2.72 to 3.68 exclusive relationships.

Slide 49

Slide 49 text

an overview of statistical inference frequentist we just completed…

Slide 50

Slide 50 text

bayesian inference a mini foray into

Slide 51

Slide 51 text

P(E) = lim n!1 nE n frequentist de fi nition of probability

Slide 52

Slide 52 text

‣ Indifferent between winning ‣ $1 if event E occurs, or ‣ winning $1 if you draw a blue chip from a box with 1,000 × p blue chips +1,000 × (1-p) white chips ‣ Equating the probability of event E, P(E), to the probability of drawing a blue chip from this box, p P(E) = p bayesian de fi nition of probability

Slide 53

Slide 53 text

Example: Based on a 2022 Pew Research poll on 5,074 Adults: “We are 95% con fi dent that 68% to 72% of Americans think in fl ation is the biggest problem facing the country.” ‣ 95% of random samples of 5,074 adults will produce con fi dence intervals for the proportion of Americans who think in fl ation is the biggest problem facing the country. ‣ Common misconceptions: ‣ There is a 95% chance that this con fi dence intervals includes the true population proportion. ‣ The true population proportion is in this interval 95% of the time. Source: https://www.pewresearch.org/fact-tank/2022/05/12/by-a-wide-margin-americans-view-in fl ation-as-the-top-problem-facing-the-country-today/ con fi dence intervals

Slide 54

Slide 54 text

‣ Allows us to describe the unknown true parameter not as a fi xed value but with a probability distribution ‣ This will let us construct something like a con fi dence interval, except we can make probabilistic statements about the parameter falling within that range. ‣ Example: “The posterior distribution yields a 95% credible interval of 68% to 72% for the proportion of Americans who think in fl ation is the biggest problem facing the country.” ‣ These are called credible intervals. Source: http://www.pewsocialtrends.org/2016/02/04/most-americans-say-government-doesnt-do-enough-to-help-middle-class/ credible intervals

Slide 55

Slide 55 text

slides at bit.ly/lmu-inference thank you!