‣ Prediction of 2010 World Cup winners: ‣ Presented with 2 clear plastic boxes, each containing food and marked with fl ag of a team. ‣ Winner: Box which Paul opened fi rst to eat its contents. ‣ Accurately predicted the outcome of 8 games! https://www.youtube.com/watch?v=Ya85knuDzp8 example: Paul the octopus

Paul the Octopus predicted 8 World Cup games, and predicted them all correctly. Does this provide convincing evidence that Paul actually has psychic powers, i.e. that he does better than just randomly guessing? example: Paul the octopus

In context of Paul’s predictions, which of the following does the null hypothesis of “there is nothing going on” maps to? a. Paul does no better than random guessing. b. Paul does better than random guessing. c. Paul predicts all games correctly. d. Paul predicts none of the games correctly. e. Paul predicts 50% of the games correctly. setting the null

In context of Paul’s predictions, which of the following does the null hypothesis of “there is nothing going on” maps to? a. Paul does no better than random guessing. b. Paul does better than random guessing. c. Paul predicts all games correctly. d. Paul predicts none of the games correctly. e. Paul predicts 50% of the games correctly. setting the null

null hypothesis H0: Defendant is innocent alternative hypothesis HA: Defendant is guilty collect data present the evidence “Could these data plausibly have happened by chance if the null hypothesis were true?” judge the evidence Fail to reject H0 yes Reject H0 no burden of proof Image source: http://en.wikipedia.org/wiki/File:Trial_by_Jury_Usher.jpg

Which of the following is not a component of the hypothesis testing framework? a. Start with a null hypothesis that represents the status quo b. Set an alternative hypothesis that represents the research question, i.e. what we’re testing for c. Conduct a hypothesis test under the assumption that the altertnative hypothesis is true d. If the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis e. If the test results suggest that the data do provide convincing evidence for the alternative hypothesis, then reject the null hypothesis in favor of the alternative hypothesis testing framework

a. Start with a null hypothesis that represents the status quo b. Set an alternative hypothesis that represents the research question, i.e. what we’re testing for c. Conduct a hypothesis test under the assumption that the altertnative hypothesis is true d. If the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis e. If the test results suggest that the data do provide convincing evidence for the alternative hypothesis, then reject the null hypothesis in favor of the alternative hypothesis testing framework Which of the following is not a component of the hypothesis testing framework?

Which of the following is the best set of hypotheses associated with the following two claims: “Paul does no better than random guessing” and “Paul does better than random guessing”? a. H0: p = 0 ; HA: p > 0 b. H0: p = 1/8 ; HA: p > 1/8 c. H0: p < 0.5 ; HA: p = 0.5 d. H0: p = 0.5 ; HA: p > 0.5 e. H0: p = 0.5 ; HA: p =1 hypothesis testing framework

a. H0: p = 0 ; HA: p > 0 b. H0: p = 1/8 ; HA: p > 1/8 c. H0: p < 0.5 ; HA: p = 0.5 d. H0: p = 0.5 ; HA: p > 0.5 e. H0: p = 0.5 ; HA: p =1 hypothesis testing framework Which of the following is the best set of hypotheses associated with the following two claims: “Paul does no better than random guessing” and “Paul does better than random guessing”?

null hypothesis Paul does no better than random guessing. “There is nothing going on” alternative hypothesis Paul does better than random guessing. “There is something going on” H0: p = 0.5 HA: p > 0.5 two competing claims

‣ Use a fair coin, and label head as success (correct guess) ‣ One simulation: fl ip the coin 8 times and record the proportion of heads (correct guesses) ‣ Repeat the simulation many times, recording the proportion of heads at each iteration ‣ Calculate the percentage of simulations where the simulated proportion of heads is at least as extreme as the observed proportion Paul the Octopus predicted 8 World Cup games, and predicted them all correctly. Does this provide convincing evidence that Paul actually has psychic powers, i.e. that he does better than just randomly guessing? H0: p = 0.5 HA: p > 0.5 example: Paul the octopus

simulation 1: H H H H H H H T 7 / 8 = 0.875 simulation 2: T H H T H T T T 3 / 8 = 0.375 0 1 0.5 0.25 0.75 simulation 3: T T H H H H T H 5 / 8 = 0.625 simulation 10: T H T H H H H H 6 / 8 = 0.75 … … What proportion of simulations yielded a proportion of success at least as extreme as Paul’s? simulating Paul

Based on the probability that you just calculated, which of the following is the best conclusion of this hypothesis test? a. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. b. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. c. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. d. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. e. None of the above. conclusion of the test

a. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. b. It is likely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. c. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing no better than randomly guessing. d. It is very unlikely to predict 8 or more games correctly if randomly guessing, hence the data suggest that Paul is doing better than randomly guessing. e. None of the above. conclusion of the test Based on the probability that you just calculated, which of the following is the best conclusion of this hypothesis test?

‣ Hypotheses: ‣ H0: p = 0.5 - Paul does no better than random guessing ‣ HA: p > 0.5 - Paul does better than random guessing ‣ Data: Paul predicted 8 out of 8 games correctly ‣ Results: Assuming H0 is true, the probability of obtaining results at least as extreme as Paul’s is almost 0. ‣ Decision: Since this probability is low (lower than 5%), we reject H0 in favor of HA. ‣ This doesn’t mean we proved the alternative hypothesis, just that the data provide convincing evidence for it. making a decision

‣ study considered sex roles, and only allowed for options of “male” and “female.” We should note that the identities being considered are not gender identities and that the study allowed only for a binary classi fi cation of sex. ‣ 48 male bank supervisors given the same personnel fi le, asked to judge whether the person should be promoted ‣ identical fi les, except that half of them indicated the candidate identi fi ed as male and the other half indicated the candidate identi fi ed as female ‣ fi les randomly assigned to managers ‣ 35 / 48 promoted ‣ are females are unfairly discriminated against? example: sex discrimination “Are individuals who identify as female discriminated against in promotion decisions made by their managers who identify as male?”

promotion promoted not promoted total sex male 21 3 24 female 14 10 24 total 35 13 48 % of males promoted = 21/24 ≈ 88% % of females promoted = 14/24 ≈ 58% example: sex discrimination

null hypothesis promotion and gender are independent, no gender discrimination, observed difference in proportions is simply due to chance “There is nothing going on” alternative hypothesis promotion and gender are dependent, there is gender discrimination, observed difference in proportions is not due to chance. “There is something going on” two competing claims

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) [use a deck of playing cards to simulate this experiment]

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) 2. shuf fl e the cards, deal into two groups of size 24, representing males and females [use a deck of playing cards to simulate this experiment]

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) 2. shuf fl e the cards, deal into two groups of size 24, representing males and females 3. count how many number cards are in each group (representing promoted fi les) 4. calculate the proportion of promoted fi les in each group, take the difference (male - female), and record this value [use a deck of playing cards to simulate this experiment]

simulation scheme 1. face card: not promoted, non-face card: promoted ‣ set aside the jokers, consider aces as face cards ‣ take out 3 aces → 13 face cards left in the deck (face cards: A, K, Q, J) ‣ take out a number card → 35 number (non-face) cards left in the deck (number cards: 2-10) 2. shuf fl e the cards, deal into two groups of size 24, representing males and females 3. count how many number cards are in each group (representing promoted fi les) 4. calculate the proportion of promoted fi les in each group, take the difference (male - female), and record this value 5. repeat steps 2 - 4 many times [use a deck of playing cards to simulate this experiment]

‣ Results from the simulations look like the data → the difference between the proportions of promoted fi les between males and females was due to chance (promotion and sex are independent) ‣ Results from the simulations do not look like the data → the difference between the proportions of promoted fi les between males and females was not due to chance, but due to an actual effect of gender (promotion and sex are dependent) making a decision

‣ set a null and an alternative hypothesis ‣ simulate the experiment assuming that the null hypothesis is true ‣ evaluated the probability of observing an outcome at least as extreme as the one observed in the original data ‣ and if this probability is low, reject the null hypothesis in favor of the alternative p-value summary

A plausible range of values for the population parameter is called a con fi dence interval. Net: Photo by ozgurmulazimoglu on Flickr: http://www. fl ickr.com/photos/mulazimoglu/5195133899, CC-A 3.0 http://creativecommons.org/licenses/by/3.0/deed.en Spear fi shing: Photo by Chris Penny on Flickr: http://www. fl ickr.com/photos/clearlydived/7029109617, CC-BY 2.0 http://creativecommons.org/licenses/by/2.0/ ‣ If we report a point estimate, we probably won’t hit the exact population parameter. ‣ If we report a range of plausible values we have a good shot at capturing the parameter.

One of the earliest examples of behavioral asymmetry is a preference in humans for turning the head to the right, rather than to the left, during the fi nal weeks of gestation and for the fi rst 6 months after birth. This is thought to in fl uence subsequent development of perceptual and motor preferences. A study of 124 couples found that 64.5% turned their heads to the right when kissing. The standard error associated with this estimate is roughly 4%. Which of the below is false? (a) A higher sample size would yield a lower standard error. (b) The margin of error for a 95% CI for the percentage of kissers who turn their heads to the right is roughly 8%. (c) The 95% CI for the percentage of kissers who turn their heads to the right is roughly 64.5% ± 4%. (d) The 99.7% CI for the percentage of kissers who turn their heads to the right is roughly 64.5% ± 12%. The Kiss: http://en.wikipedia.org/wiki/File:Gustav_Klimt_016.jpg ✔︎ ✔︎ x ✔︎ Study reference: Gunturkun, O. (2003) Adult persistence of head-turning asymmetry. Nature. Vol 421.

con fi dence level ‣ Then about 95% of those intervals would contain the true population mean (μ). ‣ Commonly used con fi dence levels in practice are 90%, 95%, 98%, and 99%. 24 / 25 = 0.96 µ = 94.52 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ‣ Suppose we took many samples and built a con fi dence interval from each sample using the equation

If we want to be very certain that we capture the population parameter, should we use a wider interval or a narrower interval? µ = 94.52 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

How can we get the best of both worlds — higher precision and higher accuracy? What drawbacks are associated with using a wider interval? Weather icon: Matthew Petroff, http://commons.wikimedia.org/wiki/File:Weather_Icons.png, Creative Commons CC0 1.0 Universal Public Domain Dedication, http://creativecommons.org/about/cc0 Low: -20F / -29C High: 110F / 43 C CL ↑ width ↑ accuracy ↑ precision ↓ increase sample size

The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. In 2010, the survey collected responses from 1,154 US residents. Based on the survey results, a 95% con fi dence interval for the average number of hours Americans have to relax or pursue activities that they enjoy after an average work day was found to be 3.53 to 3.83 hours. Determine if each of the following statements are true or false. (a) 95% of Americans spend 3.53 to 3.83 hours relaxing after a work day. (b) 95% of random samples of 1,154 Americans will yield con fi dence intervals that contain the true average number of hours Americans spend relaxing after a work day. (c) 95% of the time the true average number of hours Americans spend relaxing after a work day is between 3.53 and 3.83 hours. (d) We are 95% con fi dent that Americans in this sample spend on average 3.53 to 3.83 hours relaxing after a work day. F T F F

The General Social Survey asks: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% con fi dence interval of 3.40 to 4.24 days in 2010. Interpret this interval in context of the data. We are 95% con fi dent that Americans on average have 3.40 to 4.24 bad mental health days per month.

95% of random samples of 1,151 Americans will yield CIs that capture the true population mean of number of bad mental health days per month. The General Social Survey asks: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% con fi dence interval of 3.40 to 4.24 days in 2010. Interpret this interval in context of the data In this context, what does a 95% con fi dence level mean?

As CL increases so does the width of the con fi dence interval, so wider. The General Social Survey asks: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% con fi dence interval of 3.40 to 4.24 days in 2010. Interpret this interval in context of the data Suppose the researchers think a 99% con fi dence level would be more appropriate for this interval. Will this new interval be narrower or wider than the 95% con fi dence interval?

A sample of 50 college students were asked how many exclusive relationships they’ve been in so far. The students in the sample had an average of 3.2 exclusive relationships, with a standard deviation of 1.74. In addition, the sample distribution was only slightly skewed to the right. Estimate the true average number of exclusive relationships based on this sample using a 95% con fi dence interval. 1. random sample & 50 < 10% of all college students We can assume that the number of exclusive relationships

one student in the sample has been in is independent of another. 2. n > 30 & not so skewed sample We can assume that the sampling distribution of average number of exclusive relationships from samples of size 50 will be nearly normal. n = 50 s = 1.74 x = 3.2 Heart: http://commons.wikimedia.org/wiki/File:Heart-padlock.svg

n = 50 s = 1.74 x = 3.2 x ± z* SE = 3.2 ± 1.96 (0.246) s n 1.74 50 SE = = ≈ 0.246 = 3.2 ± 0.48 = (2.72, 3.68) We are 95% con fi dent that college students on average have been in

Example: Based on a 2022 Pew Research poll on 5,074 Adults: “We are 95% con fi dent that 68% to 72% of Americans think in fl ation is the biggest problem facing the country.”

‣ 95% of random samples of 5,074 adults will produce con fi dence intervals for the proportion of Americans who think in fl ation is the biggest problem facing the country.

‣ Common misconceptions:

‣ There is a 95% chance that this con fi dence intervals includes the true population proportion.

‣ The true population proportion is in this interval 95% of the time. Source: https://www.pewresearch.org/fact-tank/2022/05/12/by-a-wide-margin-americans-view-in fl ation-as-the-top-problem-facing-the-country-today/ con fi dence intervals

‣ Allows us to describe the unknown true parameter not as a fi xed value but with a probability distribution

‣ This will let us construct something like a con fi dence interval, except we can make probabilistic statements about the parameter falling within that range.

‣ Example: “The posterior distribution yields a 95% credible interval of 68% to 72% for the proportion of Americans who think in fl ation is the biggest problem facing the country.”

‣ These are called credible intervals. Source: http://www.pewsocialtrends.org/2016/02/04/most-americans-say-government-doesnt-do-enough-to-help-middle-class/ credible intervals