did not replicate Neuroscience Errors in popular statistical methods imply false positive rate of up to 70% (Eklund et al. 2016, PNAS) (Open Science Collaboration 2015, Science) Psychology 61/100 studies in top journals failed to replicate (p < .05) Most fields? (Baker 2016, Nature)
for they often endure long; but false views, if supported by some evidence, do little harm, for every one takes a salutary pleasure in proving their falseness; and when this is done, one path towards error is closed and the road to truth is often at the same time opened. –Darwin 1871 The Descent of Man
of hypothesis Probability of result 1 – β α β 1 – α + – positive results negative results True (T) False (T) Unknown Positive (+) Negative (–) General case General case (+ or –) F
0.05 Probability of true positive finding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this finding indicates a true hypothesis? Most common answer: 0.95 (Eddy 1982; Gigerenzer & Hoffrage 1995)
Probability of false positive finding is 5%. Pr(+|F) = 0.05 Probability of true positive finding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this finding indicates a true hypothesis?
base rate Assume Probability of false positive finding is 5%. Pr(+|F) = 0.05 Probability of true positive finding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this finding indicates a true hypothesis?
of hypothesis Probability of result 1 – β α β 1 – α + – positive results negative results True (T) False (T) Unknown Positive (+) Negative (–) General case General case (+ or –) F
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientific community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientific discovery. PLOS ONE 10(8):e0136088.
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientific community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientific discovery. PLOS ONE 10(8):e0136088.
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientific community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientific discovery. PLOS ONE 10(8):e0136088.
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientific community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientific discovery. PLOS ONE 10(8):e0136088.
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientific community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientific discovery. PLOS ONE 10(8):e0136088.
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientific community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F Recursions: Solutions: McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientific discovery. PLOS ONE 10(8):e0136088.
False positives and ambiguous results • Negative results aren’t published ➡ Lower information content of literature • Misunderstanding of statistical techniques ➡ False positives and ambiguous results • Surprising, easily understood results easiest to publish ➡ Lowering base rate Why Isn’t Science Better?
Nature Biotech.) “Part of the problem is that no-one is incentivised to be right. Instead, scientists are incentivised to be productive and innovative.” –Richard Horton The Lancet April 2015
a good measure. Donald T. Campbell 1916–1996 The more any quantitative social indicator is used for social decision- making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. –Campbell 1976
(Franzoni et al. 2011, Science) PLOS ONE $984 PNAS $3,513 Nature, Science $43,783 Average amount paid to first author in China in 2016 (in USD) (Quan et al. 2017, Aslib J Inform Manag) Similar incentives in many other countries • India • Malaysia • Korea • Turkey • Venezuela • Chile
• Each lab has characteristic methodological power, Pr(+|T) • Increasing power also increases false positives, unless effort is exerted • Effort increases the time between results • Novel negative results tough to publish • Labs that publish more are more likely to have their methods “reproduced” in new labs • Two phases: (1) Science, (2) Evolution Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.
proportionate to effort • Novel hypotheses true at rate b = 0.1 Investigation • Always yields a positive (+) or negative (–) result • Power: W = Pr(+|T) • False positive rate is a function of power and effort: 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 power false positive rate e = 1 e = 10 e = 75 Hypothesis Selection 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 Probability of new study effort Communication • Novel positive results can always be published • Novel negative results unlikely to be published increased effort
oldest “dies.” 2. From another randomly selected group of size d, the lab with the highest accumulated payoff “reproduces,” transmitting its methods to its “offspring” with mutation Phase 2: Evolution Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.
1995 2005 2015 Stiatistical Power R2 = 0.00097 Statistical power to detect small effects in the social + behavioral sciences mean power = 0.24 (Smaldino & McElreath 2016) Szucs & Ioannidis 2017 100,000+ statistical tests from ~10,000 papers in psych, cog. neuro, and medical journals (2011-2014)
Labs replicate tests of previously published hypotheses at rate r - All replications are publishable, worth 50% prestige of novel finding (0.5 points) - Success replication boosts original authors’ prestige (+0.1 points) - Failed replication severely damages original authors’ prestige (–100 points)
to the degradation of research methods • Requires no fraud or ill intent, only that successful individuals transmit their methods • Changing individual behavior not enough — improving science requires institutional change • This is unlikely to be easy or happen quickly. But some promising changes are already happening