The Natural Selection of Bad Science Paul E. Smaldino Assistant Professor Cognitive & Information Sciences Quantitative & Systems Biology University of California, Merced

Counterpoint: (Begley & Ellis 2012, Nature) Oncology 47/53 ‘landmark’ studies did not replicate Neuroscience Errors in popular statistical methods imply false positive rate of up to 70% (Eklund et al. 2016, PNAS) (Open Science Collaboration 2015, Science) Psychology 61/100 studies in top journals failed to replicate (p < .05) Most ﬁelds? (Baker 2016, Nature)

False facts are highly injurious to the progress of science, for they often endure long; but false views, if supported by some evidence, do little harm, for every one takes a salutary pleasure in proving their falseness; and when this is done, one path towards error is closed and the road to truth is often at the same time opened. –Darwin 1871 The Descent of Man

How do we ﬁnd facts? 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – positive results negative results True (T) False (T) Unknown Positive (+) Negative (–) General case General case (+ or –) F

Assume Probability of false positive ﬁnding is 5%. Pr(+|F) = 0.05 Probability of true positive ﬁnding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this ﬁnding indicates a true hypothesis? Most common answer: 0.95 (Eddy 1982; Gigerenzer & Hoffrage 1995)

Pr(T|+) = Pr(+|T) Pr(T) Pr(+|T) Pr(T) + Pr(+|F) Pr(F) Assume Probability of false positive ﬁnding is 5%. Pr(+|F) = 0.05 Probability of true positive ﬁnding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this ﬁnding indicates a true hypothesis?

Pr(T|+) = Pr(+|T) Pr(T) Pr(+|T) Pr(T) + Pr(+|F) Pr(F) Need base rate Assume Probability of false positive ﬁnding is 5%. Pr(+|F) = 0.05 Probability of true positive ﬁnding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this ﬁnding indicates a true hypothesis?

How do we ﬁnd facts? 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – positive results negative results True (T) False (T) Unknown Positive (+) Negative (–) General case General case (+ or –) F

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F Recursions: Solutions: McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

False facts more common when… • Studies are underpowered ➡ False positives and ambiguous results • Negative results aren’t published ➡ Lower information content of literature • Misunderstanding of statistical techniques ➡ False positives and ambiguous results • Surprising, easily understood results easiest to publish ➡ Lowering base rate Why Isn’t Science Better?

Incentives not aligned with best practices (Schillebeeckx et al. 2013, Nature Biotech.) “Part of the problem is that no-one is incentivised to be right. Instead, scientists are incentivised to be productive and innovative.” –Richard Horton The Lancet April 2015

When a measure becomes a target, it ceases to be a good measure. Donald T. Campbell 1916–1996 The more any quantitative social indicator is used for social decision- making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. –Campbell 1976

Publishing more can lead directly to some kinds of success (Franzoni et al. 2011, Science) PLOS ONE $984 PNAS $3,513 Nature, Science $43,783 Average amount paid to ﬁrst author in China in 2016 (in USD) (Quan et al. 2017, Aslib J Inform Manag) Similar incentives in many other countries • India • Malaysia • Korea • Turkey • Venezuela • Chile

An evolutionary model of science • Population of N labs • Each lab has characteristic methodological power, Pr(+|T) • Increasing power also increases false positives, unless effort is exerted • Effort increases the time between results • Novel negative results tough to publish • Labs that publish more are more likely to have their methods “reproduced” in new labs • Two phases: (1) Science, (2) Evolution Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

Phase 1: Science • New hypothesis tackled with probability inversely proportionate to effort • Novel hypotheses true at rate b = 0.1 Investigation • Always yields a positive (+) or negative (–) result • Power: W = Pr(+|T) • False positive rate is a function of power and effort: 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 power false positive rate e = 1 e = 10 e = 75 Hypothesis Selection 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 Probability of new study effort Communication • Novel positive results can always be published • Novel negative results unlikely to be published increased effort

1. From a randomly selected group of size d, the oldest “dies.” 2. From another randomly selected group of size d, the lab with the highest accumulated payoff “reproduces,” transmitting its methods to its “offspring” with mutation Phase 2: Evolution Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

The natural selection of bad science Power evolves, constant effort Effort evolves, constant power Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

0 0.2 0.4 0.6 0.8 1 1955 1965 1975 1985 1995 2005 2015 Stiatistical Power R2 = 0.00097 Statistical power to detect small effects in the social + behavioral sciences mean power = 0.24 (Smaldino & McElreath 2016) Szucs & Ioannidis 2017 100,000+ statistical tests from ~10,000 papers in psych, cog. neuro, and medical journals (2011-2014)

Replication to the Rescue? Adding replication to the model - Labs replicate tests of previously published hypotheses at rate r - All replications are publishable, worth 50% prestige of novel ﬁnding (0.5 points) - Success replication boosts original authors’ prestige (+0.1 points) - Failed replication severely damages original authors’ prestige (–100 points)

Take homes • Incentives to boost quantitative metrics can lead to the degradation of research methods • Requires no fraud or ill intent, only that successful individuals transmit their methods • Changing individual behavior not enough — improving science requires institutional change • This is unlikely to be easy or happen quickly. But some promising changes are already happening