The Natural Selection of Bad Science

The Natural Selection of Bad Science Paul E. Smaldino Assistant
Professor Cognitive & Information Sciences Quantitative & Systems Biology University of California, Merced

Counterpoint: (Begley & Ellis 2012, Nature) Oncology 47/53 ‘landmark’ studies
did not replicate Neuroscience Errors in popular statistical methods imply false positive rate of up to 70% (Eklund et al. 2016, PNAS) (Open Science Collaboration 2015, Science) Psychology 61/100 studies in top journals failed to replicate (p < .05) Most ﬁelds? (Baker 2016, Nature)

False facts are highly injurious to the progress of science,
for they often endure long; but false views, if supported by some evidence, do little harm, for every one takes a salutary pleasure in proving their falseness; and when this is done, one path towards error is closed and the road to truth is often at the same time opened. –Darwin 1871 The Descent of Man

Science as Signal Detection for Facts

How do we ﬁnd facts? 2. Investigation T Real truth
of hypothesis Probability of result 1 – β α β 1 – α + – positive results negative results True (T) False (T) Unknown Positive (+) Negative (–) General case General case (+ or –) F

Assume Probability of false positive finding is 5%. Pr(+|F) =
0.05 Probability of true positive finding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this finding indicates a true hypothesis? Most common answer: 0.95 (Eddy 1982; Gigerenzer & Hoffrage 1995)

Pr(T|+) = Pr(+|T) Pr(T) Pr(+|T) Pr(T) + Pr(+|F) Pr(F) Need
base rate Assume Probability of false positive finding is 5%. Pr(+|F) = 0.05 Probability of true positive finding is 50%. Pr(+|T) = 0.50 Your test yields a positive result. What is the probability this finding indicates a true hypothesis?

The Signal and the Noise true false base rate: proportion
of hypotheses which are true

Positive Negative Pr(true|+) = 0.5 The Signal and the Noise

The Signal and the Noise Positive Negative

How do we ﬁnd facts? 2. Investigation T Real truth
of hypothesis Probability of result 1 – β α β 1 – α + – positive results negative results True (T) False (T) Unknown Positive (+) Negative (–) General case General case (+ or –) F

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

1. Hypothesis Selection Novel hypotheses Tested hypotheses A previously tested
hypothesis is selected for replication with probability r, otherwise a novel (untested) hypothesis is selected. Novel hypotheses are true with probability b. 1 – r r 2. Investigation T Real truth of hypothesis Probability of result 1 – β α β 1 – α + – 3. Communication Experimental results are communicated to the scientiﬁc community with a probability that depends upon both the experimental result (+, –) and whether the hypothesis was novel (N) or a replication (R). Communicated results join the set of tested hypotheses. Uncommunicated replications revert to their prior status. 1 – C N– C N– positive results negative results 1 – C R+ C R+ New result communicated New result not communicated 1 – C R– C R– File drawer novel replic. novel replic. True (T) False (T) KEY Interior = true epistemic state Exterior = experimental evidence Unknown Positive (+) Negative (–) General case General case (+ or –) F Recursions: Solutions: McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

Proportion true hypotheses at different numbers of net positive ﬁndings
0.001 0.1 0.5 0 0.2 0.5 0.8 1 0 0.1 0.3 0.5 0 0.2 0.5 0.8 1 0.5 0.8 0.99 0 0.2 0.5 0.8 1 0.05 0.1 0.15 0.2 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0 0.25 0.5 0.75 1 0 0.2 0.5 0 0.25 0.5 0.75 1 0 0.2 0.5 Proportion true Proportion true base rate replication rate power false-positive rate communicate neg. rep. communicate pos. rep. communicate neg. new 1 3 5 (a) (b) (c) (d) (e) (f) (g) 5 5 5 5 5 5 Propo communicate neg. rep. communicate pos. rep. communicate neg. new Optimistic sc Pessimistic scenario 3 3 3 3 3 3 0 0 0 McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

0.001 0.1 0.5 0 0.2 0.5 0.8 1 0
0.1 0.3 0.5 0 0.2 0.5 0.8 1 0.5 0.8 0.99 0 0.2 0.5 0.8 1 0.05 0.1 0.15 0.2 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0.8 1 0 0.25 0.5 0.75 1 0 0.2 0.5 0 0.25 0.5 0.75 1 0 0.2 0.5 0 0.25 0.5 0.75 1 0 0.2 0.5 Proportion true Proportion true base rate replication rate power false-positive rate communicate neg. rep. communicate pos. rep. communicate neg. new 1 3 5 (a) (b) (c) (d) (e) (f) (g) 5 5 5 5 5 5 Propo communicate neg. rep. communicate pos. rep. communicate neg. new Optimistic sc Pessimistic scenario 3 3 3 3 3 3 0 0 0 Base rate and false-positive rate are the most important factors in avoiding false facts Proportion true hypotheses at different numbers of net positive ﬁndings McElreath R & Smaldino PE (2015) Replication, communication, and the population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

False facts more common when… • Studies are underpowered ➡
False positives and ambiguous results • Negative results aren’t published ➡ Lower information content of literature • Misunderstanding of statistical techniques ➡ False positives and ambiguous results • Surprising, easily understood results easiest to publish ➡ Lowering base rate Why Isn’t Science Better?

Incentives not aligned with best practices (Schillebeeckx et al. 2013,
Nature Biotech.) “Part of the problem is that no-one is incentivised to be right. Instead, scientists are incentivised to be productive and innovative.” –Richard Horton The Lancet April 2015

(Brischoux & Angler 2015, Scientometrics) Numbers of papers at hiring
for CNRS evolutionary biologists More papers, more co-authorship (Nabout et al. 2015, Scientometrics) botany zoology ecology genetics % single-authored papers physics fraction multi-author pubs (Wardil & Hauert. 2015, Phys Rev E) Successful scientists are publishing more

(van Dijk et al. 2014, Curr Biol) publications per year
average impact factor Successful scientists are publishing more

When a measure becomes a target, it ceases to be
a good measure. Donald T. Campbell 1916–1996 The more any quantitative social indicator is used for social decision- making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. –Campbell 1976

Publishing more can lead directly to some kinds of success
(Franzoni et al. 2011, Science) PLOS ONE $984 PNAS $3,513 Nature, Science $43,783 Average amount paid to ﬁrst author in China in 2016 (in USD) (Quan et al. 2017, Aslib J Inform Manag) Similar incentives in many other countries • India • Malaysia • Korea • Turkey • Venezuela • Chile

(Vinkers et al. 2015, BMJ) Relative frequency in PubMed abstracts,
1975-2014

Such a system can (and does) incentivize cheaters… …but does
not require cheating to be damaging.

An evolutionary model of science • Population of N labs
• Each lab has characteristic methodological power, Pr(+|T) • Increasing power also increases false positives, unless effort is exerted • Effort increases the time between results • Novel negative results tough to publish • Labs that publish more are more likely to have their methods “reproduced” in new labs • Two phases: (1) Science, (2) Evolution Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

Phase 1: Science • New hypothesis tackled with probability inversely
proportionate to effort • Novel hypotheses true at rate b = 0.1 Investigation • Always yields a positive (+) or negative (–) result • Power: W = Pr(+|T) • False positive rate is a function of power and effort: 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 power false positive rate e = 1 e = 10 e = 75 Hypothesis Selection 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 Probability of new study effort Communication • Novel positive results can always be published • Novel negative results unlikely to be published increased effort

1. From a randomly selected group of size d, the
oldest “dies.” 2. From another randomly selected group of size d, the lab with the highest accumulated payoff “reproduces,” transmitting its methods to its “offspring” with mutation Phase 2: Evolution Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

The natural selection of bad science Power evolves, constant effort
Effort evolves, constant power Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

2016 1967

1962 1989 1990

0 0.2 0.4 0.6 0.8 1 1955 1965 1975 1985
1995 2005 2015 Stiatistical Power R2 = 0.00097 Statistical power to detect small effects in the social + behavioral sciences mean power = 0.24 (Smaldino & McElreath 2016) Szucs & Ioannidis 2017 100,000+ statistical tests from ~10,000 papers in psych, cog. neuro, and medical journals (2011-2014)

Replication to the Rescue? Adding replication to the model -
Labs replicate tests of previously published hypotheses at rate r - All replications are publishable, worth 50% prestige of novel ﬁnding (0.5 points) - Success replication boosts original authors’ prestige (+0.1 points) - Failed replication severely damages original authors’ prestige (–100 points)

Replication ≠ Salvation Smaldino PE & McElreath R (2016) The
natural selection of bad science. Royal Society Open Science 3: 160384.

Take homes • Incentives to boost quantitative metrics can lead
to the degradation of research methods • Requires no fraud or ill intent, only that successful individuals transmit their methods • Changing individual behavior not enough — improving science requires institutional change • This is unlikely to be easy or happen quickly. But some promising changes are already happening

http://www.ascb.org/dora/ https://cos.io/ http://bulliedintobadscience.org/

Science is still awesome. “The ﬁrst principle is that you
must not fool yourself—and you are the easiest person to fool.”

The Natural Selection of Bad Science

The Natural Selection of Bad Science

Other Decks in Science

Featured

Transcript