Paul Smaldino
November 16, 2017
360

# The Natural Selection of Bad Science

Presentation given to graduate students at UC Merced in the NSF training program (NRT) on Intelligent Adaptive Systems, Nov 16, 2017.

## Paul Smaldino

November 16, 2017

## Transcript

1. The Natural Selection of Bad Science
Paul E. Smaldino
Assistant Professor Cognitive & Information Sciences
Quantitative & Systems Biology
University of California, Merced

2. Counterpoint:
(Begley & Ellis 2012, Nature)
Oncology
47/53 ‘landmark’ studies
did not replicate
Neuroscience
Errors in popular
statistical methods imply
false positive rate of up to
70%
(Eklund et al. 2016, PNAS)
(Open Science Collaboration 2015,
Science)
Psychology
61/100 studies in top
journals failed to replicate
(p < .05)
Most ﬁelds?
(Baker 2016, Nature)

3. False facts are highly injurious to
the progress of science, for they
often endure long; but false
views, if supported by some
evidence, do little harm, for every
one takes a salutary pleasure in
proving their falseness; and when
this is done, one path towards
error is closed and the road to
truth is often at the same time
opened.
–Darwin 1871 The Descent of Man

4. Science as Signal Detection for Facts

5. How do we ﬁnd facts?
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

positive results
negative results
True (T)
False (T)
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F

6. Assume
Probability of false positive ﬁnding is 5%. Pr(+|F) = 0.05
Probability of true positive ﬁnding is 50%. Pr(+|T) = 0.50
Your test yields a positive result. What is the probability this
ﬁnding indicates a true hypothesis?
(Eddy 1982; Gigerenzer & Hoffrage 1995)

7. Pr(T|+) =
Pr(+|T) Pr(T)
Pr(+|T) Pr(T) + Pr(+|F) Pr(F)
Assume
Probability of false positive ﬁnding is 5%. Pr(+|F) = 0.05
Probability of true positive ﬁnding is 50%. Pr(+|T) = 0.50
Your test yields a positive result. What is the probability this
ﬁnding indicates a true hypothesis?

8. Pr(T|+) =
Pr(+|T) Pr(T)
Pr(+|T) Pr(T) + Pr(+|F) Pr(F)
Need base rate
Assume
Probability of false positive ﬁnding is 5%. Pr(+|F) = 0.05
Probability of true positive ﬁnding is 50%. Pr(+|T) = 0.50
Your test yields a positive result. What is the probability this
ﬁnding indicates a true hypothesis?

9. The Signal and the Noise
true false
base rate: proportion of hypotheses which are true

10. Positive
Negative
Pr(true|+) = 0.5
The Signal and the Noise

11. The Signal and the Noise
Positive
Negative

12. How do we ﬁnd facts?
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

positive results
negative results
True (T)
False (T)
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F

13. 1. Hypothesis Selection
Novel
hypotheses
Tested
hypotheses
A previously tested
hypothesis is selected
for replication with
probability r, otherwise
a novel (untested)
hypothesis is selected.
Novel hypotheses are
true with probability b.
1 – r r
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

3. Communication
Experimental results are communicated to
the scientiﬁc community with a probability that
depends upon both the experimental result
(+, –) and whether the hypothesis was novel
(N) or a replication (R). Communicated
results join the set of tested hypotheses.
Uncommunicated replications revert to their
prior status.
1 – C
N–
C
N–
positive results
negative results
1 – C
R+
C
R+
New result communicated
New result not communicated
1 – C
R–
C
R–
File drawer
novel
replic.
novel
replic.
True (T)
False (T)
KEY
Interior = true epistemic state
Exterior = experimental evidence
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

14. 1. Hypothesis Selection
Novel
hypotheses
Tested
hypotheses
A previously tested
hypothesis is selected
for replication with
probability r, otherwise
a novel (untested)
hypothesis is selected.
Novel hypotheses are
true with probability b.
1 – r r
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

3. Communication
Experimental results are communicated to
the scientiﬁc community with a probability that
depends upon both the experimental result
(+, –) and whether the hypothesis was novel
(N) or a replication (R). Communicated
results join the set of tested hypotheses.
Uncommunicated replications revert to their
prior status.
1 – C
N–
C
N–
positive results
negative results
1 – C
R+
C
R+
New result communicated
New result not communicated
1 – C
R–
C
R–
File drawer
novel
replic.
novel
replic.
True (T)
False (T)
KEY
Interior = true epistemic state
Exterior = experimental evidence
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

15. 1. Hypothesis Selection
Novel
hypotheses
Tested
hypotheses
A previously tested
hypothesis is selected
for replication with
probability r, otherwise
a novel (untested)
hypothesis is selected.
Novel hypotheses are
true with probability b.
1 – r r
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

3. Communication
Experimental results are communicated to
the scientiﬁc community with a probability that
depends upon both the experimental result
(+, –) and whether the hypothesis was novel
(N) or a replication (R). Communicated
results join the set of tested hypotheses.
Uncommunicated replications revert to their
prior status.
1 – C
N–
C
N–
positive results
negative results
1 – C
R+
C
R+
New result communicated
New result not communicated
1 – C
R–
C
R–
File drawer
novel
replic.
novel
replic.
True (T)
False (T)
KEY
Interior = true epistemic state
Exterior = experimental evidence
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

16. 1. Hypothesis Selection
Novel
hypotheses
Tested
hypotheses
A previously tested
hypothesis is selected
for replication with
probability r, otherwise
a novel (untested)
hypothesis is selected.
Novel hypotheses are
true with probability b.
1 – r r
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

3. Communication
Experimental results are communicated to
the scientiﬁc community with a probability that
depends upon both the experimental result
(+, –) and whether the hypothesis was novel
(N) or a replication (R). Communicated
results join the set of tested hypotheses.
Uncommunicated replications revert to their
prior status.
1 – C
N–
C
N–
positive results
negative results
1 – C
R+
C
R+
New result communicated
New result not communicated
1 – C
R–
C
R–
File drawer
novel
replic.
novel
replic.
True (T)
False (T)
KEY
Interior = true epistemic state
Exterior = experimental evidence
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

17. 1. Hypothesis Selection
Novel
hypotheses
Tested
hypotheses
A previously tested
hypothesis is selected
for replication with
probability r, otherwise
a novel (untested)
hypothesis is selected.
Novel hypotheses are
true with probability b.
1 – r r
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

3. Communication
Experimental results are communicated to
the scientiﬁc community with a probability that
depends upon both the experimental result
(+, –) and whether the hypothesis was novel
(N) or a replication (R). Communicated
results join the set of tested hypotheses.
Uncommunicated replications revert to their
prior status.
1 – C
N–
C
N–
positive results
negative results
1 – C
R+
C
R+
New result communicated
New result not communicated
1 – C
R–
C
R–
File drawer
novel
replic.
novel
replic.
True (T)
False (T)
KEY
Interior = true epistemic state
Exterior = experimental evidence
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

18. 1. Hypothesis Selection
Novel
hypotheses
Tested
hypotheses
A previously tested
hypothesis is selected
for replication with
probability r, otherwise
a novel (untested)
hypothesis is selected.
Novel hypotheses are
true with probability b.
1 – r r
2. Investigation
T
Real truth of hypothesis
Probability of result
1 – β α
β 1 – α
+

3. Communication
Experimental results are communicated to
the scientiﬁc community with a probability that
depends upon both the experimental result
(+, –) and whether the hypothesis was novel
(N) or a replication (R). Communicated
results join the set of tested hypotheses.
Uncommunicated replications revert to their
prior status.
1 – C
N–
C
N–
positive results
negative results
1 – C
R+
C
R+
New result communicated
New result not communicated
1 – C
R–
C
R–
File drawer
novel
replic.
novel
replic.
True (T)
False (T)
KEY
Interior = true epistemic state
Exterior = experimental evidence
Unknown
Positive (+)
Negative (–)
General case
General case (+ or –)
F
Recursions:
Solutions:
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

19. Proportion true hypotheses at different numbers of net positive ﬁndings

0.001 0.1 0.5
0
0.2
0.5
0.8
1
0 0.1 0.3 0.5
0
0.2
0.5
0.8
1
0.5 0.8 0.99
0
0.2
0.5
0.8
1
0.05 0.1 0.15 0.2
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0 0.25 0.5 0.75 1
0
0.2
0.5
0 0.25 0.5 0.75 1
0
0.2
0.5
Proportion true
Proportion true
base rate replication rate power false-positive rate
communicate neg. rep. communicate pos. rep. communicate neg. new
1
3
5
(a) (b) (c) (d)
(e) (f) (g)
5
5
5
5 5
5
Propo
communicate neg. rep. communicate pos. rep. communicate neg. new
Optimistic sc
Pessimistic scenario
3
3
3
3
3 3
0 0 0
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

20. 0.001 0.1 0.5
0
0.2
0.5
0.8
1
0 0.1 0.3 0.5
0
0.2
0.5
0.8
1
0.5 0.8 0.99
0
0.2
0.5
0.8
1
0.05 0.1 0.15 0.2
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0.8
1
0 0.25 0.5 0.75 1
0
0.2
0.5
0 0.25 0.5 0.75 1
0
0.2
0.5
0 0.25 0.5 0.75 1
0
0.2
0.5
Proportion true
Proportion true
base rate replication rate power false-positive rate
communicate neg. rep. communicate pos. rep. communicate neg. new
1
3
5
(a) (b) (c) (d)
(e) (f) (g)
5
5
5
5 5
5
Propo
communicate neg. rep. communicate pos. rep. communicate neg. new
Optimistic sc
Pessimistic scenario
3
3
3
3
3 3
0 0 0
Base rate and false-positive rate are the
most important factors in avoiding false facts
Proportion true hypotheses at different numbers of net positive ﬁndings
McElreath R & Smaldino PE (2015) Replication, communication, and the
population dynamics of scientiﬁc discovery. PLOS ONE 10(8):e0136088.

21. False facts more common when…
• Studies are underpowered
➡ False positives and ambiguous results
• Negative results aren’t published
➡ Lower information content of literature
• Misunderstanding of statistical techniques
➡ False positives and ambiguous results
• Surprising, easily understood results easiest to publish
➡ Lowering base rate
Why Isn’t Science Better?

22. Incentives not aligned with best practices
(Schillebeeckx et al. 2013, Nature Biotech.)
“Part of the problem is that no-one is
scientists are incentivised to be
productive and innovative.”
–Richard Horton
The Lancet
April 2015

23. (Brischoux & Angler 2015, Scientometrics)
Numbers of papers at hiring for
CNRS evolutionary biologists
More papers, more
co-authorship
botany zoology
ecology genetics
% single-authored papers
physics
fraction multi-author pubs
(Wardil & Hauert. 2015, Phys Rev E)
Successful scientists are
publishing more

24. (van Dijk et al. 2014, Curr Biol)
publications per year
average impact factor
Successful scientists are
publishing more

25. When a measure becomes a target,
it ceases to be a good measure.
Donald T. Campbell
1916–1996
The more any quantitative social
indicator is used for social decision-
making, the more subject it will be to
corruption pressures and the more
apt it will be to distort and corrupt
the social processes it is intended to
monitor.
–Campbell 1976

26. Publishing more can lead directly
to some kinds of success
(Franzoni et al. 2011, Science)
PLOS ONE \$984
PNAS \$3,513
Nature,
Science
\$43,783
Average amount paid to ﬁrst author in China
in 2016 (in USD)
(Quan et al. 2017, Aslib J Inform Manag)
Similar incentives in many other countries
• India
• Malaysia
• Korea
• Turkey
• Venezuela
• Chile

27. (Vinkers et al. 2015, BMJ)
Relative frequency in PubMed abstracts, 1975-2014

28. Such a system can (and does) incentivize cheaters…
…but does not require cheating to be damaging.

29. An evolutionary model of science
• Population of N labs
• Each lab has characteristic methodological power, Pr(+|T)
• Increasing power also increases false positives, unless effort is
exerted
• Effort increases the time between results
• Novel negative results tough to publish
• Labs that publish more are more likely to have their methods
“reproduced” in new labs
• Two phases: (1) Science, (2) Evolution
Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

30. Phase 1: Science
• New hypothesis tackled with probability
inversely proportionate to effort
• Novel hypotheses true at rate b = 0.1
Investigation
• Always yields a positive (+) or negative
(–) result
• Power: W = Pr(+|T)
• False positive rate is a function of power
and effort:
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
power
false positive rate
e = 1
e = 10
e = 75
Hypothesis Selection
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100
Probability of new study
effort
Communication
• Novel positive results can always be
published
• Novel negative results unlikely to be
published
increased
effort

31. 1. From a randomly selected group of
size d, the oldest “dies.”
2. From another randomly selected
group of size d, the lab with the
highest accumulated payoff
“reproduces,” transmitting its
methods to its “offspring” with
mutation
Phase 2: Evolution
Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

32. The natural selection of bad science
Power evolves, constant effort Effort evolves, constant power
Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

33. 2016
1967

34. 1962
1989
1990

35. 0
0.2
0.4
0.6
0.8
1
1955 1965 1975 1985 1995 2005 2015
Stiatistical Power
R2 = 0.00097
Statistical power to detect small effects in
the social + behavioral sciences
mean power = 0.24
(Smaldino & McElreath 2016)
Szucs & Ioannidis 2017
100,000+ statistical tests from
~10,000 papers in psych, cog.
neuro, and medical journals
(2011-2014)

36. Replication to the Rescue?
- Labs replicate tests of previously
published hypotheses at rate r
- All replications are publishable, worth
50% prestige of novel ﬁnding (0.5 points)
- Success replication boosts original
authors’ prestige (+0.1 points)
- Failed replication severely damages
original authors’ prestige (–100 points)

37. Replication ≠ Salvation
Smaldino PE & McElreath R (2016) The natural selection of bad science. Royal Society Open Science 3: 160384.

38. Take homes
• Incentives to boost quantitative metrics can lead to
• Requires no fraud or ill intent, only that successful
individuals transmit their methods
• Changing individual behavior not enough —
improving science requires institutional change
• This is unlikely to be easy or happen quickly. But
some promising changes are already happening

39. http://www.ascb.org/dora/
https://cos.io/