Slide 23
Slide 23 text
OPINION
Opinion: Reproducible research can still be
wrong: Adopting a prevention approach
Jeffrey T. Leeka,1 and Roger D. Pengb
aAssociate Professor of Biostatistics and Oncology and bAssociate Professor of Biostatistics,
Johns Hopkins University, Baltimore, MD
Reproducibility—the ability to recompute
results—and replicability—the chances other
experimenters will achieve a consistent
result—are two foundational characteristics
of successful scientific research. Consistent
findings from independent investigators
are the primary means by which scientific
evidence accumulates for or against a hy-
pothesis. Yet, of late, there has been a crisis
of confidence among researchers worried
about the rate at which studies are either
reproducible or replicable. To maintain the
integrity of science research and the public’s
trust in science, the scientific community
must ensure reproducibility and replicability
by engaging in a more preventative ap-
proach that greatly expands data analysis
education and routinely uses software tools.
We define reproducibility as the ability to
recompute data analytic results given an
observed dataset and knowledge of the data
analysis pipeline. The replicability of a study
been some very public failings of reproduc-
ibility across a range of disciplines from can-
cer genomics (3) to economics (4), and the
data for many publications have not been
made publicly available, raising doubts
about the quality of data analyses. Popular
press articles have raised questions about the
reproducibility of all scientific research (5),
and the US Congress has convened hearings
focused on the transparency of scientific re-
search (6). The result is that much of the
scientific enterprise has been called into
question, putting funding and hard won sci-
entific truths at risk.
From a computational perspective, there
are three major components to a reproducible
and replicable study: (i) the raw data from
the experiment are available, (ii) the statisti-
cal code and documentation to reproduce the
analysis are available, and (iii) a correct data
analysis must be performed. Recent cultural
shifts in genomics and other areas have had
computational tools such as knitr, iPython
notebook, LONI, and Galaxy (8) have
simplified the process of distributing repro-
ducible data analyses.
Unfortunately, the mere reproducibility of
computational results is insufficient to ad-
dress the replication crisis because even a re-
producible analysis can suffer from many
problems—confounding from omitted varia-
bles, poor study design, missing data—that
threaten the validity and useful interpretation
of the results. Although improving the repro-
ducibility of research may increase the rate
at which flawed analyses are uncovered, as
recent high-profile examples have demon-
strated (4), it does not change the fact that
problematic research is conducted in the
first place.
The key question we want to answer when
seeing the results of any scientific study is
“Can I trust this data analysis?” If we think of
problematic data analysis as a disease, repro-
ducibility speeds diagnosis and treatment in
the form of screening and rejection of poor
data analyses by referees, editors, and other
scientists in the community (Fig. 1).
OPINION
education and routinely uses software tools.
We define reproducibility as the ability to
recompute data analytic results given an
observed dataset and knowledge of the data
analysis pipeline. The replicability of a study
is the chance that an independent experi-
ment targeting the same scientific question
will produce a consistent result (1). Con-
cerns among scientists about both have
gained significant traction recently due in
part to a statistical argument that suggested
most published scientific results may be false
positives (2). At the same time, there have
the experiment are available, (ii) the statisti-
cal code and documentation to reproduce the
analysis are available, and (iii) a correct data
analysis must be performed. Recent cultural
shifts in genomics and other areas have had
a positive impact on data and code availabil-
ity. Journals are starting to require data avail-
ability as a condition for publication (7), and
centralized databases such as the National
Center for Biotechnology Information’s
Gene Expression Omnibus are being cre-
ated for depositing data generated by pub-
licly funded scientific experiments. New
problematic data a
ducibility speeds d
the form of screen
data analyses by r
scientists in the co
This medicatio
quality relies on p
to make this diagn
is a tall order. Edi
medical and scie
the training and
evaluation of a da
is compounded b
and data analyse
ingly complex, th
journals continu
the demands on
are increasing. T
duced the efficac
tifying and cor
discoveries in the
cially, the medic
address the probl
We suggest that
to be considered
Fig. 1. Peer review and editor evaluation help treat poor data analysis. Education and evidence-based data analysis
can be thought of as preventative measures.
Author contributions: J.T.L.
1To whom correspondence
edu.
Any opinions, findings, con
pressed in this work are tho
reflect the views of the Na
www.pnas.org/cgi/doi/10.1073/pnas.1421412111 PNAS | February 10, 2015 |