what went wrong? transparency
The data/code weren’t reproducible
Slide 15
Slide 15 text
what went wrong? transparency
There was a lack of cooperation
Slide 16
Slide 16 text
what went wrong? expertise
They used silly prediction rules
(Pr(FEC)
=
5/8[Pr(F)
+
Pr(E)
+
Pr(C)]
–
¼)
Slide 17
Slide 17 text
what went wrong? expertise
They had study design problems
(Batch
effects)
Slide 18
Slide 18 text
what went wrong? expertise
Their predictions weren’t locked down
Today:
Pr(FEC)
=
0.8
Tomorrow:
Pr(FEC)
=
0.1
Slide 19
Slide 19 text
At the end of the day the Potti
analysis was fully reproducible
The problem is that the analysis
was wrong
Slide 20
Slide 20 text
1st
Discussion Point:
What is reproducibility?
Slide 21
Slide 21 text
The goal: a result that is
reproducible (the code and data
can be used to recreate the
results)
and replicable (you can perform
the experiment again and get the
same answer)
Slide 22
Slide 22 text
The goal: a result that is
reproducible (the code and data
can be used to recreate the
results)
and replicable (you can perform
the experiment again and get the
same answer)
Slide 23
Slide 23 text
Who
Reproduces
Research?
The
truth
is
A
I
don’t
care
The
truth
is
B
The
truth
is
not
A
Original
InvesRgator
Reproducers
The
truth
is
A
ScienRsts
General
Public
???
Slide courtesy R. Peng
Slide 24
Slide 24 text
hVps://github.com/jtleek/datasharing
Slide 25
Slide 25 text
2nd
Discussion Point:
Statistical modeling is only part of the process
Slide 26
Slide 26 text
What
is
Data
Analysis?
Raw
Data
Cleaning
/
ValidaRon
Pre-‐processing
Exploratory
data
analysis
StaRsRcal
model
development
SensiRvity
analysis
Finalize
results
/
report
StaRsRcs!
Slide courtesy R. Peng
Slide 27
Slide 27 text
3rd
Discussion Point:
Analysis is (often)
an afterthought
Slide 28
Slide 28 text
hVp://bit.ly/OgW3xv
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
4th
Discussion Point:
Traditional statistics & epidemiology ideas
still matter for big data
Slide 31
Slide 31 text
association between shoe size and literacy
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
No content
Slide 35
Slide 35 text
1. Reproducibility by data sharing
2. Big data is not just statistics
3. Analysis is often an afterthought
4. Traditional ideas still matter