Reproducible
Phylogenomics
Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter
software: http://hulluni-bioinformatics.github.io/ReproPhylo
[email protected]
ReproPhylo
reproducible phylogenomics environment
evohull.org
@davelunt davelunt.net
+davelunt
Slide 2
Slide 2 text
1. Does not scale
Whats wrong with
phylogenomics now?
0. Rarely reproducible
2. Is not experimental
Slide 3
Slide 3 text
Many reproducibility
challenges are solved
problems
Solved problems in computer science, and other
disciplines, do not always reach biology
well, almost
Slide 4
Slide 4 text
Lack of
reproducibility
is sociological
problem
not a new problem
unlikely to be solved by outlining best practice
a problem for most areas of science and non-science
an extensive problem
human nature costs and benefits
Slide 5
Slide 5 text
Reproducibility
makes your life
much easier
‘future you’ will reproduce your work
reproducibility gives you new experimental powers
we should highlight to users that
benefits to the user- carrot not stick
benefit
Slide 6
Slide 6 text
Frictionless
Reproducibility
Environments
happens in background, user
doesn’t have to remember/care to
behave reproducibly
we should aim for
“good science whether
you like it or not”
c/f computer backups
ease
ReproPhylo
Software: http://hulluni-bioinformatics.github.io/ReproPhylo
v1.0
Editable User Manual: http://goo.gl/aZeRXf
Open phylogenomics environment
Uses standards
Frictionless reproducibility
Platform independent
Slide 9
Slide 9 text
ReproPhylo
Software: http://hulluni-bioinformatics.github.io/ReproPhylo
Editable User Manual: http://goo.gl/aZeRXf
IPython notebook
Pickle
text reports
Slide 10
Slide 10 text
Sequences,
alignments &
metadata
pickle, git, explicit
code, Docker
html report, ms figures,
tables, Methods,
IPython notebook
usability
reproducibility
Slide 11
Slide 11 text
.zip files for Dryad and FigShare
Docker containers
pickled project
git
figures for manuscript
tables for supp info
Methods text
detailed html report
explicit python scripts
IPython notebooks
usability
re-usability
reproducibility
likely to be used
Slide 12
Slide 12 text
code
output
Exploratory Data Analysis
Slide 13
Slide 13 text
Exploratory Data Analysis
check this?
high GC
exploratory data analysis suggests experimental reuse with variation
= reproducibility
Slide 14
Slide 14 text
ReproPhylo
reproducible phylogenomics environment
v1.0
Challenge is to make reproducibility the norm
Target audience is not bioinformaticians
successful human interaction is essential component of reuse and reproducibility
usability
Slide 15
Slide 15 text
ReproPhylo
ReproPhylo is environment & approach
reproducibility leads to other advantages…..
promote experimental, exploratory &
hypothesis-testing phylogenomics
speed
inherently experimental
new ways of working? collaborative working
Slide 16
Slide 16 text
Reproducible
Phylogenomics
Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter
software: http://hulluni-bioinformatics.github.io/ReproPhylo
[email protected]
ReproPhylo
reproducible phylogenomics environment
evohull.org
@davelunt davelunt.net
+davelunt