Reproducible Phylogenetics Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter slides available: speakerdeck.com/davelunt software: http://hulluni-bioinformatics.github.io/ReproPhylo
How can I do this? Reproducible Phylogenetics talk outline in questions Whats wrong with phylogenetics now? What are advantages of reproducibility to me?
Genomics is going to break it Whats wrong with phylogenetics now? Lack of reproducibility is a problem We don’t take advantage of computing environment advances my view is I’ll explain……
Can you get their data? Trees? ((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700, seal: 12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201, weasel: 18.87953):2.09460):3.87382,dog:25.46154);
Reproducibility will make your life much easier Hinders reproducibility Does not scale manual data processing is ‘old phylogenetics’ widespread programmatic approaches are required
Reproducibility will make your life much easier Current phylogenetics is not experimental How often have you tested the effect of Clustal parameter choices?
Reproducibility leads to experimental phylogenetics support gap trimming ‘relaxedness’ a synthetic example: tree replicates built from alignments constructed with 10 different alignment parameters
Computational pipelines make this trivial All these things are done automatically “Frictionless” reproducibility How do I do this? Reproducible phylogenetics All these challenges are solved-problems for computer scientists
ReproPhylo is an environment and approach not phylogenetic tree building software GenBank sequences and metadata Your sequences, alignments, trees Your metadata
Automatic archiving of ALL Text report of all actions, analyses and results trees, alignments, sequences, metadata, provenance, methods & journal friendly zip files html electronic lab notebook automatically written, ease to browse Copy and paste Methods section for journals ReproPhylo is an environment and approach not phylogenetic tree building software
ReproPhylo runs in user- friendly IPython notebook Analysis pipelines provided Edit to specify your data, and modify any parameters you wish, then run, inspect, repeat
Meta data is retained tree can be labelled, or stat test done, with any data that can be harvested from original genbank file (or any other associated data file) sponge tree with morphological annotations at tips
Electronic lab book Pipeline writes a human-readable text/html file documenting the experiment and outcomes including Methods section Data provenance and version control included Easy archiving for journal submission
Allows experimental hypothesis- testing phylogenomics ReproPhylo opens new doors ReproPhylo ReproPhylo is environment & approach not tree building algorithm more than reproducibility
ReproPhylo and molecular evolution Similar approach gives reproducible, comparative evolutionary genomics Amir Szitenberg Comparative genomics of transposon evolution Friday 11.20
Reproducible Phylogenetics Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter ReproPhylo slides available: speakerdeck.com/davelunt software: http://hulluni-bioinformatics.github.io/ReproPhylo