Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reproducible Phylogenetics

Dave Lunt
January 08, 2015

Reproducible Phylogenetics

Given at 48th Population Genetics Group in Sheffield Jan 2015

Dave Lunt

January 08, 2015
Tweet

More Decks by Dave Lunt

Other Decks in Science

Transcript

  1. Reproducible Phylogenetics Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter

    slides available: speakerdeck.com/davelunt software: http://hulluni-bioinformatics.github.io/ReproPhylo
  2. How can I do this? Reproducible Phylogenetics talk outline in

    questions Whats wrong with phylogenetics now? What are advantages of reproducibility to me?
  3. Genomics is going to break it Whats wrong with phylogenetics

    now? Lack of reproducibility is a problem We don’t take advantage of computing environment advances my view is I’ll explain……
  4. Phylogenetics is everywhere We are in the new age of

    phylogenomics a scale of data we are badly prepared to analyse
  5. We are in the new age of phylogenomics a scale

    of data we are badly prepared to analyse Algorithm bottlenecks Human bottlenecks
  6. Can you exactly reproduce the figures from their paper? or

    are figures just pictures of results rather than results
  7. If you can’t reproduce the work is it science? Science

    is iterative, building on previous work
  8. “If I have seen further it is by standing on

    the shoulders of giants” Isaac Newton
  9. Reproducibility is a very hot topic in bioinformatics but has

    had little influence on phylogenetics
  10. Reproducibility will make your life much easier Hinders reproducibility Does

    not scale manual data processing is ‘old phylogenetics’ widespread programmatic approaches are required
  11. Reproducibility will make your life much easier Current phylogenetics is

    not experimental How often have you tested the effect of Clustal parameter choices?
  12. Reproducibility leads to experimental phylogenetics support gap trimming ‘relaxedness’ a

    synthetic example: tree replicates built from alignments constructed with 10 different alignment parameters
  13. Computational pipelines make complete reproducibility as easy as minimal reproducibility

    Only human users are concerned with minimal reproducibility
  14. Computational pipelines make this trivial All these things are done

    automatically “Frictionless” reproducibility How do I do this? Reproducible phylogenetics All these challenges are solved-problems for computer scientists
  15. • Open phylogenetics environment • Uses standards • Frictionless reproducibility

    • Platform independent • Fast ReproPhylo Software: http://hulluni-bioinformatics.github.io/ReproPhylo v1.0 Users welcome! Manual: http://goo.gl/aZeRXf
  16. ReproPhylo is an environment and approach not phylogenetic tree building

    software GenBank sequences and metadata Your sequences, alignments, trees Your metadata
  17. Automatic archiving of ALL Text report of all actions, analyses

    and results trees, alignments, sequences, metadata, provenance, methods & journal friendly zip files html electronic lab notebook automatically written, ease to browse Copy and paste Methods section for journals ReproPhylo is an environment and approach not phylogenetic tree building software
  18. ReproPhylo runs in user- friendly IPython notebook Analysis pipelines provided

    Edit to specify your data, and modify any parameters you wish, then run, inspect, repeat
  19. ReproPhylo runs in user- friendly IPython notebook Mixture of user

    manual & analysis framework change a parameter and hit Run
  20. Meta data is retained tree can be labelled, or stat

    test done, with any data that can be harvested from original genbank file (or any other associated data file) sponge tree with morphological annotations at tips
  21. Electronic lab book Pipeline writes a human-readable text/html file documenting

    the experiment and outcomes including Methods section Data provenance and version control included Easy archiving for journal submission
  22. Allows experimental hypothesis- testing phylogenomics ReproPhylo opens new doors ReproPhylo

    ReproPhylo is environment & approach not tree building algorithm more than reproducibility
  23. ReproPhylo and molecular evolution Similar approach gives reproducible, comparative evolutionary

    genomics Amir Szitenberg Comparative genomics of transposon evolution Friday 11.20
  24. Reproducible Phylogenetics Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter

    ReproPhylo slides available: speakerdeck.com/davelunt software: http://hulluni-bioinformatics.github.io/ReproPhylo