Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reproducible Phylogenetics

96e8ca061c005a42d360459d366ec923?s=47 Dave Lunt
January 08, 2015

Reproducible Phylogenetics

Given at 48th Population Genetics Group in Sheffield Jan 2015

96e8ca061c005a42d360459d366ec923?s=128

Dave Lunt

January 08, 2015
Tweet

Transcript

  1. Reproducible Phylogenetics Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter

    slides available: speakerdeck.com/davelunt software: http://hulluni-bioinformatics.github.io/ReproPhylo
  2. How can I do this? Reproducible Phylogenetics talk outline in

    questions Whats wrong with phylogenetics now? What are advantages of reproducibility to me?
  3. Genomics is going to break it Whats wrong with phylogenetics

    now? Lack of reproducibility is a problem We don’t take advantage of computing environment advances my view is I’ll explain……
  4. Phylogenetics is everywhere

  5. Phylogenetics is everywhere Pubmed has ~100,000 articles with phylog* in

    title / abstract in >700 different journals
  6. Phylogenetics is everywhere We are in the new age of

    phylogenomics a scale of data we are badly prepared to analyse
  7. We are in the new age of phylogenomics a scale

    of data we are badly prepared to analyse Algorithm bottlenecks Human bottlenecks
  8. What is reproducible phylogenetics and why is it important?

  9. Can you get their data?

  10. Can you get their data? Alignments as well as raw

    data?
  11. Can you get their data? Trees? ((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700, seal: 12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201, weasel:

    18.87953):2.09460):3.87382,dog:25.46154);
  12. Can you get their data? Do alignments & trees share

    taxon names with figures?
  13. Can you get their software?

  14. Can you get their software? Do you know which version

    they used?
  15. Can you get their software? Do you know the parameters

    they ran?
  16. Can you exactly reproduce the figures from their paper? or

    are figures just pictures of results rather than results
  17. If you can’t reproduce the work is it science? Science

    is iterative, building on previous work
  18. “If I have seen further it is by standing on

    the shoulders of giants” Isaac Newton
  19. None
  20. Reproducibility is a very hot topic in bioinformatics but has

    had little influence on phylogenetics
  21. It is likely to be compulsory in the future Reproducibility

    is the right thing to do
  22. What are the advantages of reproducibility to me?

  23. Reproducibility will make your life much easier The rest of

    the talk looks at advantages to you
  24. Reproducibility will make your life much easier Who will replicate

    your analysis?
  25. Reproducibility will make your life much easier Who will replicate

    your analysis? Future you!
  26. Reproducibility will make your life much easier Hinders reproducibility Does

    not scale manual data processing is ‘old phylogenetics’ widespread programmatic approaches are required
  27. Reproducibility will make your life much easier Current phylogenetics is

    not experimental How often have you tested the effect of Clustal parameter choices?
  28. Reproducible scripted ‘pipelines’ are inherently experimental experimental phylogenetics

  29. Reproducibility leads to experimental phylogenetics support gap trimming ‘relaxedness’ a

    synthetic example: tree replicates built from alignments constructed with 10 different alignment parameters
  30. What is minimally required for reproducibility?

  31. What is minimally required for reproducibility? Should we really be

    aiming for minimal? archive it all
  32. Computational pipelines make complete reproducibility as easy as minimal reproducibility

    Only human users are concerned with minimal reproducibility
  33. Computational pipelines make this trivial All these things are done

    automatically “Frictionless” reproducibility How do I do this? Reproducible phylogenetics All these challenges are solved-problems for computer scientists
  34. ReproPhylo reproducible phylogenetics environment v1.0

  35. • Open phylogenetics environment • Uses standards • Frictionless reproducibility

    • Platform independent • Fast ReproPhylo Software: http://hulluni-bioinformatics.github.io/ReproPhylo v1.0 Users welcome! Manual: http://goo.gl/aZeRXf
  36. ReproPhylo is an environment and approach not phylogenetic tree building

    software GenBank sequences and metadata Your sequences, alignments, trees Your metadata
  37. Automatic archiving of ALL Text report of all actions, analyses

    and results trees, alignments, sequences, metadata, provenance, methods & journal friendly zip files html electronic lab notebook automatically written, ease to browse Copy and paste Methods section for journals ReproPhylo is an environment and approach not phylogenetic tree building software
  38. ReproPhylo runs in user- friendly IPython notebook Analysis pipelines provided

    Edit to specify your data, and modify any parameters you wish, then run, inspect, repeat
  39. ReproPhylo runs in user- friendly IPython notebook Mixture of user

    manual & analysis framework change a parameter and hit Run
  40. code output Exploratory Data Analysis example

  41. Exploratory Data Analysis check this? real data Dunn et al

    2008 doi:10.1038/nature06614
  42. Exploratory Data Analysis Dunn et al 2008 doi:10.1038/nature06614 real data

  43. Meta data is retained tree can be labelled, or stat

    test done, with any data that can be harvested from original genbank file (or any other associated data file) sponge tree with morphological annotations at tips
  44. Electronic lab book Pipeline writes a human-readable text/html file documenting

    the experiment and outcomes including Methods section Data provenance and version control included Easy archiving for journal submission
  45. ReproPhylo writes very extensive Results automatically alignment statistics

  46. Allows experimental hypothesis- testing phylogenomics ReproPhylo opens new doors ReproPhylo

    ReproPhylo is environment & approach not tree building algorithm more than reproducibility
  47. ReproPhylo and molecular evolution Similar approach gives reproducible, comparative evolutionary

    genomics Amir Szitenberg Comparative genomics of transposon evolution Friday 11.20
  48. Reproducible Phylogenetics Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter

    ReproPhylo slides available: speakerdeck.com/davelunt software: http://hulluni-bioinformatics.github.io/ReproPhylo