Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ReproPhylo at Balti and Bioinformatics

Dave Lunt
January 21, 2015

ReproPhylo at Balti and Bioinformatics

Approaches to reproducibility in phylogenomics and ReproPhylo software

Dave Lunt

January 21, 2015
Tweet

More Decks by Dave Lunt

Other Decks in Science

Transcript

  1. Reproducible
    Phylogenomics
    Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter
    software: http://hulluni-bioinformatics.github.io/ReproPhylo
    [email protected]
    ReproPhylo
    reproducible phylogenomics environment
    evohull.org
    @davelunt davelunt.net
    +davelunt

    View Slide

  2. 1. Does not scale
    Whats wrong with
    phylogenomics now?
    0. Rarely reproducible
    2. Is not experimental

    View Slide

  3. Many reproducibility
    challenges are solved
    problems
    Solved problems in computer science, and other
    disciplines, do not always reach biology
    well, almost

    View Slide

  4. Lack of
    reproducibility
    is sociological
    problem
    not a new problem
    unlikely to be solved by outlining best practice
    a problem for most areas of science and non-science
    an extensive problem
    human nature costs and benefits

    View Slide

  5. Reproducibility
    makes your life
    much easier
    ‘future you’ will reproduce your work
    reproducibility gives you new experimental powers
    we should highlight to users that
    benefits to the user- carrot not stick
    benefit

    View Slide

  6. Frictionless
    Reproducibility
    Environments
    happens in background, user
    doesn’t have to remember/care to
    behave reproducibly
    we should aim for
    “good science whether
    you like it or not”
    c/f computer backups
    ease

    View Slide

  7. ReproPhylo
    reproducible phylogenomics environment
    v1.0
    http://hulluni-bioinformatics.github.io/ReproPhylo

    View Slide

  8. ReproPhylo
    Software: http://hulluni-bioinformatics.github.io/ReproPhylo
    v1.0
    Editable User Manual: http://goo.gl/aZeRXf
    Open phylogenomics environment
    Uses standards
    Frictionless reproducibility
    Platform independent

    View Slide

  9. ReproPhylo
    Software: http://hulluni-bioinformatics.github.io/ReproPhylo
    Editable User Manual: http://goo.gl/aZeRXf
    IPython notebook
    Pickle
    text reports

    View Slide

  10. Sequences,
    alignments &
    metadata
    pickle, git, explicit
    code, Docker
    html report, ms figures,
    tables, Methods,
    IPython notebook
    usability
    reproducibility

    View Slide

  11. .zip files for Dryad and FigShare
    Docker containers
    pickled project
    git
    figures for manuscript
    tables for supp info
    Methods text
    detailed html report
    explicit python scripts
    IPython notebooks
    usability
    re-usability
    reproducibility
    likely to be used

    View Slide

  12. code
    output
    Exploratory Data Analysis

    View Slide

  13. Exploratory Data Analysis
    check this?
    high GC
    exploratory data analysis suggests experimental reuse with variation
    = reproducibility

    View Slide

  14. ReproPhylo
    reproducible phylogenomics environment
    v1.0
    Challenge is to make reproducibility the norm
    Target audience is not bioinformaticians
    successful human interaction is essential component of reuse and reproducibility
    usability

    View Slide

  15. ReproPhylo
    ReproPhylo is environment & approach
    reproducibility leads to other advantages…..
    promote experimental, exploratory &
    hypothesis-testing phylogenomics
    speed
    inherently experimental
    new ways of working? collaborative working

    View Slide

  16. Reproducible
    Phylogenomics
    Dave Lunt, Amir Szitenberg, Max John, Mark Blaxter
    software: http://hulluni-bioinformatics.github.io/ReproPhylo
    [email protected]
    ReproPhylo
    reproducible phylogenomics environment
    evohull.org
    @davelunt davelunt.net
    +davelunt

    View Slide