Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Survey of Technologies for Reproducing and Communicating Biomedical Analyses

A Survey of Technologies for Reproducing and Communicating Biomedical Analyses

Short talk at High-throughput Sequencing Computational Standards for Regulatory Sciences workshop.

Jeremy Goecks

March 16, 2017
Tweet

More Decks by Jeremy Goecks

Other Decks in Research

Transcript

  1. A Survey of Technologies for
    Reproducing and Communicating
    Biomedical Analyses
    Jeremy Goecks
    Assistant Professor, Computational Biology and Biomedical Engineering
    Oregon Health and Science University
    @jgoecks

    View full-size slide

  2. Challenges
    Reproducibility: can you and others rerun your
    analysis now and in the future?
    Communication
    ‣ Can others understand what you’ve done at many
    different levels?
    ‣ Can others extend your analysis to their own work?

    View full-size slide

  3. Technical Complexity Impedes
    Scientific and Regulatory Progress
    Operating System
    Analysis Tools Parameter Settings
    Input Data
    Pipelines/Workflows
    “High level”
    “Low level”
    BCOs
    }
    BCOs within this ecosystem
    • BCOs will be consumers of pipeline, tool, data, and
    parameters technologies
    • Some technologies will produce BCOs for use/resuse

    View full-size slide

  4. Research Objects
    “More than just PDFs”
    http://www.researchobject.org/
    Analysis Tools Parameter Settings
    Input Data
    Pipelines/Workflows

    View full-size slide

  5. Galaxy: Web-based analysis system
    https://galaxyproject.org
    Use a Web browser
    for large biomedical
    analyses on high-
    performance
    computing or the
    cloud
    ‣ datasets
    ‣ tools
    ‣ workflows
    ‣ visualizations
    Operating System
    Analysis Tools Parameter Settings
    Pipelines/Workflows

    View full-size slide

  6. Communication
    and Reuse

    View full-size slide

  7. (Goecks et al. Cancer Medicine, 2015)

    View full-size slide

  8. Common Workflow Language
    “Specification for describing analysis workflows
    and tools in a way that makes them portable and
    scalable across a variety of software and
    hardware environments"
    https://github.com/common-workflow-language/common-workflow-language
    Analysis Tools Parameter Settings
    Pipelines/Workflows

    View full-size slide

  9. GA4GH
    “Global standards
    and tools for the
    secure, privacy
    respecting and
    interoperable sharing
    of Genomic data”
    More than just data:
    workflows,
    containers, etc.
    http://ga4gh.org/
    Operating System
    Analysis Tools Parameter Settings
    Input Data Pipelines/Workflows

    View full-size slide

  10. Software containers
    http://www.zdnet.com/article/what-is-docker-and-why-is-it-so-darn-popular/
    Operating System
    https://www.docker.com/
    Analysis Tools

    View full-size slide

  11. Finding and Creating Containers
    Many repositories where general-purpose and bioinformatics-specific
    containers are available
    ‣ General: Dockerhub (https://hub.docker.com/)
    ‣ Bioinformatics: Dockstore (https://dockstore.org/), Biocontainers (https://
    github.com/BioContainers)
    Many tools for creating containers by simplifying software installation
    ‣ Conda/Bioconda (https://conda.io/docs/): “Package, dependency and
    environment management for any language: Python, R, Ruby, Lua, Scala,
    Java, Javascript, C/ C++, FORTRAN”
    ‣ Install software plus dependencies on many different systems
    Operating System
    Analysis Tools

    View full-size slide

  12. Questions
    How much detail should biocompute objects (BCOs) capture?
    What is the best way to ensure that BCOs can be easily used by non-technical
    users?
    Should execution platforms enforce intended BCO usage?
    What about clutter in the BCOs repository? How to search for and provide
    feedback on BCOs that yield good performance?
    What incentives can encourage sharing of BCOs?

    View full-size slide

  13. Thank you!

    [email protected]

    @jgoecks

    View full-size slide