Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Survey of Technologies for Reproducing and Co...

A Survey of Technologies for Reproducing and Communicating Biomedical Analyses

Short talk at High-throughput Sequencing Computational Standards for Regulatory Sciences workshop.

Jeremy Goecks

March 16, 2017
Tweet

More Decks by Jeremy Goecks

Other Decks in Research

Transcript

  1. A Survey of Technologies for Reproducing and Communicating Biomedical Analyses

    Jeremy Goecks Assistant Professor, Computational Biology and Biomedical Engineering Oregon Health and Science University @jgoecks
  2. Challenges Reproducibility: can you and others rerun your analysis now

    and in the future? Communication ‣ Can others understand what you’ve done at many different levels? ‣ Can others extend your analysis to their own work?
  3. Technical Complexity Impedes Scientific and Regulatory Progress Operating System Analysis

    Tools Parameter Settings Input Data Pipelines/Workflows “High level” “Low level” BCOs } BCOs within this ecosystem • BCOs will be consumers of pipeline, tool, data, and parameters technologies • Some technologies will produce BCOs for use/resuse
  4. Galaxy: Web-based analysis system https://galaxyproject.org Use a Web browser for

    large biomedical analyses on high- performance computing or the cloud ‣ datasets ‣ tools ‣ workflows ‣ visualizations Operating System Analysis Tools Parameter Settings Pipelines/Workflows
  5. Common Workflow Language “Specification for describing analysis workflows and tools

    in a way that makes them portable and scalable across a variety of software and hardware environments" https://github.com/common-workflow-language/common-workflow-language Analysis Tools Parameter Settings Pipelines/Workflows
  6. GA4GH “Global standards and tools for the secure, privacy respecting

    and interoperable sharing of Genomic data” More than just data: workflows, containers, etc. http://ga4gh.org/ Operating System Analysis Tools Parameter Settings Input Data Pipelines/Workflows
  7. Finding and Creating Containers Many repositories where general-purpose and bioinformatics-specific

    containers are available ‣ General: Dockerhub (https://hub.docker.com/) ‣ Bioinformatics: Dockstore (https://dockstore.org/), Biocontainers (https:// github.com/BioContainers) Many tools for creating containers by simplifying software installation ‣ Conda/Bioconda (https://conda.io/docs/): “Package, dependency and environment management for any language: Python, R, Ruby, Lua, Scala, Java, Javascript, C/ C++, FORTRAN” ‣ Install software plus dependencies on many different systems Operating System Analysis Tools
  8. Questions How much detail should biocompute objects (BCOs) capture? What

    is the best way to ensure that BCOs can be easily used by non-technical users? Should execution platforms enforce intended BCO usage? What about clutter in the BCOs repository? How to search for and provide feedback on BCOs that yield good performance? What incentives can encourage sharing of BCOs?