Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sharing Reproducible Computations on Binder

Sharing Reproducible Computations on Binder

Presented at SDSS 2019:

Despite the motivation that many researchers have to make their computationational work reproducible, there are a number of barriers that make it challenging. Some of these are cultural and behavioral, for example the use of proprietary codes or a lack of documentation. Others challenges are more technical in nature, these include the overhead of installing software and managing a software environment. Many research groups are promoting a cultural shift to the open sharing of research code and data, for example sharing their work on GitHub, GitLab, or the Open Science Framework. However, there is no easy way for other researchers or interested users to interact with these analyses.

The Binder project aims to reduce these technical barriers by enabling researchers to share their computational workflows in an interactive computing environment that runs in the cloud. It is built on the Jupyter architecture; e.g. the BinderHub component builds upon JupyterHub for the specific use case of sharing standalone code repositories. BinderHub provides the core framework that deploys the Binder service in the cloud. It is scalable, can be deployed on your own cloud instances or hardware, and supports many of the common languages (e.g. Python, Julia and R) and interfaces (e.g. JupyterLab, RStudio) that are used in data science.

A researcher sharing their work specifies the computational environment, for example, in Python dependencies can be specified in a requirements.txt file, and provides their computational workflows, commonly in the form of Jupyter notebooks in a GitHub repository. Users and other researchers can then access and interactively run these computations on the cloud simply by following a URL. One free-to-use example of a BinderHub deployment is: https://mybinder.org.

In this presentation, we will provide an overview of the Binder project and demonstrate how researchers can share their computations through Binder.

Lindsey Heagy

May 30, 2019
Tweet

More Decks by Lindsey Heagy

Other Decks in Science

Transcript

  1. a community of people and an ecosystem of open tools

    and standards for interactive computing
  2. An article about computational science in a scientific publication is

    not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho WaveLab and Reproducible Research, 1995
  3. An article about computational science in a scientific publication is

    not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho WaveLab and Reproducible Research, 1995
  4. An article about computational science in a scientific publication is

    not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho WaveLab and Reproducible Research, 1995 (and a place to run the code?)
  5. • creates reproducible containers from repositories (repo2docker) • generates user

    sessions that serve these containers (JupyterHub) • provides an interface to create, share, and use these sessions (BinderHub) • demonstrates the above as a free public service/tech demo (mybinder.org)
  6. (why?) • in order to collaborate • to build on

    the work of others • for others to build upon your work • to make revisions to your paper when you hear back from reviewers in 8 months
  7. maintenance and sharing • version control • issue tracking •

    licensing • integrations with ◦ testing services ◦ documentation hosting ◦ ... GitHub GitLab Bitbucket
  8. Detection problem: • ~ 1/1000 proton over 4 km. •

    Sensitivity ~ 1e-21 • Milky Way: 1e+21m across!