Slide 1

Slide 1 text

ICSE’16 Technical Briefing, May 17th, Austin, TX Photo Credits: Astrid Westvang, https://flic.kr/p/pWJLCW Using Docker Containers to Improve Reproducibility in Software Engineering Research Jürgen Cito, Harald C. Gall

Slide 2

Slide 2 text

Jürgen Cito PhD @ UZH Harald Gall Prof @ UZH Software Evolution Cloud-based Software Engineering Human Factors in Software Engineering

Slide 3

Slide 3 text

This technical briefing… Conceptual, more abstract notion of Reproducibility in Research Concrete instructions to aid Reproducibility in Research

Slide 4

Slide 4 text

Reproducibility is the ability of an entire experiment or study to be duplicated, either by the same researcher or by someone else working independently. Reproducing an experiment is called replicating it. What is Reproducibility? No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process.

Slide 5

Slide 5 text

What is Reproducibility? Repeatability of a certain process in order to establish a fact or the conditions under which we are able to observe the same fact* A process to share methods, describe the environment, 
 in order to recreate results. * Mockus et al. “Experiences from replicating a case study to investigate reproducibility of software development.”

Slide 6

Slide 6 text

Scientific Process Experiment Design Data Collection Data Analysis Interpret Results Hypotheses Reproducibility

Slide 7

Slide 7 text

What is Reproducibility? Establishing facts
 > Steps (or method) to establish the fact
 > Sharing computational knowledge Controlling environment
 > Execution Environment
 > Dependencies Providing data
 > Ability to interpret data
 > Computational Analyses Low barriers to replicate
 > Comprehensible results
 > Ease of achieving replication

Slide 8

Slide 8 text

What is Reproducibility in SE Research? Algorithms / Computational Analyses Developed Tools or Prototypes Quantitative Evaluations + internal knowledge of the necessary process to derive/establish results


Slide 9

Slide 9 text

Artifact Evaluation & Replication Packages In Software Engineering Research:
 - FSE’15/16, MSR’15/16 In Programming Languages Research:
 - PLDI, POPL, OOPSLA

Slide 10

Slide 10 text

Current State of Sharing Artifacts in SE Researcher’s website

Slide 11

Slide 11 text

Case Study: ChangeDistiller https://bitbucket.org/sealuzh/tools-changedistiller Research Project

Slide 12

Slide 12 text

Why is reproducibility hard? Why does it fail?

Slide 13

Slide 13 text

Reasons for Failed Reproducibility (1/2) Here is a link to download the code for this paper. Good luck trying to download the only post-doc who knows how to run this thing* * Paraphrase of a tweet I cannot seem to find anymore

Slide 14

Slide 14 text

Source: Collberg et al., “Measuring Reproducibility in Computer Systems Research”, http://reproducibility.cs.arizona.edu/tr.pdf Reasons for Failed Reproducibility (2/2)

Slide 15

Slide 15 text

Case Study: ChangeDistiller https://bitbucket.org/sealuzh/tools-changedistiller Research Project - Developed 2006-2009
 How many Java versions have we passed?
 - Dependencies defined in a Maven pom file
 Are they all still available in the repository?
 - How does analysis in ChangeDistiller work?
 What is the entry point?

Slide 16

Slide 16 text

Challenges in Reproducibility > No standard way of describing experiments, environments, (derived) data, and workflows > No transparency in creating environments and the steps/methods to establish facts or recreate analysis 
 > Experimental nature of research code and ecosystems makes it often hard to build > Unresolved or undocumented dependencies > Infrastructure for storage and distribution

Slide 17

Slide 17 text

… to the rescue

Slide 18

Slide 18 text

Docker Containers to the rescue (1/2) What is Docker? Docker allows you to package an application with all of its dependencies into a standardized unit for software development* Containers consist of everything that enables software to run:
 > Code
 > Runtime
 > System Tools
 > System Libraries * https://www.docker.com/what-docker

Slide 19

Slide 19 text

Docker Containers to the rescue (2/2) What can Docker be for SE research? Docker allows you to package a 
 - Prototype
 - Proof-of-concept Implementation
 - Computational analysis or experiment 
 with all of its dependencies into a standardized unit for reproducible research * https://www.docker.com/what-docker

Slide 20

Slide 20 text

https://www.docker.com/what-is-docker Technical Overview / Virtual Machines vs Containers “Lightweight” VM
 > Container is an isolated process 
 (“chroot on steroids”)
 > Own process space
 > Own network interface
 > Feels like a VM
 > Share kernel with the host
 > Isolation through cgroups/namespaces https://blog.docker.com/2016/03/containers-are-not-vms/

Slide 21

Slide 21 text

Docker Engine Centralized runtime environment for containers Enables portability Sole dependency for Docker No Emulation layer (almost no performance impact) https://www.docker.com/products/docker-engine

Slide 22

Slide 22 text

Benefits of Docker Containers Fast instantiation (~1-3 seconds)
 Almost native performance
 Transparent build process
 Smaller Images
 Easy to build, share, and publish * https://www.docker.com/what-docker also compared to
 other container technology

Slide 23

Slide 23 text

Local Docker Workflow # Build redis from source # Make sure you have the redis source code checked out in # the same directory as this Dockerfile FROM ubuntu:12.04 MAINTAINER dockerfiles http:// dockerfiles.github.io RUN echo "deb http://archive.ubuntu.com/ ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build- essential libc6-dev tcl wget RUN wget http://download.redis.io/redis- stable.tar.gz -O - | tar -xvz # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis- server"] CMD ["--dir", "/redis-data"] Dockerfile build Image Docker Image Docker Container run

Slide 24

Slide 24 text

Terminology Dockerfile
 Declarative definition of an environment for producing an image
 
 Docker Image
 Immutable artifact built from a Dockerfile, has one to many layers. Docker Container
 Execution environment - Instantiation/running version of an image (can be parameterized) Docker Registry
 Public or private repository that stores allows for distribution of images
 (Docker Hub - https://hub.docker.com/ or CoreOS Quay - https://quay.io/)
 


Slide 25

Slide 25 text

Local Docker Workflow # Build redis from source # Make sure you have the redis source code checked out in # the same directory as this Dockerfile FROM ubuntu:12.04 MAINTAINER dockerfiles http:// dockerfiles.github.io RUN echo "deb http://archive.ubuntu.com/ ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build- essential libc6-dev tcl wget RUN wget http://download.redis.io/redis- stable.tar.gz -O - | tar -xvz # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis- server"] CMD ["--dir", "/redis-data"] Dockerfile build Image Docker Image Docker Container run

Slide 26

Slide 26 text

Dockerfile Definition of infrastructure and dependencies of a container through instructions # Build redis from source # Make sure you have the redis source code checked out in # the same directory as this Dockerfile FROM ubuntu:12.04 MAINTAINER dockerfiles RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build-essential libc6-dev tcl wget RUN wget http://download.redis.io/redis-stable.tar.gz -O - | tar -xvz # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) COPY redis.conf /var/www/redis.conf RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis-server"] CMD ["--dir", "/redis-data"] Dependencies Base Image Install Open Port Start Server Volume Base Image can be an OS (Ubuntu) or a different, existing image Runs commands as if you were typing them in the command line Copies local files from build context into container

Slide 27

Slide 27 text

Data Volumes A specially-designated directory within one or more containers that bypasses the Union File System*
 
 Volumes allow you to manage data within containers
 > Mount a host directory (dependency to the host filesystem)
 > Mount a data volume container (dependency to another container)
 > Mount a shared-storage volume (NFS, iSCSI, etc.) * https://docs.docker.com/engine/userguide/containers/dockervolumes/

Slide 28

Slide 28 text

Local Docker Workflow # Build redis from source # Make sure you have the redis source code checked out in # the same directory as this Dockerfile FROM ubuntu:12.04 MAINTAINER dockerfiles http:// dockerfiles.github.io RUN echo "deb http://archive.ubuntu.com/ ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build- essential libc6-dev tcl wget RUN wget http://download.redis.io/redis- stable.tar.gz -O - | tar -xvz # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis- server"] CMD ["--dir", "/redis-data"] Dockerfile build Image Docker Image Docker Container run

Slide 29

Slide 29 text

Dockerfile —> Image Definition of infrastructure and dependencies of a container through instructions docker build -t . Build Context containing all local dependencies and Dockerfile

Slide 30

Slide 30 text

Docker Images # docker images REPOSITORY TAG IMAGE ID CREATED SIZE mhart/alpine-node latest 2a15d8568f75 1 week ago 36.76 MB hakyll latest d575da1e730c 2 weeks ago 1.487 GB redis alpine 50405530a7e5 4 weeks ago 15.95 MB Lists all previously built images # docker rmi hakyll Untagged: hakyll:latest Deleted: sha256:3240943c9ea3f72db51… Deleted: sha256:a3aeefae0d4b8f61… Deleted: sha256:16a7ebd378002f1261… Removes image ‘hakyll’ and all its layers from disk

Slide 31

Slide 31 text

Local Docker Workflow # Build redis from source # Make sure you have the redis source code checked out in # the same directory as this Dockerfile FROM ubuntu:12.04 MAINTAINER dockerfiles http:// dockerfiles.github.io RUN echo "deb http://archive.ubuntu.com/ ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build- essential libc6-dev tcl wget RUN wget http://download.redis.io/redis- stable.tar.gz -O - | tar -xvz # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis- server"] CMD ["--dir", "/redis-data"] Dockerfile build Image Docker Image Docker Container run

Slide 32

Slide 32 text

Image —> Container docker run -d --name -p 80:5000 Run container in the background
 (d for daemon) https://docs.docker.com/engine/reference/run/ Give the container a unique name Port mapping
 First the exposed port (80) Second the port within the container (5000) Many more possibilities to run containers, see full reference here: (A typical example)

Slide 33

Slide 33 text

Container Management docker ps List all running containers docker ps -a List all containers (also stopped) docker stop Stop a running container docker rm Remove a stopped container

Slide 34

Slide 34 text

Container Debugging # docker run -ti --entrypoint=bash Start image with a different entrypoint # docker exec -ti bash Start an interactive shell into a running container # docker inspect Low-level information on a container or image

Slide 35

Slide 35 text

Docker Hub: Public Registry

Slide 36

Slide 36 text

Pulling Docker Images
 Getting started with existing images docker pull nginx:latest

Slide 37

Slide 37 text

Pulling Docker Images
 Getting started with existing images docker pull nginx:latest Reference to a Docker Image in the Docker Hub

Slide 38

Slide 38 text

Pulling Docker Images
 Getting started with existing images docker pull nginx:latest Images can have many “tags”

Slide 39

Slide 39 text

Pulling Docker Images
 Getting started with existing images docker pull nginx:latest Pulls an image from a Docker registry

Slide 40

Slide 40 text

Pushing Docker Images to a Registry
 Tag Image docker tag c6fdd6639541 /: Image Id 
 (retrieve through ) docker images

Slide 41

Slide 41 text

Pushing Docker Images to a Registry
 Push Image docker login --username= --email= docker push /:

Slide 42

Slide 42 text

Case Study: ChangeDistiller https://bitbucket.org/sealuzh/tools-changedistiller Research Project # Build redis from source # Make sure you have the redis source code checked out in # the same directory as this Dockerfile FROM ubuntu:12.04 MAINTAINER dockerfiles http:// dockerfiles.github.io RUN echo "deb http://archive.ubuntu.com/ ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build- essential libc6-dev tcl wget RUN wget http://download.redis.io/redis- stable.tar.gz -O - | tar -xvz # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis- server"] CMD ["--dir", "/redis-data"] Dockerfile build Image Docker Image Docker Container run

Slide 43

Slide 43 text

Recap: Challenges in Reproducibility > No standard way of describing experiments, environments, (derived) data, and workflows > No transparency in creating environments and the steps/methods to establish facts or recreate analysis
 
 > Experimental nature of research code and ecosystems makes it often hard to build > Unresolved or undocumented dependencies > Infrastructure for storage and distribution Dockerfile Docker Image Docker Container Registries (Docker Hub, Quay, …) Dockerfile

Slide 44

Slide 44 text

Limitations > Performance sensitivity 
 [Jimenez et al., The Role of Container Technology in Reproducible Computer Systems Research] > Proprietary Software and Dependencies > Non-Disclosure Agreements / Intellectual Property
 
 > Can we build the same artifact from the specification (Dockerfile) even in 10 years? [Suggestion: Version Pinning]

Slide 45

Slide 45 text

Conclusions > Containers enable a standard, fast, and easy way of describing experiments and environments > Helps your future self, reviewers, and other researchers to make use of your work

Slide 46

Slide 46 text

Using Docker Containers to Improve Reproducibility in SE Research Jürgen Cito , Harald Gall Photo Credits: Nan Palmero, https://flic.kr/p/nPLSpe @citostyle Slides: speakerdeck.com/citostyle Photo Credits: Astrid Westvang, https://flic.kr/p/pWJLCW