$30 off During Our Annual Pro Sale. View Details »

An Empirical Analysis of the Docker Container Ecosystem on GitHub

An Empirical Analysis of the Docker Container Ecosystem on GitHub

Paper on the Docker container ecosystem on GitHub presented by Jürgen Cito and Gerald Schermann at the International Conference on Mining Software Repositories (MSR'17), co-located with ICSE'17, in Buenos Aires, Argentina

Jürgen Cito

May 21, 2017
Tweet

More Decks by Jürgen Cito

Other Decks in Research

Transcript

  1. International Conference on Mining Software Repositories (MSR’17), Buenos Aires Photo

    Credits: Astrid Westvang, https://flic.kr/p/pWJLCW An Empirical Analysis of the Docker Container Ecosystem on GitHub Jürgen Cito, Gerald Schermann, E. Wittern, P. Leitner, S. Zumberi, H. C. Gall @citostyle @sh3llcat
  2. 2 What are Docker Containers? Docker containers allow you to

    package an application with all of its dependencies into a standardized unit for software development* Containers consist of everything that enables software to run:
 > Code
 > Runtime
 > System Tools
 > System Libraries * https://www.docker.com/what-docker
  3. 3 Containers power infrastructure of modern deployments
 Docker de-facto standard

    of containers in industry
 
 
 
 Motivation Current State (December 2016) 38079 unique GitHub projects containing Dockerfile
 70197 unique Dockerfiles
  4. 4 Container Workflow # Build redis from source # Make

    sure you have the redis source code checked out in # the same directory as this Dockerfile FROM ubuntu:12.04 MAINTAINER dockerfiles http:// dockerfiles.github.io RUN echo "deb http://archive.ubuntu.com/ ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build- essential libc6-dev tcl wget RUN wget http://download.redis.io/redis- stable.tar.gz -O - | tar -xvz # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis- server"] CMD ["--dir", "/redis-data"] Dockerfile build Image Docker Image Docker Container run
  5. Dockerfile (Overview) 5 FROM ubuntu:12.04 MAINTAINER John Doe RUN echo

    "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build-essential libc6-dev tcl wget RUN sudo -E pip install scipy:0.18.1 # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) ADD redis.conf /var/www/redis.conf RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis-server"] CMD ["--dir", "/redis-data"] Dependencies Base Image Install Open Port Start Server Volume Base Image can be an OS (Ubuntu) or a different, existing image Runs commands as if you were typing them in the command line Copies local files from build context into container Defines the infrastructure and dependencies of a container through instructions
  6. Dockerfile (Quality Issues) 6 FROM ubuntu:12.04 MAINTAINER John Doe RUN

    echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/ sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y gcc make g++ build-essential libc6-dev tcl wget RUN sudo -E pip install scipy:0.18.1 # RUN tar -zvzf /redis/redis-stable.tar.gz RUN (cd /redis-stable && make) RUN (cd /redis-stable && make test) ADD redis.conf /var/www/redis.conf RUN mkdir -p /redis-data VOLUME ["/redis-data"] EXPOSE 6379 ENTRYPOINT ["/redis-stable/src/redis-server"] CMD ["--dir", "/redis-data"] Dependencies Base Image Install Open Port Start Server Volume Dockerfile Linter to check adherence to best practices Image Version Pinning missing 
 (DL3006,Dl3007) Version Pinning on Dependencies
 (DL3008,Dl3013) ADD instead of COPY
 (DL3032)
  7. 7 Relational model of Dockerfiles
 Tools to extract and compute

    structural changes 
 Reproducibility Package Empirical Analysis of Ecosystem Quality and Standards Compliance Evolution Study Overview / Contributions
  8. 8 Dockerfile + Quality Issues + Evolution Dockerfile Revision has

    1 n Rule Violation violates n 1 Instruction contains n 1 Diff before 1 1 after 1 1 Structured Change contains n 1 Change Type has n 1 Parameter Fine Grained Relational Model Dockerfile Linter
  9. 9 Dataset Dockerfile Revision has 1 n Rule Violation violates

    n 1 Instruction contains n 1 Diff before 1 1 after 1 1 Structured Change contains n 1 Change Type has n 1 Parameter All (70197) Dockerfiles on GitHub 218259 Revisions 1483763 changes 260829 violations
  10. 10 Docker Ecosystem on GitHub

  11. 11 Base Images ubuntu debian node centos python dockerfile/nodejs golang

    alpine java nginx ruby scratch php fedora busybox 0 5 10 15 20 25 % of Projects with Base Image Referenced in FROM Statements All Top−100 Top−1000
  12. 12 Base Images ubuntu debian node centos python dockerfile/nodejs golang

    alpine java nginx ruby scratch php fedora busybox 0 5 10 15 20 25 % of Projects with Base Image Referenced in FROM Statements All Top−100 Top−1000 OS Runtime Application ~60% ~30% ~5%
  13. 13 ubuntu debian node centos python dockerfile/nodejs golang alpine java

    nginx ruby scratch php fedora busybox 0 5 10 15 20 25 % of Projects with Base Image Referenced in FROM Statements All Top−100 Top−1000 Base Images & Sizes 125 MB 195 MB 4 MB
  14. 14 ubuntu debian node centos python dockerfile/nodejs golang alpine java

    nginx ruby scratch php fedora busybox 0 5 10 15 20 25 % of Projects with Base Image Referenced in FROM Statements All Top−100 Top−1000 Base Images & Sizes 125 MB 195 MB 4 MB Reduce Image Size Base Image Recommendation
  15. 15 Distribution of Instructions Instruction All Top-1000 Top-100 RUN 40%

    41% 48% COMMENT 16% 14% 15% ENV 6% 7% 9% FROM 7% 8% 7% ADD 6% 5% 2% CMD 4% 4% 3% COPY 3% 4% 3% EXPOSE 4% 4% 3% MAINTAINER 4% 4% 3% WORKDIR 3% 3% 3% ENTRYPOINT 2% 2% 1% VOLUME 2% 2% 1% USER 1% 1% 1%
  16. 16 Distribution of RUN Instructions Category Examples All Top-1000 Top-100

    Dependencies apt-get, yum, npm 45.2% 44.7% 45.2% File System mkdir, cd, cp, rm 30.4% 29.3% 29.4% Permissions chmod, chown 7.3% 5.2% 2.3% Build / Execute make, install 5.3% 8.3% 13.5% Environment set, export, source 0.6% 1.0% 0.2% Other 11.3% 11.5% 9.4% Abstraction for Dependencies
  17. 17 Quality of Dockerfiles

  18. Built random sample of 560 Dockerfiles (“docker build”) 3 iterations

    measured build outcome and duration 18 Quality of Dockerfiles - Build Analysis Failure 34% Success 66% Failed: 91 sec Successful: 146 sec Build duration (avg) Failed Java builds on TravisCI: 9.7 sec
  19. Built random sample of 560 Dockerfiles (“docker build”) 3 iterations

    measured build outcome and duration 19 Quality of Dockerfiles - Build Analysis Failure 34% Success 66% Failed: 91 sec Successful: 146 sec Build duration (avg) Failed Java builds on TravisCI: 9.7 sec Build Acceleration
  20. 20 Quality of Dockerfiles - Rule Violations Dockerfile Linter to

    check adherence to best practices https://github.com/lukasmartinelli/hadolint Version Pinning Image pip apt-get :latest FROM ubuntu:12.04 FROM ubuntu RUN pip install django RUN pip install django==1.9 RUN apt-get install python RUN apt-get install python=2.7 FROM ubuntu:latest FROM ubuntu:12.04 Copy vs. Add
  21. Quality of Dockerfiles - Rule Violations All Top-100 Top-1000 Build

    Failure Build Success Image 9.7% 4.3% 7.7% 14.9% 7.7% pip 2.0% 4.8% 2.2% 2.3% 1.9% apt-get 13.5% 14.5% 13.8% 11.4% 13.6% :latest 3.4% 3.2% 2.9% 3.1% 4.1% copy/add 12.8% 5.0% 12.2% 17.5% 13.6% Quality Check Integration 21
  22. 22 Conclusion Empirical Analysis of Ecosystem Quality and Standards Compliance

    Evolution Takeaway Messages Image Size Reduction Base Image Recommendation Abstraction for Dependencies Quality Check Integration Build Acceleration
  23. An Empirical Analysis on the Docker Container Ecosystem on Github

    Paper
 https://peerj.com/preprints/2905/ Online Appendix (Dataset, Scripts, Analyses, Plots)
 https://github.com/sealuzh/docker-ecosystem-paper Jürgen Cito, Gerald Schermann, Erik Wittern, Philipp Leitner, Sali Zumberi, Harald C. Gall @citostyle @sh3llcat Empirical Analysis of 
 Ecosystem, Quality and Standards Compliance, Evolution Tools
 docker-parser: https://github.com/sealuzh/dockerparser
 dockolution: https://github.com/sealuzh/dockolution