Slide 1

Slide 1 text

Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment March 23th, 2017 SIG BIO 49@Japan Advanced Institute of Science and Technology Kento Aoyama1,2, Yuki Yamamoto1,2, Masahito Ohue1,3, Yutaka Akiyama1,2,3 1) Department of Computer Science, School of Computing Tokyo Institute of Technology 2) Education Academy of Computational Life Sciences (ACLS) Tokyo Institute of Technology 3) Advanced Computational Drug Discovery Unit, Institute of Innovative Research Tokyo Institute of Technology

Slide 2

Slide 2 text

“Docker” 2 https://www.docker.com/what-container No. of pulled containers from DockerHub

Slide 3

Slide 3 text

Docker and Bioinformatics 3 A. Paolo, D. Tommaso, A. B. Ramirez, E. Palumbo, C. Notredame, and D. Gruber, “Benchmark Report : Univa Grid Engine , Nextflow , and Docker for running Genomic Analysis Workflows.” Docker Integration Benchmark Report @Centre for Genomic Regulation (Barcelona, Spain) • Univa Grid Engine (Job Scheduler) • Nextflow (Workflow manager) • Docker (Linux Container) • Reproducibility • Portability

Slide 4

Slide 4 text

To develop the Container-Native HPC Bioinformatics Application Using Linux Container which has … • Low Dependency on Environment • High-Performance • Parallel execution performance • Overhead of virtualization • Dynamically Scaling Research Purpose 4

Slide 5

Slide 5 text

• To evaluate the Performance of Docker Container-Virtualization in Bioinformatics Application Target Application • MEGADOCK[1] • FFT-grid-based Protein-Protein Docking software • Multi-threading, Multi-node, Multi-GPU (OpenMP, MPI, GPU) • Extremely compute intensive workloads Today’s Report 5 [1] Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers”, Bioinformatics, 30(22): 3281-3283, 2014.

Slide 6

Slide 6 text

Background Linux Container Docker Container & Bioinformatics 6

Slide 7

Slide 7 text

Kernel-Shared Virtualization • Lightweight : small size, fast deploy, easy sharing • Performance : few virtualization overhead, faster than VM Linux Container 7 Hardware Linux Kernel Container App Bins/Libs Container App Bins/Libs Hardware Virtual Machine App Guest OS Bins/Libs Virtual Machine App Guest OS Bins/Libs Hypervisor Virtual Machines Containers

Slide 8

Slide 8 text

Linux Container • virtualizes the host resource as containers • Filesystem, hostname, IPC, PID, Network, User, etc. • can be used like Virtual Machines Linux Kernel Features • Containers are sharing same host kernel • namespace[1], chroot, cgroup, SELinux, etc. Container-based Virtualization 8 [1] E. W. Biederman. “Multiple instances of the global Linux namespaces.”, In Proceedings of the 2006 Ottawa Linux Symposium, 2006. Machine Linux Kernel Space Container Process Process Container Process Process

Slide 9

Slide 9 text

Linux Container – Performance [1] 9 [1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and Linux containers,” IEEE International Symposium on Performance Analysis of Systems and Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.) 0.96 1.00 0.98 0.78 0.83 0.99 0.82 0.98 0.00 0.20 0.40 0.60 0.80 1.00 PXZ [MB/s] Linpack [GFLOPS] Random Access [GUPS] Performance Ratio [based Native] Native Docker KVM KVM-tuned

Slide 10

Slide 10 text

Docker [1] • Most popular Linux Container management platform • Many useful components and services Linux Container Management Tools 10 [1] Solomon Hykes and others. “What is Docker?” - https://www.docker.com/what-docker [2] W. Bhimji, S. Canon, D. Jacobsen, L. Gerhardt, M. Mustafa, and J. Porter, “Shifter : Containers for HPC,” Cray User Group, pp. 1–12, 2016. [3] “Singularity” - http://singularity.lbl.gov/ [1] [2] [3]

Slide 11

Slide 11 text

Easy container sharing – Docker Hub 11 Portability & Reproducibility • Easy to share the application environment via Docker Hub • Containers can be executed on other host machine Ubuntu Docker Engine Container App Bins/Libs Image App Bins/Libs Docker Hub Image App Bins/Libs Push Pull Dockerfile apt-get install … wget … … make CentOS Docker Engine Container App Bins/Libs Image App Bins/Libs Generate Share

Slide 12

Slide 12 text

AUFS (Advanced multi layered unification filesystem) [1] • Docker default filesystem as AUFS • Layers can be reused in other container image • AUFS helps software Reproducibility Docker - Filesystem 12 [1] Advanced multi layered unification filesystem. http://aufs.sourceforge.net, 2014. Docker Container (image) f49eec89601e 129.5 MB ubuntu:16.04 (base image) 366a03547595 39.85 MB ef122501292c 133.6 MB e50c89716342 660.4 KB tag: beta tag: version-1.0 tag: version-1.0.2 tag: version-1.2 5aec9aa5462c 24.17 MB tag: latest 0d3cccd04bdb 6.07 MB

Slide 13

Slide 13 text

Why in the field of Bioinformatics? • Types of Applications • Data Analysis, Machine Learning • MD Simulation, Docking calc. , etc. • Data-centric workload • Compute : Large • Data I/O : Case by case • Communication : Small • Container performs well on compute-Intensive workload[1] For Bioinformatics Apps : 1 13 [1] W. Felter, et al. “An updated performance comparison of virtual machines and Linux containers,” IEEE International Symposium on Performance Analysis of Systems and Software, pp.171-172, 2015.

Slide 14

Slide 14 text

Reproducibility • Different version of library can make different result • e.g.) Genomic analysis pipeline [Paolo, 2016] Container A’ Container A Container B Container A For Bioinformatics Apps : 2 14 Library A Application A Application B version >= 1.2 version < 1.1 Application A Library version 1.3 Result A’ Application A Library version 1.2 Result A conflict different result Dependency Isolation Application Reproducibility Dependency conflict • Different application can requires different version of same library

Slide 15

Slide 15 text

Performance • Few performance overhead Reproducibility • Dependency Isolation from other applications/libraries Portability, Generality • Sharing/Porting to other environment Features for Bioinformatics Apps 15 Features Native VM Container Performance Scalability Great Bad Good Reproducibility Bad Good Great Portability Generality Bad Great Great

Slide 16

Slide 16 text

Proposed Method 16

Slide 17

Slide 17 text

MEGADOCK 17 Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high- performance protein-protein docking software for heterogeneous supercomputers”, Bioinformatics, 30(22): 3281-3283, 2014. High-performance protein-protein interaction predictions • FFT-grid based docking software • Extremely compute-intensive • OpenMP/MPI/GPU support • Great HPC Performance

Slide 18

Slide 18 text

Container-based Application Distribution 18 Resource Resource MEGA DOCK Resource MEGA DOCK Add/Remove Container Resource MEGA DOCK Add/Remove Application Layer Compute Resource Layer • All application dependencies exist in the Container • Easy-to-test application • Easy-to-scale size of resources Test Environment Production Environment

Slide 19

Slide 19 text

Experiments 19

Slide 20

Slide 20 text

Experiment I Evaluate container virtualization overhead on Physical Machine • Physical Machine (single-node) + Docker • Physical Machine (single-node, GPU) + NVIDIA-Docker Experiment II Evaluate container virtualization overhead on Cloud Environment • Virtual Machines (multi-node) + Docker • Virtual Machines (multi-node, GPU) + NVIDIA-Docker Experiments 20

Slide 21

Slide 21 text

Measurement • megadock-gpu exec. time • time command (6 times, median) Dataset • 100 pair-pdb (KEGG pathway) Options (OpenMP, OpenMPI) • MPI : 12 threads / 4 MPI process / 1 node • GPU : 1 GPU / 1 process / 1 node Overview of Experiment I 21 Physical Machine MPI MPI MPI MPI Physical Machine Docker MPI MPI MPI MPI Physical Machine GPU MEGADOCK GPU Physical Machine NVIDIA Docker MEGADOCK GPU GPU (b) (a) (d) (c) Test Case Native Docker CPU (MPI) (a) (b) GPU (c) (d)

Slide 22

Slide 22 text

Hardware/Software Specification 22 Software Env. Physical Machine Docker NVIDIA Docker (GPU) OS (image) CentOS 7.2.1511 ubuntu:14.04 nvidia/cuda8.0-devel Linux Kernel 3.10.0 3.10.0 3.10.0 GCC 4.8.5 4.8.4 4.8.4 FFTW 3.3.5 3.3.5 3.3.5 OpenMPI 1.10.0 1.6.5 N/A Docker Engine 1.12.3 N/A N/A NVCC 8.0.44 N/A 8.0.44 NVIDIA Docker 1.0.0 rc.3 N/A N/A NVIDIA Driver 367.48 N/A 367.48 CPU Intel Xeon E5-1630, 3.7 [GHz] ×8 [core] Memory 32 [GB] Local SSD 128 [GB] GPU NVIDIA Tesla K40

Slide 23

Slide 23 text

Execution time 23 7353.80 1646.09 7850.57 1638.05 0 1500 3000 4500 6000 7500 9000 CPU (MPI) GPU Time [sec] Native Docker +6.32 % slower

Slide 24

Slide 24 text

Profile Result (CPU time) 24 Process native [sec] docker [sec] diff Ratio (all) FFT3D 7.40E+04 7.63E+04 +3.01% 76.84% MPIDP-Master 8010.98 8325.9 +3.78% 8.38% Create Voxel 3743.7 3993.29 +6.25% 4.02% FFT Convolution 3551.08 3576.43 +0.71% 3.60% Score Sort 2462.61 2459.7 -0.12% 2.48% Output Detail 2139.94 2225.96 +3.86% 2.24% Ligand Preparation 1035.51 1849.11 +44.00% 1.86% MPI_Barrier 236.95 231.05 -2.55% 0.23% MPI_Init 0.94 4.54 79.30% 0.00% … … … … …

Slide 25

Slide 25 text

(a) MEGADOCK-Azure[2] Measurement • megadock-dp exec. time • time command (3 times, median) Dataset • ZDOCK benchmark 1.0 [1] (59 * 59 = 3481 pairs) Options (OpenMP, OpenMPI) • MPI : 12 threads / 4 MPI process / 1 node All file input/output in Local SSD Overview of Experiment II-(a) 25 Virtual Machine MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI Master Process Worker Process (Other) [1] R. Chen, et al. “A protein-protein docking benchmark,” Proteins: Structure, Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003. [2] Masahito Ohue, et al. ”MEGADOCK-Azure: High-performance protein-protein interaction prediction system on Microsoft Azure HPC”, IIBMP2016.

Slide 26

Slide 26 text

(b) MEGADOCK + Docker on Microsoft Azure Measurement • megadock-dp exec. time • time command (3 times, median) Dataset • ZDOCK benchmark 1.0 (59 * 59 = 3481 pairs) Options (OpenMP, OpenMPI) • MPI : 12 threads / 4 MPI process / 1 node All file input/output in Local SSD Docker Swarm • All Containers in 1 overlay network Overview of Experiment II-(b) 26 Virtual Machine Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker Swarm (Docker Network) Master Process Worker Process (Other) [1] R. Chen, J. Mintseris, J. Janin, and Z. Weng, “A protein-protein docking benchmark,” Proteins: Structure, Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003.

Slide 27

Slide 27 text

VM Instance/Software Specification 27 Software Env. Virtual Machine Docker OS (image) SUSE Linux Enterprise Server 12 ubuntu:14.04 Linux Kernel 3.12.43 3.12.43 GCC 4.8.3 4.8.4 FFTW 3.3.4 3.3.5 OpenMPI 1.10.2 1.6.5 Docker Engine 1.12.6 N/A VM Instance Standard_D14_v2 CPU Intel Xeon E5-2673, 2.40 [GHz] × 16 [core] Memory 112 [GB] Local SSD 800 [GB]

Slide 28

Slide 28 text

Execution time 28 145,534 25,515 13,132 6,006 4,098 117,219 25,145 12,331 6,344 3,971 0 25,000 50,000 75,000 100,000 125,000 150,000 1 5 10 20 30 Time [sec] # of VMs VM Docker on VM May be a measurement mistake

Slide 29

Slide 29 text

Scalability (Strong Scaling, based VM=1) 29 0 5 10 15 20 25 30 35 40 45 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM VM=5 VM=1 VM=10 VM=20 VM=30 comparable scalability

Slide 30

Slide 30 text

Experiment I • MEGADOCK + Docker on Physical Machine showed 6.32% lower performance. • Docker can cause 0-4% compute-performance down[1] • Communications via Docker NAT (Network Address Translation) • MEGADOCK (GPU) + NVIDIA-Docker on Physical Machine showed comparable performance to native. • GPU calc. is independent from container virtualization • Container virtualization has few overhead on memory bandwidth Experiment II • MEGADOCK + Docker on Microsoft Azure performed comparable scalability. • Container virtualization overhead is smaller than other cloud environment factor Result & Discussion 30 [1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and Linux containers”, IEEE International Symposium on Performance Analysis of Systems and Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.)

Slide 31

Slide 31 text

• Performance overhead of Docker container-virtualization is small. • suitable for GPU-accelerated-App and Cloud Environment • Container-Virtualization can isolate application environment from host environment. • same container image can be used on various machines • Physical machine on local environment • Virtual machine on cloud environment • Docker is useful for computational research work Conclusion 31

Slide 32

Slide 32 text

Multi-Node & Multi-GPU Evaluation on Cloud • NVIDIA-Docker is not available on Docker Swarm mode • Kubernetes[1] officially support 1GPU/1node • (experimental-feature: multi-GPU support) Container-based Task Distribution • Web-Service-Application like container-based distribution • easy to scale computing resource • easy to extends multiple task (e.g. GHOST-MP, MEGADOCK) Future Work 32 [1] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and Kubernetes,” acmqueue, vol. 14, no. 1, p. 24, 2016.