Performance evaluation of MEGADOCK protein-protein interaction prediction system implemented with distributed containers on a cloud computing environment

Performance evaluation of MEGADOCK protein-protein interaction prediction system implemented with
distributed containers on a cloud computing environment PDPTA’19 - The 25th International Conference on Parallel and Distributed Processing Techniques and Applications Kento Aoyama1,2, Yuki Yamamoto1, Masahito Ohue1, Yutaka Akiyama1 1 Dept. of Computer Science, School of Computing, Tokyo Institute of Technology 2 AIST – Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology July 29, 2019

Three important factors for computational science applications Because … •
mostly like data-processing, require huge memory I/O & File I/O • calculations strongly depend on data itself, load-imbalanced tasks • requires various dependent software/tools written in different languages • requires scientific reproducibility for the result itself Introduction 2 Reproducibility The application result can be reproducible by other researcher (Reproducibility) Performance The application can show good performance, and good parallel scalability (Performance) Portability The application can be easily available on other machine environment (Availability, Portability)

MEGADOCK [M. Ohue+ 2014] • A high performance Protein-Protein Interaction
predicting application • FFT-grid based docking approach (FFTW/CUFFT) • Hybrid parallelization using MPI/GPU/OpenMP • Available on GitHub: akiyamalab/MEGADOCK Related Projects: • MEGADOCK-Azure [M. Ohue+ 2017] • MEGADOCK 5.0 (WIP) Target Application of this Study 3 Performance Portability Performance Performance

• data-processing workload • task load imbalance • near-native performance,
better than VMs • complex software dependency, deployment cost • availability requirements • good portability by packaging application dependencies • Reproducibility of the application result • packaged applications contribute to the result reproducibility App. requirements in Computational Science Core Inspiration 4 Performance Portability Reproducibility The requirements in Computational Science field are matched to advantages of the container-based virtualization! Container-based Virtualization

Our Container Workflow over Multiple Environments 5 Official Docker image
MEGADOCK Docker image Singularity, Shifter, Sarus, etc. Docker Docker Local-PC Public Clouds HPC Systems build on DockerHub (e.g. ubuntu, centos, cuda) deploy on DockerHub / GitHub (e.g. akiyamalab/megadock:gpu) 2. System Architecture Design 3. Experiments and Perf. Results container runtime container image 1. Container-based Virtualization

A lightweight process-level virtualization • Kernel-shared virtualization • container process
runs on the same host kernel • process namespace is isolated from global namespace • Executable, portable application package • application dependencies are packaged • containers perform as a native process on the host kernel ✔ Advantages • good performance (near-native) • good portability, availability Concerns: • security, user isolation (vs VMs) • learning cost Container-based Virtualization 6 Hardware Linux Kernel Root File System (‘/’) App App App App Container Root Container Container Runtime

Linux Containers for HPC Env. 7 [G.M. Kurtzer+ 2017] [R.S.
Canon+ 2015] [L. Benedicic+ 2017] [S. Hykes, 2013] HPC Containers Docker Singularity Shifter/Sarus Privilege Model Root Daemon SUID/UserNS SUID Support for GPU nvidia-docker Yes Yes Support for MPI Yes Yes Yes Image compatibility for Docker - Yes Yes Image sharing service/ location Docker Hub, local file Singularity Hub, local file Local service / local repository Image format (general) .tar layers (OCIv1 compat.) .sif/.simg (OCIv1 compat.) squashfs Performance good good good

• Containers can isolate the software dependencies from the host
environment and enable easy to share own images to the others • Container registry (e.g. Docker Hub) helps user’s image portability • Container recipe (e.g. Dockerfile) improves application maintainability Portability of Docker Container 8 Docker Hub Image App Bins/Libs Push Pull Ubuntu Docker Engine Linux Kernel Container App Bins/Libs Image App Bins/Libs CentOS Docker Engine Linux Kernel Run Dockerfile apt-get install … wget … … make Generate Image App Bins/Libs

System Architecture 1. Conventional Straightforward Implementation 2. Proposal System Impl.
using Docker Containers 9

Conventional Straightforward Implementation • Clients deploy computational resources (VM, storage)
using custom VM images 1. Does not care about vendors lock-in issues (VM images, internal service, etc.) 2. Increasing of maintenance cost for custom VM images System Architecture on MS Azure (1/2) 10 Azure CLI Deploy VMs SSH Submit job Resource group VM (master) VM (worker) VM (worker) Virtual network Microsoft Azure Storage Public interface Client 😢 Bad MEGADOCK MPI comm.

Proposal System Impl. using Docker Containers • VMs are deployed
by CLI, but all VMs run Docker container for MEGADOCK calc. ( Container images are pulled from a public registry service) 1. No vendor lock-in: container images can run on other platforms 2. Small maintenance cost: only care about container images How about performance issues? System Architecture on MS Azure (2/2) 11 ✔ Good Question

Experiments and Performance Results Ex 1. Container Performance on Bare-metal
Node Ex 2. HPC Perf. of PPI Calc. on Microsoft Azure 12

Ex1: Experimental Setup 13 To measure the performance effects of
Docker container, we performed the MEGADOCK calculations in following conditions: a. MEGADOCK-MPI b. MEGADOCK-MPI + Docker container c. MEGADOCK-GPU d. MEGADOCK-GPU + Docker container Dataset 100 pairs of docking calc. (obtained from KEGG pathway [M. Kanehisa+ 1996]) Measurement median of execution time (3 runs) Bare-metal machine Process Process Process Process Bare-metal machine Docker Container Process Process Process Process Bare-metal machine Process GPU Bare-metal machine Docker Container Process GPU c) a) b) d)

Ex1: Hardware/Software Specification 14 Label Bare-metal Docker NVIDIA Docker (GPU)
OS (image) CentOS 7.2.1511 ubuntu:14.04 nvidia/cuda8.0-devel Linux Kernel 3.10.0 3.10.0 3.10.0 GCC 4.8.5 4.8.4 4.8.4 FFTW 3.3.5 3.3.5 3.3.5 OpenMPI 1.10.0 1.6.5 N/A Docker Engine 1.12.3 N/A N/A NVCC 8.0.44 N/A 8.0.44 NVIDIA Docker 1.0.0 rc.3 N/A N/A NVIDIA Driver 367.48 N/A 367.48 Bare-metal machine (local) CPU Intel Xeon E5-1630, 3.7 [GHz] ×8 [core] Memory 32 [GB] Local SSD 128 [GB] GPU NVIDIA Tesla K40

Ex1: Performance Result 15 ✔ MEGADOCK with GPU in the
Docker container (d) showed almost same performance when the bare-metal environment (c) ✔ MEGADOCK with MPI library in the Docker container (b) was appx. 6.3% slower than the bare-metal environment (a) 7,354 1,646 7,851 1,638 0 1500 3000 4500 6000 7500 9000 Time [sec] Bare-metal Docker (a) MEGADOCK-MPI (b) MEGADOCK-GPU + Docker (c) MEGADOCK-GPU (b) MEGADOCK-MPI + Docker ≤ +6.3% time ≈±0.5% time < x4.5 ~ x4.8 speed

16 Dataset • ZDOCK benchmark 1.0 [R. Chen+ 2003] (59×59
= 3481 pairs docking calc.) Measurement • median of 3 times docking calculation (‘time’ command) Runtime Configurations • OMP_NUM_THREADS=4 • 4 MPI process / 1 node • All file input/output are stored to data volume on local SSD Ex2: Experimental Setup

Ex2: Hardware/Software Specification 17 Label VM Docker on VM OS
(image) SUSE Linux Enterprise Server 12 ubuntu:14.04 Linux Kernel 3.12.43 3.12.43 GCC 4.8.3 4.8.4 FFTW 3.3.4 3.3.5 OpenMPI 1.10.2 1.6.5 Docker Engine 1.12.6 N/A VM Instance Standard_D14_v2 (MS Azure) CPU Intel Xeon E5-2673, 2.40 [GHz] × 16 [core] Memory 112 [GB] Local SSD 800 [GB] GPU N/A

145,534 25,515 13,132 6,006 4,098 117,219 25,145 12,331 6,344 3,971
0 25,000 50,000 75,000 100,000 125,000 150,000 1 5 10 20 30 Time [sec] # of VMs (16 cores in 1 VM) VM Docker on VM ✔ The speed-ups were almost equivalent in this experiment Ex2: Performance Result 18 1.0 5.7 11.1 24.2 35.5 1.2 5.8 11.8 22.9 36.6 0 10 20 30 40 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM #VM1 #VM5 #VM10 #VM20 #VM30

• Both of ‘VM’ and ‘Docker on VM’ achieved a
good speed-up of up to 476 worker cores (30 VMs). • The MEGADOCK execution load is mainly composed of independent 3D- fast Fourier transform (FFT) convolutions on each single node even in the MPI version such that it tends to be a compute-intensive workload, not a data I/O or network-intensive. Ex2: Discussion 19 1.0 5.7 11.1 24.2 35.5 1.2 5.8 11.8 22.9 36.6 0 10 20 30 40 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM #VM1 #VM5 #VM10 #VM20 #VM30

The entire source code of MEGADOCK is open sourced with
GPL v3 License. Webpage • https://www.bi.cs.titech.ac.jp/megadock/ GitHub (akiyamalab/MEGADOCK) • https://github.com/akiyamalab/MEGADOCK • ’Dockerfile’ available • ‘Singularity Definition’ file will be available soon (WIP) Docker Hub (akiyamalab/megadock) • https://hub.docker.com/r/akiyamalab/megadock • akiyamalab/megadock:cpu • akiyamalab/megadock:gpu Code Availability 20

a. We did not perform multiple GPU nodes experiment in
this study. So now we are working in progress to demonstrate GPU+MPI parallelization with Singularity containers on multiple-HPC environments. b. In Expriment2, a data point (VM=1) particularly consumed time but the reason was unknown that affected the result of scalability. The reason should be investigated. c. Alternative approach for task distributions framework of MEGADOCK; MapReduce frameworks (e.g. Hadoop, Spark), container frameworks (e.g. Kubernetes, Mesosphere). • needed features: fault-tolerant, auto-recovery from failure, … Discussion 21

• GPU parallelization effectively accelerates MEGADOCK calc. on the both
cases of with/without container (c, d). • Small performance degradation of approximately +6.3% on the (b) Docker container + MEGADOCK-MPI. Conclusion (1/2) 22 Bare-metal machine Process Process Process Process Bare-metal machine Docker Container Process Process Process Process Bare-metal machine Process GPU Bare-metal machine Docker Container Process GPU c) a) b) d) > x4.5 speed > x4.8 speed ≤+6.3% time ≈±0.5% time

• Parallel performance of our system (Docker on VM) is
almost equivalent to direct VM calculation case (VM). • Out approach can be applied to other environments: e.g. Singularity + HPC environments Docker + laptop, cloud environments, local servers Conclusion (2/2) 23 1.0 5.7 11.1 24.2 35.5 1.2 5.8 11.8 22.9 36.6 0 10 20 30 40 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM #VM1 #VM5 #VM10 #VM20 #VM30

Performance evaluation of MEGADOCK protein-prot...

Performance evaluation of MEGADOCK protein-protein interaction prediction system implemented with distributed containers on a cloud computing environment

metaVariable

More Decks by metaVariable

Other Decks in Science

Featured

Transcript

Performance evaluation of MEGADOCK protein-protein interaction prediction system implemented with

Three important factors for computational science applications Because … •

MEGADOCK [M. Ohue+ 2014] • A high performance Protein-Protein Interaction

• data-processing workload • task load imbalance • near-native performance,

Our Container Workflow over Multiple Environments 5 Official Docker image

A lightweight process-level virtualization • Kernel-shared virtualization • container process

Linux Containers for HPC Env. 7 [G.M. Kurtzer+ 2017] [R.S.

• Containers can isolate the software dependencies from the host

System Architecture 1. Conventional Straightforward Implementation 2. Proposal System Impl.

Conventional Straightforward Implementation • Clients deploy computational resources (VM, storage)

Proposal System Impl. using Docker Containers • VMs are deployed

Experiments and Performance Results Ex 1. Container Performance on Bare-metal

Ex1: Experimental Setup 13 To measure the performance effects of

Ex1: Hardware/Software Specification 14 Label Bare-metal Docker NVIDIA Docker (GPU)

Ex1: Performance Result 15 ✔ MEGADOCK with GPU in the

16 Dataset • ZDOCK benchmark 1.0 [R. Chen+ 2003] (59×59

Ex2: Hardware/Software Specification 17 Label VM Docker on VM OS

145,534 25,515 13,132 6,006 4,098 117,219 25,145 12,331 6,344 3,971

• Both of ‘VM’ and ‘Docker on VM’ achieved a

The entire source code of MEGADOCK is open sourced with

a. We did not perform multiple GPU nodes experiment in

• GPU parallelization effectively accelerates MEGADOCK calc. on the both

• Parallel performance of our system (Docker on VM) is