Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance evaluation of MEGADOCK protein-protein interaction prediction system implemented with distributed containers on a cloud computing environment

Performance evaluation of MEGADOCK protein-protein interaction prediction system implemented with distributed containers on a cloud computing environment

PDPTA’19 - The 25th International Conference on Parallel and Distributed Processing Techniques and Applications

metaVariable

July 29, 2019
Tweet

More Decks by metaVariable

Other Decks in Science

Transcript

  1. Performance evaluation of MEGADOCK protein-protein interaction prediction system implemented with

    distributed containers on a cloud computing environment PDPTA’19 - The 25th International Conference on Parallel and Distributed Processing Techniques and Applications Kento Aoyama1,2, Yuki Yamamoto1, Masahito Ohue1, Yutaka Akiyama1 1 Dept. of Computer Science, School of Computing, Tokyo Institute of Technology 2 AIST – Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology July 29, 2019
  2. Three important factors for computational science applications Because … •

    mostly like data-processing, require huge memory I/O & File I/O • calculations strongly depend on data itself, load-imbalanced tasks • requires various dependent software/tools written in different languages • requires scientific reproducibility for the result itself Introduction 2 Reproducibility The application result can be reproducible by other researcher (Reproducibility) Performance The application can show good performance, and good parallel scalability (Performance) Portability The application can be easily available on other machine environment (Availability, Portability)
  3. MEGADOCK [M. Ohue+ 2014] • A high performance Protein-Protein Interaction

    predicting application • FFT-grid based docking approach (FFTW/CUFFT) • Hybrid parallelization using MPI/GPU/OpenMP • Available on GitHub: akiyamalab/MEGADOCK Related Projects: • MEGADOCK-Azure [M. Ohue+ 2017] • MEGADOCK 5.0 (WIP) Target Application of this Study 3 Performance Portability Performance Performance
  4. • data-processing workload • task load imbalance • near-native performance,

    better than VMs • complex software dependency, deployment cost • availability requirements • good portability by packaging application dependencies • Reproducibility of the application result • packaged applications contribute to the result reproducibility App. requirements in Computational Science Core Inspiration 4 Performance Portability Reproducibility The requirements in Computational Science field are matched to advantages of the container-based virtualization! Container-based Virtualization
  5. Our Container Workflow over Multiple Environments 5 Official Docker image

    MEGADOCK Docker image Singularity, Shifter, Sarus, etc. Docker Docker Local-PC Public Clouds HPC Systems build on DockerHub (e.g. ubuntu, centos, cuda) deploy on DockerHub / GitHub (e.g. akiyamalab/megadock:gpu) 2. System Architecture Design 3. Experiments and Perf. Results container runtime container image 1. Container-based Virtualization
  6. A lightweight process-level virtualization • Kernel-shared virtualization • container process

    runs on the same host kernel • process namespace is isolated from global namespace • Executable, portable application package • application dependencies are packaged • containers perform as a native process on the host kernel ✔ Advantages • good performance (near-native) • good portability, availability Concerns: • security, user isolation (vs VMs) • learning cost Container-based Virtualization 6 Hardware Linux Kernel Root File System (‘/’) App App App App Container Root Container Container Runtime
  7. Linux Containers for HPC Env. 7 [G.M. Kurtzer+ 2017] [R.S.

    Canon+ 2015] [L. Benedicic+ 2017] [S. Hykes, 2013] HPC Containers Docker Singularity Shifter/Sarus Privilege Model Root Daemon SUID/UserNS SUID Support for GPU nvidia-docker Yes Yes Support for MPI Yes Yes Yes Image compatibility for Docker - Yes Yes Image sharing service/ location Docker Hub, local file Singularity Hub, local file Local service / local repository Image format (general) .tar layers (OCIv1 compat.) .sif/.simg (OCIv1 compat.) squashfs Performance good good good
  8. • Containers can isolate the software dependencies from the host

    environment and enable easy to share own images to the others • Container registry (e.g. Docker Hub) helps user’s image portability • Container recipe (e.g. Dockerfile) improves application maintainability Portability of Docker Container 8 Docker Hub Image App Bins/Libs Push Pull Ubuntu Docker Engine Linux Kernel Container App Bins/Libs Image App Bins/Libs CentOS Docker Engine Linux Kernel Run Dockerfile apt-get install … wget … … make Generate Image App Bins/Libs
  9. Conventional Straightforward Implementation • Clients deploy computational resources (VM, storage)

    using custom VM images 1. Does not care about vendors lock-in issues (VM images, internal service, etc.) 2. Increasing of maintenance cost for custom VM images System Architecture on MS Azure (1/2) 10 Azure CLI Deploy VMs SSH Submit job Resource group VM (master) VM (worker) VM (worker) Virtual network Microsoft Azure Storage Public interface Client 😢 Bad MEGADOCK MPI comm.
  10. Proposal System Impl. using Docker Containers • VMs are deployed

    by CLI, but all VMs run Docker container for MEGADOCK calc. ( Container images are pulled from a public registry service) 1. No vendor lock-in: container images can run on other platforms 2. Small maintenance cost: only care about container images How about performance issues? System Architecture on MS Azure (2/2) 11 ✔ Good Question
  11. Experiments and Performance Results Ex 1. Container Performance on Bare-metal

    Node Ex 2. HPC Perf. of PPI Calc. on Microsoft Azure 12
  12. Ex1: Experimental Setup 13 To measure the performance effects of

    Docker container, we performed the MEGADOCK calculations in following conditions: a. MEGADOCK-MPI b. MEGADOCK-MPI + Docker container c. MEGADOCK-GPU d. MEGADOCK-GPU + Docker container Dataset 100 pairs of docking calc. (obtained from KEGG pathway [M. Kanehisa+ 1996]) Measurement median of execution time (3 runs) Bare-metal machine Process Process Process Process Bare-metal machine Docker Container Process Process Process Process Bare-metal machine Process GPU Bare-metal machine Docker Container Process GPU c) a) b) d)
  13. Ex1: Hardware/Software Specification 14 Label Bare-metal Docker NVIDIA Docker (GPU)

    OS (image) CentOS 7.2.1511 ubuntu:14.04 nvidia/cuda8.0-devel Linux Kernel 3.10.0 3.10.0 3.10.0 GCC 4.8.5 4.8.4 4.8.4 FFTW 3.3.5 3.3.5 3.3.5 OpenMPI 1.10.0 1.6.5 N/A Docker Engine 1.12.3 N/A N/A NVCC 8.0.44 N/A 8.0.44 NVIDIA Docker 1.0.0 rc.3 N/A N/A NVIDIA Driver 367.48 N/A 367.48 Bare-metal machine (local) CPU Intel Xeon E5-1630, 3.7 [GHz] ×8 [core] Memory 32 [GB] Local SSD 128 [GB] GPU NVIDIA Tesla K40
  14. Ex1: Performance Result 15 ✔ MEGADOCK with GPU in the

    Docker container (d) showed almost same performance when the bare-metal environment (c) ✔ MEGADOCK with MPI library in the Docker container (b) was appx. 6.3% slower than the bare-metal environment (a) 7,354 1,646 7,851 1,638 0 1500 3000 4500 6000 7500 9000 Time [sec] Bare-metal Docker (a) MEGADOCK-MPI (b) MEGADOCK-GPU + Docker (c) MEGADOCK-GPU (b) MEGADOCK-MPI + Docker ≤ +6.3% time ≈±0.5% time < x4.5 ~ x4.8 speed
  15. 16 Dataset • ZDOCK benchmark 1.0 [R. Chen+ 2003] (59×59

    = 3481 pairs docking calc.) Measurement • median of 3 times docking calculation (‘time’ command) Runtime Configurations • OMP_NUM_THREADS=4 • 4 MPI process / 1 node • All file input/output are stored to data volume on local SSD Ex2: Experimental Setup
  16. Ex2: Hardware/Software Specification 17 Label VM Docker on VM OS

    (image) SUSE Linux Enterprise Server 12 ubuntu:14.04 Linux Kernel 3.12.43 3.12.43 GCC 4.8.3 4.8.4 FFTW 3.3.4 3.3.5 OpenMPI 1.10.2 1.6.5 Docker Engine 1.12.6 N/A VM Instance Standard_D14_v2 (MS Azure) CPU Intel Xeon E5-2673, 2.40 [GHz] × 16 [core] Memory 112 [GB] Local SSD 800 [GB] GPU N/A
  17. 145,534 25,515 13,132 6,006 4,098 117,219 25,145 12,331 6,344 3,971

    0 25,000 50,000 75,000 100,000 125,000 150,000 1 5 10 20 30 Time [sec] # of VMs (16 cores in 1 VM) VM Docker on VM ✔ The speed-ups were almost equivalent in this experiment Ex2: Performance Result 18 1.0 5.7 11.1 24.2 35.5 1.2 5.8 11.8 22.9 36.6 0 10 20 30 40 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM #VM1 #VM5 #VM10 #VM20 #VM30
  18. • Both of ‘VM’ and ‘Docker on VM’ achieved a

    good speed-up of up to 476 worker cores (30 VMs). • The MEGADOCK execution load is mainly composed of independent 3D- fast Fourier transform (FFT) convolutions on each single node even in the MPI version such that it tends to be a compute-intensive workload, not a data I/O or network-intensive. Ex2: Discussion 19 1.0 5.7 11.1 24.2 35.5 1.2 5.8 11.8 22.9 36.6 0 10 20 30 40 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM #VM1 #VM5 #VM10 #VM20 #VM30
  19. The entire source code of MEGADOCK is open sourced with

    GPL v3 License. Webpage • https://www.bi.cs.titech.ac.jp/megadock/ GitHub (akiyamalab/MEGADOCK) • https://github.com/akiyamalab/MEGADOCK • ’Dockerfile’ available • ‘Singularity Definition’ file will be available soon (WIP) Docker Hub (akiyamalab/megadock) • https://hub.docker.com/r/akiyamalab/megadock • akiyamalab/megadock:cpu • akiyamalab/megadock:gpu Code Availability 20
  20. a. We did not perform multiple GPU nodes experiment in

    this study. So now we are working in progress to demonstrate GPU+MPI parallelization with Singularity containers on multiple-HPC environments. b. In Expriment2, a data point (VM=1) particularly consumed time but the reason was unknown that affected the result of scalability. The reason should be investigated. c. Alternative approach for task distributions framework of MEGADOCK; MapReduce frameworks (e.g. Hadoop, Spark), container frameworks (e.g. Kubernetes, Mesosphere). • needed features: fault-tolerant, auto-recovery from failure, … Discussion 21
  21. • GPU parallelization effectively accelerates MEGADOCK calc. on the both

    cases of with/without container (c, d). • Small performance degradation of approximately +6.3% on the (b) Docker container + MEGADOCK-MPI. Conclusion (1/2) 22 Bare-metal machine Process Process Process Process Bare-metal machine Docker Container Process Process Process Process Bare-metal machine Process GPU Bare-metal machine Docker Container Process GPU c) a) b) d) > x4.5 speed > x4.8 speed ≤+6.3% time ≈±0.5% time
  22. • Parallel performance of our system (Docker on VM) is

    almost equivalent to direct VM calculation case (VM). • Out approach can be applied to other environments: e.g. Singularity + HPC environments Docker + laptop, cloud environments, local servers Conclusion (2/2) 23 1.0 5.7 11.1 24.2 35.5 1.2 5.8 11.8 22.9 36.6 0 10 20 30 40 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM #VM1 #VM5 #VM10 #VM20 #VM30