Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improvements of the Shifter at Swiss National Supercomputing Centre (CSCS)

Improvements of the Shifter at Swiss National Supercomputing Centre (CSCS)

Internship Presentation: Improvements of the Shifter Workflow at Swiss National Supercomputing Centre (CSCS)

(2019-04-05)
Kento Aoyama, Ph.D. Student
Akiyama Laboratory, Dept. of Computer Science, Tokyo Institute of Technology

metaVariable

April 05, 2019
Tweet

More Decks by metaVariable

Other Decks in Technology

Transcript

  1. Improvements of the Shifter Workflow at Swiss National Supercomputing Centre

    (CSCS) Kento Aoyama, Ph.D. Student Akiyama Laboratory, Dept. of Computer Science, Tokyo Institute of Technology
  2. Table of Contents 1. Introduction  Container Virtualization and Computational

    Science  About Shifter  Contributions of this work 2. Design and Implementations  New architecture design  New Docker-like interface  Performance improvements and others 3. Performance Evaluation  Comparison on code-level  Comparison on user experience 4. Summary 2
  3. Container Virtualization (focused on packaging)  Lightweight virtualization for packaging

    application  Faster than Virtual Machines on performance  More lightweight on data size 3 Hardware Linux Kernel Container App Bins/Libs Container App Bins/Libs Hardware Virtual Machine App Guest OS Bins/Libs Virtual Machine App Guest OS Bins/Libs Hypervisor Virtual Machines Containers
  4. Container Virtualization and Computational Science  Reproducibility  Different version

    of library can make different result  e.g.) Genomic analysis pipeline [Paolo, 2016]  Dependency conflict  Different application can requires different version of same library Container A’ Container A Container B Container A Library A Application A Application B version >= 1.2 version < 1.1 Application A Library version 1.3 Result A’ Application A Library version 1.2 Result A conflict different result Dependency Isolation Application Reproducibility 4
  5. Example Report: Containers in Genomic Center A. Paolo, D. Tommaso,

    A. B. Ramirez, E. Palumbo, C. Notredame, and D. Gruber, “Benchmark Report : Univa Grid Engine , Nextflow , and Docker for running Genomic Analysis Workflows.” @ Centre for Genomic Regulation (Barcelona, Spain) • Univa Grid Engine (Job Scheduler) • Nextflow (Workflow manager) • Docker (Linux Container) running job on node Docker Container User Script pull image from registry job scheduler allocate run on container • Keep application reproducibility • Easy to build & test application 5
  6. About Shifter  “Shifter: Containers for HPC” [R. S. Canon,

    2016]  Provide container environment on HPC systems  Open Source on GitHub (NERSC/Shifter)  Available on PizDaint at CSCS, and others Key Features  Creation of software environments (containers)  Native performance of custom HPC hardware (MPI/GPU)  No root-privilege for user  Compatible with Docker Containers 6 Docker Containers User Registry Service Build Upload Run Other System
  7. Shifter’s Problems, Challenges > Shifter is the first HPC-targeted solution

    for the user-defined stack with a good workflow and direct resource access. However, it relies on trusted and privileged operations, its resource manager integration increases complexity, and it requires servers and daemons for the image gateway. 7 Jie Zhang, Xiaoyi Lu, and Dhabaleswar K. Panda.”Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds?”, In Proceedings of the10th International Conference on Utility and Cloud Computing (UCC '17), pp.151-160, 2017.
  8. Problem: Complex System Architecture 8  Image gateway requires servers

    and daemons  Increases complexity of deployment and maintenance cost
  9. Problem: Performance of Shifter Image Gateway 9 Label Download Expansion

    Convert Other TOTAL Elapsed Time [sec] 373.441 769.786 820.882 63.186 2027.295  Time of pulling a container image from Docker Hub  Data size : 6 GB (32 layers)  Machine : gpu node on TDS (dom.cscs.ch) 0 500 1000 1500 2000 2500 Elapsed Time Download Expansion Convert Other 2027.295  Sometimes takes over 30 mins for pulling  low-reliability: once a trouble happens, restart from scratch
  10. Contributions of this work:  New Shifter’s Architecture  Single

    executable, No daemon exists  Integration of Shifter and Shifter Image Gateway  New Shifter Image Manager  Performance Improvements  Robust and reliable pulling process  Support private repository & 3rd party registry etc.  New Docker-like Interface 10 $ shifter pull <image><:tag> $ shifter run <image><:tag> <args> $ shifter images etc.
  11. Old Shifter’s Architecture Problems 12  low-performance, low-reliability  complexity

    of deployment and maintenance cost Different executable Performance issue Different service
  12. Example: Pulling Workflow (old) 13 1. $ shifterimg pull ubuntu

    2. $ shifter --image=ubuntu pull request download layers expand & convert finished!
  13. Example: Running Workflow (old) 14 1. $ shifterimg pull ubuntu

    2. $ shifter --image=ubuntu shifter lookup load exec
  14. New Shifter’s Architecture 15 a shifter image manager shifter (CLI)

    shifter runtime HPC System  single-executable, no daemons Workload Manager Shifter
  15. New Shifter’s Architecture 16 a shifter image manager shifter (CLI)

    shifter runtime HPC System  single-executable, no daemons  improved performance using faster-filesystems single executable no daemon Workload Manager improved performance using faster filesystems user friendly interface
  16. Example: Pulling Workflow of New Shifter 17 1. $ shifter

    pull ubuntu 2. $ shifter run ubuntu a shifter image manager shifter (CLI) shifter runtime HPC System download layers expand & convert finished! (user quota) pull Workload Manager
  17. Example: Running Workflow of New Shifter 18 1. $ shifter

    pull ubuntu 2. $ shifter run ubuntu a shifter image manager shifter (CLI) shifter runtime HPC System exec lookup load run Workload Manager
  18. Shifter (CLI) - New Shifter’s Component  Available commands list

    20 # run container image $ shifter run [options] <image>[<:tag>] <args> # pull image $ shifter pull [options] <image>[<:tag>] # show list of images $ shifter images # remove image $ shifter rmi <image>[<:tag>] # import image $ shifter import <arhive_path> <image>[<:tag>]
  19. Examples of Command Output  e.g. shifter images, shifter run

    21 $ shifter images REPOSITORY TAG DIGEST CREATED SIZE SERVER library/alpine latest 2a69dc48d968cd1 2017-12-11T17:42:24 1.91 MB index.docker.io library/centos latest 358bf47a7a6443d 2017-12-14T10:08:53 68.40 MB index.docker.io library/ubuntu latest 736f02dfa6f4e92 2017-12-14T10:07:09 44.22 MB index.docker.io nvidia/caffe 17.12 feddbaa20c2f4bb 2017-12-14T10:33:52 1.23 GB nvcr.io partners/mapd 3.2.2 499ff9befa4a162 2017-12-14T10:22:02 665.31 MB nvcr.io $ shifter run centos cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31“ … *under development
  20. About Shifter Image Manager  Main Features  pulling container

    image from Docker registry service  storing the shifter image  providing utilities for local repository  Repository Policy  repository images are owned by user (user’s local repository)  New Features and Improvements  performance improvement  Parallel layer download  Remove-After-Write expansion  robust pulling (automatic connection retry)  support private repository  support 3rd party registry service 23
  21. Optimization on Expansion Process for Fast-Storage (1/2)  Old Expansion

    Process  Combine all layer archives and create a container image  Old algorithm aims Reduction of File I/O  Concept: Filter unnecessary items before write  A lot of time consumed to check deleted item dependency  Especially, in the case of a container has many items  Bad implementations on Performance (heavy python loop, etc.) 24 layer1 layer2 layer3 output file1 file2 file3 deleted file4 file1 file2 file4 file2 will be filtered
  22. Optimization on Expansion Process for Fast-Storage (2/2)  New Expansion

    Process  Concept: delete unnecessary items after write  New algorithm aims Simple Implementation  Performance has few dependency on the number of items  New shifter architecture uses fast-storage(tmpfs) on compute-node 25 layer1 layer2 layer3 output file1 file2 file3 deleted file4 file1 file2 file4 file2 overwrite & delete overwrite … overwrite…
  23. Robust Download Process  Automatic retry download (specified times) If

    it catches any failure, automatically retry the layer downloading  case: Failed to connect registry  case: Failed to download layer archive  case: Failed to validate archive checksum  case: Authorized token expired, etc.  Example of failure 26 $ shifter pull nvidia/cuda:latest … > pulling : sha256:1d8592394ba1ae81037e16fac3382... [ERROR ] Download stream error: Failed to read response body > failed : sha256:9b7c55367bee78ee5af95e7bc7d91... > retry : sha256:9b7c55367bee78ee5af95e7bc7d91... > completed : sha256:1d8592394ba1ae81037e16fac3382... …
  24. Support Authentication & 3rd party registry  Authentication option for

    private repository  “--login” options for basic authentication  Support 3rd party registry service  $ shifter pull <server>/<namespace>/<image><:tag>  e.g. NVIDIA GPU Cloud (DGX-1 Container Registry) 27 $ shifter pull user/privateRepo:tag --login username : user password : … $ shifter pull nvcr.io/nvidia/caffe:17.12 --login username : $oauthtoken password : …
  25. Evaluation: Comparison on Code-level Label Download Expansion Convert Other TOTAL

    Speed-up Old (clone code) 286.90 394.96 - - 681.86 1.00 New (prototype) 131.26 195.31 - - 326.57 2.09 29  DataSize : 6 GB (32 layers)  Env. : PizDaint (daint-gpu) using tmpfs for File I/O 0 100 200 300 400 500 600 700 800 New (prototype) Old (clone code) Download Expansion 681.86 326.57 Old Shifter Image Gateway code New prototype python code vs
  26. 0 500 1000 1500 2000 2500 New ($HOME + tmpfs)

    Old Download Expansion Convert Other Evaluation: Comparison on User Experience 30 Label Download Expansion Convert Other TOTAL Speed-up Old 373.441 769.786 820.882 63.186 2027.295 1.00 New ($HOME + tmpfs) 295.189 58.120 126.364 3.603 483.275 4.20 2027.295 483.275 Old Shifter Image Gateway New Shifter Pull Command vs  DataSize : 6 GB (32 layers)  Env. : gpu node on TDS (dom.cscs.ch)
  27. Summary  Create New Architecture  Single-executable, No daemon exists

     Improve Performance  maximum: x 4.20 faster (using 6 GB image)  Provide User Friendly Interface  shifter pull, shifter run, shifter images, etc.  Provide New Features  support authentication, 3rd party registry, etc. 31
  28. 0 50 100 150 200 250 New ($HOME + tmpfs)

    Old Download Expansion Convert Other Comparison of User Experience: Small image Label Download Expansion Convert Other TOTAL Speed-up Old 88.035 55.534 57.484 19.514 220.567 1.00 New ($HOME + tmpfs) 36.427 9.333 11.324 2.126 59.210 3.73 33  DataSize : 1 GB (6 layers)  Env. : login node on TDS using tmpfs for File I/O 220.567 59.210 Old Shifter Image Gateway New Shifter Pull Command vs
  29. Profile data: Prototype Python Code ncalls tottime percall cumtime percall

    filename:lineno(function) 1 183/1 0.006 0 369.873 369.873 {built-in method exec} 2 1 0 0 369.873 369.873 main.py:3(<module>) 3 1 0 0 369.579 369.579 main.py:723(main) 4 1 0 0 229.467 229.467 main.py:630(testExpansion) 5 1 0.144 0.144 227.818 227.818 main.py:393(simple_expand_layers) 6 17816449 27.825 0 193.458 0 gzip.py:349(read) 7 7697462 8.883 0 157.148 0 gzip.py:425(_read) 8 1 0 0 140.111 140.111 main.py:511(testDownload) … 34 $ python3 -m cProfile $HOME/toy-python/main.py --namespace=jereviendrai --repo=cntk-mpich-gpu-demjan --tag=latest --N=1 --test_pull --test_expand --skip_original --log_level=2 --tmp_dir=/tmp/layer_cache --expand_dir=/tmp/image > perf.out gzip buffered IO consumed the 84.3% of expansion time in prototype code