$30 off During Our Annual Pro Sale. View Details »

ハイブリッド並列 on Kubernetes/hybrid-parallel-program-on-kubernetes

ryo nakamaru
November 19, 2018

ハイブリッド並列 on Kubernetes/hybrid-parallel-program-on-kubernetes

JAWS HPC #14 での登壇資料です。デモで利用したコードや手順は https://github.com/pottava/docker-openmpi にあります

ryo nakamaru

November 19, 2018
Tweet

More Decks by ryo nakamaru

Other Decks in Programming

Transcript

  1. ίϯςφͰ΋ظ଴௨Γಈ͘ʁ
    ϋΠϒϦουฒྻ on Kubernetes
    ɹJAWS-UG HPC #14 Nov 19, 2018
    Ryo NAKAMARU, SUPINF Inc.

    View Slide

  2. SUPINF Inc
    ҰԠɺಈ͘ΑʂͰ΋ɾɾ
    !2

    View Slide

  3. SUPINF Inc
    DEMO
    !3

    View Slide

  4. SUPINF Inc
    Docker Ͱඣ໺ϕϯνϚʔΫ
    !4
    Docker ͑͞ೖ͍ͬͯΕ͹ gcc ΋ OpemMPI ΋ෆཁɺϫϯϥΠφʙʂ
    ɹɹ$ docker run --rm -it pottava/openmpi:4.0 \
    ɹɹɹɹbash -c "apt-get install -y unzip lhasa make >/dev/null \
    ɹɹɹɹ&& wget --quiet http://i.riken.jp/wp-content/uploads/2015/07/cc_himenobmtxp_mpi.zip \
    ɹɹɹɹ&& unzip -q cc_himenobmtxp_mpi.zip && lha xqw=/opt/himeno cc_himenobmtxp_mpi.lzh \
    ɹɹɹɹ&& cd /opt/himeno && mv Makefile.sample Makefile \
    ɹɹɹɹ&& chmod +x ./paramset.sh && ./paramset.sh S 1 1 1 && make >/dev/null 2>&1 \
    ɹɹɹɹ&& su -c 'mpirun -np 1 /opt/himeno/bmt' mpiuser”
    ɹɹSequential version array size
    ɹɹ mimax = 65 mjmax = 65 mkmax = 129
    ɹɹ..
    ɹɹMFLOPS measured : 3078.922728

    View Slide

  5. SUPINF Inc
    Mac 1 ୆ͰϋΠϒϦουฒྻॲཧ
    !5
    OpenMPI ͷϚελɾεϨʔϒϊʔυΛίϯςφͱͯ͠ىಈ
    ɹɹ// εϨʔϒϓϩηεΛ SSH αʔόͱͯ͠ىಈ
    ɹɹ$ docker run --name 02-node01 -d --cpuset-cpus 0,1 openmpi/samples:02-hybrid-parallel
    ɹɹ$ docker run --name 02-node02 -d --cpuset-cpus 2,3 openmpi/samples:02-hybrid-parallel
    ɹɹ// Ϛελʔϓϩηεͷىಈ
    ɹɹ$ docker run --rm -it -u mpiuser \
    ɹɹɹɹ--link 02-node01:node01 --link 02-node02:node02 \
    ɹɹɹɹopenmpi/samples:02-hybrid-parallel \
    ɹɹɹɹmpirun -np 2 --host node01,node02 -x OMP_NUM_THREADS=2 ./hybrid
    ɹɹHello from thread 0 out of 2 from process 0 out of 2 on 5329fecf93f4
    ɹɹHello from thread 1 out of 2 from process 0 out of 2 on 5329fecf93f4
    ɹɹHello from thread 0 out of 2 from process 1 out of 2 on ca3d85c87284
    ɹɹHello from thread 1 out of 2 from process 1 out of 2 on ca3d85c87284

    View Slide

  6. SUPINF Inc
    !6
    ɹσϞʹ࢖ͬͨίʔυͱ࣮ߦखॱ͸ͪ͜Β
    https://github.com/pottava/docker-openmpi

    View Slide

  7. SUPINF Inc
    τϐοΫ
    !7
    • HPC ΞϓϦέʔγϣϯΛ Docker Ͱಈ͔ͨ͢Ίͷߟ࡯
    • EC2 ͰϋΠϒϦουฒྻΞϓϦΛಈ͔͢·Ͱ
    • Kubernetes Ͱͷར༻ྫͱ՝୊

    View Slide

  8. SUPINF Inc
    HPC ΞϓϦέʔγϣϯΛ Docker Ͱಈ͔ͨ͢Ίͷߟ࡯
    !8
    HPC ͷཁٻ / Docker ͷ࢓૊Έ

    View Slide

  9. SUPINF Inc
    HPC ΞϓϦέʔγϣϯͷಛ௃
    !9
    • ϋʔυ΢ΣΞϦιʔεΛͱʹ͔͘࢖͍੾Δ
    ‣ େن໛Ϋϥελ & ϊʔυ͸઎༗͢Δ΋ͷ
    ‣ ؀ڥΛϋʔυ΢ΣΞϨϕϧͰݫີʹ؅ཧ
    ‣ σόΠε΍ωοτϫʔΫΛར༻੍ݶ͞Εͯ͸ࠔΔ
    ‣ ந৅ԽʹΑΔΦʔόʔϔου͑͞ɺͱͯ΋ؾʹͳΔ
    • “ࣾ಺ܭࢉ؀ڥ” ޲͚ηΩϡϦςΟ
    ‣ ܭࢉ࣮ߦऀͷݫີ͔ͭॊೈͳ؅ཧ & Ϋϥελ಺෦͸؇Ί

    View Slide

  10. SUPINF Inc
    Docker ͷ࢓૊Έ & HPC Ͱ࢖͏೰·͠͞
    !10
    • namespaces ʹΑΔܭࢉۭؒͷִ཭
    ‣ ͍΍ɺϊʔυ͸઎༗͍ͨ͠ͷͰɾɾ
    ‣ ϓϩηεؒ௨৴ʹͱͬͯ΋ແ༻ͷ௕෺
    • cgroup ʹΑΔܭࢉϦιʔεͷ੍ޚ
    ‣ ੍ݶ͠ͳ͍͍ͯ͘Ͱ͢
    ‣ OOM Ωϧʁ໰୊૿΍͞ͳ͍Ͱɾɾ

    View Slide

  11. SUPINF Inc
    ɹଓ: Docker Λ HPC Ͱ࢖͏೰·͠͞
    !11
    • ϓϩηε࣮ߦϢʔβʔͷઃܭ͕ΧδϡΞϧ
    ‣ ΧδϡΞϧʹ root
    ‣ ϑΝΠϧڞ༗ΛབྷΊͯߟ͑Δͱ΋͏࡬Λ౤͍͛ͨ
    • ISV ͞Μ֤ҐͷରԠ࣍ୈɾɾ
    ‣ ༗ঈιϑτ΢ΣΞ΁ͷґଘ౓ͷߴ͞
    ‣ ϥΠηϯεαʔό΁ͷΞΫηε੍ޚɺେৎ෉ʁ

    View Slide

  12. SUPINF Inc
    !12
    ͱ͸͍͑ɺDocker ΠϝʔδʹͰ͖Ε͹ՄൖੑΞοϓʂ
    ʢSingularity ΁ͷม׵΋͙͢Ͱ͖ΔΑʣ

    View Slide

  13. SUPINF Inc
    !13
    MPI ͷ࢓૊Έͱ Dockerfile
    HPC ΞϓϦέʔγϣϯΛ Docker Ͱಈ͔ͨ͢Ίͷߟ࡯

    View Slide

  14. SUPINF Inc
    Dockerize ͢Δͱ͖ʹେ੾ͳ͜ͱ
    !14
    • ΞϓϦέʔγϣϯͷ࢓༷ͱڍಈΛ೺Ѳ͢Δ
    ‣ Ͳ͏΍ͬͯಈ͍ͯΔΜ͚ͩͬʁ֤छґଘͷ೺Ѳ
    ‣ Ͳ͏௨৴͚ͯͨͬ͠ʁ
    • Ͳ͜·ͰίϯςφԽ͢Δ͔Λߟ͑Δ
    ‣ SSH ͸ϗετʹ೚ͤΔʁMPI ΋ϗετΛ࢖͏ʁ
    ‣ શ෦ίϯςφʹೖΕΔʁʁ

    View Slide

  15. SUPINF Inc
    OpenMPI
    !15
    • ֤ϊʔυʹ͸ SSH Ͱ઀ଓ
    ‣ ܭࢉίϯςφ͸ SSH αʔόͱͯ͠ࢦࣔ଴ͪͤ͞Δͷ΋ख
    ‣ ίϯςφىಈ࣌ͷίϚϯυͰ௚઀ىಈ͢Δ͜ͱ΋Ͱ͖Δ
    • OpenMPI ͷόʔδϣϯ͸Ͳ͏߹ΘͤΔʁ
    ‣ ϗετʹ SSH + OpenMPI Λ೚ͤΔͳΒɺίϯςφ΋߹ΘͤΔ

    View Slide

  16. SUPINF Inc
    ࢲ͸͜͏࡞ͬͯΈ·ͨ͠
    !16
    https://github.com/pottava/docker-openmpi/blob/master/versions/4.0/Dockerfile
    ɹɹFROM debian:stretch-slim
    ɹɹRUN apt-get update && apt-get install -y gcc ssh wget curl \
    ɹɹ && apt-get install -y openssh-server \
    ɹɹ ..
    ɹɹENV OPENMPI_VERSION=4.0.0
    ɹɹRUN apt-get install -y build-essential \
    ɹɹ && repo="https://www.open-mpi.org/software/ompi/v4.0/downloads" \
    ɹɹ && curl --location --silent --show-error --output openmpi.tar.gz \
    ɹɹ "${repo}/openmpi-${OPENMPI_VERSION}.tar.gz" \
    ɹɹ ..
    ɹɹ && ./configure --prefix=/usr/local && make && make install
    SSH
    Server
    ΋ೖͬͯΔ

    View Slide

  17. SUPINF Inc
    EC2 ͰϋΠϒϦουฒྻΞϓϦΛಈ͔͢·Ͱ
    !17

    View Slide

  18. SUPINF Inc
    SSH αʔό΋ϗετͷ΋ͷΛར༻ʢ --net=host ʣ
    EC2 ϗετͷωοτϫʔΫΛ࢖ͬͨܭࢉ
    !18
    eth0
    EC2
    10.0.0.10 eth0
    EC2
    10.0.0.12
    SSH server SSH server
    hostfile ʹ
    ɹ10.0.0.10 ʹ
    ɹ10.0.0.12 ʹ
    Λࢦఆ

    View Slide

  19. SUPINF Inc
    SSH αʔό΋ίϯςφͱͯ͠ىಈ
    Docker ͷԾ૝ωοτϫʔΫΛ࢖ͬͨܭࢉ
    !19
    eth0
    docker0
    EC2
    veth
    eth0
    10.0.0.10
    172.17.0.2
    eth0
    docker0
    EC2
    veth
    eth0
    10.0.0.12
    172.17.0.4
    SSH
    hostfile ʹ
    ɹ172.17.0.2 ʹ
    ɹ172.17.0.4 ʹ
    Λࢦఆ
    SSH

    View Slide

  20. SUPINF Inc
    Kubernetes Ͱͷར༻ྫͱ՝୊
    !20
    ΍ͬͯΈͨ

    View Slide

  21. SUPINF Inc
    ࣄલʹΞϓϦέʔγϣϯΛ ECR ʹ push
    !21
    Build Push

    View Slide

  22. SUPINF Inc
    ɹ
    ܭࢉϊʔυΛઌʹల։
    !22
    δϣϒΛఆٛͨ͠
    YAML Λ Apply
    ɹ ɹ ɹ
    ܭࢉϊʔυ
    c5.large c5.large c5.large

    • SSH αʔόͱͯ͠ pod Λىಈ
    • ϊʔυΞϑΟχςΟΛར༻
    • ࠓճ͸؆қతʹ DaemonSet
    EKSʢ؅ཧϊʔυʣ
    SSH
    SSH
    SSH
    ECR

    View Slide

  23. SUPINF Inc
    ɹ
    Master ϓϩηεΛ Job ͱͯ͠౤ೖ
    !23
    eth0
    docker0
    veth
    eth0 172.17.0.2
    eth0
    docker0
    veth
    eth0 172.17.0.4
    SSH
    hostfile ʹ
    ɹ172.17.0.2 ʹ
    ɹ172.17.0.4 ʹ
    Λࢦఆ
    SSH
    ܭࢉϊʔυ

    View Slide

  24. Presented by

    View Slide

  25. தؙ ྑ @pottava
    • CTO at SUPINF Inc
    • Solutions Architect at Rescale, Inc.
    • AWS Certified SA / DevOps Engineer - Pro
    Profile
    !25

    View Slide

  26. Containerize your app!
    !26
    • Ϋϥ΢υ / ίϯςφ ΛڧΈʹͨ͠डୗ։ൃӡ༻ɺίϯαϧςΟϯά
    • 2015 ೥͔Β Docker ͷຊ൪ӡ༻Λ։࢝ɾ๛෋ͳ CI / CD ࣄྫ
    • εϐϯϑɺͱಡΈ·͢ɾɾ

    View Slide

  27. Cloud HPC with
    !27
    • Ϋϥ΢υ HPC γϛϡϨʔγϣϯϓϥοτϑΥʔϜͷఏڙ
    • 2011 ೥ॳ಄ʹઃཱɺPeter Thiel ΍ Microsoft ͔Βग़ࢿ
    • εέʔϥϒϧͳγϛϡϨʔγϣϯ΍ػցֶशΛʂ

    View Slide

  28. ͝੩ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ :)
    ࢀߟจݙɿ
    • Getting Started with Amazon EKS ( https://docs.aws.amazon.com/
    ja_jp/eks/latest/userguide/getting-started.html )

    View Slide