ハイブリッド並列 on Kubernetes/hybrid-parallel-program-on-kubernetes

1e5a15f4dc65c207a04a1e82a3f92e92?s=47 ryo nakamaru
November 19, 2018

ハイブリッド並列 on Kubernetes/hybrid-parallel-program-on-kubernetes

JAWS HPC #14 での登壇資料です。デモで利用したコードや手順は https://github.com/pottava/docker-openmpi にあります

1e5a15f4dc65c207a04a1e82a3f92e92?s=128

ryo nakamaru

November 19, 2018
Tweet

Transcript

  1. ίϯςφͰ΋ظ଴௨Γಈ͘ʁ ϋΠϒϦουฒྻ on Kubernetes ɹJAWS-UG HPC #14 Nov 19, 2018

    Ryo NAKAMARU, SUPINF Inc.
  2. SUPINF Inc ҰԠɺಈ͘ΑʂͰ΋ɾɾ !2

  3. SUPINF Inc DEMO !3

  4. SUPINF Inc Docker Ͱඣ໺ϕϯνϚʔΫ !4 Docker ͑͞ೖ͍ͬͯΕ͹ gcc ΋ OpemMPI

    ΋ෆཁɺϫϯϥΠφʙʂ ɹɹ$ docker run --rm -it pottava/openmpi:4.0 \ ɹɹɹɹbash -c "apt-get install -y unzip lhasa make >/dev/null \ ɹɹɹɹ&& wget --quiet http://i.riken.jp/wp-content/uploads/2015/07/cc_himenobmtxp_mpi.zip \ ɹɹɹɹ&& unzip -q cc_himenobmtxp_mpi.zip && lha xqw=/opt/himeno cc_himenobmtxp_mpi.lzh \ ɹɹɹɹ&& cd /opt/himeno && mv Makefile.sample Makefile \ ɹɹɹɹ&& chmod +x ./paramset.sh && ./paramset.sh S 1 1 1 && make >/dev/null 2>&1 \ ɹɹɹɹ&& su -c 'mpirun -np 1 /opt/himeno/bmt' mpiuser” ɹɹSequential version array size ɹɹ mimax = 65 mjmax = 65 mkmax = 129 ɹɹ.. ɹɹMFLOPS measured : 3078.922728
  5. SUPINF Inc Mac 1 ୆ͰϋΠϒϦουฒྻॲཧ !5 OpenMPI ͷϚελɾεϨʔϒϊʔυΛίϯςφͱͯ͠ىಈ ɹɹ// εϨʔϒϓϩηεΛ

    SSH αʔόͱͯ͠ىಈ ɹɹ$ docker run --name 02-node01 -d --cpuset-cpus 0,1 openmpi/samples:02-hybrid-parallel ɹɹ$ docker run --name 02-node02 -d --cpuset-cpus 2,3 openmpi/samples:02-hybrid-parallel ɹɹ// Ϛελʔϓϩηεͷىಈ ɹɹ$ docker run --rm -it -u mpiuser \ ɹɹɹɹ--link 02-node01:node01 --link 02-node02:node02 \ ɹɹɹɹopenmpi/samples:02-hybrid-parallel \ ɹɹɹɹmpirun -np 2 --host node01,node02 -x OMP_NUM_THREADS=2 ./hybrid ɹɹHello from thread 0 out of 2 from process 0 out of 2 on 5329fecf93f4 ɹɹHello from thread 1 out of 2 from process 0 out of 2 on 5329fecf93f4 ɹɹHello from thread 0 out of 2 from process 1 out of 2 on ca3d85c87284 ɹɹHello from thread 1 out of 2 from process 1 out of 2 on ca3d85c87284
  6. SUPINF Inc !6 ɹσϞʹ࢖ͬͨίʔυͱ࣮ߦखॱ͸ͪ͜Β https://github.com/pottava/docker-openmpi

  7. SUPINF Inc τϐοΫ !7 • HPC ΞϓϦέʔγϣϯΛ Docker Ͱಈ͔ͨ͢Ίͷߟ࡯ •

    EC2 ͰϋΠϒϦουฒྻΞϓϦΛಈ͔͢·Ͱ • Kubernetes Ͱͷར༻ྫͱ՝୊
  8. SUPINF Inc HPC ΞϓϦέʔγϣϯΛ Docker Ͱಈ͔ͨ͢Ίͷߟ࡯ !8 HPC ͷཁٻ /

    Docker ͷ࢓૊Έ
  9. SUPINF Inc HPC ΞϓϦέʔγϣϯͷಛ௃ !9 • ϋʔυ΢ΣΞϦιʔεΛͱʹ͔͘࢖͍੾Δ ‣ େن໛Ϋϥελ &

    ϊʔυ͸઎༗͢Δ΋ͷ ‣ ؀ڥΛϋʔυ΢ΣΞϨϕϧͰݫີʹ؅ཧ ‣ σόΠε΍ωοτϫʔΫΛར༻੍ݶ͞Εͯ͸ࠔΔ ‣ ந৅ԽʹΑΔΦʔόʔϔου͑͞ɺͱͯ΋ؾʹͳΔ • “ࣾ಺ܭࢉ؀ڥ” ޲͚ηΩϡϦςΟ ‣ ܭࢉ࣮ߦऀͷݫີ͔ͭॊೈͳ؅ཧ & Ϋϥελ಺෦͸؇Ί
  10. SUPINF Inc Docker ͷ࢓૊Έ & HPC Ͱ࢖͏೰·͠͞ !10 • namespaces

    ʹΑΔܭࢉۭؒͷִ཭ ‣ ͍΍ɺϊʔυ͸઎༗͍ͨ͠ͷͰɾɾ ‣ ϓϩηεؒ௨৴ʹͱͬͯ΋ແ༻ͷ௕෺ • cgroup ʹΑΔܭࢉϦιʔεͷ੍ޚ ‣ ੍ݶ͠ͳ͍͍ͯ͘Ͱ͢ ‣ OOM Ωϧʁ໰୊૿΍͞ͳ͍Ͱɾɾ
  11. SUPINF Inc ɹଓ: Docker Λ HPC Ͱ࢖͏೰·͠͞ !11 • ϓϩηε࣮ߦϢʔβʔͷઃܭ͕ΧδϡΞϧ

    ‣ ΧδϡΞϧʹ root ‣ ϑΝΠϧڞ༗ΛབྷΊͯߟ͑Δͱ΋͏࡬Λ౤͍͛ͨ • ISV ͞Μ֤ҐͷରԠ࣍ୈɾɾ ‣ ༗ঈιϑτ΢ΣΞ΁ͷґଘ౓ͷߴ͞ ‣ ϥΠηϯεαʔό΁ͷΞΫηε੍ޚɺେৎ෉ʁ
  12. SUPINF Inc !12 ͱ͸͍͑ɺDocker ΠϝʔδʹͰ͖Ε͹ՄൖੑΞοϓʂ ʢSingularity ΁ͷม׵΋͙͢Ͱ͖ΔΑʣ

  13. SUPINF Inc !13 MPI ͷ࢓૊Έͱ Dockerfile HPC ΞϓϦέʔγϣϯΛ Docker Ͱಈ͔ͨ͢Ίͷߟ࡯

  14. SUPINF Inc Dockerize ͢Δͱ͖ʹେ੾ͳ͜ͱ !14 • ΞϓϦέʔγϣϯͷ࢓༷ͱڍಈΛ೺Ѳ͢Δ ‣ Ͳ͏΍ͬͯಈ͍ͯΔΜ͚ͩͬʁ֤छґଘͷ೺Ѳ ‣

    Ͳ͏௨৴͚ͯͨͬ͠ʁ • Ͳ͜·ͰίϯςφԽ͢Δ͔Λߟ͑Δ ‣ SSH ͸ϗετʹ೚ͤΔʁMPI ΋ϗετΛ࢖͏ʁ ‣ શ෦ίϯςφʹೖΕΔʁʁ
  15. SUPINF Inc OpenMPI !15 • ֤ϊʔυʹ͸ SSH Ͱ઀ଓ ‣ ܭࢉίϯςφ͸

    SSH αʔόͱͯ͠ࢦࣔ଴ͪͤ͞Δͷ΋ख ‣ ίϯςφىಈ࣌ͷίϚϯυͰ௚઀ىಈ͢Δ͜ͱ΋Ͱ͖Δ • OpenMPI ͷόʔδϣϯ͸Ͳ͏߹ΘͤΔʁ ‣ ϗετʹ SSH + OpenMPI Λ೚ͤΔͳΒɺίϯςφ΋߹ΘͤΔ
  16. SUPINF Inc ࢲ͸͜͏࡞ͬͯΈ·ͨ͠ !16 https://github.com/pottava/docker-openmpi/blob/master/versions/4.0/Dockerfile ɹɹFROM debian:stretch-slim ɹɹRUN apt-get update

    && apt-get install -y gcc ssh wget curl \ ɹɹ && apt-get install -y openssh-server \ ɹɹ .. ɹɹENV OPENMPI_VERSION=4.0.0 ɹɹRUN apt-get install -y build-essential \ ɹɹ && repo="https://www.open-mpi.org/software/ompi/v4.0/downloads" \ ɹɹ && curl --location --silent --show-error --output openmpi.tar.gz \ ɹɹ "${repo}/openmpi-${OPENMPI_VERSION}.tar.gz" \ ɹɹ .. ɹɹ && ./configure --prefix=/usr/local && make && make install SSH Server ΋ೖͬͯΔ
  17. SUPINF Inc EC2 ͰϋΠϒϦουฒྻΞϓϦΛಈ͔͢·Ͱ !17

  18. SUPINF Inc SSH αʔό΋ϗετͷ΋ͷΛར༻ʢ --net=host ʣ EC2 ϗετͷωοτϫʔΫΛ࢖ͬͨܭࢉ !18 eth0

    EC2 10.0.0.10 eth0 EC2 10.0.0.12 SSH server SSH server hostfile ʹ ɹ10.0.0.10 ʹ ɹ10.0.0.12 ʹ Λࢦఆ
  19. SUPINF Inc SSH αʔό΋ίϯςφͱͯ͠ىಈ Docker ͷԾ૝ωοτϫʔΫΛ࢖ͬͨܭࢉ !19 eth0 docker0 EC2

    veth eth0 10.0.0.10 172.17.0.2 eth0 docker0 EC2 veth eth0 10.0.0.12 172.17.0.4 SSH hostfile ʹ ɹ172.17.0.2 ʹ ɹ172.17.0.4 ʹ Λࢦఆ SSH
  20. SUPINF Inc Kubernetes Ͱͷར༻ྫͱ՝୊ !20 ΍ͬͯΈͨ

  21. SUPINF Inc ࣄલʹΞϓϦέʔγϣϯΛ ECR ʹ push !21 Build Push

  22. SUPINF Inc ɹ ܭࢉϊʔυΛઌʹల։ !22 δϣϒΛఆٛͨ͠ YAML Λ Apply ɹ

    ɹ ɹ ܭࢉϊʔυ c5.large c5.large c5.large … • SSH αʔόͱͯ͠ pod Λىಈ • ϊʔυΞϑΟχςΟΛར༻ • ࠓճ͸؆қతʹ DaemonSet EKSʢ؅ཧϊʔυʣ SSH SSH SSH ECR
  23. SUPINF Inc ɹ Master ϓϩηεΛ Job ͱͯ͠౤ೖ !23 eth0 docker0

    veth eth0 172.17.0.2 eth0 docker0 veth eth0 172.17.0.4 SSH hostfile ʹ ɹ172.17.0.2 ʹ ɹ172.17.0.4 ʹ Λࢦఆ SSH ܭࢉϊʔυ
  24. Presented by

  25. தؙ ྑ @pottava • CTO at SUPINF Inc • Solutions

    Architect at Rescale, Inc. • AWS Certified SA / DevOps Engineer - Pro Profile !25
  26. Containerize your app! !26 • Ϋϥ΢υ / ίϯςφ ΛڧΈʹͨ͠डୗ։ൃӡ༻ɺίϯαϧςΟϯά •

    2015 ೥͔Β Docker ͷຊ൪ӡ༻Λ։࢝ɾ๛෋ͳ CI / CD ࣄྫ • εϐϯϑɺͱಡΈ·͢ɾɾ
  27. Cloud HPC with !27 • Ϋϥ΢υ HPC γϛϡϨʔγϣϯϓϥοτϑΥʔϜͷఏڙ • 2011

    ೥ॳ಄ʹઃཱɺPeter Thiel ΍ Microsoft ͔Βग़ࢿ • εέʔϥϒϧͳγϛϡϨʔγϣϯ΍ػցֶशΛʂ
  28. ͝੩ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ :) ࢀߟจݙɿ • Getting Started with Amazon EKS (

    https://docs.aws.amazon.com/ ja_jp/eks/latest/userguide/getting-started.html )