Slide 1

Slide 1 text

ScaleShift ΦϯϓϨϛε / Ϋϥ΢υ Ͱ࣮ݱ͢Δػցֶश؀ڥ June, 2019

Slide 2

Slide 2 text

!2 ScaleShift Docker ϕʔεɺΦʔϓϯιʔεͷ Web ΫϥΠΞϯτ ΞϓϦέʔγϣϯͰ͢ • ϞσϧߏஙϑΣʔζ - NGC / ࣗࣾϦϙδτϦ͔Βػցֶश Docker ΠϝʔδΛϫϯΫϦοΫͰऔಘ - ͦͷ೚ҙͷ Docker ΠϝʔδΛ Jupyter notebook ίϯςφͱͯ͠ىಈ • ϞσϧֶशϑΣʔζ - ߏஙʹར༻ͨ͠ϥΠϒϥϦ͝ͱ Docker ΠϝʔδʹݻΊϦϙδτϦ΁อଘ - ΫϦοΫ͚ͩͰ Kubernetes Ϋϥελ / Rescale ΁େن໛ܭࢉλεΫΛૹ৴

Slide 3

Slide 3 text

جຊతͳಈ͖ !3 How does it work?

Slide 4

Slide 4 text

!4 ScaleShift ͷىಈ ϩʔΧϧʹ Web αʔόʔ্ཱ͕͕ͪΓ·͢

Slide 5

Slide 5 text

!5 ػցֶशιϑτ΢ΣΞͷΠϯετʔϧ NGC / ϓϥΠϕʔτϨδετϦ ͔ΒϫϯΫϦοΫͰμ΢ϯϩʔυ

Slide 6

Slide 6 text

!6 Jupyter notebook ͰͷϞσϧߏங Jupyter Ͱϥοϓͨ͠ίϯςφ͕͔ΜͨΜʹىಈ ϙʔτ΋࡞ۀྖҬ΋ ίϯςφ͝ͱʹ ෼཭͞Εͨ ΫϦʔϯͳ؀ڥ ɹ.

Slide 7

Slide 7 text

!7 େن໛ܭࢉͷͨΊͷϥοϐϯά ґଘϥΠϒϥϦ΍ιʔείʔυ܈Λ·ͱΊɺͻͱͭͷΠϝʔδʹݻΊ·͢

Slide 8

Slide 8 text

!8 ࣾ಺Ϋϥελ / Ϋϥ΢υ΁ܭࢉλεΫ౤ೖ ౤ೖઌʹԠͯ͡ඞཁͳ API ͕࣮ߦ͞Ε·͢ ར༻ϦιʔεྔΛܾΊ Ϋϥελ΁λεΫ౤ೖ

Slide 9

Slide 9 text

Kubernetes ࿈ܞ !9 Integration with a kubernetes cluster

Slide 10

Slide 10 text

!10 ػցֶश ͱ Kubernetes Web ք۾Λத৺ʹίϯςφΦʔέετϨʔγϣϯͷσϑΝΫτʹͳͬͨ k8sɻ ػցֶशͷจ຺Ͱ΋ίϯςφར༻͕੝ΜʹͳΓɺԠ༻ࣄྫ͕૿͍͑ͯ·͢ɻ - NVIDIA ͕ެࣜʹαϙʔτΛද໌ [ GTC 2018 Keynote, March 27 ] - Mercari ML Ops Night Vol.1 [ גࣜձࣾ ϝϧΧϦ / May 23, 2018 ] ɹhttps://mercari.connpass.com/event/85931/presentation/ - Jupyter ͚ͩͰػցֶशΛ࣮αʔϏεల։Ͱ͖Δج൫ [ גࣜձࣾϦΫϧʔτϥΠϑελΠϧ ] ɹhttps://engineer.recruit-lifestyle.co.jp/techblog/2018-10-04-ml-platform/ - KubernetesʹΑΔػցֶशج൫΁ͷ௅ઓ [ גࣜձࣾ Preferred Networks / Dec 4, 2018 ] ɹhttps://www.slideshare.net/pfi/kubernetes-125013757

Slide 11

Slide 11 text

!11 ScaleShift + Kubernetes ߏ੒ྫ ετϨʔδ ؅ཧϊʔυ ܭࢉϊʔυ ࣾ಺ωοτϫʔΫ NGC DockerHub ϓϥΠϕʔτ ϨδετϦ Kubernetes ݚڀ / ։ൃνʔϜ ScaleShift ೖΓ ϩʔΧϧ୺຤

Slide 12

Slide 12 text

!12 1. ػցֶशιϑτ΢ΣΞͷબ୒ ετϨʔδ ؅ཧϊʔυ ܭࢉϊʔυ ࣾ಺ωοτϫʔΫ NGC DockerHub ϓϥΠϕʔτ ϨδετϦ Kubernetes ݚڀ / ։ൃνʔϜ GUI ͔ΒબͿ͚ͩͰ μ΢ϯϩʔυ͕࢝·Γ·͢

Slide 13

Slide 13 text

!13 2. Ϟσϧߏங ετϨʔδ ؅ཧϊʔυ ܭࢉϊʔυ ࣾ಺ωοτϫʔΫ NGC DockerHub ϓϥΠϕʔτ ϨδετϦ Kubernetes ݚڀ / ։ൃνʔϜ ScaleShift ͕ ϊʔτϒοΫΛىಈ͠·͢

Slide 14

Slide 14 text

!14 3. ࣮ߦ؀ڥɾೖྗσʔλͷసૹ ετϨʔδ ؅ཧϊʔυ ܭࢉϊʔυ ࣾ಺ωοτϫʔΫ NGC DockerHub ϓϥΠϕʔτ ϨδετϦ Kubernetes ScaleShift ͕಺෦తʹ ඞཁͳసૹΛߦ͍·͢ ݚڀ / ։ൃνʔϜ

Slide 15

Slide 15 text

!15 4. େن໛ܭࢉͷ࣮ߦΛࢦࣔ ετϨʔδ ؅ཧϊʔυ ܭࢉϊʔυ ࣾ಺ωοτϫʔΫ NGC DockerHub ϓϥΠϕʔτ ϨδετϦ Kubernetes ݚڀ / ։ൃνʔϜ Kubernetes ͷ Job ͱͯ͠ ܭࢉ৚݅Λૹ৴͠·͢

Slide 16

Slide 16 text

!16 5. େن໛ܭࢉͷ࣮ߦ ετϨʔδ ؅ཧϊʔυ ܭࢉϊʔυ ࣾ಺ωοτϫʔΫ NGC DockerHub Kubernetes ϓϥΠϕʔτ ϨδετϦ ݚڀ / ։ൃνʔϜ

Slide 17

Slide 17 text

!17 6. ܭࢉ݁Ռͷ֬ೝ ετϨʔδ ؅ཧϊʔυ ܭࢉϊʔυ ࣾ಺ωοτϫʔΫ NGC DockerHub ϓϥΠϕʔτ ϨδετϦ Kubernetes ݚڀ / ։ൃνʔϜ

Slide 18

Slide 18 text

!18 Kubernetes ઃఆ / λεΫ࣮ߦը໘

Slide 19

Slide 19 text

ScaleShift ͷઃఆ !19 Configurations

Slide 20

Slide 20 text

!20 ֎෦࿈ܞ ࿈ܞػೳ ઃఆ஋ NVIDIA GPU CLOUD • NVIDIA ࣾͷ؅ཧ͢Δػցֶश Docker Πϝʔδͷ Ұཡ / ৄࡉ৘ใऔಘɺΠϝʔδͷμ΢ϯϩʔυ ɹAPI Ωʔ & Ϣʔβઃఆ ϓϥΠϕʔτϨδετϦ • ࣗࣾͰ؅ཧ͢Δػցֶश Docker Πϝʔδͷ Ұཡ৘ใऔಘɺΠϝʔδͷμ΢ϯϩʔυ ɹ઀ଓઌ & Ϣʔβઃఆ AWS • ػցֶश Docker Πϝʔδͷμ΢ϯϩʔυ • ϩʔΧϧϑΝΠϧγεςϜͱ S3 ؒͷσʔλ࿈ܞ ʢ࣮૷༧ఆʣ Kubernetes • ࣾ಺Ϋϥελ / Ϋϥ΢υͰͷେن໛ܭࢉ࣮ߦ ɹkubecfg Rescale • Rescale ϓϥοτϑΥʔϜͰͷେن໛ܭࢉ࣮ߦ ɹ஍Ҭࢦఆ & API Ωʔ

Slide 21

Slide 21 text

!21 ىಈΦϓγϣϯʢൈਮʣ ઃఆ֓ཁ ॳظ஋ SS_JUPYTER_MINIMUM_PORT ίϯςφ΁ͷ઀ଓϙʔτಈతׂ౰։࢝൪߸ ɾ30000 SS_LOG_LEVEL ΞϓϦέʔγϣϯͷϩάग़ྗϨϕϧ ɹwarn SS_WORKSPACE_HOST_DIR ϗετଆͷ࡞ۀσʔλอଘྖҬ ɹͳ͠ʢࢦఆඞਢʣ SS_NGC_REGISTRY_ENDPOINT NGC ઀ଓઌ ɹhttps://registry.nvidia.com SS_NGC_REGISTRY_USER_NAME NGC Ϣʔβʔ໊ ɹ$oauthtoken SS_RESCALE_SINGULARITY_VERSION Rescale Ͱͷ Singularity ϥϯλΠϜόʔδϣϯ ɹ3.2.0 SS_RESCALE_JOB_WALLTIME Rescale ͰͷλεΫ࣮ߦ࠷େ࣌ؒ ɾ3600 docker-compose.yml ʹઃఆΛهࡌɺىಈͰ͖·͢