ScaleShift の機能概要、Kubernetes クラスタとの連携について
ScaleShiftΦϯϓϨϛε / Ϋϥυ Ͱ࣮ݱ͢ΔػցֶशڥJune, 2019
View Slide
!2ScaleShiftDocker ϕʔεɺΦʔϓϯιʔεͷ Web ΫϥΠΞϯτ ΞϓϦέʔγϣϯͰ͢• ϞσϧߏஙϑΣʔζ- NGC / ࣗࣾϦϙδτϦ͔Βػցֶश Docker ΠϝʔδΛϫϯΫϦοΫͰऔಘ- ͦͷҙͷ Docker ΠϝʔδΛ Jupyter notebook ίϯςφͱͯ͠ىಈ• ϞσϧֶशϑΣʔζ- ߏஙʹར༻ͨ͠ϥΠϒϥϦ͝ͱ Docker ΠϝʔδʹݻΊϦϙδτϦอଘ- ΫϦοΫ͚ͩͰ Kubernetes Ϋϥελ / Rescale େنܭࢉλεΫΛૹ৴
جຊతͳಈ͖!3How does it work?
!4ScaleShift ͷىಈϩʔΧϧʹ Web αʔόʔ্ཱ͕͕ͪΓ·͢
!5ػցֶशιϑτΣΞͷΠϯετʔϧNGC / ϓϥΠϕʔτϨδετϦ ͔ΒϫϯΫϦοΫͰμϯϩʔυ
!6Jupyter notebook ͰͷϞσϧߏஙJupyter Ͱϥοϓͨ͠ίϯςφ͕͔ΜͨΜʹىಈϙʔτ࡞ۀྖҬίϯςφ͝ͱʹ͞ΕͨΫϦʔϯͳڥ ɹ.
!7େنܭࢉͷͨΊͷϥοϐϯάґଘϥΠϒϥϦιʔείʔυ܈Λ·ͱΊɺͻͱͭͷΠϝʔδʹݻΊ·͢
!8ࣾΫϥελ / ΫϥυܭࢉλεΫೖೖઌʹԠͯ͡ඞཁͳ API ͕࣮ߦ͞Ε·͢ར༻ϦιʔεྔΛܾΊΫϥελλεΫೖ
Kubernetes ࿈ܞ!9Integration with a kubernetes cluster
!10ػցֶश ͱ KubernetesWeb ք۾Λத৺ʹίϯςφΦʔέετϨʔγϣϯͷσϑΝΫτʹͳͬͨ k8sɻػցֶशͷจ຺Ͱίϯςφར༻͕ΜʹͳΓɺԠ༻ࣄྫ͕૿͍͑ͯ·͢ɻ- NVIDIA ͕ެࣜʹαϙʔτΛද໌ [ GTC 2018 Keynote, March 27 ]- Mercari ML Ops Night Vol.1 [ גࣜձࣾ ϝϧΧϦ / May 23, 2018 ]ɹhttps://mercari.connpass.com/event/85931/presentation/- Jupyter ͚ͩͰػցֶशΛ࣮αʔϏεల։Ͱ͖Δج൫ [ גࣜձࣾϦΫϧʔτϥΠϑελΠϧ ]ɹhttps://engineer.recruit-lifestyle.co.jp/techblog/2018-10-04-ml-platform/- KubernetesʹΑΔػցֶशج൫ͷઓ [ גࣜձࣾ Preferred Networks / Dec 4, 2018 ]ɹhttps://www.slideshare.net/pfi/kubernetes-125013757
!11ScaleShift + Kubernetes ߏྫετϨʔδཧϊʔυ ܭࢉϊʔυࣾωοτϫʔΫNGCDockerHubϓϥΠϕʔτϨδετϦKubernetesݚڀ / ։ൃνʔϜScaleShift ೖΓϩʔΧϧ
!121. ػցֶशιϑτΣΞͷબετϨʔδཧϊʔυ ܭࢉϊʔυࣾωοτϫʔΫNGCDockerHubϓϥΠϕʔτϨδετϦKubernetesݚڀ / ։ൃνʔϜGUI ͔ΒબͿ͚ͩͰμϯϩʔυ͕࢝·Γ·͢
!132. ϞσϧߏஙετϨʔδཧϊʔυ ܭࢉϊʔυࣾωοτϫʔΫNGCDockerHubϓϥΠϕʔτϨδετϦKubernetesݚڀ / ։ൃνʔϜScaleShift ͕ϊʔτϒοΫΛىಈ͠·͢
!143. ࣮ߦڥɾೖྗσʔλͷసૹετϨʔδཧϊʔυ ܭࢉϊʔυࣾωοτϫʔΫNGCDockerHubϓϥΠϕʔτϨδετϦKubernetesScaleShift ͕෦తʹඞཁͳసૹΛߦ͍·͢ݚڀ / ։ൃνʔϜ
!154. େنܭࢉͷ࣮ߦΛࢦࣔετϨʔδཧϊʔυ ܭࢉϊʔυࣾωοτϫʔΫNGCDockerHubϓϥΠϕʔτϨδετϦKubernetesݚڀ / ։ൃνʔϜKubernetes ͷ Job ͱͯ͠ܭࢉ݅Λૹ৴͠·͢
!165. େنܭࢉͷ࣮ߦετϨʔδཧϊʔυ ܭࢉϊʔυࣾωοτϫʔΫNGCDockerHubKubernetesϓϥΠϕʔτϨδετϦݚڀ / ։ൃνʔϜ
!176. ܭࢉ݁Ռͷ֬ೝετϨʔδཧϊʔυ ܭࢉϊʔυࣾωοτϫʔΫNGCDockerHubϓϥΠϕʔτϨδετϦKubernetesݚڀ / ։ൃνʔϜ
!18Kubernetes ઃఆ / λεΫ࣮ߦը໘
ScaleShift ͷઃఆ!19Configurations
!20֎෦࿈ܞ࿈ܞػೳ ઃఆNVIDIA GPU CLOUD• NVIDIA ࣾͷཧ͢Δػցֶश Docker ΠϝʔδͷҰཡ / ৄࡉใऔಘɺΠϝʔδͷμϯϩʔυɹAPI Ωʔ & ϢʔβઃఆϓϥΠϕʔτϨδετϦ• ࣗࣾͰཧ͢Δػցֶश Docker ΠϝʔδͷҰཡใऔಘɺΠϝʔδͷμϯϩʔυɹଓઌ & ϢʔβઃఆAWS• ػցֶश Docker Πϝʔδͷμϯϩʔυ• ϩʔΧϧϑΝΠϧγεςϜͱ S3 ؒͷσʔλ࿈ܞʢ࣮༧ఆʣKubernetes • ࣾΫϥελ / ΫϥυͰͷେنܭࢉ࣮ߦ ɹkubecfgRescale • Rescale ϓϥοτϑΥʔϜͰͷେنܭࢉ࣮ߦ ɹҬࢦఆ & API Ωʔ
!21ىಈΦϓγϣϯʢൈਮʣઃఆ֓ཁ ॳظSS_JUPYTER_MINIMUM_PORT ίϯςφͷଓϙʔτಈతׂ։࢝൪߸ ɾ30000SS_LOG_LEVEL ΞϓϦέʔγϣϯͷϩάग़ྗϨϕϧ ɹwarnSS_WORKSPACE_HOST_DIR ϗετଆͷ࡞ۀσʔλอଘྖҬ ɹͳ͠ʢࢦఆඞਢʣSS_NGC_REGISTRY_ENDPOINT NGC ଓઌ ɹhttps://registry.nvidia.comSS_NGC_REGISTRY_USER_NAME NGC Ϣʔβʔ໊ ɹ$oauthtokenSS_RESCALE_SINGULARITY_VERSION Rescale Ͱͷ Singularity ϥϯλΠϜόʔδϣϯ ɹ3.2.0SS_RESCALE_JOB_WALLTIME Rescale ͰͷλεΫ࣮ߦ࠷େ࣌ؒ ɾ3600docker-compose.yml ʹઃఆΛهࡌɺىಈͰ͖·͢