Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Deep Learning for everyone

Cloud Deep Learning for everyone

2018.12.14 Naver TechTalk 발표자료입니다.

Eunju Amy Sohn

December 14, 2018
Tweet

More Decks by Eunju Amy Sohn

Other Decks in Programming

Transcript

  1. ߊ಴੗ ࣗѐ • ઔসਸ р੺൤ ߄ۄח ؀೟ࢤ • झఋ౟সীࢲ Devops

    ѐߊ੗۽ ੌ ೮যਃ • ബਯ੸ਵ۽ ੌೞח Ѫਸ ب৬઱ח ӝ ٜࣿਸ ࢎی೤פ׮. • ࢎप ੉Ѫ੷Ѫ ׮ ೧ࠁח Ѫب જই ೤פ׮. EJSohn
  2. ݽف੄ ௿ۄ਋٘ ٩۞׬ ݽف੄ ௿ۄ਋٘ ٩۞׬ CNN, RNN, GAN, GPU,

    tensorflow, cuda, keras, Deep Reinforcement Learning..
  3. ݽف੄ • ૒੢ੋ, ѐੋ, ೟ࢤ ѐߊ੗? • ૘઺ೞח ࠗ࠙ী ର੉о

    ੓णפ׮. • ೞ૑݅ ҕా੸ਵ۽ - ݽ؛ਸ ٜ݅Ҋ - ೟णदఈפ׮. (೟णҗ పझ౟৬ ࣻ੿੄ ߈ࠂ ӒܻҊ ߈ࠂ..)
  4. ݽف੄ • ૒੢ੋ, ѐੋ, ೟ࢤ ѐߊ੗? • ૘઺ೞח ࠗ࠙ী ର੉о

    ੓णפ׮. • ೞ૑݅ ҕా੸ਵ۽ - ݽ؛ਸ ٜ݅Ҋ - ೟णदఈפ׮. (೟णҗ పझ౟৬ ࣻ੿੄ ߈ࠂ ӒܻҊ ߈ࠂ..) ੉ җ੿ਸ ബਯ੸ਵ۽ ݅ٚ׮ݶ?
  5. ௿ۄ਋٘ যڃ Ѫਸ ೡ ࣻ ੓ਸөਃ? • ѐߊ ജ҃ ࢸ੿

    • ࢲ࠺झ ജ҃ ࢸ੿ • ࢲ࠺झܳ ਤೠ GPU ࠺ਊ хࣗ • ೟ण ঌܿ
  6. ѐߊ ജ҃ ࢸ੿ • ѐߊ੗о ѐߊী݅ ૘઺ೡ ࣻ ੓ח ജ҃

    ઑࢿ • ٩۞׬ ѐߊ੗ - दр ࠺ऱਃ - ॳח ݠनب ࠺ऱਃ • ߈ࠂ੸ੋ ࠗ࠙ਸ ઁѢೞחؘ ૘઺
  7. ѐߊ ജ҃ ࢸ੿ 1. AWS Deep Learning AMI ࢶఖ 2.

    EC2 ੋझఢझ ࢤࢿ 3. Jupyter Notebook ҳز 4. SSH ఠօ݂ ࢸ੿ 5. Jupyter Notebook ੽ࣘ 6. ޙઁ ࢤӝݶ, 3ߣࠗఠ ߈ࠂ 7. ೙ਃೠ ݽ؛ ৘ઁ ௏٘ ࠂࢎ 8. ؘ੉ఠܳ ֍য ࠁ؛ ള۲ ૓೯ 9. ӝ ࢠ೒ী ള۲ػ ݽ؛ ഝਊ Amazon EC2 + Deep Learning AMI ੌ߈੸ਵ۽ ௿ۄ਋٘ীࢲ ٩۞׬ ജ҃ਸ ઑࢿೞח ߑߨ
  8. ੉ ࠗ࠙ਸ ੗زച೤פ׮ ৬ ೣԋ ѐߊ ജ҃ ࢸ੿ 1. AWS

    Deep Learning AMI ࢶఖ 2. EC2 ੋझఢझ ࢤࢿ 3. Jupyter Notebook ҳز 4. SSH ఠօ݂ ࢸ੿ 5. Jupyter Notebook ੽ࣘ 6. ޙઁ ࢤӝݶ, 3ߣࠗఠ ߈ࠂ 7. ೙ਃೠ ݽ؛ ৘ઁ ௏٘ ࠂࢎ 8. ؘ੉ఠܳ ֍য ࠁ؛ ള۲ ૓೯ 9. ӝ ࢠ೒ী ള۲ػ ݽ؛ ഝਊ Amazon EC2 + Deep Learning AMI
  9. Nvidia-Docker Containerizing GPU applications provides several benefits, among them: -

    Ease of deployment - Isolation of individual devices - Run across heterogeneous driver/toolkit environments - Requires only the NVIDIA driver to be installed on the host - Facilitate collaboration: reproducible builds, reproducible performance, reproducible results.
  10. ѐߊ ജ҃ ࢸ੿ 1. Dockerfile ੘ࢿ • ೙ਃೠ Dependencyо ࢸ஖غয

    ੓ח ӝࠄ بழ ੉޷૑ ҳࢿ - devel/prodীࢲ ҕా੸ਵ۽ ࢎਊ - Cloud/IDCীࢲ ҕా੸ਵ۽ ࢎਊ • OS Dependency ࢸ஖ • Deep Learning packages ࢸ஖ • Ӓ ৻ ೙ਃೠ ࢸ੿
  11. ѐߊ ജ҃ ࢸ੿ 1. Dockerfile ੘ࢿ • nvidia/cuda • CUDA

    and cuDNN images from gitlab.com/nvidia/cuda • Ubuntu, Centos ӝ߈ cuda ಁః૑ ૑ਗ (2018.12ӝળ ߡઊ ~10)
  12. ѐߊ ജ҃ ࢸ੿ 1. Dockerfile ੘ࢿ nvidia/cuda install DeepLearning packages

    install pip packages 1st layer 2nd layer 3rd layer • 3rd layerө૑ Dockerfile উী ੘ࢿೞח Ѫب оמ • ೞ૑݅ Ansible, Packer э਷ ో۽ ੄ઓࢿ ઱ੑਸ ೧ࢲ ࢎਊೞח Ѫ੉ ੉੼੉ ఁפ׮
  13. ѐߊ ജ҃ ࢸ੿ 2. AMI ࢤࢿ • Packer is a

    tool for building identical machine images for multiple platforms from a single source configuration. • Ansible: Simple, agentless IT automation that anyone can use • Ansible۽ ੉޷૑ ਤী ௏٘ܳ ࢸ஖ ೞҊ, Packer۽ ੉ܳ AWS AMI (Amazon Machine Image)ী স۽ ٘೤פ׮ • Packer, Ansible হ੉ب AMIח ࢤ ࢿೡ ࣻ ੓णפ׮
  14. ѐߊ ജ҃ ࢸ੿ 2. AMI ࢤࢿ • AWS AMI? •

    Amazon Machine Image • ೙ਃೠ ࣗ೐౟ਝযо ੉޷ ҳࢿ غয ੓ח మ೒݁ (৘: OS, App Server, App) • AMIীࢲ EC2 ੋझఢझܳ ߄۽ द੘ೡ ࣻ ੓णפ׮.
  15. ѐߊ ജ҃ ࢸ੿ 3. ࢎਊ • ਗೞח ੋझఢझ ࢶఖ •

    ੋझఢझо द੘غݶ, ઺૑ೠ റ scale-up/scale-down оמ
  16. ࢲ࠺झ ജ҃ ࢸ੿ • ౟ې೗ী ٮܲ ੸੺ೠ ࣻ੄ ੋझఢझܳ ઁҕ

    • AWS EC2ীࢲ GPU Utilization ࣻ૘ ߂ CloudWatch Metric ਵ۽ ੹࣠ • GPU Utilizationਸ ӝળਵ۽ ೠ Auto Scailing
  17. ࢲ࠺झ ജ҃ ࢸ੿ • Agent python code ( NVML API

    ࢎਊ ) • ઱ӝ੸ਵ۽ ࣻ૘ೠ ؘ੉ఠܳ CloudWatchী স۽٘
  18. ࢲ࠺झ ജ҃ ࢸ੿ Service Auto Scailing Group ASGী ؀ೠ GPU

    Utilization CloudWatch metric ࣻ૘ GPU util metricী ؀ೠ CloudWatch Alarm ࢤࢿ CloudWatch Alarmী ؀ೠ ASG Scailing Policy ࢤࢿ (৘: GPU Util > 60% then scale 1 out)
  19. GPU ࠺ਊ ୭੸ച • ࣗ઺ೠ ૑цਸ ૑ఃӝ ਤೠ ୭ࢶ GPU

    Instance ৡ٣ݔ٘ ਃӘ झ౽ ਃӘ p2.xlarge $1.465 $0.4395 p2.8xlarge $11.72 $11.72 p2.16xlarge $23.44 $23.44 p3.2xlarge $4.981 $1.4943 p3.8xlarge $19.924 $5.9772 p3.16xlarge $39.848 $39.848 시간당 온디맨드 대비 스팟 가격 2018.10.10 2pm
  20. GPU ࠺ਊ ୭੸ച • झ౽ ੋझఢझۆ? • ৡ٣ݔ٘ оѺࠁ׮ ੷۴ೠ

    ࠺ਊਵ۽ ࢎਊೡ ࣻ ੓ח ޷ࢎਊ EC2 ੋझఢझ • ୭؀ 90% ೡੋػ Әঘਵ۽ ੉ਊ • दр׼ оѺী ؀ೠ ੑ଴ ߑधਵ۽ (1) EC2 оਊ ਊ۝੉ ࠗ઒ೞѢ ա, (2) झ౽ оѺ੉ ୭Ҋ оѺਸ ୡҗೞݶ ઺ױ
  21. GPU ࠺ਊ ୭੸ച • ਷Ӕ൤ ੗઱ ߊࢤೞח ࢚ క ௏٘

    - capacity-not-available - system-error - instance-terminated- no-capacity
  22. GPU ࠺ਊ ୭੸ച • ਷Ӕ൤ ੗઱ ߊࢤೞח ࢚క ௏٘ -

    capacity-not-available - system-error - instance-terminated-no- capacity • झ౽ ਃ୒ࠗఠ पಁ оמ • (द੘ റ) ঱ઁٚ ઺ױ оמ • (झ౽ оѺ)=(ৡ٣ݔ٘ оѺ) ੉؊ۄب ਤ೷ࢿ ઓ੤
  23. GPU ࠺ਊ ୭੸ച झ౽੉ ࢿҕ੸ਵ۽ द੘غ঻ਸ ҃਋ (ࢿҕೠ झ౽ ਃ୒)

    ݅ఀ੄ on-demand ੋझఢझܳ ઙܐ झ౽੉ ਃ୒ী पಁೞѢա ੋझఢझ੄ ઺ױ ঌۈਸ ߉ਸ ҃਋, (ਗೞח झ౽ ѐࣻ - അ੤ ੿࢚ प೯઺ੋ झ౽ ѐࣻ) ݅ఀ੄ on-demand ੋझఢझܳ द੘
  24. GPU ࠺ਊ ୭੸ച • ࢎਊೞח ઱ਃ ੋझఢझ ఋੑ ೡੋਯী ٮۄ

    ௼Ҋ ੘਷ ࠺ਊ ੺х੉ ੉ܖয૗ • GPU ࢎਊ दр੉ ݆ਸ ࣻ۾ ௾ ࠺ਊ ੺х • 1/5 ~ 1/10
  25. ೟ण ঌܿ • ঱ઁ ՘զ૑ ݽܰח Training.. • ӝ׮ܻӝীח ߎ੉

    Ө঻Ҋ Ӓր فӝীח җӘ੉ ف۰਍ ࢚ടੌ ٸ • рױೠ Lambda ௒ ೞա۽ ೧ߑؼ ࣻ ੓णפ׮
  26. ٩۞׬ • ੽Ӕࢿ੉ ֫ই૓ ٩۞׬ ೟ण • ੋఠ֔ ъ੄ (Udacity,

    Coursera, Udemy, etc..) • Google • ؀೟, ؀೟ਗ ъ੄ • ҙ۲ ࢲ੸
  27. ٩۞׬ • “ݽف੄ ٩۞׬” • दп੢গੋ উղೞח ‘AI’…Ҋ2о ੉ౣ݅ী ٜ݅঻׮

    (https://news.joins.com/ article/22957572) • ‘ই-‘ೞݶ 4ୡ݅ী ౵అट߽ ૓ױ೧ਃ (https://www.bloter.net/archives/320524) • धޛ߽ ૓ױೞח ੋҕ૑מ জ ݅ٚ ৈҊࢤ ѐߊ੗ (https://smartaedi.tistory.com/ 354?fbclid=IwAR131LoJvmOeVmssFKiBJdUuz49INTAUsosWi8cM71ntLzfd0FPpjXNFOZc)
  28. ٩۞׬ • Ӓۢীب ࠛҳೞҊ.. • ࣻ೟, NN ӝ߈૑ध੄ ೂࠗೠ ੉೧

    ೙ਃ • ޖѢ਍ ়ؕ੉৬ ֫਷ ੋղब, ૒ҙ੉ ೙ਃ • খਵ۽੄ ߊ੹੉ ӝ؀غח ৔৉