Build Image Classification service with Amazon ECS and GPU instances

Build Image Classification service with Amazon ECS and GPU instances

7dc8611c26c3ca62c551109c65d04270?s=128

Yuichiro Someya

November 22, 2016
Tweet

Transcript

  1. Build Image Classification service with AWS ECS and GPU instances

    Yuichiro Someya @ Cookpad
  2. • છ୩ ༔Ұ࿠ [Yuichiro Someya] • ౦޻େେֶӃ ܭࢉ޻ֶઐ߈ म࢜ •

    '16 ৽ଔ @ ΫοΫύου • github.com/ayemos • twitter.com/kumasan_com echo `whoami`
  3. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
  4. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
  5. ΫοΫύου • Ϩγϐ਺ɿ 250ສ඼Ҏ্ • ݄࣍ؒར༻ऀ਺ɿ 6,000ສਓҎ্

  6. • εϚϗ಺ͷࣸਅ͔Βྉཧ͚ͩΛࣗಈతʹऩू • Ұ෦ͷϢʔβʔ޲͚ʹݶఆతʹެ։த ྉཧ͖Ζ͘

  7. • CaffeNetΛ ྉཧʗඇྉཧ ൑ఆ޲͚ʹFine Tuningͨ͠Ϟσϧ • Caffe[1]Ͱֶश͞ΕͨϞσϧΛChainerͷCaffe emulatorͰಡΉ
 ref: http://docs.chainer.org/en/stable/reference/caffe.html

    • ෼ྨΧςΰϦΛ ྉཧʗඇྉཧ ʹมߋ͠ɺΫοΫύου্ͷ
 ྉཧࣸਅΛ࢖ֶͬͯश <>IUUQDB⒎FCFSLFMFZWJTJPOPSH CookpadNet
  8. • CookpadNet͸Ͳ͜Ͱ൑ఆΛߦ͍ɺͦͷ݁Ռ͸Ͳ͜ʹͲ͏఻͑Δ ͷ͔ʁ • ൑ఆϞσϧΛΫϥΠΞϯτʹஔ͍ͯ൑ఆ • ϞσϧαΠζ͕େ͖͍(100MB~)ͷͰɺݱ࣮తͰͳ͍ • (αΠζͷখ͍͞ϞσϧΛݚڀத) •

    ൑ఆΛߦ͏ίϯϙʔωϯτΛ֎෦ʹஔ͘ • HTTP Serverʁ σʔλϑϩʔʗϫʔΫϑϩʔ
  9. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS

  10. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS

  11. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ DIBJOFS

  12. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS
  13. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS SFTVMU\JT@GPPECPPM^
  14. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ ൑ఆ SFTVMU\JT@GPPECPPM^
  15. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ ൑ఆ SFTVMU\JT@GPPECPPM^ >>> 300~500 ms <<<
  16. • ը૾ॲཧͱϞσϧʹinferenceʹֻ͕͔ͦͦ࣌ؒ͜͜Δ
 (300~500ms) • APIαʔόʔ͔Βಉظతʹୟ͚ͳ͍
 (Unicorn ͷ worker͕ਚ͖ͯ͠·͏) • Amazon

    S3, SQSΛར༻ͨ͠ඇಉظͳ൑ఆॲཧϫʔΫϑϩʔ σʔλϑϩʔʗϫʔΫϑϩʔ
  17. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    "NB[PO424 2VFVF %#
  18. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF %#
  19. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ %#
  20. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ %#
  21. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ <%PXOMPBE*NBHF> %#
  22. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
  23. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
  24. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4

    4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
  25. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4

    4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPECPPM^^ <%PXOMPBE*NBHF> ඇಉظʹ൑ఆॲཧ
  26. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
  27. • ECS: Amazon EC2 Container Service • Docker ContainerΛEC2Ͱߏ੒͞ΕͨΫϥελʹ഑ஔ(Task) •

    github.com/eagletmt/hako • ECSͷߏ੒ΛyamlϑΝΠϧͰ؅ཧ ECSͱGPUͱDockerͱ…
  28. "8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2

    desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS
  29. "8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2

    desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS DockerԽ͞ΕͨWorkerΛ
 hakoͰσϓϩΠ & ߏ੒؅ཧ
  30. w XPSLFSͰ͸(16Λ࢖༻ w ಉՁ֨ଳͷ$16Πϯελϯεͱൺ΂ͯ ഒͷੑೳࠩ w %PDLFS (16 GPU

  31. • Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker

    Container͔ΒGPU devicesΛૢ࡞͢Δҝ
 Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ
  32. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

  33. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

  34. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT ESJWFSךQBUIכ04ח״׶殯ז׷

  35. NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount
 ͯ͘͠ΕΔ

  36. NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount
 ͯ͘͠ΕΔ

    "NB[PO&$4דכ劢؟ه٦ز
  37. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

  38. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT (ಉҰόʔδϣϯ)

  39. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT 㣐⡤鍑寸 (ಉҰόʔδϣϯ)

  40. • Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker

    Container͔ΒGPU devicesΛૢ࡞͢Δҝ
 Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ
  41. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa

    HQVXPSLFS
  42. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa

    HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ
  43. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa

    HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ
  44. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS

  45. Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • capability શ։์ • rootͰ࣮ߦ͞Ε͍ͯΔdockerd্ͷcontainerͷதͰrootΛ औ͍ͬͯΔͷͰ৭ʑग़དྷΔ

    EPDLFSSVOQSJWJMFHFEBMQJOFMBUFTUEBUFT • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
  46. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • rootҎ֎ͷϢʔβʔͰ࣮ߦ͢Δ͜ͱʹ͢Δ

    • DockerFile಺Ͱ `USER runner`
  47. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda