$30 off During Our Annual Pro Sale. View Details »

Build Image Classification service with Amazon ECS and GPU instances

Build Image Classification service with Amazon ECS and GPU instances

Yuichiro Someya

November 22, 2016
Tweet

More Decks by Yuichiro Someya

Other Decks in Programming

Transcript

  1. Build Image Classification service with AWS ECS and GPU instances

    Yuichiro Someya @ Cookpad
  2. • છ୩ ༔Ұ࿠ [Yuichiro Someya] • ౦޻େେֶӃ ܭࢉ޻ֶઐ߈ म࢜ •

    '16 ৽ଔ @ ΫοΫύου • github.com/ayemos • twitter.com/kumasan_com echo `whoami`
  3. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
  4. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
  5. ΫοΫύου • Ϩγϐ਺ɿ 250ສ඼Ҏ্ • ݄࣍ؒར༻ऀ਺ɿ 6,000ສਓҎ্

  6. • εϚϗ಺ͷࣸਅ͔Βྉཧ͚ͩΛࣗಈతʹऩू • Ұ෦ͷϢʔβʔ޲͚ʹݶఆతʹެ։த ྉཧ͖Ζ͘

  7. • CaffeNetΛ ྉཧʗඇྉཧ ൑ఆ޲͚ʹFine Tuningͨ͠Ϟσϧ • Caffe[1]Ͱֶश͞ΕͨϞσϧΛChainerͷCaffe emulatorͰಡΉ
 ref: http://docs.chainer.org/en/stable/reference/caffe.html

    • ෼ྨΧςΰϦΛ ྉཧʗඇྉཧ ʹมߋ͠ɺΫοΫύου্ͷ
 ྉཧࣸਅΛ࢖ֶͬͯश <>IUUQDB⒎FCFSLFMFZWJTJPOPSH CookpadNet
  8. • CookpadNet͸Ͳ͜Ͱ൑ఆΛߦ͍ɺͦͷ݁Ռ͸Ͳ͜ʹͲ͏఻͑Δ ͷ͔ʁ • ൑ఆϞσϧΛΫϥΠΞϯτʹஔ͍ͯ൑ఆ • ϞσϧαΠζ͕େ͖͍(100MB~)ͷͰɺݱ࣮తͰͳ͍ • (αΠζͷখ͍͞ϞσϧΛݚڀத) •

    ൑ఆΛߦ͏ίϯϙʔωϯτΛ֎෦ʹஔ͘ • HTTP Serverʁ σʔλϑϩʔʗϫʔΫϑϩʔ
  9. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS

  10. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS

  11. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ DIBJOFS

  12. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS
  13. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS SFTVMU\JT@GPPECPPM^
  14. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ ൑ఆ SFTVMU\JT@GPPECPPM^
  15. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^

    DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ ൑ఆ SFTVMU\JT@GPPECPPM^ >>> 300~500 ms <<<
  16. • ը૾ॲཧͱϞσϧʹinferenceʹֻ͕͔ͦͦ࣌ؒ͜͜Δ
 (300~500ms) • APIαʔόʔ͔Βಉظతʹୟ͚ͳ͍
 (Unicorn ͷ worker͕ਚ͖ͯ͠·͏) • Amazon

    S3, SQSΛར༻ͨ͠ඇಉظͳ൑ఆॲཧϫʔΫϑϩʔ σʔλϑϩʔʗϫʔΫϑϩʔ
  17. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    "NB[PO424 2VFVF %#
  18. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF %#
  19. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ %#
  20. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ %#
  21. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ <%PXOMPBE*NBHF> %#
  22. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
  23. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
  24. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4

    4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
  25. $MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4

    4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPECPPM^^ <%PXOMPBE*NBHF> ඇಉظʹ൑ఆॲཧ
  26. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
  27. • ECS: Amazon EC2 Container Service • Docker ContainerΛEC2Ͱߏ੒͞ΕͨΫϥελʹ഑ஔ(Task) •

    github.com/eagletmt/hako • ECSͷߏ੒ΛyamlϑΝΠϧͰ؅ཧ ECSͱGPUͱDockerͱ…
  28. "8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2

    desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS
  29. "8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2

    desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS DockerԽ͞ΕͨWorkerΛ
 hakoͰσϓϩΠ & ߏ੒؅ཧ
  30. w XPSLFSͰ͸(16Λ࢖༻ w ಉՁ֨ଳͷ$16Πϯελϯεͱൺ΂ͯ ഒͷੑೳࠩ w %PDLFS (16 GPU

  31. • Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker

    Container͔ΒGPU devicesΛૢ࡞͢Δҝ
 Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ
  32. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

  33. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

  34. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT ESJWFSךQBUIכ04ח״׶殯ז׷

  35. NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount
 ͯ͘͠ΕΔ

  36. NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount
 ͯ͘͠ΕΔ

    "NB[PO&$4דכ劢؟ه٦ز
  37. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

  38. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT (ಉҰόʔδϣϯ)

  39. ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT 㣐⡤鍑寸 (ಉҰόʔδϣϯ)

  40. • Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker

    Container͔ΒGPU devicesΛૢ࡞͢Δҝ
 Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ
  41. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa

    HQVXPSLFS
  42. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa

    HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ
  43. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa

    HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ
  44. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS

  45. Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • capability શ։์ • rootͰ࣮ߦ͞Ε͍ͯΔdockerd্ͷcontainerͷதͰrootΛ औ͍ͬͯΔͷͰ৭ʑग़དྷΔ

    EPDLFSSVOQSJWJMFHFEBMQJOFMBUFTUEBUFT • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
  46. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • rootҎ֎ͷϢʔβʔͰ࣮ߦ͢Δ͜ͱʹ͢Δ

    • DockerFile಺Ͱ `USER runner`
  47. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon

    ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda