Slide 1

Slide 1 text

Build Image Classification service with AWS ECS and GPU instances Yuichiro Someya @ Cookpad

Slide 2

Slide 2 text

• છ୩ ༔Ұ࿠ [Yuichiro Someya] • ౦޻େେֶӃ ܭࢉ޻ֶઐ߈ म࢜ • '16 ৽ଔ @ ΫοΫύου • github.com/ayemos • twitter.com/kumasan_com echo `whoami`

Slide 3

Slide 3 text

• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda

Slide 4

Slide 4 text

• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda

Slide 5

Slide 5 text

ΫοΫύου • Ϩγϐ਺ɿ 250ສ඼Ҏ্ • ݄࣍ؒར༻ऀ਺ɿ 6,000ສਓҎ্

Slide 6

Slide 6 text

• εϚϗ಺ͷࣸਅ͔Βྉཧ͚ͩΛࣗಈతʹऩू • Ұ෦ͷϢʔβʔ޲͚ʹݶఆతʹެ։த ྉཧ͖Ζ͘

Slide 7

Slide 7 text

• CaffeNetΛ ྉཧʗඇྉཧ ൑ఆ޲͚ʹFine Tuningͨ͠Ϟσϧ • Caffe[1]Ͱֶश͞ΕͨϞσϧΛChainerͷCaffe emulatorͰಡΉ
 ref: http://docs.chainer.org/en/stable/reference/caffe.html • ෼ྨΧςΰϦΛ ྉཧʗඇྉཧ ʹมߋ͠ɺΫοΫύου্ͷ
 ྉཧࣸਅΛ࢖ֶͬͯश <>IUUQDB⒎FCFSLFMFZWJTJPOPSH CookpadNet

Slide 8

Slide 8 text

• CookpadNet͸Ͳ͜Ͱ൑ఆΛߦ͍ɺͦͷ݁Ռ͸Ͳ͜ʹͲ͏఻͑Δ ͷ͔ʁ • ൑ఆϞσϧΛΫϥΠΞϯτʹஔ͍ͯ൑ఆ • ϞσϧαΠζ͕େ͖͍(100MB~)ͷͰɺݱ࣮తͰͳ͍ • (αΠζͷখ͍͞ϞσϧΛݚڀத) • ൑ఆΛߦ͏ίϯϙʔωϯτΛ֎෦ʹஔ͘ • HTTP Serverʁ σʔλϑϩʔʗϫʔΫϑϩʔ

Slide 9

Slide 9 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS

Slide 10

Slide 10 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS

Slide 11

Slide 11 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ DIBJOFS

Slide 12

Slide 12 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^ DIBJOFS

Slide 13

Slide 13 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^ DIBJOFS SFTVMU\JT@GPPECPPM^

Slide 14

Slide 14 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^ DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ൑ఆ SFTVMU\JT@GPPECPPM^

Slide 15

Slide 15 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^ DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ൑ఆ SFTVMU\JT@GPPECPPM^ >>> 300~500 ms <<<

Slide 16

Slide 16 text

• ը૾ॲཧͱϞσϧʹinferenceʹֻ͕͔ͦͦ࣌ؒ͜͜Δ
 (300~500ms) • APIαʔόʔ͔Βಉظతʹୟ͚ͳ͍
 (Unicorn ͷ worker͕ਚ͖ͯ͠·͏) • Amazon S3, SQSΛར༻ͨ͠ඇಉظͳ൑ఆॲཧϫʔΫϑϩʔ σʔλϑϩʔʗϫʔΫϑϩʔ

Slide 17

Slide 17 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF "NB[PO424 2VFVF %#

Slide 18

Slide 18 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF %#

Slide 19

Slide 19 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ %#

Slide 20

Slide 20 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ %#

Slide 21

Slide 21 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ <%PXOMPBE*NBHF> %#

Slide 22

Slide 22 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#

Slide 23

Slide 23 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#

Slide 24

Slide 24 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#

Slide 25

Slide 25 text

$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4 4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF
 \LFZ@PO@TTUSJOH^ EFRVFVF
 \LFZ@PO@TTUSJOH^ 1045SFTVMU
 \LFZ@PO@TTUSJOH SFTVMU\JT@GPPECPPM^^ <%PXOMPBE*NBHF> ඇಉظʹ൑ఆॲཧ

Slide 26

Slide 26 text

• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon ECS Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda

Slide 27

Slide 27 text

• ECS: Amazon EC2 Container Service • Docker ContainerΛEC2Ͱߏ੒͞ΕͨΫϥελʹ഑ஔ(Task) • github.com/eagletmt/hako • ECSͷߏ੒ΛyamlϑΝΠϧͰ؅ཧ ECSͱGPUͱDockerͱ…

Slide 28

Slide 28 text

"8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2 desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS

Slide 29

Slide 29 text

"8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2 desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS DockerԽ͞ΕͨWorkerΛ
 hakoͰσϓϩΠ & ߏ੒؅ཧ

Slide 30

Slide 30 text

w XPSLFSͰ͸(16Λ࢖༻ w ಉՁ֨ଳͷ$16Πϯελϯεͱൺ΂ͯ ഒͷੑೳࠩ w %PDLFS(16 GPU

Slide 31

Slide 31 text

• Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker Container͔ΒGPU devicesΛૢ࡞͢Δҝ
 Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ

Slide 32

Slide 32 text

ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

Slide 33

Slide 33 text

ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

Slide 34

Slide 34 text

ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT ESJWFSךQBUIכ04ח״׶殯ז׷

Slide 35

Slide 35 text

NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount
 ͯ͘͠ΕΔ

Slide 36

Slide 36 text

NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount
 ͯ͘͠ΕΔ "NB[PO&$4דכ劢؟ه٦ز

Slide 37

Slide 37 text

ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT

Slide 38

Slide 38 text

ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT (ಉҰόʔδϣϯ)

Slide 39

Slide 39 text

ubuntu EPDLFSDPOUBJOFS ཧ૝ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT 㣐⡤鍑寸 (ಉҰόʔδϣϯ)

Slide 40

Slide 40 text

• Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker Container͔ΒGPU devicesΛૢ࡞͢Δҝ
 Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ

Slide 41

Slide 41 text

• GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa HQVXPSLFS

Slide 42

Slide 42 text

• GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ

Slide 43

Slide 43 text

• GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ

Slide 44

Slide 44 text

• GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS

Slide 45

Slide 45 text

Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • capability શ։์ • rootͰ࣮ߦ͞Ε͍ͯΔdockerd্ͷcontainerͷதͰrootΛ औ͍ͬͯΔͷͰ৭ʑग़དྷΔ EPDLFSSVOQSJWJMFHFEBMQJOFMBUFTUEBUFT • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ

Slide 46

Slide 46 text

• GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ Ծ૝Խ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • rootҎ֎ͷϢʔβʔͰ࣮ߦ͢Δ͜ͱʹ͢Δ • DockerFile಺Ͱ `USER runner`

Slide 47

Slide 47 text

• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda