Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
Build Image Classification service with Amazon ECS and GPU instances
Yuichiro Someya
November 22, 2016
Programming
4
2.1k
Build Image Classification service with Amazon ECS and GPU instances
Yuichiro Someya
November 22, 2016
Tweet
Share
More Decks by Yuichiro Someya
See All by Yuichiro Someya
ayemos
0
4.7k
ayemos
3
1.6k
ayemos
1
650
ayemos
0
2.5k
ayemos
15
4.5k
ayemos
1
270
ayemos
15
10k
ayemos
1
2.4k
ayemos
0
310
Other Decks in Programming
See All in Programming
numeroanddev
1
240
takapy
0
190
pirosikick
4
960
hanasuke
1
660
makky0620
0
100
takapdayon
0
160
shiz
1
110
rarous
0
170
tkow
1
120
j5ik2o
1
330
lovee
9
2.9k
christianweyer
PRO
0
300
Featured
See All Featured
garrettdimon
288
110k
sachag
446
36k
brad_frost
157
6.4k
geeforr
332
29k
destraynor
222
47k
cassininazir
347
20k
roundedbygravity
242
21k
bryan
100
11k
jonrohan
1021
380k
searls
204
36k
productmarketing
6
720
dougneiner
118
7.9k
Transcript
Build Image Classification service with AWS ECS and GPU instances
Yuichiro Someya @ Cookpad
• છ୩ ༔Ұ [Yuichiro Someya] • ౦େେֶӃ ܭࢉֶઐ߈ म࢜ •
'16 ৽ଔ @ ΫοΫύου • github.com/ayemos • twitter.com/kumasan_com echo `whoami`
• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon
ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon
ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
ΫοΫύου • Ϩγϐɿ 250ສҎ্ • ݄࣍ؒར༻ऀɿ 6,000ສਓҎ্
• εϚϗͷࣸਅ͔Βྉཧ͚ͩΛࣗಈతʹऩू • Ұ෦ͷϢʔβʔ͚ʹݶఆతʹެ։த ྉཧ͖Ζ͘
• CaffeNetΛ ྉཧʗඇྉཧ ఆ͚ʹFine Tuningͨ͠Ϟσϧ • Caffe[1]Ͱֶश͞ΕͨϞσϧΛChainerͷCaffe emulatorͰಡΉ ref: http://docs.chainer.org/en/stable/reference/caffe.html
• ྨΧςΰϦΛ ྉཧʗඇྉཧ ʹมߋ͠ɺΫοΫύου্ͷ ྉཧࣸਅΛֶͬͯश <>IUUQDB⒎FCFSLFMFZWJTJPOPSH CookpadNet
• CookpadNetͲ͜ͰఆΛߦ͍ɺͦͷ݁ՌͲ͜ʹͲ͏͑Δ ͷ͔ʁ • ఆϞσϧΛΫϥΠΞϯτʹஔ͍ͯఆ • ϞσϧαΠζ͕େ͖͍(100MB~)ͷͰɺݱ࣮తͰͳ͍ • (αΠζͷখ͍͞ϞσϧΛݚڀத) •
ఆΛߦ͏ίϯϙʔωϯτΛ֎෦ʹஔ͘ • HTTP Serverʁ σʔλϑϩʔʗϫʔΫϑϩʔ
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO DIBJOFS
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ DIBJOFS
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
DIBJOFS
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
DIBJOFS SFTVMU\JT@GPPECPPM^
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ ఆ SFTVMU\JT@GPPECPPM^
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ 1045DMBTTJGZ\QIPUPCJOBSZ^ $MBTTJpDBUJPO4FSWFS QZUIPO 1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
DIBJOFS ը૾ͷΞοϓϩʔυ ը૾ॲཧ ఆ SFTVMU\JT@GPPECPPM^ >>> 300~500 ms <<<
• ը૾ॲཧͱϞσϧʹinferenceʹֻ͕͔ͦͦ࣌ؒ͜͜Δ (300~500ms) • APIαʔόʔ͔Βಉظతʹୟ͚ͳ͍ (Unicorn ͷ worker͕ਚ͖ͯ͠·͏) • Amazon
S3, SQSΛར༻ͨ͠ඇಉظͳఆॲཧϫʔΫϑϩʔ σʔλϑϩʔʗϫʔΫϑϩʔ
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF
"NB[PO424 2VFVF %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF
<6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF
<6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF \LFZ@PO@TTUSJOH^ %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF
<6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF \LFZ@PO@TTUSJOH^ EFRVFVF \LFZ@PO@TTUSJOH^ %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF
<6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF \LFZ@PO@TTUSJOH^ EFRVFVF \LFZ@PO@TTUSJOH^ <%PXOMPBE*NBHF> %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF
<6QMPBEQIPUPUPDMBTTJGZ> "NB[PO424 2VFVF FORVFVF \LFZ@PO@TTUSJOH^ EFRVFVF \LFZ@PO@TTUSJOH^ 1045SFTVMU \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS "NB[PO4 4UPSBHF
<6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF \LFZ@PO@TTUSJOH^ EFRVFVF \LFZ@PO@TTUSJOH^ 1045SFTVMU \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4
4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF \LFZ@PO@TTUSJOH^ EFRVFVF \LFZ@PO@TTUSJOH^ 1045SFTVMU \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE <%PXOMPBE*NBHF> %#
$MJFOU "OESPJE J04 "1*4FSWFS SVCZ $MBTTJpDBUJPO8PSLFS QZUIPO DIBJOFS SFTVMU\JT@GPPECPPM^ "NB[PO4
4UPSBHF <6QMPBEQIPUPUPDMBTTJGZ> 1045JT@QIPUP\LFZ@PO@TTUSJOH^ "NB[PO424 2VFVF FORVFVF \LFZ@PO@TTUSJOH^ EFRVFVF \LFZ@PO@TTUSJOH^ 1045SFTVMU \LFZ@PO@TTUSJOH SFTVMU\JT@GPPECPPM^^ <%PXOMPBE*NBHF> ඇಉظʹఆॲཧ
• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon
ECS Λར༻ͯ͠ӡ༻͍ͯ͠Δ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda
• ECS: Amazon EC2 Container Service • Docker ContainerΛEC2Ͱߏ͞ΕͨΫϥελʹஔ(Task) •
github.com/eagletmt/hako • ECSͷߏΛyamlϑΝΠϧͰཧ ECSͱGPUͱDockerͱ…
"8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2
desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS
"8471$ # cookpadnet-worker.yml scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2
desired_count: 1 app: image: cookpadnet-worker-gpu cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ... %PDLFS3FHJTUSZ ։ൃऀ EPDLFSQVTI IBLPEFQMPZ &$4 EPDLFSQVMM 5BTL DPPLQBEOFUXPSLFS DockerԽ͞ΕͨWorkerΛ hakoͰσϓϩΠ & ߏཧ
w XPSLFSͰ(16Λ༻ w ಉՁ֨ଳͷ$16Πϯελϯεͱൺͯ ഒͷੑೳࠩ w %PDLFS (16 GPU
• Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker
Container͔ΒGPU devicesΛૢ࡞͢Δҝ ContainerʹదͳLinux Capabilityͷઃఆ͕ඞཁ ԾԽ v.s. Χʔωϧ
ubuntu EPDLFSDPOUBJOFS ཧ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT
ubuntu EPDLFSDPOUBJOFS ཧ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT
ubuntu EPDLFSDPOUBJOFS ཧ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT ESJWFSךQBUIכ04ח״殯ז
NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount ͯ͘͠ΕΔ
NVIDIA Docker • Docker CLIͷബ͍ϥούʔ • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount ͯ͘͠ΕΔ
"NB[PO&$4דכ劢؟ه٦ز
ubuntu EPDLFSDPOUBJOFS ཧ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT
ubuntu EPDLFSDPOUBJOFS ཧ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT (ಉҰόʔδϣϯ)
ubuntu EPDLFSDPOUBJOFS ཧ OWJEJB(16 VTFSMFWFMESJWFS LFSOFMNPEVMFT 㣐⡤鍑寸 (ಉҰόʔδϣϯ)
• Driver͕ඞཁ • nvidia-driverͷkernel module • ಉ͡όʔδϣϯͷuser-level drivers • Docker
Container͔ΒGPU devicesΛૢ࡞͢Δҝ ContainerʹదͳLinux Capabilityͷઃఆ͕ඞཁ ԾԽ v.s. Χʔωϧ
• GPUσόΠεಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ ԾԽ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa
HQVXPSLFS
• GPUσόΠεಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ ԾԽ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa
HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯະαϙʔτ
• GPUσόΠεಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ ԾԽ v.s. Χʔωϧ EPDLFSSVOa EFWJDFEFWOWJEJBEFWOWJEJBa EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa
HQVXPSLFS &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯະαϙʔτ
• GPUσόΠεಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ ԾԽ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS
ԾԽ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • capability શ։์ • rootͰ࣮ߦ͞Ε͍ͯΔdockerd্ͷcontainerͷதͰrootΛ औ͍ͬͯΔͷͰ৭ʑग़དྷΔ
EPDLFSSVOQSJWJMFHFEBMQJOFMBUFTUEBUFT • GPUσόΠεಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
• GPUσόΠεಛघͳϑΝΠϧͱͯ͠ଘࡏ • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ ԾԽ v.s. Χʔωϧ EPDLFSSVOQSJWJMFHFEHQVXPSLFS • rootҎ֎ͷϢʔβʔͰ࣮ߦ͢Δ͜ͱʹ͢Δ
• DockerFileͰ `USER runner`
• ྉཧࣸਅͷࣗಈऩूαʔϏεΛ • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ • Amazon
ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ <>IUUQTHJUIVCDPN#7-$DB⒎F Agenda