$30 off During Our Annual Pro Sale. View Details »

Build Image Classification service with Amazon ECS and GPU instances

Build Image Classification service with Amazon ECS and GPU instances

Yuichiro Someya

November 22, 2016
Tweet

More Decks by Yuichiro Someya

Other Decks in Programming

Transcript

  1. Build Image Classification service with
    AWS ECS and GPU instances
    Yuichiro Someya @ Cookpad

    View Slide

  2. • છ୩ ༔Ұ࿠ [Yuichiro Someya]
    • ౦޻େେֶӃ ܭࢉ޻ֶઐ߈ म࢜
    • '16 ৽ଔ @ ΫοΫύου
    • github.com/ayemos
    • twitter.com/kumasan_com
    echo `whoami`

    View Slide

  3. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ
    • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ
    • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ
    • Amazon ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩
    <>IUUQTHJUIVCDPN#7-$DB⒎F
    Agenda

    View Slide

  4. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ
    • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ
    • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ
    • Amazon ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩
    <>IUUQTHJUIVCDPN#7-$DB⒎F
    Agenda

    View Slide

  5. ΫοΫύου
    • Ϩγϐ਺ɿ 250ສ඼Ҏ্
    • ݄࣍ؒར༻ऀ਺ɿ 6,000ສਓҎ্

    View Slide

  6. • εϚϗ಺ͷࣸਅ͔Βྉཧ͚ͩΛࣗಈతʹऩू
    • Ұ෦ͷϢʔβʔ޲͚ʹݶఆతʹެ։த
    ྉཧ͖Ζ͘

    View Slide

  7. • CaffeNetΛ ྉཧʗඇྉཧ ൑ఆ޲͚ʹFine Tuningͨ͠Ϟσϧ
    • Caffe[1]Ͱֶश͞ΕͨϞσϧΛChainerͷCaffe emulatorͰಡΉ

    ref: http://docs.chainer.org/en/stable/reference/caffe.html
    • ෼ྨΧςΰϦΛ ྉཧʗඇྉཧ ʹมߋ͠ɺΫοΫύου্ͷ

    ྉཧࣸਅΛ࢖ֶͬͯश
    <>IUUQDB⒎FCFSLFMFZWJTJPOPSH
    CookpadNet

    View Slide

  8. • CookpadNet͸Ͳ͜Ͱ൑ఆΛߦ͍ɺͦͷ݁Ռ͸Ͳ͜ʹͲ͏఻͑Δ
    ͷ͔ʁ
    • ൑ఆϞσϧΛΫϥΠΞϯτʹஔ͍ͯ൑ఆ
    • ϞσϧαΠζ͕େ͖͍(100MB~)ͷͰɺݱ࣮తͰͳ͍
    • (αΠζͷখ͍͞ϞσϧΛݚڀத)
    • ൑ఆΛߦ͏ίϯϙʔωϯτΛ֎෦ʹஔ͘
    • HTTP Serverʁ
    σʔλϑϩʔʗϫʔΫϑϩʔ

    View Slide

  9. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO4FSWFS QZUIPO

    DIBJOFS

    View Slide

  10. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    1045DMBTTJGZ\QIPUPCJOBSZ^
    $MBTTJpDBUJPO4FSWFS QZUIPO

    DIBJOFS

    View Slide

  11. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    1045DMBTTJGZ\QIPUPCJOBSZ^
    $MBTTJpDBUJPO4FSWFS QZUIPO

    1045DMBTTJGZ\QIPUPCJOBSZ^
    DIBJOFS

    View Slide

  12. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    1045DMBTTJGZ\QIPUPCJOBSZ^
    $MBTTJpDBUJPO4FSWFS QZUIPO

    1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
    DIBJOFS

    View Slide

  13. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    1045DMBTTJGZ\QIPUPCJOBSZ^
    $MBTTJpDBUJPO4FSWFS QZUIPO

    1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
    DIBJOFS
    SFTVMU\JT@GPPECPPM^

    View Slide

  14. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    1045DMBTTJGZ\QIPUPCJOBSZ^
    $MBTTJpDBUJPO4FSWFS QZUIPO

    1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
    DIBJOFS
    ը૾ͷΞοϓϩʔυ
    ը૾ॲཧ൑ఆ
    SFTVMU\JT@GPPECPPM^

    View Slide

  15. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    1045DMBTTJGZ\QIPUPCJOBSZ^
    $MBTTJpDBUJPO4FSWFS QZUIPO

    1045DMBTTJGZ\QIPUPCJOBSZ^ SFTVMU\JT@GPPECPPM^
    DIBJOFS
    ը૾ͷΞοϓϩʔυ
    ը૾ॲཧ൑ఆ
    SFTVMU\JT@GPPECPPM^
    >>> 300~500 ms <<<

    View Slide

  16. • ը૾ॲཧͱϞσϧʹinferenceʹֻ͕͔ͦͦ࣌ؒ͜͜Δ

    (300~500ms)
    • APIαʔόʔ͔Βಉظతʹୟ͚ͳ͍

    (Unicorn ͷ worker͕ਚ͖ͯ͠·͏)
    • Amazon S3, SQSΛར༻ͨ͠ඇಉظͳ൑ఆॲཧϫʔΫϑϩʔ
    σʔλϑϩʔʗϫʔΫϑϩʔ

    View Slide

  17. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    "NB[PO4
    4UPSBHF

    "NB[PO424
    2VFVF

    %#

    View Slide

  18. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    "NB[PO424
    2VFVF

    %#

    View Slide

  19. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    "NB[PO424
    2VFVF

    FORVFVF

    \LFZ@PO@TTUSJOH^
    %#

    View Slide

  20. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    "NB[PO424
    2VFVF

    FORVFVF

    \LFZ@PO@TTUSJOH^
    EFRVFVF

    \LFZ@PO@TTUSJOH^
    %#

    View Slide

  21. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    "NB[PO424
    2VFVF

    FORVFVF

    \LFZ@PO@TTUSJOH^
    EFRVFVF

    \LFZ@PO@TTUSJOH^
    <%PXOMPBE*NBHF>
    %#

    View Slide

  22. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    "NB[PO424
    2VFVF

    FORVFVF

    \LFZ@PO@TTUSJOH^
    EFRVFVF

    \LFZ@PO@TTUSJOH^
    1045SFTVMU

    \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE
    <%PXOMPBE*NBHF>
    %#

    View Slide

  23. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    1045JT@QIPUP\LFZ@PO@TTUSJOH^
    "NB[PO424
    2VFVF

    FORVFVF

    \LFZ@PO@TTUSJOH^
    EFRVFVF

    \LFZ@PO@TTUSJOH^
    1045SFTVMU

    \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE
    <%PXOMPBE*NBHF>
    %#

    View Slide

  24. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    SFTVMU\JT@GPPECPPM^
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    1045JT@QIPUP\LFZ@PO@TTUSJOH^
    "NB[PO424
    2VFVF

    FORVFVF

    \LFZ@PO@TTUSJOH^
    EFRVFVF

    \LFZ@PO@TTUSJOH^
    1045SFTVMU

    \LFZ@PO@TTUSJOH SFTVMU\JT@GPPE
    <%PXOMPBE*NBHF>
    %#

    View Slide

  25. $MJFOU "OESPJE J04

    "1*4FSWFS SVCZ

    $MBTTJpDBUJPO8PSLFS QZUIPO

    DIBJOFS
    SFTVMU\JT@GPPECPPM^
    "NB[PO4
    4UPSBHF

    <6QMPBEQIPUPUPDMBTTJGZ>
    1045JT@QIPUP\LFZ@PO@TTUSJOH^
    "NB[PO424
    2VFVF

    FORVFVF

    \LFZ@PO@TTUSJOH^
    EFRVFVF

    \LFZ@PO@TTUSJOH^
    1045SFTVMU

    \LFZ@PO@TTUSJOH SFTVMU\JT@GPPECPPM^^
    <%PXOMPBE*NBHF>
    ඇಉظʹ൑ఆॲཧ

    View Slide

  26. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ
    • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ
    • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ
    • Amazon ECS Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩
    <>IUUQTHJUIVCDPN#7-$DB⒎F
    Agenda

    View Slide

  27. • ECS: Amazon EC2 Container Service
    • Docker ContainerΛEC2Ͱߏ੒͞ΕͨΫϥελʹ഑ஔ(Task)
    • github.com/eagletmt/hako
    • ECSͷߏ੒ΛyamlϑΝΠϧͰ؅ཧ
    ECSͱGPUͱDockerͱ…

    View Slide

  28. "8471$
    # cookpadnet-worker.yml
    scheduler:
    type: ecs
    region: ap-northeast-1
    cluster: hako-production-g2
    desired_count: 1
    app:
    image: cookpadnet-worker-gpu
    cpu: 128
    memory: 3072
    memory_reservation: 2048
    env:
    AWS_REGION: ap-northeast-1
    COOKPADNET_ENV: production
    ...
    %PDLFS3FHJTUSZ
    ։ൃऀ
    EPDLFSQVTI
    IBLPEFQMPZ
    &$4
    EPDLFSQVMM
    5BTL
    DPPLQBEOFUXPSLFS

    View Slide

  29. "8471$
    # cookpadnet-worker.yml
    scheduler:
    type: ecs
    region: ap-northeast-1
    cluster: hako-production-g2
    desired_count: 1
    app:
    image: cookpadnet-worker-gpu
    cpu: 128
    memory: 3072
    memory_reservation: 2048
    env:
    AWS_REGION: ap-northeast-1
    COOKPADNET_ENV: production
    ...
    %PDLFS3FHJTUSZ
    ։ൃऀ
    EPDLFSQVTI
    IBLPEFQMPZ
    &$4
    EPDLFSQVMM
    5BTL
    DPPLQBEOFUXPSLFS
    DockerԽ͞ΕͨWorkerΛ

    hakoͰσϓϩΠ & ߏ੒؅ཧ

    View Slide

  30. w XPSLFSͰ͸(16Λ࢖༻
    w ಉՁ֨ଳͷ$16Πϯελϯεͱൺ΂ͯ ഒͷੑೳࠩ
    w %PDLFS(16
    GPU

    View Slide

  31. • Driver͕ඞཁ
    • nvidia-driverͷkernel module
    • ಉ͡όʔδϣϯͷuser-level drivers
    • Docker Container͔ΒGPU devicesΛૢ࡞͢Δҝ

    Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ
    Ծ૝Խ v.s. Χʔωϧ

    View Slide

  32. ubuntu
    EPDLFSDPOUBJOFS
    ཧ૝
    OWJEJB(16
    VTFSMFWFMESJWFS
    LFSOFMNPEVMFT

    View Slide

  33. ubuntu
    EPDLFSDPOUBJOFS
    ཧ૝
    OWJEJB(16
    VTFSMFWFMESJWFS
    LFSOFMNPEVMFT

    View Slide

  34. ubuntu
    EPDLFSDPOUBJOFS
    ཧ૝
    OWJEJB(16
    VTFSMFWFMESJWFS
    LFSOFMNPEVMFT
    ESJWFSךQBUIכ04ח״׶殯ז׷

    View Slide

  35. NVIDIA Docker
    • Docker CLIͷബ͍ϥούʔ
    • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount

    ͯ͘͠ΕΔ

    View Slide

  36. NVIDIA Docker
    • Docker CLIͷബ͍ϥούʔ
    • `docker run` ࣌ʹඞཁͳvolumeΛࣗಈతʹmount

    ͯ͘͠ΕΔ
    "NB[PO&$4דכ劢؟ه٦ز

    View Slide

  37. ubuntu
    EPDLFSDPOUBJOFS
    ཧ૝
    OWJEJB(16
    VTFSMFWFMESJWFS
    LFSOFMNPEVMFT

    View Slide

  38. ubuntu
    EPDLFSDPOUBJOFS
    ཧ૝
    OWJEJB(16
    VTFSMFWFMESJWFS
    LFSOFMNPEVMFT
    (ಉҰόʔδϣϯ)

    View Slide

  39. ubuntu
    EPDLFSDPOUBJOFS
    ཧ૝
    OWJEJB(16
    VTFSMFWFMESJWFS
    LFSOFMNPEVMFT
    㣐⡤鍑寸
    (ಉҰόʔδϣϯ)

    View Slide

  40. • Driver͕ඞཁ
    • nvidia-driverͷkernel module
    • ಉ͡όʔδϣϯͷuser-level drivers
    • Docker Container͔ΒGPU devicesΛૢ࡞͢Δҝ

    Containerʹద੾ͳLinux Capabilityͷઃఆ͕ඞཁ
    Ծ૝Խ v.s. Χʔωϧ

    View Slide

  41. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ
    • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
    Ծ૝Խ v.s. Χʔωϧ
    EPDLFSSVOa
    EFWJDFEFWOWJEJBEFWOWJEJBa
    EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa
    HQVXPSLFS

    View Slide

  42. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ
    • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
    Ծ૝Խ v.s. Χʔωϧ
    EPDLFSSVOa
    EFWJDFEFWOWJEJBEFWOWJEJBa
    EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa
    HQVXPSLFS
    &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ

    View Slide

  43. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ
    • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
    Ծ૝Խ v.s. Χʔωϧ
    EPDLFSSVOa
    EFWJDFEFWOWJEJBEFWOWJEJBa
    EFWJDFEFWOWJEJBVWNEFWOWJEJBVWNa
    HQVXPSLFS
    &$4ͷ5BTLఆٛʹ͓͍ͯEFWJDFΦϓγϣϯ͸ະαϙʔτ

    View Slide

  44. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ
    • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
    Ծ૝Խ v.s. Χʔωϧ
    EPDLFSSVOQSJWJMFHFEHQVXPSLFS

    View Slide

  45. Ծ૝Խ v.s. Χʔωϧ
    EPDLFSSVOQSJWJMFHFEHQVXPSLFS
    • capability શ։์
    • rootͰ࣮ߦ͞Ε͍ͯΔdockerd্ͷcontainerͷதͰrootΛ
    औ͍ͬͯΔͷͰ৭ʑग़དྷΔ
    EPDLFSSVOQSJWJMFHFEBMQJOFMBUFTUEBUFT
    • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ
    • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ

    View Slide

  46. • GPUσόΠε͸ಛघͳϑΝΠϧͱͯ͠ଘࡏ
    • ΞΫηε͢ΔͨΊʹಛఆͷCapabilityઃఆ͕ඞཁ
    Ծ૝Խ v.s. Χʔωϧ
    EPDLFSSVOQSJWJMFHFEHQVXPSLFS
    • rootҎ֎ͷϢʔβʔͰ࣮ߦ͢Δ͜ͱʹ͢Δ
    • DockerFile಺Ͱ `USER runner`

    View Slide

  47. • ྉཧࣸਅͷࣗಈऩूαʔϏεΛ
    • CaffeNet[1]Λݩʹ࡞ͬͨྉཧը૾ೝࣝϞσϧͱ
    • Amazon SQS/S3 Ͱߏங͞Εͨσʔλϑϩʔͱ
    • Amazon ECS (GPU instance) Λར༻ͯ͠ӡ༻͍ͯ͠Δ࿩
    <>IUUQTHJUIVCDPN#7-$DB⒎F
    Agenda

    View Slide