$30 off During Our Annual Pro Sale. View Details »

kurashiruにおけるSageMakerの活用

RytaroTsuji
October 15, 2018

 kurashiruにおけるSageMakerの活用

aws loft ML night 2018/10/9

RytaroTsuji

October 15, 2018
Tweet

More Decks by RytaroTsuji

Other Decks in Technology

Transcript

  1. A m a z o n S a g e M a k e r ͷ ׆ ༻ ࣄ ྫ

    View Slide

  2. ձ ࣾ ɾαʔϏε঺ հ
    • delyגࣜձࣾ
    • 2014೥4݄૑ۀ
    • ࣾһ70ਓɺैۀһ130ਓ
    • kurashiru (Ϋ ϥ γϧ )
    • 2016೥2݄ ɺα ʔ Ϗ ε ։࢝
    • 2016೥5݄ ɺΞ ϓ Ϧ Ϧ Ϧ ʔε
    • 2017೥4݄ɺશࠃTVCM์ૹ։࢝
    • 2017೥12݄ɺྦྷܭ1000ສDLಥഁ

    View Slide

  3. ࣗ ݾ ঺ հ
    • ⁋ོଠ࿠(@kametaro) github/twitter
    • dely גࣜձࣾ
    • ։ൃ෦ΤϯδχΞɾػցֶश୲౰
    • झຯ
    • ਺࿦ʢପԁۂઢͱอܕܗࣜͷษڧதʣ
    • ུྺ
    • ڈ೥·ͰΞϓϦˍαʔόʔαΠυͷΤϯδχΞΛϝΠϯͰ΍ͬͯ·ͨ͠ɻػցֶश
    ΤϯδχΞͱͯ͠͸·ͩ·ͩϖʔϖʔͰ͢ɻ

    View Slide

  4. ϨγϐఏҊʹ๊͓͍͍ͯ͑ͯͨ՝୊






    શϢʔβʔʹڞ௨ͷϨγϐ܈Λදࣔ
    ਓͦΕͧΕͷ޷Έʹ߹ͬͨϨγϐఏҊ͕Ͱ͖͍ͯͳ͍

    View Slide

  5. ཧ૝ͷϨγϐఏҊ
    ਓͦΕͧΕͷ޷Έʹج͍ͮͯύʔιφϥΠζ͞ΕͨఏҊ
    1Ґ 2Ґ 3Ґ
    1Ґ 2Ґ 3Ґ
    1Ґ 2Ґ 3Ґ

    View Slide

  6. Amazon SageMaker ͷಋೖΛܾఆ
    • ཧ૝ͷϨγϐఏҊΛ࣮ݱ͢Δʹ͸ػցֶशٕज़͕ඞਢ
    • ػցֶशΤϯδχΞ͸1໊ͷΈɺͰ΋࠷୹ͰϦϦʔε͍ͨ͠
    • SageMaker͸ϑϧϚωʔδυͳػցֶशαʔϏε
    • ϞσϧߏஙɺτϨʔχϯάɺσϓϩΠ·ͰΛҰؾ௨؏ͰରԠ
    • ։ൃணख͔Β1.5ϲ݄ͰProduction؀ڥ΁ͷ൓өʹ੒ޭ

    View Slide

  7. ࣮ ૷ ̍ ɿ Ϋ ϥ ε λ Ϧ ϯ ά
    Ϣʔ β ʔ ૉੑ
    • ͓ ؾ ʹ ೖ Γ / ݕࡧճ਺
    • ࢹௌճ਺/ ࢹ ௌ ࣌ ؒ
    • ϩ άΠ ϯ ༗ແ
    • ฏ೔/ ٳ೔ͷ ىಈճ਺
    • ேன໷ͷ ىಈճ਺
    etc…
    Ϩ γ ϐ ૉੑ
    • Χ ς ΰ Ϧ ɺ ࿨ ༸ த
    • ०ͳ৯ࡐ
    • ௐཧ࣌ؒɺ৯ࡐ਺
    • Χ ϩ Ϧ ʔ ɺ Ԙ ෼ ྔ
    • ਏ ͍ ɾ ؁ ͍
    etc…
    Ϣ ʔ β ʔ ͓ Α ͼ Ϩ γ ϐ ͷ ಛ ௃ ྔ Λ ந ग़ ͯ͠ Ϋ ϥε λ Ϧϯ ά

    View Slide

  8. ࣮૷̎ɿڠௐϑ Ο ϧ λ Ϧ ϯ ά
    ڠௐϑ Ο ϧ λ Ϧ ϯ ά
    1. ࣗ෼ʹࣅ͍ͯΔਓͷ޷Έͱ ࣗ෼
    ͷ޷Έ͸ࣅ͍ͯΔ͸ͣʂ
    2. ࣗ ෼ ʹࣅ ͍ͯΔਓ ͕޷ ΜͩϨγϐ͸
    ࣗ ෼ ͕ · ͩ ݟ ͨ ͜ͱͳ ͯ͘ ΋ ޷ ͖
    ͳ ͸ ͣ ʂ
    ֤ Ϣ ʔ β ʔ Ϋ ϥ ε λ ͕ ޷ Ή Ͱ ͋ Ζ ͏
    Ϩ γ ϐ Λ ਪ ࿦ ʹ Α ΓϨ ʔ ς Ο ϯ ά Λ
    औ ಘ ɺίϯ ς ϯ π ϓʔϧ ʹ ֨ ೲ

    View Slide

  9. ࣮ ૷ ̏ɿίϯ ς ϯ π ϓʔϧ ͷ ࠷ ద Խ
    ࣌ؒܦա΍܁Γฦ͠ࢹௌʹ
    ΑΓί ϯ ς ϯ π ͸ ຏ ໣ ͠
    ͯ ͍ ͘

    ಉ ͡ Ϋ ϥε λ ಺ ͷ ະ ࢹ ௌ Ϩ γ
    ϐ ʹ ೖ Ε ସ ͑ ͯ ɺ ί ϯ ς ϯ
    π ϓʔ ϧ Λ Ϧ ϑ Ϩ ο γ ϡ

    View Slide

  10. Ϩ γ ϐ ఏ Ҋ · Ͱ ͷ σ ʔ λ ͷ ྲྀ Ε

    View Slide

  11. Ϩ γ ϐ ఏ Ҋ · Ͱ ͷ σ ʔ λ ͷ ྲྀ Ε
    1. ૊ Έ ࠐ Έ ΍ ͢ ͘ ɺ ૊ Έ ׵ ͑΋ ָ
    • ֶशίϯςφ͕੾Γग़ͤΔͷͰɺ
    δϣϒϑϩʔͷ௥Ճ΍ฒྻԽ͕ྟ
    ػԠมʹߦ͑Δ
    ‏ SageMaker

    View Slide

  12. ϩά
    ऩूج൫
    data ETL Machine Learning Service
    development
    Container vm(minicube)
    [[etl]]
    ap-northeast-1
    us-east-1 ap-northeast-1
    Amazon

    Athena
    kops
    kops
    cronjobs
    extract
    transform
    train
    predict
    load
    [[etl]]
    Transform

    train

    predict

    load
    Amazon
    SageMaker
    predict endpoint
    container
    train job container
    Predict endpoint

    container
    - instance type
    - instance count
    train job container
    - instance type
    - instance count
    DynamoDB
    recommendation
    RDB
    recommendation
    AWS Glue
    staging
    production
    apply staging
    apply
    feature
    input
    feature
    CRR CRR
    apply
    application
    endpoint

    View Slide

  13. ϩά
    ऩूج൫
    data ETL Machine Learning Service
    development
    Container vm(minicube)
    [[etl]]
    ap-northeast-1
    us-east-1 ap-northeast-1
    Amazon

    Athena
    kops
    kops
    cronjobs
    extract
    transform
    train
    predict
    load
    [[etl]]
    Transform

    train

    predict

    load
    Amazon
    SageMaker
    predict endpoint
    container
    train job container
    Predict endpoint

    container
    - instance type
    - instance count
    train job container
    - instance type
    - instance count
    DynamoDB
    recommendation
    RDB
    recommendation
    AWS Glue
    staging
    production
    apply staging
    apply
    feature
    input
    feature
    CRR CRR
    apply
    application
    endpoint
    ‏ SageMaker
    1. ॊೈͳόονγεςϜ
    • τϨʔχϯάδϣϒʹ͔͔ΔෛՙΛ

    ผΠϯελϯεʹҕৡՄೳ
    • ඇಉظͰδϣϒ࣮ߦ΋Մೳ

    2. ࣗ༝ʹΤϯυϙΠϯτԽ
    • ӬଓԽͨ͠API͔Βਪ࿦݁ՌΛฦ٫
    • Φʔτεέʔϧػೳ΋͋Γ

    View Slide

  14. Amazon SageMakerͷ׆༻
    • ෼ੳʢϊʔτϒοΫΠϯελϯεʣ
    • ֶशͱਪ࿦ʢΞϧΰϦζϜɾίϯςφʣ
    ͜ΕΒͷओʹͭ·͍ͣͨ఺Λ঺հ

    View Slide

  15. ෼ੳᶃ
    ϊʔτϒοΫΠϯελϯε
    ‣ Jupyter NotebookͷΠϯελϯεΛ؆୯ʹىಈͰ͖Δɻ
    ‣ ΠϯελϯεαΠζΛ࡞੒ޙʹมߋՄೳɻ

    View Slide

  16. ෼ੳᶄ
    ϥΠϑαΠΫϧઃఆ
    #!/bin/bash
    set -e
    sudo yum install -y gcc72 gcc72-c++
    echo ". /home/ec2-user/anaconda3/etc/profile.d/
    conda.sh" >> ~/.bashrc
    source ~/.bashrc
    conda activate python3
    pip install --upgrade pip
    pip install sshtunnel --no-warn-conflicts
    pip install pymysql --no-warn-conflicts
    pip install gensim --no-warn-conflicts
    pip install msgpack --no-warn-conflicts
    pip install janome --no-warn-conflicts
    pip install jupyter-emacskeys --no-warn-conflicts
    pip install fasttext --no-warn-conflicts
    ϊʔτϒοΫΠϯελϯεىಈ௚ޙʹ
    ඞཁͳϥΠϒϥϦͷΠϯετʔϧͳͲ
    Λࡁ·ͤΔɻ
    Lifecycle configurations ex)

    View Slide

  17. ෼ੳᶅ
    • ϊʔτϒοΫͰͭ·͍ͮͨͱ͜Ζ
    ϊʔτϒοΫͷىಈʹࣦഊ͢Δͱίϯιʔϧը໘͔ΒىಈͰ͖ͳ͘ͳΔɻ
    ϥΠϑαΠΫϧઃఆͷpip install͕҆ఆ͠ͳ͍ɻ
    ‣ ϥΠϑαΠΫϧઃఆͰίέΔ
    ‣ େ͖ͳϑΝΠϧΛuploadͯ͠ΠϯελϯεͷσΟεΫ༰ྔ͕͍ͬͺ͍
    ‣ sagemakerͷpython packageͱpipͷىಈλΠϛϯά͕όοςΟϯά͢Δͱى͜Δɻ
    ✓pip install numpy —no-warn-conflicts # ͜ͷΦϓγϣϯΛ෇͚Δ
    ‣ ͜ͷΑ͏ʹԿ΋ૢ࡞Ͱ͖ͳ͘ͳΔ
    ✓awscli͔Βىಈ͢Δ
    # aws sagemaker start-notebook-instance --notebook-instance-name my_note

    View Slide

  18. ֶशͱਪ࿦ᶃ
    • Built-InΞϧΰϦζϜ
    k-means
    PCA
    LDA
    Factorization Machines
    Linear Learner
    Neural Topic Model
    Random Cut Forest
    Seq2Seq Modeling
    XGBoost
    Object Detection
    Image Classification
    DeepAR Forecasting
    BlazingText
    k-nearest-neighbor (k-NN)
    ‣ Factorization Machines => Ϩίϝϯυ
    ‣ XGBoost => ଞΫϥε෼ྨ
    ‣ Image Classification => αϜωΠϧը૾෼ྨ
    ‣ k-means => ΫϥελϦϯά

    View Slide

  19. ֶशͱਪ࿦ᶄ
    • Factorization MachinesͰͭ·͍ͣͨͱ͜Ζ
    ՝୊ɿnumpyͰѻ͏ʹ͸େ͖͗͢ΔτϨʔχϯάσʔληοτ

    View Slide

  20. ֶशͱਪ࿦ᶄ
    • Factorization MachinesͰͭ·͍ͣͨͱ͜Ζ
    ରࡦɿscipy.sparse.lil_matrixʹΑΔεύʔεߦྻͷੜ੒͢Δ
    େ͖ͳεύʔεߦྻΛ̍ͰຒΊ͍ͯ͘

    View Slide

  21. ֶशͱਪ࿦ᶄ
    • Factorization MachinesͰͭ·͍ͣͨͱ͜Ζ
    ՝୊ɾରࡦɿਪ࿦ྔ͕ଟ͍numpy:1ߦ -> scr:10000ߦʢ16࣌ؒ -> 20෼ʣ
    Compressed Sparse Row matrix ʹѹॖ csrߦྻ͕ࢦఆͰ͖Δ
    ※) Batch transform job ʹमਖ਼த

    View Slide

  22. ֶशͱਪ࿦ᶅ
    • XGBoostͰͭ·͍ͣͨͱ͜Ζ
    ՝୊ɿϋΠύʔύϥϝλௐ੔δϣϒͬͯͲ͏΍ͬͯ࢖͏ͷʁ

    View Slide

  23. ֶशͱਪ࿦ᶅ
    • XGBoostͰͭ·͍ͣͨͱ͜Ζ
    ରࡦɿϋΠύʔύϥϝλௐ੔δϣϒͷҾ਺ʹrangesύϥϝλΛ౉͢

    View Slide

  24. ֶशͱਪ࿦ᶅ
    • XGBoostͰͭ·͍ͣͨͱ͜Ζ
    ରࡦɿϋΠύʔύϥϝλௐ੔δϣϒͷ࣮ߦ

    View Slide

  25. ֶशͱਪ࿦ᶅ
    • XGBoostͰͭ·͍ͣͨͱ͜Ζ
    ରࡦɿϋΠύʔύϥϝλௐ੔δϣϒΛίϯιʔϧͰ֬ೝ
    validation:auc

    View Slide

  26. ֶशͱਪ࿦ᶆ
    • Image ClassificationͰͭ·͍ͣͨͱ͜Ζ
    ՝୊: τϨʔχϯάσʔληοτͬͯͲ͏΍ͬͯ༻ҙ͢Δͷʁ
    MXNetͷrecϑΝΠϧΛࢦఆ͢Δ

    View Slide

  27. ֶशͱਪ࿦ᶆ
    • Image ClassificationͰͭ·͍ͣͨͱ͜Ζ
    ରࡦɿMXNetͷlstϑΝΠϧͱrecϑΝΠϧͷ࡞੒
    MXNET_HOME = ‘~/incubator-mxnet/'
    RESOURCE_DIR = ‘~/thumbnails/'
    os.system('python {0}/tools/im2rec.py --list --recursive --train-ratio 0.8 --test-ratio 0.2 {1}/im2rec/target {1}'.format(MXNET_HOME, RESOURCE_DIR))
    os.system('python {0}/tools/im2rec.py --resize 480 --quality 95 --num-thread 64 {1}/im2rec/train {1}'.format(MXNET_HOME, RESOURCE_DIR))
    os.system('python {0}/tools/im2rec.py --resize 480 --quality 95 --num-thread 64 {1}/im2rec/test {1}'.format(MXNET_HOME, RESOURCE_DIR))
    1.https://github.com/apache/incubator-mxnet.git
    2.ֶश͢ΔαϜωΠϧը૾ΛPCʹμ΢ϯϩʔυ
    3.࡞੒ͨ͠recϑΝΠϧΛS3ͷॴఆͷ৔ॴʹΞοϓϩʔυ

    View Slide

  28. ֶशͱਪ࿦ᶇ
    • k-meansͰͭ·͍ͣͨͱ͜Ζ
    ՝୊ɾରࡦɿkΫϥελʔͷ࠷ద਺͸Ͳ͏΍ͬͯௐ΂Δͷʁ͜Εʹؔͯ͠͸ϋΠύʔύϥ
    ϝλௐ੔δϣϒͰ͸ݱ࣌఺Ͱ͸Ͱ͖ͳ͍ͷͰҎԼͷํ๏Ͱ஍ಓʹௐ΂Δɻ
    ΤϧϘʔ๏ γϧΤοτ෼ੳ

    View Slide

  29. ETLɾֶशόονγεςϜ
    • Kubernetes(kops)Λج൫ʹબ୒ͨ͠ཧ༝
    step functionsʗAWS BatchͰ͸ɺδϣϒͱδϣϒϑϩʔΛҰॹʹ؅ཧͰ͖ͳ͍ɻ
    εέδϡʔϥʔ͕cronjobs͚ͩͰγϯϓϧʹ؅ཧͰ͖ɺίϚϯυͰ؆୯ʹมߋͰ͖Δɻ
    ΦϯϥΠϯֶशͰ͸BatchͱAPIΛ࿈ܞ͢Δඞཁ͕͋ͬͨɻ
    কདྷతʹ͸EKSʢ౦ژϦʔδϣϯʣͰ؅ཧͰ͖Δɻ
    step functions΍AWS Batch΋෦෼తʹ࢖༻Մೳɻ
    SageMakerͰ͸ֶश͕ίϯςφʹ੾Γ཭ͤΔͷͰɺόονγεςϜͷઃܭ͕ॊೈʹߦ
    ͑Δɻ

    View Slide

  30. SageMakerΛ̑ϲ݄࢖ͬͯΈͨײ૝
    • ෼ੳʢϊʔτϒοΫΠϯελϯεʣ
    ϥΠϑαΠΫϧઃఆ͕ศརʗ͓खܰʹ؀ڥΛηοτΞοϓͰ͖Δ
    ͪΐͬͱॲཧ͕ॏ͘ͳͬͨͱࢥͬͨΒɺ͋ͱ͔ΒΠϯελϯελΠϓΛมߋՄೳ
    • ֶशͱਪ࿦ʢΞϧΰϦζϜɾίϯςφʣ
    Built-inΞϧΰϦζϜɺTensorflowʗChainerͳͲਂ૚ֶशϑϨʔϜϫʔΫ΋ॆ࣮
    ֶशίϯςφ͕੾Γ཭͞ΕΔͷͰɺ࣮ߦதͷδϣϒϦιʔεΛؾʹ͠ͳͯ͘ࡁΉ
    ϊʔτϒοΫΛෳ਺ਓͰར༻Ͱ͖Δ
    ϞσϧΛ؆୯ʹΤϯυϙΠϯτͱͯ͠σϓϩΠͰ͖ɺΦʔτεέʔϧ΋Մೳ
    ϋΠύʔύϥϝλௐ੔δϣϒΛ࢖ͬͯɺҰ൪ྑ͍ϋΠύʔύϥϝλΛࣗಈઃఆͰ͖Δ

    View Slide

  31. ࠓޙͷల๬
    • ৯ࡐͷ ༨Γ ΍ ͢ ͞ Λ ߟྀ͠ ͨ Ϩ γ ϐ ఏҊ
    1. աڈʹ ࢹௌ͠ ͨ Ϩ γ ϐ ͷ தͰ ༨Γ ΍ ͢ ͍ ৯ࡐΛ ൑ผ
    2. ͦ ͷ ৯ࡐΛ ޮ཰Α ͘ ফඅͰ ͖ Δ Ϩ γ ϐ Λ ఏҊ
    • ύʔιφϥΠζͨ͠ϨγϐͷఏҊ
    1. ʰ ਏ ͍ ʗ ؁ ͍ ʱ ɺ ʰ ͜ ͬ ͯ Γ ʗ ͞ ͬ ͺ Γ ʱ ͳ Ͳ ɺ
    Α Γ Ϣ ʔ β ͷ ޷ Έ ΍ ϥ Π ϑ ε λ Π ϧ ʹ ߹ ͬ ͨ Ϩ γ
    ϐ ͷ ఏ Ҋ
    2. ༨ ͬ ͨ ৯ ࡐ ʹ ͪ ΐ ͍ ଍ ͠ ͠ ͯ Ͱ ͖ Δ Ϩ γ ϐ ͷ ఏ Ҋ

    View Slide

  32. delyͰ͸ػցֶशΤϯδχΞΛืू͍ͯ͠·͢ʂ
    • ΫϥγϧγΣϑ͕࡞ͬͨϨγϐ͸ຊ౰ʹඒຯ͍͠ΜͰ͢Αɻ
    ඒ ຯ ͠ ͦ ͏ ͳ ͷ ͸ ݟ ͨ ໨ ͩ ͚ ͳ Μ Ͱ ͠ ΐ ͏ ʁ ͍ ΍ ͍ ΍ ɺ
    ͦ Μ ͳ ͜ ͱ ͳ ͍ Μ Ͱ ͢ɻ ຯ Θ ͬ ͯ Έ Δ ͭ ͍ Ͱ ʹ ػ ց ֶ
    श ΋ ΍ Γ ͨ ͍ ͱ ͍ ͏ ํ ͸ ͥ ͻ ͓ ଴ ͪ ͠ ͯ ͓ Γ · ͢ ʂ
    • ػցֶशʹؔ࿈͢Δ͜ͱ͸શ෦ܦݧͰ͖·͢ɻ
    ͍ · ͷ ͱ ͜ Ζ σ ʔ λ ෼ ੳ ɺ α ʔ Ϗ ε ఏ ڙ ɺ ֶ श Ξ ϧ ΰ
    Ϧ ζ Ϝ બ ఆ ɺ ج ൫ ߏ ங ɾ ӡ ༻ · Ͱ શ ෦ Ұ ਓ Ͱ ΍ ͬ ͯ ·
    ͢ɻ গ ͠ େ ͖ ͍ ن ໛ ͷ ૊ ৫ ͩ ͱ ෳ ਺ ਓ Ͱ ΍ Δ Α ͏ ͳ ͜
    ͱ Λ ڽ ॖ ͠ ͯ ܦ ݧ Ͱ ͖ · ͢ ʂ

    View Slide