Upgrade to Pro — share decks privately, control downloads, hide ads and more …

kurashiruにおけるSageMakerの活用

RytaroTsuji
October 15, 2018

 kurashiruにおけるSageMakerの活用

aws loft ML night 2018/10/9

RytaroTsuji

October 15, 2018
Tweet

More Decks by RytaroTsuji

Other Decks in Technology

Transcript

  1. A m a z o n S a g e

    M a k e r ͷ ׆ ༻ ࣄ ྫ
  2. ձ ࣾ ɾαʔϏε঺ հ • delyגࣜձࣾ • 2014೥4݄૑ۀ • ࣾһ70ਓɺैۀһ130ਓ

    • kurashiru (Ϋ ϥ γϧ ) • 2016೥2݄ ɺα ʔ Ϗ ε ։࢝ • 2016೥5݄ ɺΞ ϓ Ϧ Ϧ Ϧ ʔε • 2017೥4݄ɺશࠃTVCM์ૹ։࢝ • 2017೥12݄ɺྦྷܭ1000ສDLಥഁ
  3. ࣗ ݾ ঺ հ • ⁋ོଠ࿠(@kametaro) github/twitter • dely גࣜձࣾ

    • ։ൃ෦ΤϯδχΞɾػցֶश୲౰ • झຯ • ਺࿦ʢପԁۂઢͱอܕܗࣜͷษڧதʣ • ུྺ • ڈ೥·ͰΞϓϦˍαʔόʔαΠυͷΤϯδχΞΛϝΠϯͰ΍ͬͯ·ͨ͠ɻػցֶश ΤϯδχΞͱͯ͠͸·ͩ·ͩϖʔϖʔͰ͢ɻ
  4. ࣮ ૷ ̍ ɿ Ϋ ϥ ε λ Ϧ ϯ

    ά Ϣʔ β ʔ ૉੑ • ͓ ؾ ʹ ೖ Γ / ݕࡧճ਺ • ࢹௌճ਺/ ࢹ ௌ ࣌ ؒ • ϩ άΠ ϯ ༗ແ • ฏ೔/ ٳ೔ͷ ىಈճ਺ • ேன໷ͷ ىಈճ਺ etc… Ϩ γ ϐ ૉੑ • Χ ς ΰ Ϧ ɺ ࿨ ༸ த • ०ͳ৯ࡐ • ௐཧ࣌ؒɺ৯ࡐ਺ • Χ ϩ Ϧ ʔ ɺ Ԙ ෼ ྔ • ਏ ͍ ɾ ؁ ͍ etc… Ϣ ʔ β ʔ ͓ Α ͼ Ϩ γ ϐ ͷ ಛ ௃ ྔ Λ ந ग़ ͯ͠ Ϋ ϥε λ Ϧϯ ά
  5. ࣮૷̎ɿڠௐϑ Ο ϧ λ Ϧ ϯ ά ڠௐϑ Ο ϧ

    λ Ϧ ϯ ά 1. ࣗ෼ʹࣅ͍ͯΔਓͷ޷Έͱ ࣗ෼ ͷ޷Έ͸ࣅ͍ͯΔ͸ͣʂ 2. ࣗ ෼ ʹࣅ ͍ͯΔਓ ͕޷ ΜͩϨγϐ͸ ࣗ ෼ ͕ · ͩ ݟ ͨ ͜ͱͳ ͯ͘ ΋ ޷ ͖ ͳ ͸ ͣ ʂ ֤ Ϣ ʔ β ʔ Ϋ ϥ ε λ ͕ ޷ Ή Ͱ ͋ Ζ ͏ Ϩ γ ϐ Λ ਪ ࿦ ʹ Α ΓϨ ʔ ς Ο ϯ ά Λ औ ಘ ɺίϯ ς ϯ π ϓʔϧ ʹ ֨ ೲ
  6. ࣮ ૷ ̏ɿίϯ ς ϯ π ϓʔϧ ͷ ࠷ ద

    Խ ࣌ؒܦա΍܁Γฦ͠ࢹௌʹ ΑΓί ϯ ς ϯ π ͸ ຏ ໣ ͠ ͯ ͍ ͘ ↓ ಉ ͡ Ϋ ϥε λ ಺ ͷ ະ ࢹ ௌ Ϩ γ ϐ ʹ ೖ Ε ସ ͑ ͯ ɺ ί ϯ ς ϯ π ϓʔ ϧ Λ Ϧ ϑ Ϩ ο γ ϡ
  7. Ϩ γ ϐ ఏ Ҋ · Ͱ ͷ σ ʔ

    λ ͷ ྲྀ Ε 1. ૊ Έ ࠐ Έ ΍ ͢ ͘ ɺ ૊ Έ ׵ ͑΋ ָ • ֶशίϯςφ͕੾Γग़ͤΔͷͰɺ δϣϒϑϩʔͷ௥Ճ΍ฒྻԽ͕ྟ ػԠมʹߦ͑Δ ‏ SageMaker
  8. ϩά ऩूج൫ data ETL Machine Learning Service development Container vm(minicube)

    [[etl]] ap-northeast-1 us-east-1 ap-northeast-1 Amazon
 Athena kops kops cronjobs extract transform train predict load [[etl]] Transform
 train
 predict
 load Amazon SageMaker predict endpoint container train job container Predict endpoint
 container - instance type - instance count train job container - instance type - instance count DynamoDB recommendation RDB recommendation AWS Glue staging production apply staging apply feature input feature CRR CRR apply application endpoint
  9. ϩά ऩूج൫ data ETL Machine Learning Service development Container vm(minicube)

    [[etl]] ap-northeast-1 us-east-1 ap-northeast-1 Amazon
 Athena kops kops cronjobs extract transform train predict load [[etl]] Transform
 train
 predict
 load Amazon SageMaker predict endpoint container train job container Predict endpoint
 container - instance type - instance count train job container - instance type - instance count DynamoDB recommendation RDB recommendation AWS Glue staging production apply staging apply feature input feature CRR CRR apply application endpoint ‏ SageMaker 1. ॊೈͳόονγεςϜ • τϨʔχϯάδϣϒʹ͔͔ΔෛՙΛ
 ผΠϯελϯεʹҕৡՄೳ • ඇಉظͰδϣϒ࣮ߦ΋Մೳ
 2. ࣗ༝ʹΤϯυϙΠϯτԽ • ӬଓԽͨ͠API͔Βਪ࿦݁ՌΛฦ٫ • Φʔτεέʔϧػೳ΋͋Γ
  10. ෼ੳᶄ ϥΠϑαΠΫϧઃఆ #!/bin/bash set -e sudo yum install -y gcc72

    gcc72-c++ echo ". /home/ec2-user/anaconda3/etc/profile.d/ conda.sh" >> ~/.bashrc source ~/.bashrc conda activate python3 pip install --upgrade pip pip install sshtunnel --no-warn-conflicts pip install pymysql --no-warn-conflicts pip install gensim --no-warn-conflicts pip install msgpack --no-warn-conflicts pip install janome --no-warn-conflicts pip install jupyter-emacskeys --no-warn-conflicts pip install fasttext --no-warn-conflicts ϊʔτϒοΫΠϯελϯεىಈ௚ޙʹ ඞཁͳϥΠϒϥϦͷΠϯετʔϧͳͲ Λࡁ·ͤΔɻ Lifecycle configurations ex)
  11. ෼ੳᶅ • ϊʔτϒοΫͰͭ·͍ͮͨͱ͜Ζ ϊʔτϒοΫͷىಈʹࣦഊ͢Δͱίϯιʔϧը໘͔ΒىಈͰ͖ͳ͘ͳΔɻ ϥΠϑαΠΫϧઃఆͷpip install͕҆ఆ͠ͳ͍ɻ ‣ ϥΠϑαΠΫϧઃఆͰίέΔ ‣ େ͖ͳϑΝΠϧΛuploadͯ͠ΠϯελϯεͷσΟεΫ༰ྔ͕͍ͬͺ͍

    ‣ sagemakerͷpython packageͱpipͷىಈλΠϛϯά͕όοςΟϯά͢Δͱى͜Δɻ ✓pip install numpy —no-warn-conflicts # ͜ͷΦϓγϣϯΛ෇͚Δ ‣ ͜ͷΑ͏ʹԿ΋ૢ࡞Ͱ͖ͳ͘ͳΔ ✓awscli͔Βىಈ͢Δ # aws sagemaker start-notebook-instance --notebook-instance-name my_note
  12. ֶशͱਪ࿦ᶃ • Built-InΞϧΰϦζϜ k-means PCA LDA Factorization Machines Linear Learner

    Neural Topic Model Random Cut Forest Seq2Seq Modeling XGBoost Object Detection Image Classification DeepAR Forecasting BlazingText k-nearest-neighbor (k-NN) ‣ Factorization Machines => Ϩίϝϯυ ‣ XGBoost => ଞΫϥε෼ྨ ‣ Image Classification => αϜωΠϧը૾෼ྨ ‣ k-means => ΫϥελϦϯά
  13. ֶशͱਪ࿦ᶆ • Image ClassificationͰͭ·͍ͣͨͱ͜Ζ ରࡦɿMXNetͷlstϑΝΠϧͱrecϑΝΠϧͷ࡞੒ MXNET_HOME = ‘~/incubator-mxnet/' RESOURCE_DIR =

    ‘~/thumbnails/' os.system('python {0}/tools/im2rec.py --list --recursive --train-ratio 0.8 --test-ratio 0.2 {1}/im2rec/target {1}'.format(MXNET_HOME, RESOURCE_DIR)) os.system('python {0}/tools/im2rec.py --resize 480 --quality 95 --num-thread 64 {1}/im2rec/train {1}'.format(MXNET_HOME, RESOURCE_DIR)) os.system('python {0}/tools/im2rec.py --resize 480 --quality 95 --num-thread 64 {1}/im2rec/test {1}'.format(MXNET_HOME, RESOURCE_DIR)) 1.https://github.com/apache/incubator-mxnet.git 2.ֶश͢ΔαϜωΠϧը૾ΛPCʹμ΢ϯϩʔυ 3.࡞੒ͨ͠recϑΝΠϧΛS3ͷॴఆͷ৔ॴʹΞοϓϩʔυ
  14. ࠓޙͷల๬ • ৯ࡐͷ ༨Γ ΍ ͢ ͞ Λ ߟྀ͠ ͨ

    Ϩ γ ϐ ఏҊ 1. աڈʹ ࢹௌ͠ ͨ Ϩ γ ϐ ͷ தͰ ༨Γ ΍ ͢ ͍ ৯ࡐΛ ൑ผ 2. ͦ ͷ ৯ࡐΛ ޮ཰Α ͘ ফඅͰ ͖ Δ Ϩ γ ϐ Λ ఏҊ • ύʔιφϥΠζͨ͠ϨγϐͷఏҊ 1. ʰ ਏ ͍ ʗ ؁ ͍ ʱ ɺ ʰ ͜ ͬ ͯ Γ ʗ ͞ ͬ ͺ Γ ʱ ͳ Ͳ ɺ Α Γ Ϣ ʔ β ͷ ޷ Έ ΍ ϥ Π ϑ ε λ Π ϧ ʹ ߹ ͬ ͨ Ϩ γ ϐ ͷ ఏ Ҋ 2. ༨ ͬ ͨ ৯ ࡐ ʹ ͪ ΐ ͍ ଍ ͠ ͠ ͯ Ͱ ͖ Δ Ϩ γ ϐ ͷ ఏ Ҋ
  15. delyͰ͸ػցֶशΤϯδχΞΛืू͍ͯ͠·͢ʂ • ΫϥγϧγΣϑ͕࡞ͬͨϨγϐ͸ຊ౰ʹඒຯ͍͠ΜͰ͢Αɻ ඒ ຯ ͠ ͦ ͏ ͳ ͷ

    ͸ ݟ ͨ ໨ ͩ ͚ ͳ Μ Ͱ ͠ ΐ ͏ ʁ ͍ ΍ ͍ ΍ ɺ ͦ Μ ͳ ͜ ͱ ͳ ͍ Μ Ͱ ͢ɻ ຯ Θ ͬ ͯ Έ Δ ͭ ͍ Ͱ ʹ ػ ց ֶ श ΋ ΍ Γ ͨ ͍ ͱ ͍ ͏ ํ ͸ ͥ ͻ ͓ ଴ ͪ ͠ ͯ ͓ Γ · ͢ ʂ • ػցֶशʹؔ࿈͢Δ͜ͱ͸શ෦ܦݧͰ͖·͢ɻ ͍ · ͷ ͱ ͜ Ζ σ ʔ λ ෼ ੳ ɺ α ʔ Ϗ ε ఏ ڙ ɺ ֶ श Ξ ϧ ΰ Ϧ ζ Ϝ બ ఆ ɺ ج ൫ ߏ ங ɾ ӡ ༻ · Ͱ શ ෦ Ұ ਓ Ͱ ΍ ͬ ͯ · ͢ɻ গ ͠ େ ͖ ͍ ن ໛ ͷ ૊ ৫ ͩ ͱ ෳ ਺ ਓ Ͱ ΍ Δ Α ͏ ͳ ͜ ͱ Λ ڽ ॖ ͠ ͯ ܦ ݧ Ͱ ͖ · ͢ ʂ