aws loft ML night 2018/10/9
A m a z o n S a g e M a k e r ͷ ׆ ༻ ࣄ ྫ
View Slide
ձ ࣾ ɾαʔϏε հ• delyגࣜձࣾ• 20144݄ۀ• ࣾһ70ਓɺैۀһ130ਓ• kurashiru (Ϋ ϥ γϧ )• 20162݄ ɺα ʔ Ϗ ε ։࢝• 20165݄ ɺΞ ϓ Ϧ Ϧ Ϧ ʔε• 20174݄ɺશࠃTVCM์ૹ։࢝• 201712݄ɺྦྷܭ1000ສDLಥഁ
ࣗ ݾ հ• ⁋ོଠ(@kametaro) github/twitter• dely גࣜձࣾ• ։ൃ෦ΤϯδχΞɾػցֶश୲• झຯ• ʢପԁۂઢͱอܕܗࣜͷษڧதʣ• ུྺ• ڈ·ͰΞϓϦˍαʔόʔαΠυͷΤϯδχΞΛϝΠϯͰͬͯ·ͨ͠ɻػցֶशΤϯδχΞͱͯ͠·ͩ·ͩϖʔϖʔͰ͢ɻ
ϨγϐఏҊʹ๊͓͍͍ͯ͑ͯͨ՝1Ґ2Ґ3Ґ4Ґ5Ґ6ҐશϢʔβʔʹڞ௨ͷϨγϐ܈ΛදࣔਓͦΕͧΕͷΈʹ߹ͬͨϨγϐఏҊ͕Ͱ͖͍ͯͳ͍
ཧͷϨγϐఏҊਓͦΕͧΕͷΈʹج͍ͮͯύʔιφϥΠζ͞ΕͨఏҊ1Ґ 2Ґ 3Ґ1Ґ 2Ґ 3Ґ1Ґ 2Ґ 3Ґ
Amazon SageMaker ͷಋೖΛܾఆ• ཧͷϨγϐఏҊΛ࣮ݱ͢Δʹػցֶशٕज़͕ඞਢ• ػցֶशΤϯδχΞ1໊ͷΈɺͰ࠷ͰϦϦʔε͍ͨ͠• SageMakerϑϧϚωʔδυͳػցֶशαʔϏε• ϞσϧߏஙɺτϨʔχϯάɺσϓϩΠ·ͰΛҰؾ௨؏ͰରԠ• ։ൃணख͔Β1.5ϲ݄ͰProductionڥͷөʹޭ
࣮ ̍ ɿ Ϋ ϥ ε λ Ϧ ϯ άϢʔ β ʔ ૉੑ• ͓ ؾ ʹ ೖ Γ / ݕࡧճ• ࢹௌճ/ ࢹ ௌ ࣌ ؒ• ϩ άΠ ϯ ༗ແ• ฏ/ ٳͷ ىಈճ• ேனͷ ىಈճetc…Ϩ γ ϐ ૉੑ• Χ ς ΰ Ϧ ɺ ༸ த• ०ͳ৯ࡐ• ௐཧ࣌ؒɺ৯ࡐ• Χ ϩ Ϧ ʔ ɺ Ԙ ྔ• ਏ ͍ ɾ ͍etc…Ϣ ʔ β ʔ ͓ Α ͼ Ϩ γ ϐ ͷ ಛ ྔ Λ ந ग़ ͯ͠ Ϋ ϥε λ Ϧϯ ά
࣮̎ɿڠௐϑ Ο ϧ λ Ϧ ϯ άڠௐϑ Ο ϧ λ Ϧ ϯ ά1. ࣗʹࣅ͍ͯΔਓͷΈͱ ࣗͷΈࣅ͍ͯΔͣʂ2. ࣗ ʹࣅ ͍ͯΔਓ ͕ ΜͩϨγϐࣗ ͕ · ͩ ݟ ͨ ͜ͱͳ ͯ͘ ͖ͳ ͣ ʂ֤ Ϣ ʔ β ʔ Ϋ ϥ ε λ ͕ Ή Ͱ ͋ Ζ ͏Ϩ γ ϐ Λ ਪ ʹ Α ΓϨ ʔ ς Ο ϯ ά Λऔ ಘ ɺίϯ ς ϯ π ϓʔϧ ʹ ֨ ೲ
࣮ ̏ɿίϯ ς ϯ π ϓʔϧ ͷ ࠷ ద Խ࣌ؒܦա܁Γฦ͠ࢹௌʹΑΓί ϯ ς ϯ π ຏ ͯ͠ ͍ ͘↓ಉ ͡ Ϋ ϥε λ ͷ ະ ࢹ ௌ Ϩ γϐ ʹ ೖ Ε ସ ͑ ͯ ɺ ί ϯ ς ϯπ ϓʔ ϧ Λ Ϧ ϑ Ϩ ο γ ϡ
Ϩ γ ϐ ఏ Ҋ · Ͱ ͷ σ ʔ λ ͷ ྲྀ Ε
Ϩ γ ϐ ఏ Ҋ · Ͱ ͷ σ ʔ λ ͷ ྲྀ Ε1. Έ ࠐ Έ ͢ ͘ ɺ Έ ͑ ָ• ֶशίϯςφ͕Γग़ͤΔͷͰɺδϣϒϑϩʔͷՃฒྻԽ͕ྟػԠมʹߦ͑Δ SageMaker
ϩάऩूج൫data ETL Machine Learning ServicedevelopmentContainer vm(minicube)[[etl]]ap-northeast-1us-east-1 ap-northeast-1Amazon Athenakopskopscronjobsextracttransformtrainpredictload[[etl]]Transform train predict loadAmazonSageMakerpredict endpointcontainertrain job containerPredict endpoint container- instance type- instance counttrain job container- instance type- instance countDynamoDBrecommendationRDBrecommendationAWS Gluestagingproductionapply stagingapplyfeatureinputfeatureCRR CRRapplyapplicationendpoint
ϩάऩूج൫data ETL Machine Learning ServicedevelopmentContainer vm(minicube)[[etl]]ap-northeast-1us-east-1 ap-northeast-1Amazon Athenakopskopscronjobsextracttransformtrainpredictload[[etl]]Transform train predict loadAmazonSageMakerpredict endpointcontainertrain job containerPredict endpoint container- instance type- instance counttrain job container- instance type- instance countDynamoDBrecommendationRDBrecommendationAWS Gluestagingproductionapply stagingapplyfeatureinputfeatureCRR CRRapplyapplicationendpoint SageMaker1. ॊೈͳόονγεςϜ• τϨʔχϯάδϣϒʹ͔͔ΔෛՙΛ ผΠϯελϯεʹҕৡՄೳ• ඇಉظͰδϣϒ࣮ߦՄೳ 2. ࣗ༝ʹΤϯυϙΠϯτԽ• ӬଓԽͨ͠API͔Βਪ݁ՌΛฦ٫• Φʔτεέʔϧػೳ͋Γ
Amazon SageMakerͷ׆༻• ੳʢϊʔτϒοΫΠϯελϯεʣ• ֶशͱਪʢΞϧΰϦζϜɾίϯςφʣ͜ΕΒͷओʹͭ·͍ͣͨΛհ
ੳᶃϊʔτϒοΫΠϯελϯε‣ Jupyter NotebookͷΠϯελϯεΛ؆୯ʹىಈͰ͖Δɻ‣ ΠϯελϯεαΠζΛ࡞ޙʹมߋՄೳɻ
ੳᶄϥΠϑαΠΫϧઃఆ#!/bin/bashset -esudo yum install -y gcc72 gcc72-c++echo ". /home/ec2-user/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrcsource ~/.bashrcconda activate python3pip install --upgrade pippip install sshtunnel --no-warn-conflictspip install pymysql --no-warn-conflictspip install gensim --no-warn-conflictspip install msgpack --no-warn-conflictspip install janome --no-warn-conflictspip install jupyter-emacskeys --no-warn-conflictspip install fasttext --no-warn-conflictsϊʔτϒοΫΠϯελϯεىಈޙʹඞཁͳϥΠϒϥϦͷΠϯετʔϧͳͲΛࡁ·ͤΔɻLifecycle configurations ex)
ੳᶅ• ϊʔτϒοΫͰͭ·͍ͮͨͱ͜ΖϊʔτϒοΫͷىಈʹࣦഊ͢Δͱίϯιʔϧը໘͔ΒىಈͰ͖ͳ͘ͳΔɻϥΠϑαΠΫϧઃఆͷpip install͕҆ఆ͠ͳ͍ɻ‣ ϥΠϑαΠΫϧઃఆͰίέΔ‣ େ͖ͳϑΝΠϧΛuploadͯ͠ΠϯελϯεͷσΟεΫ༰ྔ͕͍ͬͺ͍‣ sagemakerͷpython packageͱpipͷىಈλΠϛϯά͕όοςΟϯά͢Δͱى͜Δɻ✓pip install numpy —no-warn-conflicts # ͜ͷΦϓγϣϯΛ͚Δ‣ ͜ͷΑ͏ʹԿૢ࡞Ͱ͖ͳ͘ͳΔ✓awscli͔Βىಈ͢Δ# aws sagemaker start-notebook-instance --notebook-instance-name my_note
ֶशͱਪᶃ• Built-InΞϧΰϦζϜk-meansPCALDAFactorization MachinesLinear LearnerNeural Topic ModelRandom Cut ForestSeq2Seq ModelingXGBoostObject DetectionImage ClassificationDeepAR ForecastingBlazingTextk-nearest-neighbor (k-NN)‣ Factorization Machines => Ϩίϝϯυ‣ XGBoost => ଞΫϥεྨ‣ Image Classification => αϜωΠϧը૾ྨ‣ k-means => ΫϥελϦϯά
ֶशͱਪᶄ• Factorization MachinesͰͭ·͍ͣͨͱ͜Ζ՝ɿnumpyͰѻ͏ʹେ͖͗͢ΔτϨʔχϯάσʔληοτ
ֶशͱਪᶄ• Factorization MachinesͰͭ·͍ͣͨͱ͜Ζରࡦɿscipy.sparse.lil_matrixʹΑΔεύʔεߦྻͷੜ͢Δେ͖ͳεύʔεߦྻΛ̍ͰຒΊ͍ͯ͘
ֶशͱਪᶄ• Factorization MachinesͰͭ·͍ͣͨͱ͜Ζ՝ɾରࡦɿਪྔ͕ଟ͍numpy:1ߦ -> scr:10000ߦʢ16࣌ؒ -> 20ʣCompressed Sparse Row matrix ʹѹॖ csrߦྻ͕ࢦఆͰ͖Δ※) Batch transform job ʹमਖ਼த
ֶशͱਪᶅ• XGBoostͰͭ·͍ͣͨͱ͜Ζ՝ɿϋΠύʔύϥϝλௐδϣϒͬͯͲ͏ͬͯ͏ͷʁ
ֶशͱਪᶅ• XGBoostͰͭ·͍ͣͨͱ͜ΖରࡦɿϋΠύʔύϥϝλௐδϣϒͷҾʹrangesύϥϝλΛ͢
ֶशͱਪᶅ• XGBoostͰͭ·͍ͣͨͱ͜ΖରࡦɿϋΠύʔύϥϝλௐδϣϒͷ࣮ߦ
ֶशͱਪᶅ• XGBoostͰͭ·͍ͣͨͱ͜ΖରࡦɿϋΠύʔύϥϝλௐδϣϒΛίϯιʔϧͰ֬ೝvalidation:auc
ֶशͱਪᶆ• Image ClassificationͰͭ·͍ͣͨͱ͜Ζ՝: τϨʔχϯάσʔληοτͬͯͲ͏ͬͯ༻ҙ͢ΔͷʁMXNetͷrecϑΝΠϧΛࢦఆ͢Δ
ֶशͱਪᶆ• Image ClassificationͰͭ·͍ͣͨͱ͜ΖରࡦɿMXNetͷlstϑΝΠϧͱrecϑΝΠϧͷ࡞MXNET_HOME = ‘~/incubator-mxnet/'RESOURCE_DIR = ‘~/thumbnails/'os.system('python {0}/tools/im2rec.py --list --recursive --train-ratio 0.8 --test-ratio 0.2 {1}/im2rec/target {1}'.format(MXNET_HOME, RESOURCE_DIR))os.system('python {0}/tools/im2rec.py --resize 480 --quality 95 --num-thread 64 {1}/im2rec/train {1}'.format(MXNET_HOME, RESOURCE_DIR))os.system('python {0}/tools/im2rec.py --resize 480 --quality 95 --num-thread 64 {1}/im2rec/test {1}'.format(MXNET_HOME, RESOURCE_DIR))1.https://github.com/apache/incubator-mxnet.git2.ֶश͢ΔαϜωΠϧը૾ΛPCʹμϯϩʔυ3.࡞ͨ͠recϑΝΠϧΛS3ͷॴఆͷॴʹΞοϓϩʔυ
ֶशͱਪᶇ• k-meansͰͭ·͍ͣͨͱ͜Ζ՝ɾରࡦɿkΫϥελʔͷ࠷దͲ͏ͬͯௐΔͷʁ͜Εʹؔͯ͠ϋΠύʔύϥϝλௐδϣϒͰݱ࣌ͰͰ͖ͳ͍ͷͰҎԼͷํ๏ͰಓʹௐΔɻΤϧϘʔ๏ γϧΤοτੳ
ETLɾֶशόονγεςϜ• Kubernetes(kops)Λج൫ʹબͨ͠ཧ༝step functionsʗAWS BatchͰɺδϣϒͱδϣϒϑϩʔΛҰॹʹཧͰ͖ͳ͍ɻεέδϡʔϥʔ͕cronjobs͚ͩͰγϯϓϧʹཧͰ͖ɺίϚϯυͰ؆୯ʹมߋͰ͖ΔɻΦϯϥΠϯֶशͰBatchͱAPIΛ࿈ܞ͢Δඞཁ͕͋ͬͨɻকདྷతʹEKSʢ౦ژϦʔδϣϯʣͰཧͰ͖Δɻstep functionsAWS Batch෦తʹ༻ՄೳɻSageMakerͰֶश͕ίϯςφʹΓͤΔͷͰɺόονγεςϜͷઃܭ͕ॊೈʹߦ͑Δɻ
SageMakerΛ̑ϲ݄ͬͯΈͨײ• ੳʢϊʔτϒοΫΠϯελϯεʣϥΠϑαΠΫϧઃఆ͕ศརʗ͓खܰʹڥΛηοτΞοϓͰ͖Δͪΐͬͱॲཧ͕ॏ͘ͳͬͨͱࢥͬͨΒɺ͋ͱ͔ΒΠϯελϯελΠϓΛมߋՄೳ• ֶशͱਪʢΞϧΰϦζϜɾίϯςφʣBuilt-inΞϧΰϦζϜɺTensorflowʗChainerͳͲਂֶशϑϨʔϜϫʔΫॆֶ࣮शίϯςφ͕Γ͞ΕΔͷͰɺ࣮ߦதͷδϣϒϦιʔεΛؾʹ͠ͳͯ͘ࡁΉϊʔτϒοΫΛෳਓͰར༻Ͱ͖ΔϞσϧΛ؆୯ʹΤϯυϙΠϯτͱͯ͠σϓϩΠͰ͖ɺΦʔτεέʔϧՄೳϋΠύʔύϥϝλௐδϣϒΛͬͯɺҰ൪ྑ͍ϋΠύʔύϥϝλΛࣗಈઃఆͰ͖Δ
ࠓޙͷల• ৯ࡐͷ ༨Γ ͢ ͞ Λ ߟྀ͠ ͨ Ϩ γ ϐ ఏҊ1. աڈʹ ࢹௌ͠ ͨ Ϩ γ ϐ ͷ தͰ ༨Γ ͢ ͍ ৯ࡐΛ ผ2. ͦ ͷ ৯ࡐΛ ޮΑ ͘ ফඅͰ ͖ Δ Ϩ γ ϐ Λ ఏҊ• ύʔιφϥΠζͨ͠ϨγϐͷఏҊ1. ʰ ਏ ͍ ʗ ͍ ʱ ɺ ʰ ͜ ͬ ͯ Γ ʗ ͞ ͬ ͺ Γ ʱ ͳ Ͳ ɺΑ Γ Ϣ ʔ β ͷ Έ ϥ Π ϑ ε λ Π ϧ ʹ ߹ ͬ ͨ Ϩ γϐ ͷ ఏ Ҋ2. ༨ ͬ ͨ ৯ ࡐ ʹ ͪ ΐ ͍ ͠ ͠ ͯ Ͱ ͖ Δ Ϩ γ ϐ ͷ ఏ Ҋ
delyͰػցֶशΤϯδχΞΛืू͍ͯ͠·͢ʂ• ΫϥγϧγΣϑ͕࡞ͬͨϨγϐຊʹඒຯ͍͠ΜͰ͢Αɻඒ ຯ ͠ ͦ ͏ ͳ ͷ ݟ ͨ ͩ ͚ ͳ Μ Ͱ ͠ ΐ ͏ ʁ ͍ ͍ ɺͦ Μ ͳ ͜ ͱ ͳ ͍ Μ Ͱ ͢ɻ ຯ Θ ͬ ͯ Έ Δ ͭ ͍ Ͱ ʹ ػ ց ֶश Γ ͨ ͍ ͱ ͍ ͏ ํ ͥ ͻ ͓ ͪ ͠ ͯ ͓ Γ · ͢ ʂ• ػցֶशʹؔ࿈͢Δ͜ͱશ෦ܦݧͰ͖·͢ɻ͍ · ͷ ͱ ͜ Ζ σ ʔ λ ੳ ɺ α ʔ Ϗ ε ఏ ڙ ɺ ֶ श Ξ ϧ ΰϦ ζ Ϝ બ ఆ ɺ ج ൫ ߏ ங ɾ ӡ ༻ · Ͱ શ ෦ Ұ ਓ Ͱ ͬ ͯ ·͢ɻ গ ͠ େ ͖ ͍ ن ͷ ৫ ͩ ͱ ෳ ਓ Ͱ Δ Α ͏ ͳ ͜ͱ Λ ڽ ॖ ͠ ͯ ܦ ݧ Ͱ ͖ · ͢ ʂ