Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ママ向けコミュニティサービスを支えるNLP

 ママ向けコミュニティサービスを支えるNLP

ママリでは、質問投稿の検閲フィルタリングにNLPを用いたリアルタイム推論を行なっています。
AWSの各種マネージドサービスを有効活用し、日本語の自然言語処理を円滑に行う方法についての知見をまとめました。

Takanobu Nozawa

August 27, 2019
Tweet

More Decks by Takanobu Nozawa

Other Decks in Technology

Transcript

  1.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz
  2.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz ى͜Γ͏Δ͜ͱ w ϩʔΧϧͱຊ൪ͷ.FDBCͬͯಉ͔͡ͳʁ w ຊ൪Ͱಈ͍͍ͯΔϞσϧ࡞ͬͨOPUFCPPLͲΕ͚ͩͬʁ w ͦΕͬΆ͍OPUFCPPL͸ݟ͔͚ͭͬͨͲɺͪΐ͍ͪΐ͍ม ߋ͠ͳ͕Β࣮ߦͯ͠Δ͔Βɺલॲཧͱ͔ຊ౰ʹ͜ͷ··Ͱ େৎ෉͔ͳʁ FUD
  3.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz ʊਓਓਓਓਓਓਓਓਓਓਓਓਓਓਓʊ ʼɹѹ౗తͳ৺ཧత҆શੑͷ௿͞ɹʻ ʉ?:?:?:?:?:?:?:?:?:?:?:ʉ
  4.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz ࠓ೔࿩͢͜ͱ
  5.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz "84ͷαʔϏεΛ׆༻͢Δ͜ͱͰ ԁ׈ͳ.-ϑϩʔΛߏஙͰ͖ͨ ͱ͍͏࿩Λ͠·͢
  6. ˙"CPVUϚϚϦ   ˞ʮӾཡ਺ʯʮར༻ऀ਺ʯ͸ϝσΟΞͱΞϓϦͷ߹ܭ஋ʢ೥݄݄ͷฏۉ஋ʣ ˞ʮϚϚ޲͚/PΞϓϦʯ͸೥݄Πϯςʔδௐ΂ɹௐࠪର৅ɿ೛৷தʙ̎ࡀ̌ϲ݄ͷࢠڙΛ࣋ͭঁੑ O  Λநग़ ˞*OTUBHSBNͷϑΥϩϫʔ਺ɺ'BDFCPPLͷ͍͍Ͷ਺ɺ-*/&ͷͱ΋ͩͪ਺ͷ߹ܭ஋ ೥݄࣌఺

     ϚϚϦ ΞϓϦɾ8FC 4/4 *OTUBHSBNɾ-*/&ɾ'BDFCPPL هࣄ ϚϚಉ࢜Ͱ೰ΈΛ૬ஊ͠߹͏2"ίϛϡχςΟΛத৺ʹ ϢʔβʔΛ֦େ͍ͯ͠·͢ ʮϚϚϦʯͰϢʔβʔಉ͕࢜ ͲΜͲΜܨ͕͍ͬͯ·͢ ϚϚͷੜ׆ʹ໾ཱͭهࣄΛ ෯޿͍δϟϯϧͰ഑৴͍ͯ͠·͢ ϚϚ޲͚/P̍ΞϓϦʹબग़  ਓͷϚϚ͕બͿʮݱࡏ࢖͍ͬͯΔΞϓϦʯʹ ͯɺ߲໨ ଞͷϚϚʹΦεεϝ͍ͨ͠ɺೝ஌౓ɺ
 ར༻཰ɺརศੑɺ޷ײ౓ Ͱ̍ҐΛ֫ಘ͠·ͨ͠ هࣄ਺ 6,000 هࣄҎ্ ྦྷܭϑΝϯ਺ ໿ 85 ສਓ ˞ ݄ؒӾཡ਺ ໿ 1.5ԯճ ˞ ݄ؒར༻ऀ਺ ໿ 650ສਓ ˞ ˞ l೰ΈzͱzڞײzΛ࣠ʹϚϚʹدΓఴ͍ ΞϓϦɾ8FCɾ4/4ͱଟ֯తʹαʔϏεΛల։͍ͯ͠·͢
  7. 0 450,000 900,000 1,350,000 1,800,000 2014/4 2014/5 2014/6 2014/7 2014/8

    2014/9 2014/10 2014/11 2014/12 2015/1 2015/2 2015/3 2015/4 2015/5 2015/6 2015/7 2015/8 2015/9 2015/10 2015/11 2015/12 2016/1 2016/2 2016/3 2016/4 2016/5 2016/6 2016/7 2016/8 2016/9 2016/10 2016/11 2016/12 2017/1 2017/2 2017/3 2017/4 2017/5 2017/6 2017/7 2017/8 2017/9 2017/10 2017/11 2017/12 2018/1 2018/2 2018/3 2018/4 2018/5 2018/6 2018/7 2018/8 ˙"CPVUϚϚϦ   ݄ؒ౤ߘ਺ ໿ 150ສ݅ िʹ೔Ҏ্ىಈ͢Δ ΞΫςΟϒϢʔβʔ ໿ 50 ਓʹਓ 57$. ์ө ΞϓϦ૯%-਺ສ ਓʹਓ ਓʹਓ ਓʹਓ ਓʹਓ ˞ ˞ʮϚϚϦʯ಺ͷग़࢈༧ఆ೔Λઃఆͨ͠Ϣʔβʔ਺ͱɺްੜ࿑ಇলൃදʮਓޱಈଶ౷ܭʯͷग़ੜ਺͔Βࢉग़
 ˞िʹճҎ্ىಈ͢ΔϢʔβʔ ˞ ೥ʹग़࢈ͨ͠ϚϚͷʮਓʹਓʯ͕ϚϚϦΛར༻த ೔ຊ࠷େڃن໛ΛތΔϒϥϯυ΁ͱ੒௕͍ͯ͠·͢ ˞
  8. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  9. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ֶश
  10. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ਪ࿦
  11. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ৄ͘͠
  12. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ࣄલֶशʹ͍ͭͯ
  13. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  14. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ &5-ͱલॲཧʹ͍ͭͯ
  15. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model w'BSHBUFͷόονॲཧ಺Ͱ ݴޠϞσϧͱֶशσʔλΛ ࢖༻͠ɺςΩετͷલॲཧ ϕΫτϧԽ
  16. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  17. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  18. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ ࠓ೔͸"84-PGUͰ ࣗવݴޠॲཧͷొஃΛ͠·͢ʂ
  19. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ ࠓ೔͸"84-PGUͰ ࣗવݴޠॲཧͷొஃΛ͠·͢ʂ <ࠓ೔ "84 -PGU ࣗવݴ ޠॲཧ ొஃ ͢Δ>
  20. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ ࠓ೔͸"84-PGUͰ ࣗવݴޠॲཧͷొஃΛ͠·͢ʂ <ࠓ೔ "84 -PGU ࣗવݴ ޠॲཧ ొஃ ͢Δ> <ࠓ೔ BXT MPGU ࣗવݴ ޠॲཧ ొஃ ͢Δ>
  21. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  22. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  23. ˙ετοϓϫʔυͱࣙॻ   ɾετοϓϫʔυɿෆඞཁͳ୯ޠ ɹFHҰൠޠʢ͋Ε ͢Δ ࢥ͏ʣɺ௿ग़ݱޠͳͲ ɾࣙॻʢUPLFOJ[FSʣɿ୯ޠʹJOEFYΛׂΓ౰ͯͨ΋ͷ ɹFHʮࠓ೔͸"84-PGUͰࣗવݴޠॲཧͷొஃΛ͠·͢ʂʯ ɹɹɹɹˠ<ࠓ೔>

    <BXT> <MPGU> <ࣗવݴޠॲཧ>ʜ # kerasΛ༻͍ͨtokenizerͷੜ੒&อଘ tokenizer = Tokenizer(lower=False) all_content = train['content'].values.tolist() + test['content'].values.tolist() tokenizer.fit_on_texts(all_content) save_text_tokenizer(tokenizer, TOKENIZER_FILE_NAME)
  24. ˙&NCFEEJOH.BUSJY   w χϡʔϥϧωοτϫʔΫͷຒΊࠐΈ૚ʢ&NCFEEJOH-BZFSʣ ʹॳظઃఆ͢ΔͨΊͷ୯ޠͷ෼ࢄදݱߦྻ FHTIBQF ୯ޠ਺ XWϞσϧֶश࣌ͷ࣍ݩ਺ ࠓ೔

        ɾɾɾɾɾ BXT     ɾɾɾɾɾ MPGU     ɾɾɾɾɾ ࣗવݴޠॲཧ     ɾɾɾɾɾ ొஃ     ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ
  25. ˙&NCFEEJOH.BUSJY   w χϡʔϥϧωοτϫʔΫͷຒΊࠐΈ૚ʢ&NCFEEJOH-BZFSʣ ʹॳظઃఆ͢ΔͨΊͷ୯ޠͷ෼ࢄදݱߦྻ FH # embedding_matrixͷઃఆ vocab_size

    = len(tokenizer.word_index) + 1 # ಛ௃ྔͷ࠷େ஋ embedding_vector_size = model_w2v.wv.vector_size # ࣄલֶशϞσϧͷ࣍ݩ਺ embedding_matrix = np.zeros((vocab_size, embedding_vector_size)) for word, i in tokenizer.word_index.items(): try: embedding_vector = model_w2v.wv[word] # w2vϞσϧ͔Βର৅ͷ୯ޠ͕͋Ε͹ͦͷ෼ࢄදݱΛઃఆ except KeyError: pass if embedding_vector is not None: embedding_matrix[i] = embedding_vector # ୯ޠͷ෼ࢄදݱΛɺߦྻͷ୯ޠindex൪໨ʹઃఆ # อଘ np.save(EMBEDDING_FILE_NAME, embedding_matrix)
  26. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wֶशσʔλʢUSBJODTWʣ wςετσʔλʢUFTUDTWʣ wࣙॻʢUPLFOJ[FSQLMʣ wετοϓϫʔυʢTUPQXPSETQLMʣ w&NCFEEJOH.BUSJYʢFNCOQZʣ
  27. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wֶशσʔλʢUSBJODTWʣ wςετσʔλʢUFTUDTWʣ wࣙॻʢUPLFOJ[FSQLMʣ wετοϓϫʔυʢTUPQXPSETQLMʣ w&NCFEEJOH.BUSJYʢFNCOQZʣ ೔෇ຖʹ؅ཧ͢Δ͜ͱʹΑΓɺ Ϟσϧʹ࢖༻ͨ͠σʔλΛ໌֬Խ ˠϞσϧͷ࠶ݱੑΛอূ
  28. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wҰ࿈ͷॲཧ͸4UFQ'VODUJPOTͰࣗ ಈԽ͞Ε͍ͯΔͨΊɺϞσϧߋ৽ͷ ࡍʹखΛಈ͔͢ͷ͸ޙड़͢Δ 4BHF.BLFSͰͷֶशσϓϩΠͷ Έ
  29. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wҰ࿈ͷॲཧ͸4UFQ'VODUJPOTͰࣗ ಈԽ͞Ε͍ͯΔͨΊɺϞσϧߋ৽ͷ ࡍʹखΛಈ͔͢ͷ͸ޙड़͢Δ 4BHF.BLFSͰͷֶशσϓϩΠͷ Έ Ϟσϧͷߏஙʹ஫ྗͰ͖Δ
  30. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  31. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ֶशͱσϓϩΠʹ͍ͭͯ
  32. ˙εΫϦϓτϞʔυʢOPUFCPPLʣ   w FTUJNBUPSͷҾ਺ʹεΫϦϓτϑΝΠϧ໊Λ౉͚ͩ͢Ͱ0, [1] from sagemaker.tensorflow import TensorFlow

    [2] estimator = TensorFlow( entry_point='clf_keras_lstm.py', role=role, framework_version='1.12.0', hyperparameters=hyper_param, train_instance_count=1, train_instance_type='ml.p3.2xlarge', script_mode=True, output_path='s3://' + s3_bucket + '/questions/model', code_location='s3://' + s3_bucket + '/questions/model', py_version='py3' ) FH
  33. ˙εΫϦϓτϞʔυʢOPUFCPPLʣ   w FTUJNBUPSͷҾ਺ʹεΫϦϓτϑΝΠϧ໊Λ౉͚ͩ͢Ͱ0, [1] from sagemaker.tensorflow import TensorFlow

    [2] estimator = TensorFlow( entry_point='clf_keras_lstm.py', role=role, framework_version='1.12.0', hyperparameters=hyper_param, train_instance_count=1, train_instance_type='ml.p3.2xlarge', script_mode=True, output_path='s3://' + s3_bucket + '/questions/model', code_location='s3://' + s3_bucket + '/questions/model', py_version='py3' ) FH
  34. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w JG@@OBNF@@@@NBJO@@ͰҾ਺ͷઃఆΛ͢Δ if __name__ == '__main__': parser

    = argparse.ArgumentParser() # ϋΠύʔύϥϝʔλΛड͚औΔ parser.add_argument('--batch-size', type=int, default=512) parser.add_argument('--epochs', type=int, default=5) # SageMaker ݻ༗ͷҾ਺ ؀ڥม਺ʹ͸σϑΥϧτ஋͕ઃఆࡁ parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR']) # ϞσϧҎ֎ͷग़ྗϑΝΠϧͷอଘઌ parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) # ֶशޙͷϞσϧͷอଘઌ parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) # ֶशσʔλͷύε parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST']) # ςετσʔλͷύε parser.add_argument('--embedding', type=str, default=os.environ['SM_CHANNEL_EMBEDDING']) # ຒΊࠐΈσʔλͷύε args, _ = parser.parse_known_args() # ֶश train(args) FH
  35. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w JG@@OBNF@@@@NBJO@@ͰҾ਺ͷઃఆΛ͢Δ if __name__ == '__main__': parser

    = argparse.ArgumentParser() # ϋΠύʔύϥϝʔλΛड͚औΔ parser.add_argument('--batch-size', type=int, default=512) parser.add_argument('--epochs', type=int, default=5) # SageMaker ݻ༗ͷҾ਺ ؀ڥม਺ʹ͸σϑΥϧτ஋͕ઃఆࡁ parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR']) # ϞσϧҎ֎ͷग़ྗϑΝΠϧͷอଘઌ parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) # ֶशޙͷϞσϧͷอଘઌ parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) # ֶशσʔλͷύε parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST']) # ςετσʔλͷύε parser.add_argument('--embedding', type=str, default=os.environ['SM_CHANNEL_EMBEDDING']) # ຒΊࠐΈσʔλͷύε args, _ = parser.parse_known_args() # ֶश train(args) FH
  36. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w USBJOʹ࡞੒ͨ͠εΫϦϓτΛهड़ def train(args): X_train, X_valid, y_train,

    y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0, stratify=y_train) model = build_model(emb_matrix=embedding_matrix, input_length=X_train.shape[1]) model.compile(loss="binary_crossentropy",optimizer='adam',metrics=['accuracy']) es_cb = EarlyStopping(monitor='val_loss', patience=3, verbose=2, mode='auto') model.fit( x=X_train, y=y_train, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=epochs, verbose=2, callbacks=[es_cb] ) # Ϟσϧͷอଘ save(model, args.model_dir) FH
  37. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w USBJOʹ࡞੒ͨ͠εΫϦϓτΛهड़ def train(args): X_train, X_valid, y_train,

    y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0, stratify=y_train) model = build_model(emb_matrix=embedding_matrix, input_length=X_train.shape[1]) model.compile(loss="binary_crossentropy",optimizer='adam',metrics=['accuracy']) es_cb = EarlyStopping(monitor='val_loss', patience=3, verbose=2, mode='auto') model.fit( x=X_train, y=y_train, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=epochs, verbose=2, callbacks=[es_cb] ) # Ϟσϧͷอଘ save(model, args.model_dir) FH
  38. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w TBWFͰ࡞੒ͨ͠ϞσϧΛ4ʹอଘ def save(model, model_dir): sess =

    K.get_session() tf.saved_model.simple_save( sess, os.path.join(model_dir, 'model/1'), inputs={'inputs': model.input}, outputs={t.name: t for t in model.outputs}) FH
  39. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w TBWFͰ࡞੒ͨ͠ϞσϧΛ4ʹอଘ def save(model, model_dir): sess =

    K.get_session() tf.saved_model.simple_save( sess, os.path.join(model_dir, 'model/1'), inputs={'inputs': model.input}, outputs={t.name: t for t in model.outputs}) FH ҙࣝ͢Δͷ͸ ʮҾ਺ͷઃఆʯͱʮϞσϧͷอଘʯ͚ͩ
  40. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w TBWFͰ࡞੒ͨ͠ϞσϧΛ4ʹอଘ def save(model, model_dir): sess =

    K.get_session() tf.saved_model.simple_save( sess, os.path.join(model_dir, 'model/1'), inputs={'inputs': model.input}, outputs={t.name: t for t in model.outputs}) FH ؆୯
  41. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  42. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ϦΞϧλΠϜਪ࿦ʹ͍ͭͯ
  43. ˙ΞʔΩςΫνϟɿਪ࿦   S3 Fargate Flask API wΞϓϦ͔ΒSBXσʔλΛड͚औΓɺ ϕΫτϧԽͯ͠4BHF.BLFSͷਪ࿦Τ ϯυϙΠϯτʹ౉͢

    stopwords.pkl tokenizer.pkl 0, 13, 542, 9, 4723, 65 ࠓ೔͸AWS LoftͰ ࣗવݴޠॲཧͷ ొஃΛ͠·͢ʂ
  44. ˙ΞʔΩςΫνϟɿਪ࿦   S3 Fargate Flask API wΞϓϦ͔ΒSBXσʔλΛड͚औΓɺ ϕΫτϧԽͯ͠4BHF.BLFSͷਪ࿦Τ ϯυϙΠϯτʹ౉͢

    wਪ࿦ΤϯυϙΠϯτ͔Β͸zҧ൓౤ߘ ֬཰z͕Ϧλʔϯ͞ΕΔ stopwords.pkl tokenizer.pkl 0, 13, 542, 9, 4723, 65 0.189 ࠓ೔͸AWS LoftͰ ࣗવݴޠॲཧͷ ొஃΛ͠·͢ʂ
  45. ˙ΞʔΩςΫνϟɿਪ࿦   S3 0 or 1 Fargate Flask API

    wΞϓϦ͔ΒSBXσʔλΛड͚औΓɺ ϕΫτϧԽͯ͠4BHF.BLFSͷਪ࿦Τ ϯυϙΠϯτʹ౉͢ wਪ࿦ΤϯυϙΠϯτ͔Β͸zҧ൓౤ߘ ֬཰z͕Ϧλʔϯ͞ΕΔ w֬཰͕ࢦఆͷᮢ஋ΑΓ௿͚Ε͹ਖ਼ৗ ͏౤ߘ  ɺߴ͚Ε͹ҧ൓౤ߘ  ͱͯ͠ΞϓϦʹϦλʔϯ stopwords.pkl tokenizer.pkl 0, 13, 542, 9, 4723, 65 0.189 ࠓ೔͸AWS LoftͰ ࣗવݴޠॲཧͷ ొஃΛ͠·͢ʂ
  46. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API
  47. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue EndPoint SageMaker

    StepFunctions Preprocessing Task Fargate Flask API ࠓ೔͸AWS LoftͰࣗવݴޠॲཧͷొஃΛ͠·͢ʂ 0 or 1 0, 13, 542, 9, 4723, 65 . . . 0.189 train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl