ママ向けコミュニティサービスを支えるNLP

 ママ向けコミュニティサービスを支えるNLP

ママリでは、質問投稿の検閲フィルタリングにNLPを用いたリアルタイム推論を行なっています。
AWSの各種マネージドサービスを有効活用し、日本語の自然言語処理を円滑に行う方法についての知見をまとめました。

C5c09bfd9ee31f5aef8ce257643d50ea?s=128

Takanobu Nozawa

August 27, 2019
Tweet

Transcript

  1. ϚϚͷҰาΛࢧ͑Δ ϚϚ޲͚ίϛϡχςΟαʔϏεΛ ࢧ͑Δ/-1   $POOFIJUP*OD໺ᖒ఩র  .-!-PGU/-1

  2. ͍͖ͳΓͰ͕͢ʜ  

  3. ೔ຊޠͷࣗવݴޠॲཧͬͯ l΍Δ͜ͱzଟ͘ͳ͍Ͱ͔͢ʁ  

  4. ೔ຊޠͷࣗવݴޠॲཧͬͯ l΍Δ͜ͱzଟ͘ͳ͍Ͱ͔͢ʁ  

  5.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz
  6.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz ى͜Γ͏Δ͜ͱ w ϩʔΧϧͱຊ൪ͷ.FDBCͬͯಉ͔͡ͳʁ w ຊ൪Ͱಈ͍͍ͯΔϞσϧ࡞ͬͨOPUFCPPLͲΕ͚ͩͬʁ w ͦΕͬΆ͍OPUFCPPL͸ݟ͔͚ͭͬͨͲɺͪΐ͍ͪΐ͍ม ߋ͠ͳ͕Β࣮ߦͯ͠Δ͔Βɺલॲཧͱ͔ຊ౰ʹ͜ͷ··Ͱ େৎ෉͔ͳʁ FUD
  7.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz ʊਓਓਓਓਓਓਓਓਓਓਓਓਓਓਓʊ ʼɹѹ౗తͳ৺ཧత҆શੑͷ௿͞ɹʻ ʉ?:?:?:?:?:?:?:?:?:?:?:ʉ
  8.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz ࠓ೔࿩͢͜ͱ
  9.   w ܗଶૉղੳ w લॲཧ  ඼ࢺ੍ݶ  ਖ਼نԽ

     ετοϓϫʔυআڈ w ࣙॻ؅ཧ w ετοϓϫʔυ؅ཧ FUD ˙8IBUTl΍Δ͜ͱz "84ͷαʔϏεΛ׆༻͢Δ͜ͱͰ ԁ׈ͳ.-ϑϩʔΛߏஙͰ͖ͨ ͱ͍͏࿩Λ͠·͢
  10. ຊ୊΁ ʢ˞ࢿྉ͸ޙ΄Ͳެ։͠·͢ʂʣ  

  11. ˙ΞδΣϯμ   ࣗݾ঺հ "CPVUϚϚϦ ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ ΞʔΩςΫνϟ Ϟσϧͷӡ༻

  12. ࣗݾ঺հ  

  13. ˙ࣗݾ঺հ   ໊લɿ໺ᖒ఩রʢ/P[BXB5BLBOPCVʣ ॴଐɿίωώτגࣜձࣾ ɹɹɿ!UBLBQZ w ʙίωώτʹ.-ΤϯδχΞͱͯ͠+0*/ w ػցֶशؔ࿈ΛϝΠϯʹ΍ΓͭͭɺΠϯϑϥ΋ษڧத

    w ,BHHMFͨ͠ΓɺϒϩάʢIUUQTXXXUBLBQZXPSLʣॻ͍ͨΓɺ ໺ٿͨ͠Γ͍ͯ͠·͢
  14. "CPVUϚϚϦ  

  15. ˙"CPVUϚϚϦ   ˞ʮӾཡ਺ʯʮར༻ऀ਺ʯ͸ϝσΟΞͱΞϓϦͷ߹ܭ஋ʢ೥݄݄ͷฏۉ஋ʣ ˞ʮϚϚ޲͚/PΞϓϦʯ͸೥݄Πϯςʔδௐ΂ɹௐࠪର৅ɿ೛৷தʙ̎ࡀ̌ϲ݄ͷࢠڙΛ࣋ͭঁੑ O  Λநग़ ˞*OTUBHSBNͷϑΥϩϫʔ਺ɺ'BDFCPPLͷ͍͍Ͷ਺ɺ-*/&ͷͱ΋ͩͪ਺ͷ߹ܭ஋ ೥݄࣌఺

     ϚϚϦ ΞϓϦɾ8FC 4/4 *OTUBHSBNɾ-*/&ɾ'BDFCPPL هࣄ ϚϚಉ࢜Ͱ೰ΈΛ૬ஊ͠߹͏2"ίϛϡχςΟΛத৺ʹ ϢʔβʔΛ֦େ͍ͯ͠·͢ ʮϚϚϦʯͰϢʔβʔಉ͕࢜ ͲΜͲΜܨ͕͍ͬͯ·͢ ϚϚͷੜ׆ʹ໾ཱͭهࣄΛ ෯޿͍δϟϯϧͰ഑৴͍ͯ͠·͢ ϚϚ޲͚/P̍ΞϓϦʹબग़  ਓͷϚϚ͕બͿʮݱࡏ࢖͍ͬͯΔΞϓϦʯʹ ͯɺ߲໨ ଞͷϚϚʹΦεεϝ͍ͨ͠ɺೝ஌౓ɺ
 ར༻཰ɺརศੑɺ޷ײ౓ Ͱ̍ҐΛ֫ಘ͠·ͨ͠ هࣄ਺ 6,000 هࣄҎ্ ྦྷܭϑΝϯ਺ ໿ 85 ສਓ ˞ ݄ؒӾཡ਺ ໿ 1.5ԯճ ˞ ݄ؒར༻ऀ਺ ໿ 650ສਓ ˞ ˞ l೰ΈzͱzڞײzΛ࣠ʹϚϚʹدΓఴ͍ ΞϓϦɾ8FCɾ4/4ͱଟ֯తʹαʔϏεΛల։͍ͯ͠·͢
  16. 0 450,000 900,000 1,350,000 1,800,000 2014/4 2014/5 2014/6 2014/7 2014/8

    2014/9 2014/10 2014/11 2014/12 2015/1 2015/2 2015/3 2015/4 2015/5 2015/6 2015/7 2015/8 2015/9 2015/10 2015/11 2015/12 2016/1 2016/2 2016/3 2016/4 2016/5 2016/6 2016/7 2016/8 2016/9 2016/10 2016/11 2016/12 2017/1 2017/2 2017/3 2017/4 2017/5 2017/6 2017/7 2017/8 2017/9 2017/10 2017/11 2017/12 2018/1 2018/2 2018/3 2018/4 2018/5 2018/6 2018/7 2018/8 ˙"CPVUϚϚϦ   ݄ؒ౤ߘ਺ ໿ 150ສ݅ िʹ೔Ҏ্ىಈ͢Δ ΞΫςΟϒϢʔβʔ ໿ 50 ਓʹਓ 57$. ์ө ΞϓϦ૯%-਺ສ ਓʹਓ ਓʹਓ ਓʹਓ ਓʹਓ ˞ ˞ʮϚϚϦʯ಺ͷग़࢈༧ఆ೔Λઃఆͨ͠Ϣʔβʔ਺ͱɺްੜ࿑ಇলൃදʮਓޱಈଶ౷ܭʯͷग़ੜ਺͔Βࢉग़
 ˞िʹճҎ্ىಈ͢ΔϢʔβʔ ˞ ೥ʹग़࢈ͨ͠ϚϚͷʮਓʹਓʯ͕ϚϚϦΛར༻த ೔ຊ࠷େڃن໛ΛތΔϒϥϯυ΁ͱ੒௕͍ͯ͠·͢ ˞
  17. ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ  

  18. ˙ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ   ՝୊ɿ

  19. ˙ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ   ՝୊ɿ ࣭໰ऀ ճ౴ऀ

  20. ˙ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ   ՝୊ɿ ࣭໰ऀ ෆద੾ͳίϯςϯπͷ౤ߘ FH؆୯ʹՔ͛Δํ๏ڭ͑·͢Α ճ౴ऀ

  21. ˙ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ   ՝୊ɿ ࣭໰ऀ ճ౴ऀ ෆద੾ͳίϯςϯπͷ౤ߘ FH؆୯ʹՔ͛Δํ๏ڭ͑·͢Α ݕ Ӿ

    ϑ Ο ϧ λ
  22. ˙ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ   ՝୊ɿ ࣭໰ऀ ճ౴ऀ ෆద੾ͳίϯςϯπͷ౤ߘ FH؆୯ʹՔ͛Δํ๏ڭ͑·͢Α ݕ Ӿ

    ϑ Ο ϧ λ ػցֶशΛ׆༻
  23. ˙ϚϚϦʹ͓͚Δ՝୊ͱ/-1׆༻ࣄྫ   ՝୊ɿ ࣭໰ऀ ճ౴ऀ ෆద੾ͳίϯςϯπͷ౤ߘ FH؆୯ʹՔ͛Δํ๏ڭ͑·͢Α ݕ Ӿ

    ϑ Ο ϧ λ ػցֶशΛ׆༻ /-1Λ࢖ͬͯҧ൓౤ߘΛ ϦΞϧλΠϜʹݕ஌
  24. ΞʔΩςΫνϟ  

  25. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  26. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ֶश
  27. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ਪ࿦
  28. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ৄ͘͠
  29. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ࣄલֶशʹ͍ͭͯ
  30. ˙ΞʔΩςΫνϟɿࣄલֶश   S3 wHFOTJNXPSEWFDΛ༻͍ͯݴޠ ϞσϧΛ࡞੒͠4΁อଘ wίʔύεʹ͸ϚϚϦ಺ͷσʔλΛ ࢖༻ w2v model

  31. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  32. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ &5-ͱલॲཧʹ͍ͭͯ
  33. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

  34. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    w(MVFΛར༻ͯ͠3%4͔Β ֶशσʔλΛநग़ train.tsv
  35. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model w'BSHBUFͷόονॲཧ಺Ͱ ݴޠϞσϧͱֶशσʔλΛ ࢖༻͠ɺςΩετͷલॲཧ ϕΫτϧԽ
  36. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  37. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  38. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ ࠓ೔͸"84-PGUͰ ࣗવݴޠॲཧͷొஃΛ͠·͢ʂ
  39. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ ࠓ೔͸"84-PGUͰ ࣗવݴޠॲཧͷొஃΛ͠·͢ʂ <ࠓ೔ "84 -PGU ࣗવݴ ޠॲཧ ొஃ ͢Δ>
  40. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ ࠓ೔͸"84-PGUͰ ࣗવݴޠॲཧͷొஃΛ͠·͢ʂ <ࠓ೔ "84 -PGU ࣗવݴ ޠॲཧ ొஃ ͢Δ> <ࠓ೔ BXT MPGU ࣗવݴ ޠॲཧ ొஃ ͢Δ>
  41. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  42. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model  ෼͔ͪॻ͖ .FDBC   ඼ࢺ੍ݶ<໊ࢺ ಈࢺ ܗ༰ࢺ>  ਖ਼نԽ  ετοϓϫʔυܭࢉআڈ  ࣙॻͷ࡞੒  &NCFEEJOH.BUSJYͷ࡞੒  ςΩετσʔλΛγʔέϯεԽ  σʔλΛUSBJO UFTUʹ෼ׂ
  43. ˙ετοϓϫʔυͱࣙॻ   ɾετοϓϫʔυɿෆඞཁͳ୯ޠ ɹFHҰൠޠʢ͋Ε ͢Δ ࢥ͏ʣɺ௿ग़ݱޠͳͲ ɾࣙॻʢUPLFOJ[FSʣɿ୯ޠʹJOEFYΛׂΓ౰ͯͨ΋ͷ ɹFHʮࠓ೔͸"84-PGUͰࣗવݴޠॲཧͷొஃΛ͠·͢ʂʯ ɹɹɹɹˠ<ࠓ೔>

    <BXT> <MPGU> <ࣗવݴޠॲཧ>ʜ # kerasΛ༻͍ͨtokenizerͷੜ੒&อଘ tokenizer = Tokenizer(lower=False) all_content = train['content'].values.tolist() + test['content'].values.tolist() tokenizer.fit_on_texts(all_content) save_text_tokenizer(tokenizer, TOKENIZER_FILE_NAME)
  44. ˙&NCFEEJOH.BUSJY   w χϡʔϥϧωοτϫʔΫͷຒΊࠐΈ૚ʢ&NCFEEJOH-BZFSʣ ʹॳظઃఆ͢ΔͨΊͷ୯ޠͷ෼ࢄදݱߦྻ FHTIBQF ୯ޠ਺ XWϞσϧֶश࣌ͷ࣍ݩ਺ ࠓ೔

        ɾɾɾɾɾ BXT     ɾɾɾɾɾ MPGU     ɾɾɾɾɾ ࣗવݴޠॲཧ     ɾɾɾɾɾ ొஃ     ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ ɾɾɾɾɾ
  45. ˙&NCFEEJOH.BUSJY   w χϡʔϥϧωοτϫʔΫͷຒΊࠐΈ૚ʢ&NCFEEJOH-BZFSʣ ʹॳظઃఆ͢ΔͨΊͷ୯ޠͷ෼ࢄදݱߦྻ FH # embedding_matrixͷઃఆ vocab_size

    = len(tokenizer.word_index) + 1 # ಛ௃ྔͷ࠷େ஋ embedding_vector_size = model_w2v.wv.vector_size # ࣄલֶशϞσϧͷ࣍ݩ਺ embedding_matrix = np.zeros((vocab_size, embedding_vector_size)) for word, i in tokenizer.word_index.items(): try: embedding_vector = model_w2v.wv[word] # w2vϞσϧ͔Βର৅ͷ୯ޠ͕͋Ε͹ͦͷ෼ࢄදݱΛઃఆ except KeyError: pass if embedding_vector is not None: embedding_matrix[i] = embedding_vector # ୯ޠͷ෼ࢄදݱΛɺߦྻͷ୯ޠindex൪໨ʹઃఆ # อଘ np.save(EMBEDDING_FILE_NAME, embedding_matrix)
  46. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model
  47. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wֶशσʔλʢUSBJODTWʣ wςετσʔλʢUFTUDTWʣ wࣙॻʢUPLFOJ[FSQLMʣ wετοϓϫʔυʢTUPQXPSETQLMʣ w&NCFEEJOH.BUSJYʢFNCOQZʣ
  48. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wֶशσʔλʢUSBJODTWʣ wςετσʔλʢUFTUDTWʣ wࣙॻʢUPLFOJ[FSQLMʣ wετοϓϫʔυʢTUPQXPSETQLMʣ w&NCFEEJOH.BUSJYʢFNCOQZʣ ೔෇ຖʹ؅ཧ͢Δ͜ͱʹΑΓɺ Ϟσϧʹ࢖༻ͨ͠σʔλΛ໌֬Խ ˠϞσϧͷ࠶ݱੑΛอূ
  49. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wҰ࿈ͷॲཧ͸4UFQ'VODUJPOTͰࣗ ಈԽ͞Ε͍ͯΔͨΊɺϞσϧߋ৽ͷ ࡍʹखΛಈ͔͢ͷ͸ޙड़͢Δ 4BHF.BLFSͰͷֶशσϓϩΠͷ Έ
  50. ˙ΞʔΩςΫνϟɿ&5-ͱલॲཧ   Fargate S3 RDS Glue StepFunctions Preprocessing Task

    train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl wҰ࿈ͷॲཧ͸4UFQ'VODUJPOTͰࣗ ಈԽ͞Ε͍ͯΔͨΊɺϞσϧߋ৽ͷ ࡍʹखΛಈ͔͢ͷ͸ޙड़͢Δ 4BHF.BLFSͰͷֶशσϓϩΠͷ Έ Ϟσϧͷߏஙʹ஫ྗͰ͖Δ
  51. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  52. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ֶशͱσϓϩΠʹ͍ͭͯ
  53. ˙ΞʔΩςΫνϟɿֶशͱσϓϩΠ   Training EndPoint SageMaker w4BHF.BLFSͰֶशΤϯυ ϙΠϯτͷσϓϩΠ wֶशʹ͸ࣄલߏங͞Εͨ 5FOTPS'MPXίϯςφΛ

    εΫϦϓτϞʔυͰ࢖༻ train.csv emb.npy test.csv
  54. ˙ΞʔΩςΫνϟɿֶशͱσϓϩΠ   Training EndPoint SageMaker w4BHF.BLFSͰֶशΤϯυ ϙΠϯτͷσϓϩΠ wֶशʹ͸ࣄલߏங͞Εͨ 5FOTPS'MPXίϯςφΛ

    εΫϦϓτϞʔυͰ࢖༻ train.csv emb.npy test.csv
  55. ˙εΫϦϓτϞʔυ   w 4BHF.BLFSಛ༗ͷίʔσΟϯάنఆΛ͋·Γҙࣝ͠ͳͯ͘Α͍ ˠϩʔΧϧ౳Ͱ࡞੒ͨ͠εΫϦϓτϑΝΠϧ͕ʢ͋Δఔ౓ʣͦ ͷ··ྲྀ༻Մೳ w ࣮ߦํ๏͸γϯϓϧͰɺτϨʔχϯά༻ίϯςφىಈ࣌ʹҾ਺ ͱͯ͠εΫϦϓτϑΝΠϧΛ౉͚ͩ͢Ͱྑ͍

    ˠϞσϧΛߋ৽͢Δࡍ͸εΫϦϓτϑΝΠϧͷमਖ਼ͷΈͰ0,
  56. ˙εΫϦϓτϞʔυʢOPUFCPPLʣ   w FTUJNBUPSͷҾ਺ʹεΫϦϓτϑΝΠϧ໊Λ౉͚ͩ͢Ͱ0, [1] from sagemaker.tensorflow import TensorFlow

    [2] estimator = TensorFlow( entry_point='clf_keras_lstm.py', role=role, framework_version='1.12.0', hyperparameters=hyper_param, train_instance_count=1, train_instance_type='ml.p3.2xlarge', script_mode=True, output_path='s3://' + s3_bucket + '/questions/model', code_location='s3://' + s3_bucket + '/questions/model', py_version='py3' ) FH
  57. ˙εΫϦϓτϞʔυʢOPUFCPPLʣ   w FTUJNBUPSͷҾ਺ʹεΫϦϓτϑΝΠϧ໊Λ౉͚ͩ͢Ͱ0, [1] from sagemaker.tensorflow import TensorFlow

    [2] estimator = TensorFlow( entry_point='clf_keras_lstm.py', role=role, framework_version='1.12.0', hyperparameters=hyper_param, train_instance_count=1, train_instance_type='ml.p3.2xlarge', script_mode=True, output_path='s3://' + s3_bucket + '/questions/model', code_location='s3://' + s3_bucket + '/questions/model', py_version='py3' ) FH
  58. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w JG@@OBNF@@@@NBJO@@ͰҾ਺ͷઃఆΛ͢Δ if __name__ == '__main__': parser

    = argparse.ArgumentParser() # ϋΠύʔύϥϝʔλΛड͚औΔ parser.add_argument('--batch-size', type=int, default=512) parser.add_argument('--epochs', type=int, default=5) # SageMaker ݻ༗ͷҾ਺ ؀ڥม਺ʹ͸σϑΥϧτ஋͕ઃఆࡁ parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR']) # ϞσϧҎ֎ͷग़ྗϑΝΠϧͷอଘઌ parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) # ֶशޙͷϞσϧͷอଘઌ parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) # ֶशσʔλͷύε parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST']) # ςετσʔλͷύε parser.add_argument('--embedding', type=str, default=os.environ['SM_CHANNEL_EMBEDDING']) # ຒΊࠐΈσʔλͷύε args, _ = parser.parse_known_args() # ֶश train(args) FH
  59. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w JG@@OBNF@@@@NBJO@@ͰҾ਺ͷઃఆΛ͢Δ if __name__ == '__main__': parser

    = argparse.ArgumentParser() # ϋΠύʔύϥϝʔλΛड͚औΔ parser.add_argument('--batch-size', type=int, default=512) parser.add_argument('--epochs', type=int, default=5) # SageMaker ݻ༗ͷҾ਺ ؀ڥม਺ʹ͸σϑΥϧτ஋͕ઃఆࡁ parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR']) # ϞσϧҎ֎ͷग़ྗϑΝΠϧͷอଘઌ parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) # ֶशޙͷϞσϧͷอଘઌ parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) # ֶशσʔλͷύε parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST']) # ςετσʔλͷύε parser.add_argument('--embedding', type=str, default=os.environ['SM_CHANNEL_EMBEDDING']) # ຒΊࠐΈσʔλͷύε args, _ = parser.parse_known_args() # ֶश train(args) FH
  60. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w USBJOʹ࡞੒ͨ͠εΫϦϓτΛهड़ def train(args): X_train, X_valid, y_train,

    y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0, stratify=y_train) model = build_model(emb_matrix=embedding_matrix, input_length=X_train.shape[1]) model.compile(loss="binary_crossentropy",optimizer='adam',metrics=['accuracy']) es_cb = EarlyStopping(monitor='val_loss', patience=3, verbose=2, mode='auto') model.fit( x=X_train, y=y_train, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=epochs, verbose=2, callbacks=[es_cb] ) # Ϟσϧͷอଘ save(model, args.model_dir) FH
  61. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w USBJOʹ࡞੒ͨ͠εΫϦϓτΛهड़ def train(args): X_train, X_valid, y_train,

    y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=0, stratify=y_train) model = build_model(emb_matrix=embedding_matrix, input_length=X_train.shape[1]) model.compile(loss="binary_crossentropy",optimizer='adam',metrics=['accuracy']) es_cb = EarlyStopping(monitor='val_loss', patience=3, verbose=2, mode='auto') model.fit( x=X_train, y=y_train, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=epochs, verbose=2, callbacks=[es_cb] ) # Ϟσϧͷอଘ save(model, args.model_dir) FH
  62. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w TBWFͰ࡞੒ͨ͠ϞσϧΛ4ʹอଘ def save(model, model_dir): sess =

    K.get_session() tf.saved_model.simple_save( sess, os.path.join(model_dir, 'model/1'), inputs={'inputs': model.input}, outputs={t.name: t for t in model.outputs}) FH
  63. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w TBWFͰ࡞੒ͨ͠ϞσϧΛ4ʹอଘ def save(model, model_dir): sess =

    K.get_session() tf.saved_model.simple_save( sess, os.path.join(model_dir, 'model/1'), inputs={'inputs': model.input}, outputs={t.name: t for t in model.outputs}) FH ҙࣝ͢Δͷ͸ ʮҾ਺ͷઃఆʯͱʮϞσϧͷอଘʯ͚ͩ
  64. ˙εΫϦϓτϞʔυʢQZεΫϦϓτʣ   w TBWFͰ࡞੒ͨ͠ϞσϧΛ4ʹอଘ def save(model, model_dir): sess =

    K.get_session() tf.saved_model.simple_save( sess, os.path.join(model_dir, 'model/1'), inputs={'inputs': model.input}, outputs={t.name: t for t in model.outputs}) FH ؆୯
  65. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾
  66. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API શମ૾ ϦΞϧλΠϜਪ࿦ʹ͍ͭͯ
  67. ˙ΞʔΩςΫνϟɿਪ࿦   S3 Fargate Flask API stopwords.pkl tokenizer.pkl

  68. ˙ΞʔΩςΫνϟɿਪ࿦   S3 Fargate Flask API wΞϓϦ͔ΒSBXσʔλΛड͚औΓɺ ϕΫτϧԽͯ͠4BHF.BLFSͷਪ࿦Τ ϯυϙΠϯτʹ౉͢

    stopwords.pkl tokenizer.pkl 0, 13, 542, 9, 4723, 65 ࠓ೔͸AWS LoftͰ ࣗવݴޠॲཧͷ ొஃΛ͠·͢ʂ
  69. ˙ΞʔΩςΫνϟɿਪ࿦   S3 Fargate Flask API wΞϓϦ͔ΒSBXσʔλΛड͚औΓɺ ϕΫτϧԽͯ͠4BHF.BLFSͷਪ࿦Τ ϯυϙΠϯτʹ౉͢

    wਪ࿦ΤϯυϙΠϯτ͔Β͸zҧ൓౤ߘ ֬཰z͕Ϧλʔϯ͞ΕΔ stopwords.pkl tokenizer.pkl 0, 13, 542, 9, 4723, 65 0.189 ࠓ೔͸AWS LoftͰ ࣗવݴޠॲཧͷ ొஃΛ͠·͢ʂ
  70. ˙ΞʔΩςΫνϟɿਪ࿦   S3 0 or 1 Fargate Flask API

    wΞϓϦ͔ΒSBXσʔλΛड͚औΓɺ ϕΫτϧԽͯ͠4BHF.BLFSͷਪ࿦Τ ϯυϙΠϯτʹ౉͢ wਪ࿦ΤϯυϙΠϯτ͔Β͸zҧ൓౤ߘ ֬཰z͕Ϧλʔϯ͞ΕΔ w֬཰͕ࢦఆͷᮢ஋ΑΓ௿͚Ε͹ਖ਼ৗ ͏౤ߘ  ɺߴ͚Ε͹ҧ൓౤ߘ  ͱͯ͠ΞϓϦʹϦλʔϯ stopwords.pkl tokenizer.pkl 0, 13, 542, 9, 4723, 65 0.189 ࠓ೔͸AWS LoftͰ ࣗવݴޠॲཧͷ ొஃΛ͠·͢ʂ
  71. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue Question Data

    Prediction EndPoint SageMaker StepFunctions Preprocessing Task Fargate Flask API
  72. ˙ΞʔΩςΫνϟ   Fargate S3 Training RDS Glue EndPoint SageMaker

    StepFunctions Preprocessing Task Fargate Flask API ࠓ೔͸AWS LoftͰࣗવݴޠॲཧͷొஃΛ͠·͢ʂ 0 or 1 0, 13, 542, 9, 4723, 65 . . . 0.189 train.tsv train.tsv w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl w2v model train.csv emb.npy test.csv stopwords.pkl tokenizer.pkl
  73. Ϟσϧͷӡ༻  

  74. ˙Ϟσϧͷӡ༻   ఆظతʹϞσϧͷߋ৽Λ͍ͯ͘͠தͰɺ ֶशσʔλͰ͸ͦͦ͜͜ͷਫ਼౓͕ग़ͯ΋ɺ ݁ہ͸ຊ൪Ͱͷਫ਼౓͕ॏཁ

  75. ˙Ϟσϧͷӡ༻   ఆظతʹϞσϧͷߋ৽Λ͍ͯ͘͠தͰɺ ֶशσʔλͰ͸ͦͦ͜͜ͷਫ਼౓͕ग़ͯ΋ɺ ݁ہ͸ຊ൪Ͱͷਫ਼౓͕ॏཁ Ϟσϧ͕ͲͷΑ͏ͳڍಈΛ͍ͯ͠Δ͔ ϞχλϦϯά͢Δඞཁ͕͋Δ

  76. ˙Ϟσϧͷӡ༻   ఆظతʹϞσϧͷߋ৽Λ͍ͯ͘͠தͰɺ ֶशσʔλͰ͸ͦͦ͜͜ͷਫ਼౓͕ग़ͯ΋ɺ ݁ہ͸ຊ൪Ͱͷਫ਼౓͕ॏཁ

  77. ˙Ϟσϧͷӡ༻   'MBTL"1*Ͱͷਪ࿦݁Ռϩάͱ 3%4ʹอଘ͞Ε͍ͯΔσʔλΛར༻͠ ೔࣍ͰϞχλϦϯά

  78. ˙Ϟσϧͷӡ༻  

  79. ˙Ϟσϧͷӡ༻  

  80. ·ͱΊ  

  81. ˙·ͱΊ ೔ຊޠͷࣗવݴޠॲཧ͸΍Δ͜ͱ͕ଟ͍΋ͷͷ "84ͷ֤αʔϏεΛ༗ޮ׆༻͢Δ͜ͱͰ ԁ׈ͳ.-ϑϩʔΛߏஙͰ͖Δ  

  82. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ