BigQuery ML を使ってみた話

3c0649db1ae5ca0ae57c76037243f501?s=47 Yuya Matsumura
September 25, 2018

BigQuery ML を使ってみた話

Machine Learning Casual Talks #6
(https://mlct.connpass.com/event/94911/) での発表資料です.
BigQuery ML の説明と,実サービスに導入した事例についての紹介を致しました.

3c0649db1ae5ca0ae57c76037243f501?s=128

Yuya Matsumura

September 25, 2018
Tweet

Transcript

  1. ©2018 Wantedly, Inc. BigQuery ML Λ࢖ͬͯΈͨ࿩ Examples of BigQuery ML

    in Wantedly People Machine Learning Casual Talks #6 25.Sep.2018 - Yuya Matsumura - @yu-ya4
  2. ©2018 Wantedly, Inc. •Yuya Matsumuraʢদଜ ༏໵ʣ •Wantedly, Inc. (ࣾձਓ6ϲ݄໨) •Recommendation

    Engineer •From Kyoto Self Introduction @yu-ya4 @yu__ya4 https://www.wantedly.com/users/2390451
  3. ©2018 Wantedly, Inc. What is BigQuery ML??

  4. ©2018 Wantedly, Inc. • Google Cloud Next 2018 Ͱൃද •

    BigQuery + ML (ػցֶश) • ݱࡏαϙʔτ͍ͯ͠ΔϞσϧ͸ҎԼͷ 2 छ • ઢܗճؼϞσϧ (σʔλ͔Β਺஋ͷਪఆ) • 2 ߲ϩδεςΟοΫճؼϞσϧ (σʔλ͔Β true/false Λ൑ఆ) • BigQuery͚ͩͰ׬݁͢ΔͨΊखܰʂ • σʔλΛϩʔΧϧ؀ڥ౳ʹҠಈͤ͞Δඞཁͳ͠ • SQL ͚ͩهड़Ͱ͖ͨΒ͍͍ What is BigQuery ML??
  5. ©2018 Wantedly, Inc. How use BigQuery ML?

  6. ©2018 Wantedly, Inc. Wantedly People • χϡʔεػೳ (Timeline) • ͓͢͢ΊͷχϡʔεΛ

    1 ೔ʹ 2 ճ (ேͱன) push ௨஌Ͱ஌Β͍ͤͯΔ
  7. ©2018 Wantedly, Inc. Motivations & Backgrounds • Push ௨஌Λ։͔ͳ͍ϢʔβʹͱͬͨΒ 1

    ೔ʹ 2 ճ΋ૹΔͷ͸໎࿭͔΋ • 1 ೔ʹ 2 ճ Push ௨஌Λૹͬͯ΋େৎ෉ͦ͏ͳϢʔβΛݟ͚͍ͭͨ • ଞʹ΋༏ઌ౓ͷߴ͍λεΫ͕͋ΔͷͰ͋·Γͬ͘͡Γ࣌ؒΛ͔͚ΒΕͳ͍ • ͬ͘͞ͱͰ͖ͦ͏ͳͷͰ BigQuery ML Λ࢖ͬͯΈΔ
  8. ©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ2೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ʮաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβ͔ΒɼPush ௨஌Λ։෧ͨ͠

    ϢʔβΛ౰ͯΔʯ ໨తม਺ɿ1೔લͷ Push ௨஌Λ։͍͔ͨͲ͏͔ (1 or 0) • ։͍ͨ Push ௨஌਺ • χϡʔεهࣄͷӾཡ਺ • ໊ࢗͷεΩϟϯ਺ …etc.
  9. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ MODEL Λ BQ ʹετΞ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  10. ©2018 Wantedly, Inc. MODEL ͷ࡞੒ BigQuery merge Scheduler ʹొ࿥ PREDICT

    ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ MODEL Λ BQ ʹετΞ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  11. ©2018 Wantedly, Inc. CREATE model_type: ΞϧΰϦζϜΛࢦఆ (ઢܗճؼ or ϩδεςΟοΫճؼ) label:

    ໨తม਺(2 ஋Ͱࢦఆ) weekly_open, timeline_show_count… : આ໌ม਺
  12. ©2018 Wantedly, Inc. EVALUATE

  13. ©2018 Wantedly, Inc. ΫΤϦͷεέδϡʔϥ΁ͷొ࿥ BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ

    ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ MODEL Λ BQ ʹετΞ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  14. ©2018 Wantedly, Inc. BigQuery by Ruby

  15. ©2018 Wantedly, Inc. Define Job

  16. ©2018 Wantedly, Inc. PREDICT ͷ࣮ߦ BigQuery merge Scheduler ʹొ࿥ PREDICT

    ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ MODEL Λ BQ ʹετΞ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  17. ©2018 Wantedly, Inc. PREDICT predicted_label_probs.prob: ֤ϥϕϧͷ༧ଌ֬཰

  18. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ MODEL Λ BQ ʹετΞ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  19. ©2018 Wantedly, Inc. Results ࣮૷ظؒʢ΄΅ʣ 1 ೔Ͱ Push ௨஌ͷ։෧཰͕େ͖͘վળ

  20. ©2018 Wantedly, Inc. pros. w .-ϑϨʔϜϫʔΫ΍1ZUIPO౳ͷ஌͕ࣝෆཁ w 42-͚ͩॻ͚͹͍͍ͷͰɼ࣮૷ίετ͕௿͍ɽ w ্هͷ஌͕ࣝগͳ͍

    σʔλΞφϦετ΍ϏδωεଆͷϝϯόʔͰ΋ར ༻Մೳ w σʔλΛҠಈͤ͞Δඞཁ͕ͳ͘ɼϞσϧͷ։ൃεϐʔυ͕ߴ·Δ w ແݶͷϦιʔε w (PPHMFઌੜʹ՝ۚ͢Δ͚ͩʂ
  21. ©2018 Wantedly, Inc. cons. w Ϟσϧͷ؅ཧ͕೉͍͠ w Ϟσϧ࡞੒࣌ͷΫΤϦͷอଘ w Ϟσϧ͕ফ͞ΕͨΒͲ͏͢Δʂʁ

    σʔληοτ͸ΈΜͳ৮ΕΔ  w Ϟσϧͷߋ৽΍ఆظతͳ1SFEJDU͸΍ΓͮΒ͍ w 8BOUFEMZͰ͸͍͍ײ͡ͷεέδϡʔϥͷ࢓૊Έ͕͋Δ͕ʜ w ෳࡶͳϞσϧͷ։ൃ͸·ͩͰ͖ͳ͍ w ΞϧΰϦζϜ͕छྨ͔͠ͳ͍ w ϞσϧͷαΠζʹ੍ݶ .#
  22. ©2018 Wantedly, Inc. Summary w #JH2VFSZ͚ͩͰ.-͕Ͱ͖Δ#JH2VFSZ.-ͷొ৔ w ͱΓ͋͑ͣγϯϓϧͳճؼΛࢼ͍ͨ͠ࡍʹųƄƃž w ຊ֨తͳ.-ϓϩδΣΫτΛ։࢝͢Δલͷௐࠪ

    σʔλΛ࢖͑Δ͔ͷ ௐࠪ ʹųƄƃž w ෳࡶͳϞσϧͷੜ੒ʹ͸·ͩ࢖͑ͳ͍ w ΫΤϦ΍Ϟσϧͷ؅ཧΛ޻෉͢Δඞཁ w ࠓޙʹظ଴✨