Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigQuery ML を使ってみた話

Yuya Matsumura
November 30, 2018

BigQuery ML を使ってみた話

#bq_sushi tokyo #9 2018総集編 (https://bq-sushi.connpass.com/event/106178/) での発表資料です.

BigQuery ML の説明と,実サービスに導入した事例についての紹介を致しました.

Yuya Matsumura

November 30, 2018
Tweet

More Decks by Yuya Matsumura

Other Decks in Technology

Transcript

  1. ©2018 Wantedly, Inc. BigQuery ML Λ࢖ͬͯΈͨ࿩ Examples of BigQuery ML

    in Wantedly #bq_sushi Tokyo #9 30.Nov.2018 - Yuya Matsumura - @yu-ya4
  2. ©2018 Wantedly, Inc. ✓ Yuya Matsumura (দଜ ༏໵) ✓ Software

    Engineer (ݕࡧɾਪનΤϯδχΞʁ) ✓ Wantedly, Inc. Recommendation Team (since April 2018) ✓ Interested in Information Retrieval, Machine Learning Self Introduction @yu-ya4 @yu__ya4
  3. ©2018 Wantedly, Inc. ✓ Google Cloud Next 2018 Ͱൃද ✓

    BigQuery + ML (ػցֶश) • طଘͷ SQL πʔϧͱεΩϧͰػցֶश͕Ͱ͖Δ • ػցֶशͷຽओԽ (σʔλΞφϦετɼϏδωενʔϜ΋࢖͑Δʂ) ✓ BigQuery͚ͩͰ׬݁͢ΔͨΊखܰʂ • σʔλΛϩʔΧϧ؀ڥ౳ʹҠಈͤ͞Δඞཁͳ͠ • ։ൃεϐʔυͷ޲্ What is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro
  4. ©2018 Wantedly, Inc. ✓ ݱࡏαϙʔτ͍ͯ͠ΔϞσϧ͸ҎԼͷ 3 छ • ઢܗճؼϞσϧ (σʔλ͔Β਺஋ͷਪఆ)

    • ೋ߲ϩδεςΟοΫճؼϞσϧ (σʔλ͔Β true/false Λ൑ఆ) • ଟ߲ϩδεςΟοΫճؼϞσϧ(σʔλΛ3ͭҎ্ͷΫϥεʹ෼ྨ) What is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro
  5. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read
  6. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read Ruby ϑΝΠϧʹ BigQuery ͷΫΤϦΛهड़ͨ͠ Job ͷ࡞੒
  7. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL
  8. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL export do table :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end # ग़ྗઌͷςʔϒϧ໊ # ग़ྗ͢ΔΧϥϜ # update or replace
  9. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL schedule do frequency :daily end # Job ͷ࣮ߦස౓
  10. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL # BigQuery ͷΫΤϦͷ࣮ߦ
  11. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read Job ϑΝΠϧΛ௥Ճ͠ɼpull-request ʹͯ͠Ϛʔδ ࣗಈͰ Kubernetes Λ׆༻ͨ͠Scheduler ʹొ࿥͞Εఆظ࣮ߦ͞ΕΔ
  12. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read
  13. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read αʔϏε͔ΒɼΫΤϦͷ࣮ߦ݁ՌΛར༻͢Δ
  14. ©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘

     γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ  νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ   αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ BigQuery in Recommendation Team
  15. ©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘

     γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ  νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ   αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ ༨ஊ Recommendation
  16. ©2018 Wantedly, Inc. It is often necessary to make choices

    without sufficient personal experience of the alternatives. In everyday life, we rely on recommendations from other people either by word of mouth, recommendation letters, movie and book reviews printed in newspapers, or general surveys such as Zagat’s restaurant guides. Recommender systems assist and augment this natural social process. ਪનγεςϜͱ͸ [Resnick+ 97] ࣗ෼ͷܦݧ͔ΒͷΈͰ͸͋·ΓΑ͘Θ͔Βͳ͍΋ͷͷத͔ΒɼͲ͏ ͯ͠΋ͲΕ͔Λબ͹ͳ͚Ε͹ͳΒͳ͍ͱ͍͏͜ͱ͸Α͋͘Δɽ͜ͷ Α͏ͳࡍΘͨͨͪ͠͸ɼޱίϛɼਪનঢ়ɼ ৽ฉͷॻධ΍өըධɼϨ ετϥϯΨΠυͳͲͷଞਓ͔ΒͷਪનʹཔΔ͜ͱΛ ೔ৗతʹߦͬͯ ͍Δɽ ৘ใਪનγεςϜͱ͸ɼ͜͏ͨࣾ͠ձͰී௨ʹߦΘΕ͍ͯΔ Ұ࿈ͷߦҝΛิॿͨ͠Γɼଅਐͨ͠Γ͢Δ΋ͷͰ͋Δɽ [Resnick 97] P. Resnick and H. R. Varian. Recommender systems. Communications of the ACM, Vol. 40, No. 3, pp. 56–58, 1997.
  17. ©2018 Wantedly, Inc. Recommenders: Tools to help identify worthwhile stuff

    ਪનγεςϜͱ͸ [Konstan+ 03] ਪનγεςϜ: Ձ஋ͷ͋Δ΋ͷ͕ͲΕ͔ಛఆ͢Δͷʹ໾ཱͭಓ۩ [Konstan 03] J. A. Konstan and J. Riedl. Recommender systems: Collaborating in commerce and communities. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, Tutorial, 2003.
  18. ©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘

     γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ  νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ   αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ BigQuery in Recommendation Team
  19. ©2018 Wantedly, Inc. ͨͱ͑͹… Ϣʔβ͕ձࣾͷืूΛΫϦοΫͨ͠ͱ͍͏৘ใΛར༻ͯ͠ɼ ͋ΔϢʔβʹձࣾͷืूΛਪન͍ͨ͠ ⭕ ⭕ ⭕ ⭕

    ⭕ ⭕ ⭕ ʁ ⭕ - - - - - - - ืू Ϣʔβ ⭕ : ΫϦοΫ - : ৘ใͳ͠ ? ͕Ͳ͏ͳΔ͔౰ͯΔ
  20. ©2018 Wantedly, Inc. A example of Wantedly People • χϡʔεػೳ

    (Timeline) • ͓͢͢ΊͷχϡʔεΛ 1 ೔ʹ 2 ճ (ேͱன) push ௨஌Ͱ஌Β͍ͤͯΔ
  21. ©2018 Wantedly, Inc. Motivations & Backgrounds ✓ Push ௨஌Λ։͔ͳ͍ϢʔβʹͱͬͨΒ 1

    ೔ʹ 2 ճ΋ૹΔͷ͸໎࿭͔΋ ✓ 1 ೔ʹ 2 ճ Push ௨஌Λૹͬͯ΋େৎ෉ͦ͏ͳϢʔβʹ͚ͩ 2 ճૹΓ͍ͨ ✓ ଞʹ΋༏ઌ౓ͷߴ͍λεΫ͕͋ΔͷͰ͋·Γͬ͘͡Γ࣌ؒΛ͔͚ΒΕͳ͍ ✓ ͬ͘͞ͱͰ͖ͦ͏ͳͷͰ BigQuery ML Λ࢖ͬͯΈΔ
  22. ©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ2೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʹPush

    ௨஌ Λ։෧ͨ͠ϢʔβΛ౰ͯΔʳ ໨తม਺ɿ1೔લͷ Push ௨஌Λ։͍͔ͨͲ͏͔ (1 or 0) • ։͍ͨ Push ௨஌਺ • χϡʔεهࣄͷӾཡ਺ • ໊ࢗͷεΩϟϯ਺ …etc.
  23. ©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ2೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ໨తม਺ɿ1೔લͷ Push ௨஌Λ։͍͔ͨͲ͏͔ (1

    or 0) • ։͍ͨ Push ௨஌਺ • χϡʔεهࣄͷӾཡ਺ • ໊ࢗͷεΩϟϯ਺ …etc. આ໌ม਺͔Β໨తม਺ (label) ͕0͔1͔Λ౰ͯΔ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʹPush ௨஌ Λ։෧ͨ͠ϢʔβΛ౰ͯΔʳ
  24. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  25. ©2018 Wantedly, Inc. MODEL ͷ࡞੒ BigQuery merge Scheduler ʹొ࿥ PREDICT

    ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  26. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒
  27. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # Ϟσϧ࡞੒Λએݴ ˌ࡞੒͢ΔϞσϧ໊Λهड़ # ଞʹ͸ # CREATE MODEL IF NOT EXISTS # CREATE OR REPLACE MODEL CREATE MODEL `ml_models.push_open` MODEL ͷ࡞੒ MODEL ͷ࡞੒
  28. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ௥ՃͷϞσϧΦϓγϣϯΛࢦఆ # model_type ʹ͸ # ’logistic_reg’(ϩδεςΟοΫճؼ) ͔ ‘linear_reg’(ઢܗճؼ) Λࢦఆ # ଞʹ΋ɼL1, L2 ਖ਼ଇԽͷద༻ྔ΍ɼֶशͷεςοϓ਺ɼ # ೖྗσʔλΛτϨʔχϯάηοτͱධՁηοτʹ෼ׂ͢Δํ๏ɼͳͲͳͲΛࢦఆͰ͖Δ OPTIONS (model_type='logistic_reg') AS MODEL ͷ࡞੒ MODEL ͷ࡞੒
  29. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ೖྗσʔλΛग़ྗ͢ΔΫΤϦهड़ # label ͱ͍͏໊લͷྻΛ౰ͯΔ # (ΧϥϜ໊͸ΦϓγϣϯͰࢦఆՄೳ) SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒ MODEL ͷ࡞੒
  30. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ೖྗσʔλΛग़ྗ͢ΔΫΤϦهड़ # label ͱ͍͏໊લͷྻΛ౰ͯΔ # (ΧϥϜ໊͸ΦϓγϣϯͰࢦఆՄೳ) SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒ MODEL ͷ࡞੒
  31. ©2018 Wantedly, Inc. SELECT * FROM ML.EVALUATE(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... ) ) MODEL ͷੑೳධՁ # ࡞੒ͨ͠Ϟσϧ͕ͲΕ͚ͩͷੑೳΛ͍࣋ͬͯΔ͔ධՁ # (Ͳͷ͘Β͍ਖ਼͘͠ label Λ౰ͯΒΕΔ͔)
  32. ©2018 Wantedly, Inc. SELECT * FROM ML.EVALUATE(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... ) ) MODEL ͷੑೳධՁ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ # ϩδεςΟοΫճؼͷ৔߹ҎԼͷΑ͏ͳग़ྗ # precision: ద߹཰ # label ͕ 1 ͱ༧ଌͨ͠΋ͷͷ͏ͪɼ࣮ࡍʹ 1 Ͱ͋Δ΋ͷͷׂ߹ # recall: ࠶ݱ཰ # label ͕ ࣮ࡍʹ 1 Ͱ͋Δ΋ͷͷ͏ͪɼ1 ͱ༧ଌ͞Εͨ΋ͷͷׂ߹ MODEL ͷੑೳධՁ
  33. ©2018 Wantedly, Inc. SELECT * FROM ML.ROC_CURVE(MODEL `ml_models.push_open`, ( SELECT

    ... MODEL ͷੑೳධՁ (ROC_CURVE) # ROC ۂઢ΋ग़ྗͰ͖Δʂ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ MODEL ͷੑೳධՁ
  34. ©2018 Wantedly, Inc. SELECT * FROM ML.CONFUSION_MATRIX(MODEL `ml_models.push_open`, ( SELECT

    ... MODEL ͷੑೳධՁ (CONFUSION_MATRIX) # ࠞ߹ߦྻ΋ग़ྗͰ͖Δʂ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢
  35. ©2018 Wantedly, Inc. ༧ଌΫΤϦͷεέδϡʔϥ΁ͷొ࿥ BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ

    ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  36. ©2018 Wantedly, Inc. SELECT * FROM ML.PREDICT(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, ... ) ) ϞσϧΛ࢖༻ͯ݁͠ՌΛ༧ଌ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢
  37. ©2018 Wantedly, Inc. SELECT * FROM ML.PREDICT(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, ... ) ) ϞσϧΛ࢖༻ͯ݁͠ՌΛ༧ଌ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ # ϞσϧΛ࢖༻ͯ͠ɼೖྗσʔλʹରͯ͠ label ͷ஋Λ༧ଌ͢Δ # predicted_label: ༧ଌ͞Εͨ஋ # predicted_label_probs: label ͝ͱͷ༧ଌ֬཰
  38. ©2018 Wantedly, Inc. PREDICT ͷ࣮ߦ BigQuery merge Scheduler ʹొ࿥ PREDICT

    ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  39. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  40. ©2018 Wantedly, Inc. Motivations & Backgrounds ✓ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯ͕ԡ͞ΕΔ਺͸ॏཁͳࢦඪͷ 1 ͭ

    ✓ ͦͷ··αʔϏεΠϯ͢ΔΘ͚Ͱ͸ͳ͍͕ɼ׆༻ͷํ๏͸͍Ζ͍Ζߟ͑ΒΕΔ ✓ ϢʔβͷͲͷΑ͏ͳߦಈ͕ϘλϯΛԡ͢͜ͱʹͭͳ͕Δͷ͔஌Γ͍ͨ ✓ ͬ͘͞ͱࢼͤͦ͏ͳͷͰ BigQuery ML Λ࢖ͬͯΈΔ
  41. ©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ8೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʙ7

    ೔લͷؒʹ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛԡͨ͠ϢʔβΛ౰ͯΔʳ 1೔લʙ7೔લͷظؒʹ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛ ԡ͔ͨ͠Ͳ͏͔ (1 or 0) • ʮ࿩Λฉ͖ʹߦ͖͍ͨʯΛԡͨ͠਺ • ձࣾͷืूͷৄࡉϖʔδΛӾཡͨ͠਺ • ձ͔ࣾΒͷεΧ΢τϝοηʔδʹฦ৴ͨ͠਺ …etc. ໨తม਺ :
  42. ©2018 Wantedly, Inc. https://towardsdatascience.com/when-will-stack-overflow-reply-how-to-predict-with-bigquery-553c24b546a3 ͓΋͠Ζ͍ͳͱࢥͬͨ׆༻ྫ • ճ౴ΛಘΒΕΔ֬཰ • ճ౴ΛಘΒΕΔ·Ͱͷ࣌ؒ •

    ѱ͍ධՁΛೖΕΒΕΔ֬཰ • ༵೔ • ࣌ؒ • ࣭໰จͷ௕͞ • ࣭໰ͷλΠτϧͷ1ޠ໨ • λΠτϧ͕ “ʁ” ͰऴΘ͍ͬͯΔ͔Ͳ͏͔ • ࢖༻ͨ͠λά • ΞΧ΢ϯτΛ࡞੒ͨ࣌͠ظ In Stack Overflow
  43. ©2018 Wantedly, Inc. ؾΛ͚ͭͳ͍ͱ͍͚ͳ͍͜ͱ 㾎Ϟσϧ΍ΫΤϦͷ؅ཧ w Ϟσϧ࡞੒࣌ͷΫΤϦΛͲ͏؅ཧ͢Δʁ w ϛεͬͯϞσϧ͕ফ͞ΕͨΓ্ॻ͖͞ΕΔͱਏ͍ 㾎Ϟσϧͷߋ৽΍ఆظతͳ1SFEJDU͸΍ΓͮΒ͍

    w 8BOUFEMZͰ͸͍͍ײ͡ͷεέδϡʔϥͷ࢓૊Έ͕͋Δ͕ʜ 㾎αʔϏεͰ׆༻͢Δ৔߹͸ɼऔΓ૊Ή໰୊Λ͖ͪΜͱߟ͑Δ w ͲΜͳ໰୊ʹͰ΋ద͍ͯ͠ΔΘ͚Ͱ͸ͳ͍ w બ୒ࢶͷͭͱͯ͠ߟ͑Δ
  44. ©2018 Wantedly, Inc. ؾΛ͚ͭͳ͍ͱ͍͚ͳ͍͜ͱ 㾎Ϟσϧ΍ΫΤϦͷ؅ཧ w Ϟσϧ࡞੒࣌ͷΫΤϦΛͲ͏؅ཧ͢Δʁ w ϛεͬͯϞσϧ͕ফ͞ΕͨΓ্ॻ͖͞ΕΔͱਏ͍ 㾎Ϟσϧͷߋ৽΍ఆظతͳ1SFEJDU͸΍ΓͮΒ͍

    w 8BOUFEMZͰ͸͍͍ײ͡ͷεέδϡʔϥͷ࢓૊Έ͕͋Δ͕ʜ w ΫΤϦεέδϡʔϦϯάػೳ͕ϦϦʔε͞Εͨʂ 㾎αʔϏεͰ׆༻͢Δ৔߹͸ɼऔΓ૊Ή໰୊Λ͖ͪΜͱߟ͑Δ w ͲΜͳ໰୊ʹͰ΋ద͍ͯ͠ΔΘ͚Ͱ͸ͳ͍ w બ୒ࢶͷͭͱͯ͠ߟ͑Δ
  45. ©2018 Wantedly, Inc. ͜͏͍͏;͏ʹ࢖͑͹͍͍ͷ͔΋ʁ ✓ ػցֶश΍ͬͨ͜ͱͳ͍ਓ͕ػցֶशΛഽͰײ͡Δ ✓ ຊ֨తͳػցֶशϓϩδΣΫτΛ։࢝͢ΔલʹઢܗճؼͰͬ͘͞ͱ෼ੳͯ͠ΈΔ ✓ ·ͩखΛ෇͚͍ͯͳ͍͚Ͳɼ͕ͬͭΓϦιʔεΛ౤ೖ͢Δͷ͸೉͍͠Α͏ͳ୯७

    ͳ෼ྨ໰୊ΛϩδεςΟοΫճؼͰͬ͘͞ͱղ͘ ✓ ΦʔϓϯʹͰ͖ΔσʔλͰ͓΋ͪΌΛ࡞ͬͯެ։͢Δ https://towardsdatascience.com/when-will-stack-overflow-reply-how-to-predict-with-bigquery-553c24b546a3
  46. ©2018 Wantedly, Inc. Summary 㾎 #JH2VFSZ͚ͩͰ.-͕Ͱ͖Δ#JH2VFSZ.-ͷొ৔ w σʔλͷҠಈͳ͠ w 42-Λॻ͚ͩ͘Ͱػցֶश͕Ͱ͖Δ

    w ػցֶशͷຽओԽɼ։ൃ଎౓ͷ޲্ 㾎 8BOUFEMZ಺Ͱ࢖ͬͯΈͨ w γϯϓϧͳ໰୊ղܾ w ຊ֨తͳ.-ϓϩδΣΫτΛ։࢝͢Δલͷௐࠪ σʔλΛ࢖͑Δ͔ͷௐࠪ  㾎 ࠓޙʹظ଴✨ ϥϯμϜϑΥϨετ͕࣮૷͞ΕͯΔʜ