BigQuery ML を使ってみた話

3c0649db1ae5ca0ae57c76037243f501?s=47 Yuya Matsumura
November 30, 2018

BigQuery ML を使ってみた話

#bq_sushi tokyo #9 2018総集編 (https://bq-sushi.connpass.com/event/106178/) での発表資料です.

BigQuery ML の説明と,実サービスに導入した事例についての紹介を致しました.

3c0649db1ae5ca0ae57c76037243f501?s=128

Yuya Matsumura

November 30, 2018
Tweet

Transcript

  1. ©2018 Wantedly, Inc. BigQuery ML Λ࢖ͬͯΈͨ࿩ Examples of BigQuery ML

    in Wantedly #bq_sushi Tokyo #9 30.Nov.2018 - Yuya Matsumura - @yu-ya4
  2. ©2018 Wantedly, Inc. ✓ Yuya Matsumura (দଜ ༏໵) ✓ Software

    Engineer (ݕࡧɾਪનΤϯδχΞʁ) ✓ Wantedly, Inc. Recommendation Team (since April 2018) ✓ Interested in Information Retrieval, Machine Learning Self Introduction @yu-ya4 @yu__ya4
  3. ©2018 Wantedly, Inc. https://d3m.connpass.com/ એ఻

  4. ©2018 Wantedly, Inc. 㾎࿩͢͜ͱ w #JH2VFSZ.-ΛͲͷΑ͏ʹαʔϏεͰར༻ͨ͠ͷ͔ w #JH2VFSZ.-Λ࢖ͬͯΈͯͲ͏ࢥ͔ͬͨ 㾎࿩͞ͳ͍͜ͱ w

    #JH2VFSZ.-ͷৄࡉͳ࢖͍ํ w ػցֶशͷʹ͍ͭͯͷৄ͍͠࿩ About this talk
  5. ©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ   8IBUJT#JH2VFSZ.-   8IZVTF#JH2VFSZ.-

      )PXVTF#JH2VFSZ.-   %JTDVTTJPO Agenda
  6. ©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ   8IBUJT#JH2VFSZ.-   8IZVTF#JH2VFSZ.-

      )PXVTF#JH2VFSZ.-   %JTDVTTJPO Agenda
  7. ©2018 Wantedly, Inc. https://www.wantedly.com/about/overview Wantedly͸ʮ͸ͨΒ͘ʯΛ໘ന͘͢ΔϏδωεSNSͰ͢ɻ 
 ӡ໋ͷνʔϜ΍࢓ࣄʹग़ձ͑ͨΓɺਓ຺ͷߏங΍؅ཧɺϏδωεͷ৘ใऩूʹ׆༻͞Ε͍ͯ·͢ɻ Wantedlyͱ͸

  8. ©2018 Wantedly, Inc. Products of

  9. ©2018 Wantedly, Inc. Products

  10. ©2018 Wantedly, Inc. Products

  11. ©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ   8IBUJT#JH2VFSZ.-   8IZVTF#JH2VFSZ.-

      )PXVTF#JH2VFSZ.-   %JTDVTTJPO Agenda
  12. ©2018 Wantedly, Inc. ✓ Google Cloud Next 2018 Ͱൃද ✓

    BigQuery + ML (ػցֶश) • طଘͷ SQL πʔϧͱεΩϧͰػցֶश͕Ͱ͖Δ • ػցֶशͷຽओԽ (σʔλΞφϦετɼϏδωενʔϜ΋࢖͑Δʂ) ✓ BigQuery͚ͩͰ׬݁͢ΔͨΊखܰʂ • σʔλΛϩʔΧϧ؀ڥ౳ʹҠಈͤ͞Δඞཁͳ͠ • ։ൃεϐʔυͷ޲্ What is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro
  13. ©2018 Wantedly, Inc. ✓ ݱࡏαϙʔτ͍ͯ͠ΔϞσϧ͸ҎԼͷ 3 छ • ઢܗճؼϞσϧ (σʔλ͔Β਺஋ͷਪఆ)

    • ೋ߲ϩδεςΟοΫճؼϞσϧ (σʔλ͔Β true/false Λ൑ఆ) • ଟ߲ϩδεςΟοΫճؼϞσϧ(σʔλΛ3ͭҎ্ͷΫϥεʹ෼ྨ) What is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro
  14. ©2018 Wantedly, Inc. ͪΐͬͱ͚ͩػցֶशͷઆ໌

  15. ©2018 Wantedly, Inc. ઢܗճؼ https://upload.wikimedia.org/wikipedia/commons/b/be/Normdist_regression.png

  16. ©2018 Wantedly, Inc. ઢܗճؼ આ໌ม਺͔Β໨తม਺ ͷ਺஋Λ౰ͯΔ

  17. ©2018 Wantedly, Inc. ϩδεςΟοΫճؼ https://support.google.com/analytics/answer/7586738?hl=ja&ref_topic=3416089 આ໌ม਺͔Β໨తม਺ (label) ͕0͔1͔Λ౰ͯΔ

  18. ©2018 Wantedly, Inc. ػցֶशʹ͍ͭͯͷਂ͍ཧղ͕ͳ͘ͱ΋ɼ BigQuery Ͱ SQL ΫΤϦΛ࢖༻ͯ͠ɼσʔλΛҠಈͤ͞Δ͜ͱͳ͘खܰʹ ઢܗճؼ΍ϩδεςΟοΫճؼͳͲͷػցֶशϞσϧΛ࡞੒࣮ͯ͠ߦͰ͖Δɽ What

    is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro
  19. ©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ   8IBUJT#JH2VFSZ.-   8IZVTF#JH2VFSZ.-

      )PXVTF#JH2VFSZ.-   %JTDVTTJPO Agenda
  20. ©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML

    ?
  21. ©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML

    ?
  22. ©2018 Wantedly, Inc. 㾎ΞϓϦέʔγϣϯͷϩά͸͢΂ͯ#JH2VFSZʹอଘ ʙ  㾎σʔλ෼ੳͷͨΊͷίʔυϕʔε͕෼཭͞Εͯ࢓૊ΈԽ͞Ε͍ͯΔ BigQuery Λ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ

  23. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read
  24. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read Ruby ϑΝΠϧʹ BigQuery ͷΫΤϦΛهड़ͨ͠ Job ͷ࡞੒
  25. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL
  26. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL export do table :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end # ग़ྗઌͷςʔϒϧ໊ # ग़ྗ͢ΔΧϥϜ # update or replace
  27. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL schedule do frequency :daily end # Job ͷ࣮ߦස౓
  28. ©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table

    :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, "+9") AS day, company_id, SUM(action_score) AS score FROM `log.company_actions*` WHERE _TABLE_SUFFIX = FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY), "+9") GROUP BY company_id SQL # BigQuery ͷΫΤϦͷ࣮ߦ
  29. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read Job ϑΝΠϧΛ௥Ճ͠ɼpull-request ʹͯ͠Ϛʔδ ࣗಈͰ Kubernetes Λ׆༻ͨ͠Scheduler ʹొ࿥͞Εఆظ࣮ߦ͞ΕΔ
  30. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read
  31. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read αʔϏε͔ΒɼΫΤϦͷ࣮ߦ݁ՌΛར༻͢Δ
  32. ©2018 Wantedly, Inc. ৄ͘͠͸… https://speakerdeck.com/altech/ruby-dezuo-rudetafen-xi-ji-pan Ruby Ͱ࡞Δσʔλ෼ੳج൫ - Rails ΞϓϦέʔγϣϯʹ͓͚Δσʔλ෼ੳͷมભ

    - By Sohei Takeno
  33. ©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML

    ?
  34. ©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘

     γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ  νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ   αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ BigQuery in Recommendation Team
  35. ©2018 Wantedly, Inc. ͪͳΈʹ… ༨ஊ

  36. ©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘

     γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ  νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ   αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ ༨ஊ Recommendation
  37. ©2018 Wantedly, Inc. Recommendation ͬͯ͝ଘ஌Ͱ͔͢ʁ ༨ஊ

  38. ©2018 Wantedly, Inc. It is often necessary to make choices

    without sufficient personal experience of the alternatives. In everyday life, we rely on recommendations from other people either by word of mouth, recommendation letters, movie and book reviews printed in newspapers, or general surveys such as Zagat’s restaurant guides. Recommender systems assist and augment this natural social process. ਪનγεςϜͱ͸ [Resnick+ 97] ࣗ෼ͷܦݧ͔ΒͷΈͰ͸͋·ΓΑ͘Θ͔Βͳ͍΋ͷͷத͔ΒɼͲ͏ ͯ͠΋ͲΕ͔Λબ͹ͳ͚Ε͹ͳΒͳ͍ͱ͍͏͜ͱ͸Α͋͘Δɽ͜ͷ Α͏ͳࡍΘͨͨͪ͠͸ɼޱίϛɼਪનঢ়ɼ ৽ฉͷॻධ΍өըධɼϨ ετϥϯΨΠυͳͲͷଞਓ͔ΒͷਪનʹཔΔ͜ͱΛ ೔ৗతʹߦͬͯ ͍Δɽ ৘ใਪનγεςϜͱ͸ɼ͜͏ͨࣾ͠ձͰී௨ʹߦΘΕ͍ͯΔ Ұ࿈ͷߦҝΛิॿͨ͠Γɼଅਐͨ͠Γ͢Δ΋ͷͰ͋Δɽ [Resnick 97] P. Resnick and H. R. Varian. Recommender systems. Communications of the ACM, Vol. 40, No. 3, pp. 56–58, 1997.
  39. ©2018 Wantedly, Inc. Recommenders: Tools to help identify worthwhile stuff

    ਪનγεςϜͱ͸ [Konstan+ 03] ਪનγεςϜ: Ձ஋ͷ͋Δ΋ͷ͕ͲΕ͔ಛఆ͢Δͷʹ໾ཱͭಓ۩ [Konstan 03] J. A. Konstan and J. Riedl. Recommender systems: Collaborating in commerce and communities. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, Tutorial, 2003.
  40. ©2018 Wantedly, Inc. ৘ใਪનͷྫ ಛఆͷϢʔβͷᅂ޷ʹ ߹ͬͨ঎඼ͷਪન

  41. ©2018 Wantedly, Inc. ৘ใਪનͷྫ ྨࣅɾؔ࿈঎඼ ͷਪન (amazon.co.jp) ଞͷϢʔβʹΑΔධՁ

  42. ©2018 Wantedly, Inc. ৘ใਪનͷྫ ਓؾ঎඼ͷਪન (amazon.co.jp) ৽ண঎඼ͷਪન

  43. ©2018 Wantedly, Inc. ͭ·Γ… ༨ஊ

  44. ©2018 Wantedly, Inc. Everything is a Recommendation ༨ஊ ※ ๻͕উखʹݴ͍ͬͯΔ͚ͩͰ͢

  45. ©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘

     γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ  νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ   αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ BigQuery in Recommendation Team
  46. ©2018 Wantedly, Inc. ͨͱ͑͹… Ϣʔβ͕ձࣾͷืूΛΫϦοΫͨ͠ͱ͍͏৘ใΛར༻ͯ͠ɼ ͋ΔϢʔβʹձࣾͷืूΛਪન͍ͨ͠ ⭕ ⭕ ⭕ ⭕

    ⭕ ⭕ ⭕ ʁ ⭕ - - - - - - - ืू Ϣʔβ ⭕ : ΫϦοΫ - : ৘ใͳ͠ ? ͕Ͳ͏ͳΔ͔౰ͯΔ
  47. ©2018 Wantedly, Inc. աڈͷΫϦοΫ৘ใΛجʹͨ͠Ϣʔβؒͷྨࣅ౓Λࢉग़ͯ͠ɼ ࣅ͍ͯΔϢʔβ͕ΫϦοΫͨ͠ืूΛਪન͠Α͏ʂ (ϢʔβϕʔεڠௐϑΟϧλϦϯά) ྨࣅ౓௿ ྨࣅ౓ߴ ⭕ ⭕

    ⭕ ⭕ ⭕ ⭕ ⭕ ʁ ⭕ - - - - - - - ਪન ืू Ϣʔβ ͨͱ͑͹…
  48. ©2018 Wantedly, Inc. ୯७ͦ͏Ͱߟ͑Δ͜ͱ/΍Δ͜ͱ͸ͨ͘͞Μ… վળαΠΫϧ͕ͳ͔ͳ͔ճͤͳ͍ ࣮ߦ଎౓͸໰୊ͳ͍ʁ σʔλྔ͍͢͝ ΦϑϥΠϯධՁ͸Ͳ͏͢Δʁ ΦϯϥΠϯධՁ͸ʁ ࣮૷ݴޠ͸ʁ

    ෼ࢄॲཧͤ͞Δʁ σʔλͷύΠϓϥΠϯઃܭ͸ʁ ϚγϯϦιʔε଍ΓͯΔʁ σʔλͷݕূ͸Ͳ͏͢Δʁ
  49. ©2018 Wantedly, Inc. BigQueryͰશ෦΍ͬͪΌ͑

  50. ©2018 Wantedly, Inc. 㾎վળαΠΫϧΛ͢͹΍͘ճͨ͢Ίɼ#JH2VFSZΛϑϧ׆༻ͨ͠վળͷਪਐ wϝϞϦϕʔεͷڠௐϑΟϧλϦϯά͘Β͍ͳΒͱΓ͋͑ͣΫΤϦΛॻ͍ͯ͠·͏ 㾎ͦͷଞͷ෼ੳ౳ʹ΋#JH2VFSZΛϑϧ׆༻ w#*πʔϧΛ࢖ͬͨ,1*ͳͲͷࢦඪͷ؅ཧɾ؂ࢹ w"#ςετͳͲͷΦϯϥΠϯςετͷධՁ ৴པ۠ؒɼQWBMVFɼʜ BigQuery

    in Recommendation Team BigQuery ͷΫΤϦΛॻ͘͜ͱʹ͸͚ͬ͜͏׳Ε͍ͯΔ(ͭ΋Γ)
  51. ©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML

    ?
  52. ©2018 Wantedly, Inc. 㾎,1*௚݁͡Όͳ͍͔Β༏ઌ౓௿͍ 㾎΍Ε͹͙͔͢΋͚ͩͲɼϞνϕ͕͋·Γ্͕Βͳ͍ʜ 㾎Θ͟Θ͟ؤுͬͯػցֶश͢Δ΄ͲͰ΋ͳͦ͞͏ ͪΐ͏ͲͱΓ͘Ίͦ͏ͳ໰୊͕͋ͬͨ (୳ͨ͠) ͪΐ͏Ͳ͍͍ͷͰ#JH2VFSZ.-Ͱࢼͯ͠ΈΑ͏ʂ

  53. ©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML

    ?
  54. ©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ 㾎୯७ʹ͓΋͠Ζͦ͏ ЇТЇ Why use

    BigQuery ML ?
  55. ©2018 Wantedly, Inc. https://en-jp.wantedly.com/companies/wantedly/post_articles/129482 ༨ஊ

  56. ©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ   8IBUJT#JH2VFSZ.-   8IZVTF#JH2VFSZ.-

      )PXVTF#JH2VFSZ.-   %JTDVTTJPO Agenda
  57. ©2018 Wantedly, Inc. A example of Wantedly People • χϡʔεػೳ

    (Timeline) • ͓͢͢ΊͷχϡʔεΛ 1 ೔ʹ 2 ճ (ேͱன) push ௨஌Ͱ஌Β͍ͤͯΔ
  58. ©2018 Wantedly, Inc. Motivations & Backgrounds ✓ Push ௨஌Λ։͔ͳ͍ϢʔβʹͱͬͨΒ 1

    ೔ʹ 2 ճ΋ૹΔͷ͸໎࿭͔΋ ✓ 1 ೔ʹ 2 ճ Push ௨஌Λૹͬͯ΋େৎ෉ͦ͏ͳϢʔβʹ͚ͩ 2 ճૹΓ͍ͨ ✓ ଞʹ΋༏ઌ౓ͷߴ͍λεΫ͕͋ΔͷͰ͋·Γͬ͘͡Γ࣌ؒΛ͔͚ΒΕͳ͍ ✓ ͬ͘͞ͱͰ͖ͦ͏ͳͷͰ BigQuery ML Λ࢖ͬͯΈΔ
  59. ©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ2೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʹPush

    ௨஌ Λ։෧ͨ͠ϢʔβΛ౰ͯΔʳ ໨తม਺ɿ1೔લͷ Push ௨஌Λ։͍͔ͨͲ͏͔ (1 or 0) • ։͍ͨ Push ௨஌਺ • χϡʔεهࣄͷӾཡ਺ • ໊ࢗͷεΩϟϯ਺ …etc.
  60. ©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ2೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ໨తม਺ɿ1೔લͷ Push ௨஌Λ։͍͔ͨͲ͏͔ (1

    or 0) • ։͍ͨ Push ௨஌਺ • χϡʔεهࣄͷӾཡ਺ • ໊ࢗͷεΩϟϯ਺ …etc. આ໌ม਺͔Β໨తม਺ (label) ͕0͔1͔Λ౰ͯΔ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʹPush ௨஌ Λ։෧ͨ͠ϢʔβΛ౰ͯΔʳ
  61. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  62. ©2018 Wantedly, Inc. MODEL ͷ࡞੒ BigQuery merge Scheduler ʹొ࿥ PREDICT

    ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  63. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒
  64. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # Ϟσϧ࡞੒Λએݴ ˌ࡞੒͢ΔϞσϧ໊Λهड़ # ଞʹ͸ # CREATE MODEL IF NOT EXISTS # CREATE OR REPLACE MODEL CREATE MODEL `ml_models.push_open` MODEL ͷ࡞੒ MODEL ͷ࡞੒
  65. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ௥ՃͷϞσϧΦϓγϣϯΛࢦఆ # model_type ʹ͸ # ’logistic_reg’(ϩδεςΟοΫճؼ) ͔ ‘linear_reg’(ઢܗճؼ) Λࢦఆ # ଞʹ΋ɼL1, L2 ਖ਼ଇԽͷద༻ྔ΍ɼֶशͷεςοϓ਺ɼ # ೖྗσʔλΛτϨʔχϯάηοτͱධՁηοτʹ෼ׂ͢Δํ๏ɼͳͲͳͲΛࢦఆͰ͖Δ OPTIONS (model_type='logistic_reg') AS MODEL ͷ࡞੒ MODEL ͷ࡞੒
  66. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ೖྗσʔλΛग़ྗ͢ΔΫΤϦهड़ # label ͱ͍͏໊લͷྻΛ౰ͯΔ # (ΧϥϜ໊͸ΦϓγϣϯͰࢦఆՄೳ) SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒ MODEL ͷ࡞੒
  67. ©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ೖྗσʔλΛग़ྗ͢ΔΫΤϦهड़ # label ͱ͍͏໊લͷྻΛ౰ͯΔ # (ΧϥϜ໊͸ΦϓγϣϯͰࢦఆՄೳ) SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒ MODEL ͷ࡞੒
  68. ©2018 Wantedly, Inc. SELECT * FROM ML.EVALUATE(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... ) ) MODEL ͷੑೳධՁ # ࡞੒ͨ͠Ϟσϧ͕ͲΕ͚ͩͷੑೳΛ͍࣋ͬͯΔ͔ධՁ # (Ͳͷ͘Β͍ਖ਼͘͠ label Λ౰ͯΒΕΔ͔)
  69. ©2018 Wantedly, Inc. SELECT * FROM ML.EVALUATE(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... ) ) MODEL ͷੑೳධՁ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ # ϩδεςΟοΫճؼͷ৔߹ҎԼͷΑ͏ͳग़ྗ # precision: ద߹཰ # label ͕ 1 ͱ༧ଌͨ͠΋ͷͷ͏ͪɼ࣮ࡍʹ 1 Ͱ͋Δ΋ͷͷׂ߹ # recall: ࠶ݱ཰ # label ͕ ࣮ࡍʹ 1 Ͱ͋Δ΋ͷͷ͏ͪɼ1 ͱ༧ଌ͞Εͨ΋ͷͷׂ߹ MODEL ͷੑೳධՁ
  70. ©2018 Wantedly, Inc. SELECT * FROM ML.ROC_CURVE(MODEL `ml_models.push_open`, ( SELECT

    ... MODEL ͷੑೳධՁ (ROC_CURVE) # ROC ۂઢ΋ग़ྗͰ͖Δʂ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ MODEL ͷੑೳධՁ
  71. ©2018 Wantedly, Inc. SELECT * FROM ML.CONFUSION_MATRIX(MODEL `ml_models.push_open`, ( SELECT

    ... MODEL ͷੑೳධՁ (CONFUSION_MATRIX) # ࠞ߹ߦྻ΋ग़ྗͰ͖Δʂ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢
  72. ©2018 Wantedly, Inc. ༧ଌΫΤϦͷεέδϡʔϥ΁ͷొ࿥ BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ

    ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  73. ©2018 Wantedly, Inc. SELECT * FROM ML.PREDICT(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, ... ) ) ϞσϧΛ࢖༻ͯ݁͠ՌΛ༧ଌ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢
  74. ©2018 Wantedly, Inc. SELECT * FROM ML.PREDICT(MODEL `ml_models.push_open`, ( SELECT

    IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, ... ) ) ϞσϧΛ࢖༻ͯ݁͠ՌΛ༧ଌ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ # ϞσϧΛ࢖༻ͯ͠ɼೖྗσʔλʹରͯ͠ label ͷ஋Λ༧ଌ͢Δ # predicted_label: ༧ଌ͞Εͨ஋ # predicted_label_probs: label ͝ͱͷ༧ଌ֬཰
  75. ©2018 Wantedly, Inc. PREDICT ͷ࣮ߦ BigQuery merge Scheduler ʹొ࿥ PREDICT

    ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  76. ©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥

    PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)
  77. ©2018 Wantedly, Inc. ͦͷ݁Ռ…

  78. ©2018 Wantedly, Inc. Results ࣮૷ظؒʢ΄΅ʣ 1 ೔Ͱ Push ௨஌ͷ։෧཰͕େ͖͘վળ

  79. ©2018 Wantedly, Inc. Another example • ͋ΔϢʔβ͕ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛԡ͔͢ͷ༧ଌ https://www.wantedly.com/projects/221745

  80. ©2018 Wantedly, Inc. Motivations & Backgrounds ✓ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯ͕ԡ͞ΕΔ਺͸ॏཁͳࢦඪͷ 1 ͭ

    ✓ ͦͷ··αʔϏεΠϯ͢ΔΘ͚Ͱ͸ͳ͍͕ɼ׆༻ͷํ๏͸͍Ζ͍Ζߟ͑ΒΕΔ ✓ ϢʔβͷͲͷΑ͏ͳߦಈ͕ϘλϯΛԡ͢͜ͱʹͭͳ͕Δͷ͔஌Γ͍ͨ ✓ ͬ͘͞ͱࢼͤͦ͏ͳͷͰ BigQuery ML Λ࢖ͬͯΈΔ
  81. ©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ8೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʙ7

    ೔લͷؒʹ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛԡͨ͠ϢʔβΛ౰ͯΔʳ 1೔લʙ7೔લͷظؒʹ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛ ԡ͔ͨ͠Ͳ͏͔ (1 or 0) • ʮ࿩Λฉ͖ʹߦ͖͍ͨʯΛԡͨ͠਺ • ձࣾͷืूͷৄࡉϖʔδΛӾཡͨ͠਺ • ձ͔ࣾΒͷεΧ΢τϝοηʔδʹฦ৴ͨ͠਺ …etc. ໨తม਺ :
  82. ©2018 Wantedly, Inc. https://towardsdatascience.com/when-will-stack-overflow-reply-how-to-predict-with-bigquery-553c24b546a3 ͓΋͠Ζ͍ͳͱࢥͬͨ׆༻ྫ • ճ౴ΛಘΒΕΔ֬཰ • ճ౴ΛಘΒΕΔ·Ͱͷ࣌ؒ •

    ѱ͍ධՁΛೖΕΒΕΔ֬཰ • ༵೔ • ࣌ؒ • ࣭໰จͷ௕͞ • ࣭໰ͷλΠτϧͷ1ޠ໨ • λΠτϧ͕ “ʁ” ͰऴΘ͍ͬͯΔ͔Ͳ͏͔ • ࢖༻ͨ͠λά • ΞΧ΢ϯτΛ࡞੒ͨ࣌͠ظ In Stack Overflow
  83. ©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ   8IBUJT#JH2VFSZ.-   8IZVTF#JH2VFSZ.-

      )PXVTF#JH2VFSZ.-   %JTDVTTJPO Agenda
  84. ©2018 Wantedly, Inc. ྑ͔ͬͨ͜ͱ 㾎໰୊ʹऔΓ૊ΜͰ͔Β࠷ॳͷϞσϧΛ࡞੒͢Δ·Ͱ͕ຊ౰ʹҰॠͩͬͨ w σʔλͷҠಈͳͲΛߟ͑ͳ͍͍ͯ͘ͷͬͯૉ੖Β͍͠ʂ 㾎ҙ֎ͱ͍Ζ͍ΖͰ͖Δ w ਖ਼ଇԽɼֶश཰ͷઃఆɼFBSMZTUPQQJOHɼΫϥεෆۉߧ΁ͷରॲʜ

    㾎݁Ռͷڞ༗ָ͕ w ࣮ߦ݁ՌͷϦϯΫ౤͛ͨΓɼςʔϒϧͷ৔ॴڭ͑ͨΓɼΫΤϦ͚ͩ౤͛ͨΓ 㾎ແݶͷϦιʔε w (PPHMFઌੜʹ͓෍ࢪΛ͢Δ͚ͩʜ
  85. ©2018 Wantedly, Inc. ؾΛ͚ͭͳ͍ͱ͍͚ͳ͍͜ͱ 㾎Ϟσϧ΍ΫΤϦͷ؅ཧ w Ϟσϧ࡞੒࣌ͷΫΤϦΛͲ͏؅ཧ͢Δʁ w ϛεͬͯϞσϧ͕ফ͞ΕͨΓ্ॻ͖͞ΕΔͱਏ͍ 㾎Ϟσϧͷߋ৽΍ఆظతͳ1SFEJDU͸΍ΓͮΒ͍

    w 8BOUFEMZͰ͸͍͍ײ͡ͷεέδϡʔϥͷ࢓૊Έ͕͋Δ͕ʜ 㾎αʔϏεͰ׆༻͢Δ৔߹͸ɼऔΓ૊Ή໰୊Λ͖ͪΜͱߟ͑Δ w ͲΜͳ໰୊ʹͰ΋ద͍ͯ͠ΔΘ͚Ͱ͸ͳ͍ w બ୒ࢶͷͭͱͯ͠ߟ͑Δ
  86. ©2018 Wantedly, Inc. ؾΛ͚ͭͳ͍ͱ͍͚ͳ͍͜ͱ 㾎Ϟσϧ΍ΫΤϦͷ؅ཧ w Ϟσϧ࡞੒࣌ͷΫΤϦΛͲ͏؅ཧ͢Δʁ w ϛεͬͯϞσϧ͕ফ͞ΕͨΓ্ॻ͖͞ΕΔͱਏ͍ 㾎Ϟσϧͷߋ৽΍ఆظతͳ1SFEJDU͸΍ΓͮΒ͍

    w 8BOUFEMZͰ͸͍͍ײ͡ͷεέδϡʔϥͷ࢓૊Έ͕͋Δ͕ʜ w ΫΤϦεέδϡʔϦϯάػೳ͕ϦϦʔε͞Εͨʂ 㾎αʔϏεͰ׆༻͢Δ৔߹͸ɼऔΓ૊Ή໰୊Λ͖ͪΜͱߟ͑Δ w ͲΜͳ໰୊ʹͰ΋ద͍ͯ͠ΔΘ͚Ͱ͸ͳ͍ w બ୒ࢶͷͭͱͯ͠ߟ͑Δ
  87. ©2018 Wantedly, Inc. ͜͏͍͏;͏ʹ࢖͑͹͍͍ͷ͔΋ʁ ✓ ػցֶश΍ͬͨ͜ͱͳ͍ਓ͕ػցֶशΛഽͰײ͡Δ ✓ ຊ֨తͳػցֶशϓϩδΣΫτΛ։࢝͢ΔલʹઢܗճؼͰͬ͘͞ͱ෼ੳͯ͠ΈΔ ✓ ·ͩखΛ෇͚͍ͯͳ͍͚Ͳɼ͕ͬͭΓϦιʔεΛ౤ೖ͢Δͷ͸೉͍͠Α͏ͳ୯७

    ͳ෼ྨ໰୊ΛϩδεςΟοΫճؼͰͬ͘͞ͱղ͘ ✓ ΦʔϓϯʹͰ͖ΔσʔλͰ͓΋ͪΌΛ࡞ͬͯެ։͢Δ https://towardsdatascience.com/when-will-stack-overflow-reply-how-to-predict-with-bigquery-553c24b546a3
  88. ©2018 Wantedly, Inc. ײ૝

  89. ©2018 Wantedly, Inc. ͨͷ͔ͬͨ͠

  90. ©2018 Wantedly, Inc. Summary 㾎 #JH2VFSZ͚ͩͰ.-͕Ͱ͖Δ#JH2VFSZ.-ͷొ৔ w σʔλͷҠಈͳ͠ w 42-Λॻ͚ͩ͘Ͱػցֶश͕Ͱ͖Δ

    w ػցֶशͷຽओԽɼ։ൃ଎౓ͷ޲্ 㾎 8BOUFEMZ಺Ͱ࢖ͬͯΈͨ w γϯϓϧͳ໰୊ղܾ w ຊ֨తͳ.-ϓϩδΣΫτΛ։࢝͢Δલͷௐࠪ σʔλΛ࢖͑Δ͔ͷௐࠪ  㾎 ࠓޙʹظ଴✨ ϥϯμϜϑΥϨετ͕࣮૷͞ΕͯΔʜ
  91. ©2018 Wantedly, Inc. https://www.wantedly.com/projects/221745 We are hiring!!