Slide 1

Slide 1 text

©2018 Wantedly, Inc. BigQuery ML Λ࢖ͬͯΈͨ࿩ Examples of BigQuery ML in Wantedly #bq_sushi Tokyo #9 30.Nov.2018 - Yuya Matsumura - @yu-ya4

Slide 2

Slide 2 text

©2018 Wantedly, Inc. ✓ Yuya Matsumura (দଜ ༏໵) ✓ Software Engineer (ݕࡧɾਪનΤϯδχΞʁ) ✓ Wantedly, Inc. Recommendation Team (since April 2018) ✓ Interested in Information Retrieval, Machine Learning Self Introduction @yu-ya4 @yu__ya4

Slide 3

Slide 3 text

©2018 Wantedly, Inc. https://d3m.connpass.com/ એ఻

Slide 4

Slide 4 text

©2018 Wantedly, Inc. 㾎࿩͢͜ͱ w #JH2VFSZ.-ΛͲͷΑ͏ʹαʔϏεͰར༻ͨ͠ͷ͔ w #JH2VFSZ.-Λ࢖ͬͯΈͯͲ͏ࢥ͔ͬͨ 㾎࿩͞ͳ͍͜ͱ w #JH2VFSZ.-ͷৄࡉͳ࢖͍ํ w ػցֶशͷʹ͍ͭͯͷৄ͍͠࿩ About this talk

Slide 5

Slide 5 text

©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ 8IBUJT#JH2VFSZ.- 8IZVTF#JH2VFSZ.- )PXVTF#JH2VFSZ.- %JTDVTTJPO Agenda

Slide 6

Slide 6 text

©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ 8IBUJT#JH2VFSZ.- 8IZVTF#JH2VFSZ.- )PXVTF#JH2VFSZ.- %JTDVTTJPO Agenda

Slide 7

Slide 7 text

©2018 Wantedly, Inc. https://www.wantedly.com/about/overview Wantedly͸ʮ͸ͨΒ͘ʯΛ໘ന͘͢ΔϏδωεSNSͰ͢ɻ 
 ӡ໋ͷνʔϜ΍࢓ࣄʹग़ձ͑ͨΓɺਓ຺ͷߏங΍؅ཧɺϏδωεͷ৘ใऩूʹ׆༻͞Ε͍ͯ·͢ɻ Wantedlyͱ͸

Slide 8

Slide 8 text

©2018 Wantedly, Inc. Products of

Slide 9

Slide 9 text

©2018 Wantedly, Inc. Products

Slide 10

Slide 10 text

©2018 Wantedly, Inc. Products

Slide 11

Slide 11 text

©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ 8IBUJT#JH2VFSZ.- 8IZVTF#JH2VFSZ.- )PXVTF#JH2VFSZ.- %JTDVTTJPO Agenda

Slide 12

Slide 12 text

©2018 Wantedly, Inc. ✓ Google Cloud Next 2018 Ͱൃද ✓ BigQuery + ML (ػցֶश) • طଘͷ SQL πʔϧͱεΩϧͰػցֶश͕Ͱ͖Δ • ػցֶशͷຽओԽ (σʔλΞφϦετɼϏδωενʔϜ΋࢖͑Δʂ) ✓ BigQuery͚ͩͰ׬݁͢ΔͨΊखܰʂ • σʔλΛϩʔΧϧ؀ڥ౳ʹҠಈͤ͞Δඞཁͳ͠ • ։ൃεϐʔυͷ޲্ What is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro

Slide 13

Slide 13 text

©2018 Wantedly, Inc. ✓ ݱࡏαϙʔτ͍ͯ͠ΔϞσϧ͸ҎԼͷ 3 छ • ઢܗճؼϞσϧ (σʔλ͔Β਺஋ͷਪఆ) • ೋ߲ϩδεςΟοΫճؼϞσϧ (σʔλ͔Β true/false Λ൑ఆ) • ଟ߲ϩδεςΟοΫճؼϞσϧ(σʔλΛ3ͭҎ্ͷΫϥεʹ෼ྨ) What is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro

Slide 14

Slide 14 text

©2018 Wantedly, Inc. ͪΐͬͱ͚ͩػցֶशͷઆ໌

Slide 15

Slide 15 text

©2018 Wantedly, Inc. ઢܗճؼ https://upload.wikimedia.org/wikipedia/commons/b/be/Normdist_regression.png

Slide 16

Slide 16 text

©2018 Wantedly, Inc. ઢܗճؼ આ໌ม਺͔Β໨తม਺ ͷ਺஋Λ౰ͯΔ

Slide 17

Slide 17 text

©2018 Wantedly, Inc. ϩδεςΟοΫճؼ https://support.google.com/analytics/answer/7586738?hl=ja&ref_topic=3416089 આ໌ม਺͔Β໨తม਺ (label) ͕0͔1͔Λ౰ͯΔ

Slide 18

Slide 18 text

©2018 Wantedly, Inc. ػցֶशʹ͍ͭͯͷਂ͍ཧղ͕ͳ͘ͱ΋ɼ BigQuery Ͱ SQL ΫΤϦΛ࢖༻ͯ͠ɼσʔλΛҠಈͤ͞Δ͜ͱͳ͘खܰʹ ઢܗճؼ΍ϩδεςΟοΫճؼͳͲͷػցֶशϞσϧΛ࡞੒࣮ͯ͠ߦͰ͖Δɽ What is BigQuery ML? https://cloud.google.com/bigquery/docs/bigqueryml-intro

Slide 19

Slide 19 text

©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ 8IBUJT#JH2VFSZ.- 8IZVTF#JH2VFSZ.- )PXVTF#JH2VFSZ.- %JTDVTTJPO Agenda

Slide 20

Slide 20 text

©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML ?

Slide 21

Slide 21 text

©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML ?

Slide 22

Slide 22 text

©2018 Wantedly, Inc. 㾎ΞϓϦέʔγϣϯͷϩά͸͢΂ͯ#JH2VFSZʹอଘ ʙ 㾎σʔλ෼ੳͷͨΊͷίʔυϕʔε͕෼཭͞Εͯ࢓૊ΈԽ͞Ε͍ͯΔ BigQuery Λ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ

Slide 23

Slide 23 text

©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥ Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read

Slide 24

Slide 24 text

©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥ Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read Ruby ϑΝΠϧʹ BigQuery ͷΫΤϦΛهड़ͨ͠ Job ͷ࡞੒

Slide 25

Slide 25 text

©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <

Slide 26

Slide 26 text

©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <

Slide 27

Slide 27 text

©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <

Slide 28

Slide 28 text

©2018 Wantedly, Inc. Define Job BigQuery ͷ࣮ߦ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ export do table :daily_company_scores columns %w[:day, :company_id, :score] mode :update, [:day] end schedule do frequency :daily end run :bq, <

Slide 29

Slide 29 text

©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥ Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read Job ϑΝΠϧΛ௥Ճ͠ɼpull-request ʹͯ͠Ϛʔδ ࣗಈͰ Kubernetes Λ׆༻ͨ͠Scheduler ʹొ࿥͞Εఆظ࣮ߦ͞ΕΔ

Slide 30

Slide 30 text

©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥ Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read

Slide 31

Slide 31 text

©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥ Job(ΫΤϦ)ͷ࣮ߦ ΫΤϦͷ࣮ߦ݁ՌΛ BigQuery ΍ RDBʹॻ͖ग़͢ Job ͷهड़(΄΅ SQL) Logging Read αʔϏε͔ΒɼΫΤϦͷ࣮ߦ݁ՌΛར༻͢Δ

Slide 32

Slide 32 text

©2018 Wantedly, Inc. ৄ͘͠͸… https://speakerdeck.com/altech/ruby-dezuo-rudetafen-xi-ji-pan Ruby Ͱ࡞Δσʔλ෼ੳج൫ - Rails ΞϓϦέʔγϣϯʹ͓͚Δσʔλ෼ੳͷมભ - By Sohei Takeno

Slide 33

Slide 33 text

©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML ?

Slide 34

Slide 34 text

©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘ γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ BigQuery in Recommendation Team

Slide 35

Slide 35 text

©2018 Wantedly, Inc. ͪͳΈʹ… ༨ஊ

Slide 36

Slide 36 text

©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘ γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ ༨ஊ Recommendation

Slide 37

Slide 37 text

©2018 Wantedly, Inc. Recommendation ͬͯ͝ଘ஌Ͱ͔͢ʁ ༨ஊ

Slide 38

Slide 38 text

©2018 Wantedly, Inc. It is often necessary to make choices without sufficient personal experience of the alternatives. In everyday life, we rely on recommendations from other people either by word of mouth, recommendation letters, movie and book reviews printed in newspapers, or general surveys such as Zagat’s restaurant guides. Recommender systems assist and augment this natural social process. ਪનγεςϜͱ͸ [Resnick+ 97] ࣗ෼ͷܦݧ͔ΒͷΈͰ͸͋·ΓΑ͘Θ͔Βͳ͍΋ͷͷத͔ΒɼͲ͏ ͯ͠΋ͲΕ͔Λબ͹ͳ͚Ε͹ͳΒͳ͍ͱ͍͏͜ͱ͸Α͋͘Δɽ͜ͷ Α͏ͳࡍΘͨͨͪ͠͸ɼޱίϛɼਪનঢ়ɼ ৽ฉͷॻධ΍өըධɼϨ ετϥϯΨΠυͳͲͷଞਓ͔ΒͷਪનʹཔΔ͜ͱΛ ೔ৗతʹߦͬͯ ͍Δɽ ৘ใਪનγεςϜͱ͸ɼ͜͏ͨࣾ͠ձͰී௨ʹߦΘΕ͍ͯΔ Ұ࿈ͷߦҝΛิॿͨ͠Γɼଅਐͨ͠Γ͢Δ΋ͷͰ͋Δɽ [Resnick 97] P. Resnick and H. R. Varian. Recommender systems. Communications of the ACM, Vol. 40, No. 3, pp. 56–58, 1997.

Slide 39

Slide 39 text

©2018 Wantedly, Inc. Recommenders: Tools to help identify worthwhile stuff ਪનγεςϜͱ͸ [Konstan+ 03] ਪનγεςϜ: Ձ஋ͷ͋Δ΋ͷ͕ͲΕ͔ಛఆ͢Δͷʹ໾ཱͭಓ۩ [Konstan 03] J. A. Konstan and J. Riedl. Recommender systems: Collaborating in commerce and communities. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, Tutorial, 2003.

Slide 40

Slide 40 text

©2018 Wantedly, Inc. ৘ใਪનͷྫ ಛఆͷϢʔβͷᅂ޷ʹ ߹ͬͨ঎඼ͷਪન

Slide 41

Slide 41 text

©2018 Wantedly, Inc. ৘ใਪનͷྫ ྨࣅɾؔ࿈঎඼ ͷਪન (amazon.co.jp) ଞͷϢʔβʹΑΔධՁ

Slide 42

Slide 42 text

©2018 Wantedly, Inc. ৘ใਪનͷྫ ਓؾ঎඼ͷਪન (amazon.co.jp) ৽ண঎඼ͷਪન

Slide 43

Slide 43 text

©2018 Wantedly, Inc. ͭ·Γ… ༨ஊ

Slide 44

Slide 44 text

©2018 Wantedly, Inc. Everything is a Recommendation ༨ஊ ※ ๻͕উखʹݴ͍ͬͯΔ͚ͩͰ͢

Slide 45

Slide 45 text

©2018 Wantedly, Inc. ✓ ݱࡏͷWantedly Visit ͷ Recommendation Team ࣄ৘ γϯϓϧͳΞΠσΞͰվળͰ͖ͦ͏ͳՕॴ͕ෳ਺࢒͍ͬͯΔ νʔϜ͕খن໛ ࢲΛؚΊ໊ͯ αʔϏε͸খن໛͡Όͳ͍ ͨ͘͞ΜͷϢʔβ ͭͭͷվળʹΊ͍͍ͬͺ͍࣌ؒΛ͔͚͍͍ͯΘ͚Ͱ͸ͳ͍ վળαΠΫϧΛ͢͹΍͘ճ͍ͨ͠ BigQuery in Recommendation Team

Slide 46

Slide 46 text

©2018 Wantedly, Inc. ͨͱ͑͹… Ϣʔβ͕ձࣾͷืूΛΫϦοΫͨ͠ͱ͍͏৘ใΛར༻ͯ͠ɼ ͋ΔϢʔβʹձࣾͷืूΛਪન͍ͨ͠ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ʁ ⭕ - - - - - - - ืू Ϣʔβ ⭕ : ΫϦοΫ - : ৘ใͳ͠ ? ͕Ͳ͏ͳΔ͔౰ͯΔ

Slide 47

Slide 47 text

©2018 Wantedly, Inc. աڈͷΫϦοΫ৘ใΛجʹͨ͠Ϣʔβؒͷྨࣅ౓Λࢉग़ͯ͠ɼ ࣅ͍ͯΔϢʔβ͕ΫϦοΫͨ͠ืूΛਪન͠Α͏ʂ (ϢʔβϕʔεڠௐϑΟϧλϦϯά) ྨࣅ౓௿ ྨࣅ౓ߴ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ʁ ⭕ - - - - - - - ਪન ืू Ϣʔβ ͨͱ͑͹…

Slide 48

Slide 48 text

©2018 Wantedly, Inc. ୯७ͦ͏Ͱߟ͑Δ͜ͱ/΍Δ͜ͱ͸ͨ͘͞Μ… վળαΠΫϧ͕ͳ͔ͳ͔ճͤͳ͍ ࣮ߦ଎౓͸໰୊ͳ͍ʁ σʔλྔ͍͢͝ ΦϑϥΠϯධՁ͸Ͳ͏͢Δʁ ΦϯϥΠϯධՁ͸ʁ ࣮૷ݴޠ͸ʁ ෼ࢄॲཧͤ͞Δʁ σʔλͷύΠϓϥΠϯઃܭ͸ʁ ϚγϯϦιʔε଍ΓͯΔʁ σʔλͷݕূ͸Ͳ͏͢Δʁ

Slide 49

Slide 49 text

©2018 Wantedly, Inc. BigQueryͰશ෦΍ͬͪΌ͑

Slide 50

Slide 50 text

©2018 Wantedly, Inc. 㾎վળαΠΫϧΛ͢͹΍͘ճͨ͢Ίɼ#JH2VFSZΛϑϧ׆༻ͨ͠վળͷਪਐ wϝϞϦϕʔεͷڠௐϑΟϧλϦϯά͘Β͍ͳΒͱΓ͋͑ͣΫΤϦΛॻ͍ͯ͠·͏ 㾎ͦͷଞͷ෼ੳ౳ʹ΋#JH2VFSZΛϑϧ׆༻ w#*πʔϧΛ࢖ͬͨ,1*ͳͲͷࢦඪͷ؅ཧɾ؂ࢹ w"#ςετͳͲͷΦϯϥΠϯςετͷධՁ ৴པ۠ؒɼQWBMVFɼʜ BigQuery in Recommendation Team BigQuery ͷΫΤϦΛॻ͘͜ͱʹ͸͚ͬ͜͏׳Ε͍ͯΔ(ͭ΋Γ)

Slide 51

Slide 51 text

©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML ?

Slide 52

Slide 52 text

©2018 Wantedly, Inc. 㾎,1*௚݁͡Όͳ͍͔Β༏ઌ౓௿͍ 㾎΍Ε͹͙͔͢΋͚ͩͲɼϞνϕ͕͋·Γ্͕Βͳ͍ʜ 㾎Θ͟Θ͟ؤுͬͯػցֶश͢Δ΄ͲͰ΋ͳͦ͞͏ ͪΐ͏ͲͱΓ͘Ίͦ͏ͳ໰୊͕͋ͬͨ (୳ͨ͠) ͪΐ͏Ͳ͍͍ͷͰ#JH2VFSZ.-Ͱࢼͯ͠ΈΑ͏ʂ

Slide 53

Slide 53 text

©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ Why use BigQuery ML ?

Slide 54

Slide 54 text

©2018 Wantedly, Inc. 㾎#JH2VFSZΛ༻͍ͨ෼ੳɾਪનج൫͕੔͍ͬͯΔ 㾎νʔϜͱͯ͠ීஈ͔Β#JH2VFSZΛΑ͘࢖͏ͷͰ׳Ε͍ͯΔ 㾎ͪΐ͏ͲऔΓ૊Ίͦ͏ͳ໰୊͕͋ͬͨ 㾎୯७ʹ͓΋͠Ζͦ͏ ЇТЇ Why use BigQuery ML ?

Slide 55

Slide 55 text

©2018 Wantedly, Inc. https://en-jp.wantedly.com/companies/wantedly/post_articles/129482 ༨ஊ

Slide 56

Slide 56 text

©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ 8IBUJT#JH2VFSZ.- 8IZVTF#JH2VFSZ.- )PXVTF#JH2VFSZ.- %JTDVTTJPO Agenda

Slide 57

Slide 57 text

©2018 Wantedly, Inc. A example of Wantedly People • χϡʔεػೳ (Timeline) • ͓͢͢ΊͷχϡʔεΛ 1 ೔ʹ 2 ճ (ேͱன) push ௨஌Ͱ஌Β͍ͤͯΔ

Slide 58

Slide 58 text

©2018 Wantedly, Inc. Motivations & Backgrounds ✓ Push ௨஌Λ։͔ͳ͍ϢʔβʹͱͬͨΒ 1 ೔ʹ 2 ճ΋ૹΔͷ͸໎࿭͔΋ ✓ 1 ೔ʹ 2 ճ Push ௨஌Λૹͬͯ΋େৎ෉ͦ͏ͳϢʔβʹ͚ͩ 2 ճૹΓ͍ͨ ✓ ଞʹ΋༏ઌ౓ͷߴ͍λεΫ͕͋ΔͷͰ͋·Γͬ͘͡Γ࣌ؒΛ͔͚ΒΕͳ͍ ✓ ͬ͘͞ͱͰ͖ͦ͏ͳͷͰ BigQuery ML Λ࢖ͬͯΈΔ

Slide 59

Slide 59 text

©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ2೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʹPush ௨஌ Λ։෧ͨ͠ϢʔβΛ౰ͯΔʳ ໨తม਺ɿ1೔લͷ Push ௨஌Λ։͍͔ͨͲ͏͔ (1 or 0) • ։͍ͨ Push ௨஌਺ • χϡʔεهࣄͷӾཡ਺ • ໊ࢗͷεΩϟϯ਺ …etc.

Slide 60

Slide 60 text

©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ2೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ໨తม਺ɿ1೔લͷ Push ௨஌Λ։͍͔ͨͲ͏͔ (1 or 0) • ։͍ͨ Push ௨஌਺ • χϡʔεهࣄͷӾཡ਺ • ໊ࢗͷεΩϟϯ਺ …etc. આ໌ม਺͔Β໨తม਺ (label) ͕0͔1͔Λ౰ͯΔ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʹPush ௨஌ Λ։෧ͨ͠ϢʔβΛ౰ͯΔʳ

Slide 61

Slide 61 text

©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)

Slide 62

Slide 62 text

©2018 Wantedly, Inc. MODEL ͷ࡞੒ BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)

Slide 63

Slide 63 text

©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒

Slide 64

Slide 64 text

©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # Ϟσϧ࡞੒Λએݴ ˌ࡞੒͢ΔϞσϧ໊Λهड़ # ଞʹ͸ # CREATE MODEL IF NOT EXISTS # CREATE OR REPLACE MODEL CREATE MODEL `ml_models.push_open` MODEL ͷ࡞੒ MODEL ͷ࡞੒

Slide 65

Slide 65 text

©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ௥ՃͷϞσϧΦϓγϣϯΛࢦఆ # model_type ʹ͸ # ’logistic_reg’(ϩδεςΟοΫճؼ) ͔ ‘linear_reg’(ઢܗճؼ) Λࢦఆ # ଞʹ΋ɼL1, L2 ਖ਼ଇԽͷద༻ྔ΍ɼֶशͷεςοϓ਺ɼ # ೖྗσʔλΛτϨʔχϯάηοτͱධՁηοτʹ෼ׂ͢Δํ๏ɼͳͲͳͲΛࢦఆͰ͖Δ OPTIONS (model_type='logistic_reg') AS MODEL ͷ࡞੒ MODEL ͷ࡞੒

Slide 66

Slide 66 text

©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ೖྗσʔλΛग़ྗ͢ΔΫΤϦهड़ # label ͱ͍͏໊લͷྻΛ౰ͯΔ # (ΧϥϜ໊͸ΦϓγϣϯͰࢦఆՄೳ) SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒ MODEL ͷ࡞੒

Slide 67

Slide 67 text

©2018 Wantedly, Inc. CREATE MODEL `ml_models.push_open` OPTIONS (model_type='logistic_reg') AS SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... # ೖྗσʔλΛग़ྗ͢ΔΫΤϦهड़ # label ͱ͍͏໊લͷྻΛ౰ͯΔ # (ΧϥϜ໊͸ΦϓγϣϯͰࢦఆՄೳ) SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... MODEL ͷ࡞੒ MODEL ͷ࡞੒

Slide 68

Slide 68 text

©2018 Wantedly, Inc. SELECT * FROM ML.EVALUATE(MODEL `ml_models.push_open`, ( SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... ) ) MODEL ͷੑೳධՁ # ࡞੒ͨ͠Ϟσϧ͕ͲΕ͚ͩͷੑೳΛ͍࣋ͬͯΔ͔ධՁ # (Ͳͷ͘Β͍ਖ਼͘͠ label Λ౰ͯΒΕΔ͔)

Slide 69

Slide 69 text

©2018 Wantedly, Inc. SELECT * FROM ML.EVALUATE(MODEL `ml_models.push_open`, ( SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, scan_count, ... FROM `people.ironna_data*` WHERE ... ) ) MODEL ͷੑೳධՁ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ # ϩδεςΟοΫճؼͷ৔߹ҎԼͷΑ͏ͳग़ྗ # precision: ద߹཰ # label ͕ 1 ͱ༧ଌͨ͠΋ͷͷ͏ͪɼ࣮ࡍʹ 1 Ͱ͋Δ΋ͷͷׂ߹ # recall: ࠶ݱ཰ # label ͕ ࣮ࡍʹ 1 Ͱ͋Δ΋ͷͷ͏ͪɼ1 ͱ༧ଌ͞Εͨ΋ͷͷׂ߹ MODEL ͷੑೳධՁ

Slide 70

Slide 70 text

©2018 Wantedly, Inc. SELECT * FROM ML.ROC_CURVE(MODEL `ml_models.push_open`, ( SELECT ... MODEL ͷੑೳධՁ (ROC_CURVE) # ROC ۂઢ΋ग़ྗͰ͖Δʂ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ MODEL ͷੑೳධՁ

Slide 71

Slide 71 text

©2018 Wantedly, Inc. SELECT * FROM ML.CONFUSION_MATRIX(MODEL `ml_models.push_open`, ( SELECT ... MODEL ͷੑೳධՁ (CONFUSION_MATRIX) # ࠞ߹ߦྻ΋ग़ྗͰ͖Δʂ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢

Slide 72

Slide 72 text

©2018 Wantedly, Inc. ༧ଌΫΤϦͷεέδϡʔϥ΁ͷొ࿥ BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)

Slide 73

Slide 73 text

©2018 Wantedly, Inc. SELECT * FROM ML.PREDICT(MODEL `ml_models.push_open`, ( SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, ... ) ) ϞσϧΛ࢖༻ͯ݁͠ՌΛ༧ଌ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢

Slide 74

Slide 74 text

©2018 Wantedly, Inc. SELECT * FROM ML.PREDICT(MODEL `ml_models.push_open`, ( SELECT IF(open IS NULL, 0, 1) AS label, weekly_open_count, timeline_show_count, ... ) ) ϞσϧΛ࢖༻ͯ݁͠ՌΛ༧ଌ ※ ਺஋͸ద౰ʹ͍ͬͯ͡·͢ # ϞσϧΛ࢖༻ͯ͠ɼೖྗσʔλʹରͯ͠ label ͷ஋Λ༧ଌ͢Δ # predicted_label: ༧ଌ͞Εͨ஋ # predicted_label_probs: label ͝ͱͷ༧ଌ֬཰

Slide 75

Slide 75 text

©2018 Wantedly, Inc. PREDICT ͷ࣮ߦ BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)

Slide 76

Slide 76 text

©2018 Wantedly, Inc. Overview of architecture BigQuery merge Scheduler ʹొ࿥ PREDICT ΫΤϦͷ࣮ߦ ֘౰Ϣʔβʹ push ௨஌Λૹ৴ SELECT user_id FROM `push_predict_results` WHERE prob > #{PROB_THRESHOLD} Predict ͷ݁ՌΛ BQ ʹετΞ CREATE MODEL ΫΤϦΛ࣮ߦ ࡞੒ͨ͠Ϟσϧ Λ BQ ʹอଘ ֘౰Ϣʔβͷ id ΛಡΈࠐΉ PREDICT ΫΤϦΛ࣮ߦ͢Δ Job هड़(΄΅ SQL)

Slide 77

Slide 77 text

©2018 Wantedly, Inc. ͦͷ݁Ռ…

Slide 78

Slide 78 text

©2018 Wantedly, Inc. Results ࣮૷ظؒʢ΄΅ʣ 1 ೔Ͱ Push ௨஌ͷ։෧཰͕େ͖͘վળ

Slide 79

Slide 79 text

©2018 Wantedly, Inc. Another example • ͋ΔϢʔβ͕ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛԡ͔͢ͷ༧ଌ https://www.wantedly.com/projects/221745

Slide 80

Slide 80 text

©2018 Wantedly, Inc. Motivations & Backgrounds ✓ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯ͕ԡ͞ΕΔ਺͸ॏཁͳࢦඪͷ 1 ͭ ✓ ͦͷ··αʔϏεΠϯ͢ΔΘ͚Ͱ͸ͳ͍͕ɼ׆༻ͷํ๏͸͍Ζ͍Ζߟ͑ΒΕΔ ✓ ϢʔβͷͲͷΑ͏ͳߦಈ͕ϘλϯΛԡ͢͜ͱʹͭͳ͕Δͷ͔஌Γ͍ͨ ✓ ͬ͘͞ͱࢼͤͦ͏ͳͷͰ BigQuery ML Λ࢖ͬͯΈΔ

Slide 81

Slide 81 text

©2018 Wantedly, Inc. Problem Definition આ໌ม਺ɿ8೔લʙ28೔લͷظؒʹ͓͚ΔɼϢʔβͷ༷ʑͳΞΫγϣϯ਺ ʲաڈ 1 ϲ݄ͷΞΫςΟϒϢʔβͷதͰɼ1 ೔લʙ7 ೔લͷؒʹ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛԡͨ͠ϢʔβΛ౰ͯΔʳ 1೔લʙ7೔લͷظؒʹ ʮ࿩Λฉ͖ʹߦ͖͍ͨʯϘλϯΛ ԡ͔ͨ͠Ͳ͏͔ (1 or 0) • ʮ࿩Λฉ͖ʹߦ͖͍ͨʯΛԡͨ͠਺ • ձࣾͷืूͷৄࡉϖʔδΛӾཡͨ͠਺ • ձ͔ࣾΒͷεΧ΢τϝοηʔδʹฦ৴ͨ͠਺ …etc. ໨తม਺ :

Slide 82

Slide 82 text

©2018 Wantedly, Inc. https://towardsdatascience.com/when-will-stack-overflow-reply-how-to-predict-with-bigquery-553c24b546a3 ͓΋͠Ζ͍ͳͱࢥͬͨ׆༻ྫ • ճ౴ΛಘΒΕΔ֬཰ • ճ౴ΛಘΒΕΔ·Ͱͷ࣌ؒ • ѱ͍ධՁΛೖΕΒΕΔ֬཰ • ༵೔ • ࣌ؒ • ࣭໰จͷ௕͞ • ࣭໰ͷλΠτϧͷ1ޠ໨ • λΠτϧ͕ “ʁ” ͰऴΘ͍ͬͯΔ͔Ͳ͏͔ • ࢖༻ͨ͠λά • ΞΧ΢ϯτΛ࡞੒ͨ࣌͠ظ In Stack Overflow

Slide 83

Slide 83 text

©2018 Wantedly, Inc. 8IBUJT8BOUFEMZ 8IBUJT#JH2VFSZ.- 8IZVTF#JH2VFSZ.- )PXVTF#JH2VFSZ.- %JTDVTTJPO Agenda

Slide 84

Slide 84 text

©2018 Wantedly, Inc. ྑ͔ͬͨ͜ͱ 㾎໰୊ʹऔΓ૊ΜͰ͔Β࠷ॳͷϞσϧΛ࡞੒͢Δ·Ͱ͕ຊ౰ʹҰॠͩͬͨ w σʔλͷҠಈͳͲΛߟ͑ͳ͍͍ͯ͘ͷͬͯૉ੖Β͍͠ʂ 㾎ҙ֎ͱ͍Ζ͍ΖͰ͖Δ w ਖ਼ଇԽɼֶश཰ͷઃఆɼFBSMZTUPQQJOHɼΫϥεෆۉߧ΁ͷରॲʜ 㾎݁Ռͷڞ༗ָ͕ w ࣮ߦ݁ՌͷϦϯΫ౤͛ͨΓɼςʔϒϧͷ৔ॴڭ͑ͨΓɼΫΤϦ͚ͩ౤͛ͨΓ 㾎ແݶͷϦιʔε w (PPHMFઌੜʹ͓෍ࢪΛ͢Δ͚ͩʜ

Slide 85

Slide 85 text

©2018 Wantedly, Inc. ؾΛ͚ͭͳ͍ͱ͍͚ͳ͍͜ͱ 㾎Ϟσϧ΍ΫΤϦͷ؅ཧ w Ϟσϧ࡞੒࣌ͷΫΤϦΛͲ͏؅ཧ͢Δʁ w ϛεͬͯϞσϧ͕ফ͞ΕͨΓ্ॻ͖͞ΕΔͱਏ͍ 㾎Ϟσϧͷߋ৽΍ఆظతͳ1SFEJDU͸΍ΓͮΒ͍ w 8BOUFEMZͰ͸͍͍ײ͡ͷεέδϡʔϥͷ࢓૊Έ͕͋Δ͕ʜ 㾎αʔϏεͰ׆༻͢Δ৔߹͸ɼऔΓ૊Ή໰୊Λ͖ͪΜͱߟ͑Δ w ͲΜͳ໰୊ʹͰ΋ద͍ͯ͠ΔΘ͚Ͱ͸ͳ͍ w બ୒ࢶͷͭͱͯ͠ߟ͑Δ

Slide 86

Slide 86 text

©2018 Wantedly, Inc. ؾΛ͚ͭͳ͍ͱ͍͚ͳ͍͜ͱ 㾎Ϟσϧ΍ΫΤϦͷ؅ཧ w Ϟσϧ࡞੒࣌ͷΫΤϦΛͲ͏؅ཧ͢Δʁ w ϛεͬͯϞσϧ͕ফ͞ΕͨΓ্ॻ͖͞ΕΔͱਏ͍ 㾎Ϟσϧͷߋ৽΍ఆظతͳ1SFEJDU͸΍ΓͮΒ͍ w 8BOUFEMZͰ͸͍͍ײ͡ͷεέδϡʔϥͷ࢓૊Έ͕͋Δ͕ʜ w ΫΤϦεέδϡʔϦϯάػೳ͕ϦϦʔε͞Εͨʂ 㾎αʔϏεͰ׆༻͢Δ৔߹͸ɼऔΓ૊Ή໰୊Λ͖ͪΜͱߟ͑Δ w ͲΜͳ໰୊ʹͰ΋ద͍ͯ͠ΔΘ͚Ͱ͸ͳ͍ w બ୒ࢶͷͭͱͯ͠ߟ͑Δ

Slide 87

Slide 87 text

©2018 Wantedly, Inc. ͜͏͍͏;͏ʹ࢖͑͹͍͍ͷ͔΋ʁ ✓ ػցֶश΍ͬͨ͜ͱͳ͍ਓ͕ػցֶशΛഽͰײ͡Δ ✓ ຊ֨తͳػցֶशϓϩδΣΫτΛ։࢝͢ΔલʹઢܗճؼͰͬ͘͞ͱ෼ੳͯ͠ΈΔ ✓ ·ͩखΛ෇͚͍ͯͳ͍͚Ͳɼ͕ͬͭΓϦιʔεΛ౤ೖ͢Δͷ͸೉͍͠Α͏ͳ୯७ ͳ෼ྨ໰୊ΛϩδεςΟοΫճؼͰͬ͘͞ͱղ͘ ✓ ΦʔϓϯʹͰ͖ΔσʔλͰ͓΋ͪΌΛ࡞ͬͯެ։͢Δ https://towardsdatascience.com/when-will-stack-overflow-reply-how-to-predict-with-bigquery-553c24b546a3

Slide 88

Slide 88 text

©2018 Wantedly, Inc. ײ૝

Slide 89

Slide 89 text

©2018 Wantedly, Inc. ͨͷ͔ͬͨ͠

Slide 90

Slide 90 text

©2018 Wantedly, Inc. Summary 㾎 #JH2VFSZ͚ͩͰ.-͕Ͱ͖Δ#JH2VFSZ.-ͷొ৔ w σʔλͷҠಈͳ͠ w 42-Λॻ͚ͩ͘Ͱػցֶश͕Ͱ͖Δ w ػցֶशͷຽओԽɼ։ൃ଎౓ͷ޲্ 㾎 8BOUFEMZ಺Ͱ࢖ͬͯΈͨ w γϯϓϧͳ໰୊ղܾ w ຊ֨తͳ.-ϓϩδΣΫτΛ։࢝͢Δલͷௐࠪ σʔλΛ࢖͑Δ͔ͷௐࠪ 㾎 ࠓޙʹظ଴✨ ϥϯμϜϑΥϨετ͕࣮૷͞ΕͯΔʜ

Slide 91

Slide 91 text

©2018 Wantedly, Inc. https://www.wantedly.com/projects/221745 We are hiring!!