Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ウェブ企業の非研究者ポジションで行うサイエンス

 ウェブ企業の非研究者ポジションで行うサイエンス

ウェブサイエンス研究会オープンセミナーvol.6 (2018/05/26)

Taro Takaguchi

May 30, 2018
Tweet

More Decks by Taro Takaguchi

Other Decks in Science

Transcript

  1. © LINE Data Labs ࣗݾ঺հ ߴޱ ଠ࿕ʢ͔͙ͨͪ ͨΖ͏ʣ LINE Data

    Labs σʔλαΠΤϯςΟετʢ2017/10 -ʣ ΞΧσϛΞͰͷܦྺ 2010 - 2013 ౦ژେֶେֶӃ ৘ใཧ޻ֶܥݚڀՊ ത࢜՝ఔ 2013 - 2016 ࠃཱ৘ใֶݚڀॴ ಛ೚ݚڀһ 2016 - 2017 ৘ใ௨৴ݚڀػߏ ςχϡΞτϥοΫݚڀһ ΞΧσϛΞͰͷઐ໳෼໺ ωοτϫʔΫՊֶ ର৅ɿ࣮ࣾձͷͭͳ͕Γߏ଄ʢιʔγϟϧάϥϑɺ΢Σϒʣ खஈɿཧ࿦Ϟσϧ + σʔλ෼ੳ Network data from Rocha et al., PLOS Comput. Biol. 7, e1001109 (2011)
  2. © LINE Data Labs ΞΧσϛΞͰͷݚڀ׆ಈ 2 2 2 1 1

    1 3 3 4 4 3 4 • ࣮σʔλ෼ੳ • ఻೻Ϟσϧʢײછ঱ɺ৘ใ֦ࢄʣͷཧ࿦Ϟσϧղੳ ࣌ؒతʹมԽ͢ΔιʔγϟϧɾωοτϫʔΫͷɿ ࣌ࠁ • ࠪಡ෇͖࿦จ ܭ22ຊ
 ʢ૯Ҿ༻ճ਺ 300+ ճ in Google Scholarʣ • ࠃࡍձٞ NetSci 2017 ϓϩάϥϜҕһ • ࠃࡍڞಉݚڀɿε΢ΣʔσϯɺϕϧΪʔɺϑϥϯεɺΠΪϦε
  3. © LINE Data Labs DATA LABS ͱ͸ Data Labs Data

    Science Machine Learning Engineer Planner • LINE ؔ࿈֤αʔϏεͷσʔλΛԣஅతʹѻ͏ઐ໳૊৫ • 2016/04 ઃஔ LINE ϑΝϛϦʔαʔϏε ࣄۀ෦ Ϗδωε্ͷ՝୊ σʔλΛ׆༻ͨ͠ ղܾࡦ
  4. © LINE Data Labs 1. ؔ܎ݚڀ෼໺ʹ͓͚ΔΞΧσϛΞ
 ͔ΒϏδωε΁ͷҠ੶ͷಈ޲ 2. ͍ͪ “ݚڀऀ”

    ͔ΒΈͨΞΧσϛΞ ͱ΢Σϒاۀ
 2-1. ҧ͏఺
 2-2. ڞ௨͢Δ఺ 3. σʔλαΠΤϯεݚڀͱ΢Σϒاۀ ͱͷ਌࿨ੑ Agenda
  5. © LINE Data Labs ΞΧσϛΞ͔ΒϏδωε΁ͷҠ੶ͷಈ޲ • ؔ܎ݚڀ෼໺ɿωοτϫʔΫՊֶ • ΞϝϦΧͰ͸ 2010

    ೥ࠒ͔Βগͳ͔Βͣྫ͕͋Δ Ϗδωε΁Ҡ੶ͨ͠ւ֎ݚڀऀͷࣄྫʢҰ෦ɺॱෆಉʣ • A ڭत: C େֶ → Y ࣾ → M ࣾ • B ।ڭत: M େֶ → F ࣾ • C ।ڭत: S େֶ → P ࣾ (part-time) • D ݚڀһ: O େֶ → F ࣾ ਪଌͱͯ͠ҰൠʹݴΘΕΔҠ੶ͷཧ༝ • େن໛ͳ࣮σʔλΛѻ͑Δ͜ͱ • ଴۰ͷྑ͞
  6. © LINE Data Labs ΞΧσϛΞ͔ΒϏδωε΁ͷҠ੶ͷಈ޲ 㱺 ඞͣ͠΋ͦΕ͚͕ͩཧ༝Ͱ͸ͳ͍ͷͰ͸ʁ • ؔ܎ݚڀ෼໺ɿωοτϫʔΫՊֶ •

    ΞϝϦΧͰ͸ 2010 ೥ࠒ͔Βগͳ͔Βͣྫ͕͋Δ Ϗδωε΁Ҡ੶ͨ͠ւ֎ݚڀऀͷࣄྫʢҰ෦ɺॱෆಉʣ ਪଌͱͯ͠ҰൠʹݴΘΕΔҠ੶ͷཧ༝ • େن໛ͳ࣮σʔλΛѻ͑Δ͜ͱ • ଴۰ͷྑ͞ • A ڭत: C େֶ → Y ࣾ → M ࣾ • B ।ڭत: M େֶ → F ࣾ • C ।ڭत: S େֶ → P ࣾ (part-time) • D ݚڀһ: O େֶ → F ࣾ
  7. © LINE Data Labs • ΞΧσϛΞ = ౷ܭ෺ཧΑΓͷωοτϫʔΫՊֶ • Ϗδωεɹ

    = ݱॴଐ෦ॺ • ݴ໌Λ্هҎ্ʹҰൠԽ͢Δҙਤ͸͋Γ·ͤΜ ΞΧσϛΞͱϏδωεͱͷҧ͍
  8. © LINE Data Labs • ΞΧσϛΞ = ౷ܭ෺ཧΑΓͷωοτϫʔΫՊֶ • Ϗδωεɹ

    = ݱॴଐ෦ॺ • ݴ໌Λ্هҎ্ʹҰൠԽ͢Δҙਤ͸͋Γ·ͤΜ ΞΧσϛΞͱϏδωεͱͷҧ͍ ҧ͏͜ͱͩΒ͚
  9. © LINE Data Labs • ΞΧσϛΞ = ౷ܭ෺ཧΑΓͷωοτϫʔΫՊֶ • Ϗδωεɹ

    = ݱॴଐ෦ॺ • ݴ໌Λ্هҎ্ʹҰൠԽ͢Δҙਤ͸͋Γ·ͤΜ ΞΧσϛΞͱϏδωεͱͷҧ͍ ΞΧσϛΞ Ϗδωε 1.υΩϡϝϯτɾίʔυͷ
 ެ։ൣғ ڞಉݚڀऀͲ͏͠ ݚڀࣨ಺ جຊతʹશࣾ಺
 ʢػີϨϕϧґଘʣ 2. νʔϜߏ੒ ओʹಉ෼໺ͷݚڀऀ ϓϩδΣΫτ୯ҐͰ ଟछଟ༷ 3. ࡞ۀऀͷΫϨδοτ ඞਢ ڧௐ͞Εͳ͍ 4. ֶज़తͳ৽نੑ ඞਢ ॏࢹ͞Εͳ͍
  10. © LINE Data Labs ΞΧσϛΞͱϏδωεͱͷڞ௨఺ 1. ࿦ཧతࢥߟͱ՝୊ղܾͷํ๏࿦ɾϑϨʔϜϫʔΫͷॏཁੑ ޷ح৺ ઌߦݚڀαʔϕΠ ݚڀσΟεΧογϣϯ

    طଘɾ৽نͷ ਺ཧతख๏ ౷ܭతख๏ CONTENTS ೔ৗͷ՝୊ҙࣝ ࣄۀଆ͔Βͷґཔ ݪஶ࿦จ ֶձൃද ϓϨθϯςʔγϣϯ վળͷ࣮૷ • Ϗδωε΁ͷస޲Ͱ͸ઐ໳஌͕ࣝॏࢹ͞Ε͕͕ͪͩɺҰ࿈ͷαΠ ΫϧΛεϜʔζʹ܁Γฦ࣮͠ߦͰ͖Δ͜ͱͦ͜ॏཁ • ৘ใऩूɾ໰୊ઃఆɾίϛϡχέʔγϣϯͷೳྗ Πϯϓοτ Ξ΢τϓοτ ઐ໳஌ࣝ ΞΧσϛΞ Ϗδωε
  11. © LINE Data Labs ݱঢ়ཧղ ৽نͷاը ΞΧσϛΞͱϏδωεͱͷڞ௨఺ 2. ݁Ռͷ࠶ݱੑ͕ॏཁ ޮՌͷࣄޙ෼ੳ

    αʔϏε࣮૷ ࣄલ෼ੳ ςετͷઃܭ ݱঢ়ཧղ ৽نͷاը • ෼ੳ݁Ռ͸௕ظతͳҙࢥܾఆʹӨڹΛٴ΅͢ • ِཅੑɾِӄੑͷΤϥʔΛ੍ޚ͢ΔͨΊʹ౷ܭͷ஌͕ࣝඞཁ • ୲౰ऀ͕มΘͬͯ΋ܧଓͰ͖Δ࢓૊ΈԽ΋ඞཁ ౷ܭతख๏ͷཞ༻ɾޡ༻ͷ໰୊΋ڞ௨ • P-Hacking • ዞҙతͳՄࢹԽʹىҼ͢ΔϛεϦʔυ • ೔ৗޠతͳදݱʹΑΔղऍͷᐆດԽ
  12. © LINE Data Labs ະ஌ͷࣄ৅ʹ͍ͭͯ ୳ٻ͍ͨ͠ ݚڀऀͷࢤ޲ ΢Σϒاۀͷಛੑ αʔϏεɾϢʔβʔɾ ࣾձ͕มԽ͢ΔݶΓ

    ະ஌ͷࣄ৅͹͔Γ ࠷৽ͷ / ৽نͳ ਺ཧతख๏Λ׆༻ ͍ͨ͠ ม਺ͷ૬ؔͰ͸ͳ͘ ϝΧχζϜ͕஌Γ͍ͨ ఏҊͨ͠಺༰͕ࣾձʹ ༩͑ΔΠϯύΫτΛ ݟ͍ͨ ࣄۀʹߩݙ͢ΔݶΓʹ͓͍ͯ਌࿨ੑ͸ߴ͍ ίϯτϩʔϧ͞Εͨ
 ϥϯμϜԽςετ͕࣮ ࢪՄೳ ఏҊ͕࣮૷͞Εͯ
 Ϣʔβʔʹར༻͞ΕΔ ·Ͱͷϥά͕୹͍ ͦͷख๏ʹΑͬͯ ॴ๬ͷ੒Ռ͕ୡ੒͞Ε ΔͳΒ͹ OK
  13. © LINE Data Labs ཧ૝తͳݚڀࣄྫ Ugander et al., “Structural diversity

    in social contagion” PNAS 109, 5962 (2012) • ๭ SNS ͷϢʔβʔσʔλΛ෼ੳ • ओ݁Ռɿ͋ΔϢʔβʔʹͱͬͯɺ༑ͩͪಉ࢜ͷͭͳ͕Γߏ଄͕
 - ট଴Λड͚ͯ SNS ʹࢀՃ͢Δ֬཰ʢrecruitmentʣ
 - ܧଓͯ͠ SNS Λར༻͢Δ֬཰ʢengagementʣ
 ʹޮ͘ 1. ֶज़తʹՁ஋ͷ͋Δ஌ݟʢιʔγϟϧάϥϑͷܗ੒ϓϩηεʣ 2. SNS ͷϏδωεతʹ΋༗ҙٛ 3. ͋ΓಘΔ༑ͩͪಉ࢜ͷͭͳ͕Γߏ଄͸ແ਺ͷՄೳੑ
 → Ϗοάσʔλ͕͋ͬͯॳΊͯ෼ੳՄೳ ྑ͍ͱ͜Ζ
  14. © LINE Data Labs ૬ؔͰ͸ͳ͘ϝΧχζϜ͕஌Γ͍ͨ ؍࡯ʮιʔγϟϧάϥϑͰͭͳ͕ͬͨ̎ਓ͕ಉ͡ঢ়ଶʹมԽͨ͠ʯ ࣾձత఻೻
 (social contagion) ྨࣅऀબ޷


    (homophily) ڞ௨ͷ֎෦ཁҼ Shalizi & Thomas, “Homophily and contagion are generically cofounded in observational social network studies” Social Methods Res. 40, 211 (2011) • ؍࡯σʔλͷΈ͔ΒωοτϫʔΫతͳҼՌΛಛఆ͢Δ͜ͱ͸ɺ
 ະ؍ଌɾ؍ଌෆೳͳม਺͕͋ΔݶΓඇৗʹ೉͍͠ • ݴ͍׵͑Ε͹ɺద੾ʹઃܭɾ౷੍͞ΕͨϥϯμϜԽςετ͕ඞཁ
  15. © LINE Data Labs ֶज़తͳ৽نੑͷॏཁ౓ ੒Ռ͕ධՁ͞ΕΔ ࣌ؒεύϯ ඞਢ ॏࢹ͠ͳ͍ ௕ظ

    ୹ظ ཧ૝తͳ
 ΞΧσϛΞ ݱ࣮ͷ
 ΞΧσϛΞʁ ݱঢ়ͷ ࣗݾೝࣝ Ϗδωεͷ ෼ੳλεΫ
  16. © LINE Data Labs ·ͱΊ • େن໛ͳ࣮σʔλΛѻ͑Δ͜ͱ • ଴۰ͷྑ͞ ͚ͩͰ͸ͳ͘

    • ՝୊ղܾͷϑϨʔϜϫʔΫͱ࠶ݱੑॏࢹͷ࢟੎͕ධՁ͞ΕΔ͜ͱ • ະ஌ͷࣄ৅ʹ͍ͭͯ୳ٻͰ͖Δ͜ͱ • ม਺ͷ૬ؔͰ͸ͳ͘ϝΧχζϜΛ஌Δ͜ͱ͕Ͱ͖ΔՄೳੑ • ఏҊͨ͠಺༰͕ࣾձʹ͋ͨ͑ΔΠϯύΫτΛݟΒΕΔ͜ͱ σʔλαΠΤϯεؔ࿈ͷݚڀऀ͕΢Σϒاۀ΁ҠΔಈػ͚ͮ • ݸਓ͝ͱͷࢤ޲΍૊৫͝ͱͷ؀ڥʹඇৗʹେ͖ͳ෼ࢄ͕͋Δ • ʮΞΧσϛΞ vs. ϏδωεʯͷΑ͏ͳա౓ʹ୯७Խͨ͠ೋݩ࿦͸ ͋·Γ༗ӹͰͳ͍ ݸਓతʹେࣄͩͱࢥ͏͜ͱ