Upgrade to Pro — share decks privately, control downloads, hide ads and more …

keisu_special_lecture_20210511.pdf

 keisu_special_lecture_20210511.pdf

Taro Takaguchi

May 10, 2021
Tweet

More Decks by Taro Takaguchi

Other Decks in Technology

Transcript

  1. ߴޱ ଠ࿕ʢ͔͙ͨͪ ͨΖ͏ʣ LINEגࣜձࣾ Data Science ηϯλʔ γχΞσʔλαΠΤϯςΟετ / Ϛωʔδϟʔ

    ~2013ɹ౦ژେֶେֶӃ ৘ใཧ޻ֶܥݚڀՊ ਺ཧ৘ใֶઐ߈ ɹɹɹ ത࢜՝ఔʢ਺ཧ৘ใୈ̐ݚڀࣨʣ ~2017ɹࠃཱݚڀػؔʹͯϙευΫݚڀһɹ ܦྺ ౰࣌ͷઐ໳෼໺ 2 ωοτϫʔΫՊֶʢಛʹ࣌ؒతʹมԽ͢ΔωοτϫʔΫʣ
  2. اۀʹస͖͔͚ͨͬ͡ 3 2 2 2 1 1 1 3 3

    4 4 3 4 ໾ʹཱͪͦ͏ɺͦΕͰ΋࣮ࣾձͱͷڑ཭͸ԕ͍… @ LINE DEVELOPER DAY 2019 σʔλΛ׆༻ͨ͠ࣄۀͷ࠷લઢΛ ݟ͍ͨɾؔΘΓ͍ͨ
  3. ͦ΋ͦ΋σʔλαΠΤϯςΟετͱ͸ʁ 7 - اۀɾ࣌ظɾίϛϡχςΟʹΑΓఆٛ͸༷ʑ - ಉ͡৬໊ͰҟͳΔۀ຿ɺҟͳΔ৬໊Ͱڞ௨͢Δۀ຿ ࢦඪΛఆٛ͠ܭଌ͢Δ / ετʔϦʔΛޠΔ /

    πʔϧΛ࡞Δ Analyticsʢ෼ੳܕʣ ػցֶशͷख๏Λ੡඼ɾαʔϏεʹ࣮૷͢Δ AlgorithmsʢΞϧΰϦζϜܕʣ ౷ܭख๏ʹΑΓҼՌؔ܎Λཱূ͢Δ Inferenceʢਪ࿦ܕʣ Ref. https://www.linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal/ Data Scientist ෼ྨͷҰྫɿ
  4. ૊৫ߏ੒΋୲౰ྖҬʹରԠ͍ͯ͠Δ 9 Data Science ηϯλʔ Data Science Machine Learning Machine

    Learning
 Research Analyticsʢ෼ੳܕʣ AlgorithmsʢΞϧΰϦζϜܕʣ Inferenceʢਪ࿦ܕʣ جૅݚڀ͓Αͼࣄۀ΁ͷԠ༻ ػցֶशΤϯδχΞ
  5. ෼ੳɾਪ࿦ܕͷ۩ମతͳ࢓ࣄ ਐߦத ࣄલ ࣄޙ ࣌ظ 11 Ωϟϯϖʔϯ / ৽ػೳͷ௥Ճ /

    طଘػೳͷมߋͳͲ - Ωϟϯϖʔϯͷ৚݅બఆ - ৽ػೳͷχʔζݟੵ΋Γ - ػೳมߋͷӨڹͷݟੵ΋Γ - etc.
  6. ෼ੳɾਪ࿦ܕͷ۩ମతͳ࢓ࣄ ਐߦத ࣄલ ࣄޙ ࣌ظ 12 Ωϟϯϖʔϯ / ৽ػೳͷ௥Ճ /

    طଘػೳͷมߋͳͲ - ΦϯϥΠϯ A/B ςετ - μογϡϘʔυͷ࡞੒
 ओཁͳࣄۀࢦඪͷϞχλϦϯάද - ҟৗͳมԽͷݕग़ - etc.
  7. ෼ੳɾਪ࿦ܕͷ۩ମతͳ࢓ࣄ ਐߦத ࣄલ ࣄޙ ࣌ظ 13 Ωϟϯϖʔϯ / ৽ػೳͷ௥Ճ /

    طଘػೳͷมߋͳͲ - ࢪࡦͷޮՌݕূ - ҼՌਪ࿦ - ௕ظతมԽͷཁҼ෼ղ - etc.
  8. ʮయܕతͳ̍೔ͷ࢓ࣄ಺༰͸ʁʯ 18 ࣌ظ ϓϩδΣΫτ A ϓϩδΣΫτ B ϓϩδΣΫτ C ͱ͋Δ

    1೔ λεΫ͕ؒΛۭ͚ͯஅଓతʹਐߦ͢Δ e.g. ଞνʔϜͷਐߦ଴ͪɺಥൃతͳґཔ
  9. εςοϓ̍. ؍ଌ ؍ଌ Ծઆͱ՝୊ ͷઃఆ ݕূ ղܾࡦͷཱҊ 20 ࣌ظ ࠷ۙɺΞΫςΟϒϢʔβʔ਺


    ͕ఀ଺͍ͯ͠Δʁ 2݄ 3݄ 4݄ 5݄ μογϡϘʔυɿ ओཁͳࣄۀࢦඪͷϞχλϦϯάද ※ ΍ΓऔΓͱ਺஋͸͢΂ͯՍۭͷ΋ͷ
  10. εςοϓ̎. Ծઆͱ՝୊ͷઃఆ ؍ଌ Ծઆͱ՝୊ ͷઃఆ ݕূ ղܾࡦͷཱҊ 21 ࣌ظ ͜ͷΞΫςΟϒϢʔβʔ਺ͷ


    ਪҠ͸ରॲ͢΂͖΋ͷ͔ʁ - ྫ೥ͷقઅతͳมಈʁ - Ϣʔβʔͷηάϝϯτ͝ͱͷมԽʁ - ৽ن / طଘ / ෮ؼ - ଞػೳͷར༻ϢʔβʔͷਪҠʁ → ʮ৽نϢʔβʔͷܧଓ཰͕௿Լ
 ͍ͯ͠Δɻݩͷਫ४ʹճ෮͢Δͱ
 ˓ສਓ૿ՃͷӨڹ͕͋Δʯ ※ ΍ΓऔΓͱ਺஋͸͢΂ͯՍۭͷ΋ͷ
  11. εςοϓ̏. ղܾࡦͷཱҊ ؍ଌ Ծઆͱ՝୊ ͷઃఆ ݕূ ղܾࡦͷཱҊ 22 ࣌ظ -

    ৽نϢʔβʔʹϩάΠϯΛ ଅ͢௨஌ΛૹΖ͏ - ௨஌ͷස౓Λςετ͍ͨ͠ ςετͷઃܭΛ͠·͢ - ੒൱ΛධՁ͢Δࢦඪͷܾఆ - ςετʹඞཁͳαϯϓϧ
 αΠζͷܭࢉ - ੒൱ͷ൑அج४ͷ߹ҙ ※ ΍ΓऔΓͱ਺஋͸͢΂ͯՍۭͷ΋ͷ
  12. εςοϓ̐. ݕূ ؍ଌ Ծઆͱ՝୊ ͷઃఆ ݕূ ղܾࡦͷཱҊ 23 ࣌ظ ςετͷ݁ՌΛ෼ੳ͠·͢

    - σʔλͷਖ਼ৗͳऩूͷ֬ೝ - ࢦඪʹର͢ΔԾઆݕఆ - ௥ՃͷվળҊͷࣔࠦ - ૯߹తͳϨϙʔςΟϯά Ճೖཌ೔ʹ̍ճ͚ͩ௨஌Λ
 ૹΔҊΛ࠾༻͢Δ ※ ΍ΓऔΓͱ਺஋͸͢΂ͯՍۭͷ΋ͷ
  13. εςοϓ̍(2). ؍ଌ ؍ଌ Ծઆͱ՝୊ ͷઃఆ ݕূ ղܾࡦͷཱҊ 24 ࣌ظ ৽نϢʔβʔͷܧଓ཰͸


    ࠓޙ΋ϞχλϦϯά͠·͢ ※ ΍ΓऔΓͱ਺஋͸͢΂ͯՍۭͷ΋ͷ 2݄ 3݄ 4݄ 5݄ 6݄ 2݄ 3݄ 4݄ 5݄ 6݄ μογϡϘʔυʹ߲໨Λ௥Ճ͢Δ ΞΫςΟϒϢʔβʔ਺ ৽نϢʔβʔܧଓ཰
  14. ෼ੳɾਪ࿦ܕͷλεΫɿ՝୊ղܾͷαΠΫϧ ؍ଌ Ծઆͱ՝୊ ͷઃఆ ݕূ ղܾࡦͷཱҊ 25 ࣌ظ - ෼ੳɾਪ࿦ͷλεΫ͸


    ؔ܎ऀͱͷίϛϡχέʔγϣϯΛ ௨ͯ͡ਐߦ͢Δ - ౷ܭͳͲઐ໳஌ࣝͷ׆༻͸ɺ
 શମͷαΠΫϧͷதͷҰཁૉ - ࠷ऴతͳҙࢥܾఆऀ͸ɺࣄۀɾ
 ϓϩμΫτɾϓϩδΣΫτͷ੹೚ऀ
  15. ઐ໳తͳ਺ֶͷ஌ࣝ͸࢖͏ʁ 27 “LIFE AND MATHS”, © Pearls of Raw Nerdism

    http://pearlsofrawnerdism.com/life-and-maths/ ࢲͷߟ͑ɿ - ઐ໳తͳ਺ֶͳ͠Ͱ΋ࡁΉػձͷ΄͏͕ଟ͍ - ઐ໳஌͕ࣝ͋Δͱɺ՝୊ղܾͷ֤εςοϓͷ্࣭͕͕Δ
  16. ઐ໳తͳ਺ֶͳ͠Ͱ΋ࡁΉػձͷ΄͏͕ଟ͍ ൃੜස౓ ਺ֶతͳ ෳࡶ౓ 28 ߴ ௿ ߴ ௿ ֓೦ਤ

    ෼ੳɾਪ࿦ܕͷ໾ׂ ˚ෳࡶͳ͜ͱΛ਱ߦ͢Δ͜ͱ ˚ཧ࿦తʹ৽نͳ͜ͱΛߦ͏͜ͱ ✓ ࣄۀʹ໾ཱͭ஌ݟΛద੾ʹఏڙ͢Δ͜ͱ ʮࣄۀʹର͢Δߩݙ౓ʯ ʮ࣮ࢪʹཁ͢Δίετʯͷ͕࣠ӅΕ͍ͯΔ ਺ֶతͳ೉͠͞ ≠ ࣄۀ্ͷ՝୊ղܾͷ೉͠͞
  17. ֓೦ͷ֫ಘ͸ੈքͷݟ͑ํΛม͑Δ 31 ՝୊ɿ̎Λ̍ສݸ଍ͨ͠౴͑Λ஌Γ͍ͨ ৐ࢉͷ֓೦Λ஌Βͳ͍ͱ 2 + 2 + 2 +

    2 + …… ʮݱ࣮తͳ࣌ؒͰ͸ղܾͰ͖·ͤΜʯ ৐ࢉΛ஌͍ͬͯΕ͹ 2 × 10,000 = 20,000 ղ͚ͳ͍໰୊ ղ͚Δ໰୊
  18. ʮ౴͑Λग़͢ͱࣄۀʹ໾ཱͭʯྖҬΛ໨ࢦ͢ 32 ࣄۀՁ஋ʹ ݁ͼͭ͘ ࣄۀՁ஋ʹ ݁ͼ͖ͭͮΒ͍ ౴͑Λग़ͤΔ ౴͑Λग़ͤͳ͍ ઐ໳஌ࣝͷशಘ ࣄۀͷཧղ

    ؔ܎ऀͱͷର࿩ Cf. ҆୐࿨ਓ, ʮΠγϡʔ͔Β͸͡ΊΑʕ஌తੜ࢈ͷʰγϯϓϧͳຊ࣭ʱʯ, ӳ࣏ग़൛ʢ2010ʣ σʔλαΠΤϯςΟετͷۀ຿্ͷλεΫΛ̎࣍ݩʹϚοϓ͢Δ
  19. 2. ԾఆΛڞ༗͢Δ 37 ܦݧՊֶʹ͓͚ΔՊֶత஌ࣝ͸ ✗ ઈରෆมͷਅ࣮ͷू߹ ✓ ؍ଌͱԾఆʹج͍ͮͯਪ࿦͞Εͨؼ݁ ԿΛ͢΂͖͔ʁ -

    ԾఆΛ໌֬ʹ఻͑Δ ʮϢʔβʔ਺ͷ૿Ճ཰͸ઌ݄ͱಉ͡ͱԾఆ͠·͢ʯ - ݕূͷεςοϓͰ͸ɺࣄલͷԾఆͷଥ౰ੑ΋ݕূ͢Δ ʮϢʔβʔ਺ͷ૿Ճ཰͸ɺ݁Ռతʹઌ݄ͱൺ΂ͯʙͰͨ͠ʯ
  20. 3. ਺ࣈΛݟΔલʹ൑அج४ΛܾΊΔ 38 ਺ྔ → ೔ৗݴޠͷม׵ʹ͸ᐆດੑ͕͋Δ ͜ͷࢦඪ͕ “े෼ʹ” ্ঢͨ͠Β
 ςετ͸੒ޭͱ൑அ͠·͠ΐ͏

    ʢ+3% ͸ ”े෼” ͩΖ͏͔…ʣ ԿΛ͢΂͖͔ʁ - ࣄલʹ൑அج४ΛྔతʹܾΊΔ - ج४ͷࠜڌ͸٬؍తʹ͢Δ
 (ྫ) ࣄۀ໨ඪʹର͢Δظ଴د༩
 ɹɹ౤͡ΒΕͨίετͷճऩ
 ɹɹաڈͷྨࣅࣄྫͷ݁Ռ ※ ਺஋͸͢΂ͯՍۭͷ΋ͷ ࢦඪͷ্ঢ͸ +3% Ͱͨ͠
  21. ෼ੳɾਪ࿦ͷ۩ମతͳ࢓ࣄʢ࠶ܝʣ 42 ਐߦத ࣄલ ࣄޙ ࣌ظ Ωϟϯϖʔϯ / ৽ػೳͷ௥Ճ /

    طଘػೳͷมߋͳͲ ΦϯϥΠϯ A/B ςετ 1. αϯϓϧαΠζͷܭࢉ 2. ଟॏൺֱ 3. ׳ΕޮՌͷਪఆ
  22. ΢ΣϒαʔϏεͰ΋αϯϓϧαΠζΛܭࢉ͢Δཧ༝ 44 1. ա৒ʹେ͖ͳαϯϓϧαΠζ → খ͞ͳมԽͰ΋༗ҙʹͳΓ͕ͪ ʮ౷ܭతʹ༗ҙʯ͸ڧ͍ҹ৅Λ༩͑Δදݱ 2. ಛʹςετҊ͕ྑ͘ͳ͍࣌ɺϢʔβʔʹແ༻ͳӨڹΛ༩͑ͯ͠·͏ 4.

    P-Hacking ͷ༨஍͕࢒Δ ʮ༗ҙ͕ࠩग़ͳ͔͔ͬͨΒαϯϓϧαΠζΛେ͖ͯ͘͠࠶ςετ͠Α͏ʯ 3. SUTVA (Stable Unit Treatment Value Assumption) ͕ഁΕ΍͘͢ͳΔ ʮ͋ΔϢʔβʔͷߦಈ͸ଞͷϢʔβʔͷׂΓ౰ͯʹӨڹ͞Εͳ͍ʯ ʢྫʣςετը໘͕ڞ༗͞ΕΔɺϝσΟΞʹऔΓ্͛ΒΕΔ
  23. αϯϓϧαΠζܭࢉͷجຊܗ 45 ઃఆ - ಠཱͳ̎܈αϯϓϧͷฏۉͷݕఆ - ฼෼ࢄ͸̎܈Ͱಉ͡ & ط஌ -

    αϯϓϧαΠζ͸਺ઍ ~ ਺ສ݅ఔ౓͸औΕΔ ݕఆ͞ΕΔԾઆ - ؼແԾઆ - ରཱԾઆ H0 H1 μ1 − μ2 = 0 μ1 − μ2 ≠ 0 αϯϓϧαΠζɹͷܾఆʹඞཁͳύϥϝʔλ - ༗ҙਫ४ - ݕग़ྗ - ޮՌྔ - ฼෼ࢄ α 1 − β δ = μ1 − μ2 σ2 < + ∞ n ʢɹ ͕ਅͷ৔߹ʣ H1
  24. αϯϓϧαΠζܭࢉͷ෮शʢ̍ʣ 46 ਤ͸ԼهจݙΑΓ࠶ߏ੒ͨ͠ Gerald van Belle, “Statistical Rules of Thumb”

    (2nd edition), Wiley, 2008 ඪຊฏۉͷࠩ x1 − x2 H0 : μ1 − μ2 = 0 0 S . E . = σ 2 n ਖ਼ن෼෍ͷ࠶ੜੑΑΓ α 2 α 2 ༗ҙਫ४ɹɿ α ɹ͕ਅͷͱ͖ɹ Λ࠾୒ͯ͠͠·͏֬཰ ʢِཅੑʣ H0 H1
  25. αϯϓϧαΠζܭࢉͷ෮शʢ̎ʣ 47 ਤ͸ԼهจݙΑΓ࠶ߏ੒ͨ͠ Gerald van Belle, “Statistical Rules of Thumb”

    (2nd edition), Wiley, 2008 ඪຊฏۉͷࠩ x1 − x2 H1 : μ1 − μ2 = δ δ S . E . = σ 2 n H0 : μ1 − μ2 = 0 0 β = 1− ݕग़ྗ (1 − β) ɹ͕ਅͷͱ͖ɹ Λ࠾୒ͯ͠ ͠·͏֬཰ʢِӄੑʣ H0 H1 β
  26. αϯϓϧαΠζܭࢉͷ෮शʢ̏ʣ 48 ਤ͸ԼهจݙΑΓ࠶ߏ੒ͨ͠ Gerald van Belle, “Statistical Rules of Thumb”

    (2nd edition), Wiley, 2008 ඪຊฏۉͷࠩ x1 − x2 δ S . E . = σ 2 n 0 β n* = 2σ2 (z1−α/2 + z1−β) 2 δ2 㱺 ཁ݅Λຬͨͨ͢Ίʹ ࠷௿ݶඞཁͳαϯϓϧαΠζ z1−α/2 σ 2 n* = δ − z1−β σ 2 n* ඪ४ਖ਼ن෼෍ͷ ෼Ґ఺ؔ਺ α 2
  27. 1. ਅͷ෼෍ͷ෼ࢄɹ ͕େ 2. ِཅੑɺِӄੑΛ཈͑Δ
 ɹɹ͕খ 3. ݕग़͍ͨ͠ޮՌྔɹ͕খ ͕େ͖͘ͳΔཁҼ n*

    σ2 α, β δ αϯϓϧαΠζʹ͍ͭͯͷิ଍ 49 n* = 2σ2 (z1−α/2 + z1−β) 2 δ2 ཁ݅Λຬͨͨ͢Ίʹ
 ࠷௿ݶඞཁͳαϯϓϧαΠζ ύϥϝʔλͷܾΊํʢҰྫʣ ɹɹɿ׳शతͳ஋͔ɺ΍΍ݫ͠໨ʹ ɹɹɿ௚ۙͷ࣮ଌ஋
 ɹɹʢςετޙʹଥ౰ੑΛ֬ೝʣ ɹɹɿ׳शతͳ஋
 ɹɹɹor ίετΛ্ճΔޮՌ
 ɹɹɹor աڈͷྨࣅ͢Δςετ݁Ռ α, β σ2 δ
  28. ݕఆͷ܁Γฦ͠͸Կ͕໰୊͔ʁ 54 ݕఆ͞ΕΔԾઆ - ؼແԾઆ - ରཱԾઆ H0 H1 θ1

    = θ2 = θ3 = θ4 ʢ̐܈ͷ৔߹ʣ {θi} i=1,2,3,4 ͷ͏ͪগͳ͘ͱ΋̍ͭͷϖΞͰ θi ≠ θj (i ≠ j) ࣮ߦతͳ༗ҙਫ४ Family-Wise Error Rate α = 1 − (1 − α)6 ≥ α α α 1 − (1 − α)6 શମͱͯ͠ݟͨ࣌ʹɺِཅੑ཰্͕͕ͬͯ͠·͏
  29. Bonferroni ิਖ਼ 55 ֤ϖΞͷݕఆͷ༗ҙਫ४Λɺݕఆͷճ਺ɹͰׂͬͨ஋ʹௐ੔͢Δ α → α m m Family-Wise

    Error Rate α ≤ α ͱͳΓɺશମͱͯ͠ͷ༗ҙਫ४͕อͨΕΔ σϝϦοτ ͕େ͖͍ͱอकతʹͳΓ͕ͪʢِӄੑ཰ͷ্ঢʣ m
  30. ׳ΕޮՌΛࠩ෼ͷࠩͰϞσϧԽ͢Δ 62 1st half 2nd half Control Treatment yT,1 yC,1

    yC,2 yT,2 ׳ΕޮՌҎ֎ͷӨڹ͸̎܈ͰಉҰ ʢฒߦτϨϯυ & ڞ௨γϣοΫͷԾఆʣ Ծఆ ςετظؒΛલɾޙ൒ʹ̎෼͢Δ ࠩ෼ͷࠩ౷ܭྔ δ = (yT,2 − yC,2) − (yT,1 − yC,1) ճؼϞσϧԽ ̂ β3 = ̂ δ y = β0 + β1 T + β2 S + β3 TS + ε T / C ͷμϛʔ 1st / 2nd ͷμϛʔ Ͱ͋Γɺ ճؼϞσϧͷ౰ͯ͸·Γ & ܎਺ͷ༗ҙੑΛ֬ೝ͢Δ
  31. ֶੜͷօ͞Μ΁ͷϝοηʔδ 70 ֶͼଓ͚·͠ΐ͏ औΓ૊Έ·͠ΐ͏ - ֶ෦ɾେֶӃͰͷݚڀʢ՝୊ղܾͷαΠΫϧʣ ɹେֶ͸ੈքϨϕϧͷઐ໳Ո͔Βֶ΂Δوॏͳ৔ॴ - ਺ֶɾ޻ֶͷઐ໳஌ࣝ -

    ϓϩάϥϛϯά - ޠֶ - ٕज़ྙཧɺ๏੍౓ɺྺ࢙ɺ… ৬໊΍τϐοΫͷྲྀߦʹͱΒΘΕ͗ͣ͢ɺ ઐ໳஌ࣝͰ՝୊ղܾͰ͖ΔਓΛͥͻ໨ࢦ͍ͯͩ͘͠͞