Upgrade to Pro — share decks privately, control downloads, hide ads and more …

juju1008

 juju1008

第46回TokyoWebMining発表資料

jujudubai

May 30, 2015
Tweet

More Decks by jujudubai

Other Decks in Technology

Transcript

  1. Agenda 1. Uberͱ͸… 2. Bayesian Modelingʹؔ͢Δجૅதͷجૅ 3. Bayesian ModelingΛ༻͍ͨUserͷ໨త஍ͷ༧ଌ 1.

    σʔλͷ֓ཁ 2. Ϟσϧઃܭʢࣄલ෼෍ɺ໬౓ͷઃܭʣ 3. ࣄޙ֬཰ͷਪఆʢMAPਪఆʣ 4. Ϟσϧ݁Ռ 4. ຊൃදͷ·ͱΊ
  2. Profile • ܚԠٛक़େֶӃ म࢜ʢ2015ʣ • ަ௨σʔλͷղੳʢProbe-Car Dataʣ
 ޿ࠂؔ࿈ͷσʔλ΍εϙʔπσʔλͷ෼ੳ΍Βͦͷଞॾʑ… • Spatial

    Statistics, Bayesian Statistics Λͪΐͼͬͱ • Python/R/HDFS/Impala/Hive etc…
 ͦͷଞ΋ษڧத… • GISؔ࿈… • “Ad Technology”ք۾ͷΤϯδχΞͯ͠·͢ɻ
  3. ϕΠζਪఆΛ࢝ΊΔલʹ… ࠷໬ਪఆ๏ • ύϥϝʔλͷ஋͸ະ஌Ͱ͋Δ͕ɺਓؒͷ௚؍ͱ͸ಠཱʹଘࡏ͢Δఆ਺ͱଊ͑Δɻ • ࣮ࡍʹ؍ଌσʔλ͕ಘΒΕͨ࣌ɺͦͷΑ͏ͳσʔλ͕ಘΒΕΔ֬཰Λ࠷େʹ͢Δ
 ύϥϝʔλͷ஋Λɺ࠷ྑͷਪఆ஋ͱ͢Δɻ ϕΠζਪఆ • ύϥϝʔλΛɺط஌ͷࣄલ෼෍Λ΋ͭ֬཰ม਺ͱͯ͠ଊ͑Δɻ

    • ؍ଌ݁ՌΛಘΔͱɺ͜ͷࣄલ෼෍͸ࣄޙ෼෍΁ͱมԽ͠ɺ
 ύϥϝʔλ஋ʹର͢Δ֬৴౓͕मਖ਼͞ΕΔɻ ࠷໬ਪఆ๏ͱϕΠζਪఆͷൺֱʢ؆қ൛ʣ ˆ ✓ = argmax ✓ [ P(x (n) ; ✓) ] ˆ ✓ = argmax ✓ [ P(✓ | x (n) ) ] [1]ΑΓ [1]ΑΓ
  4. ࠷໬ਪఆ๏ɿجૅதͷجૅͷ෮श ʮݱ࣮ʹզʑ͕ಘͨ؍ଌσʔλ͸ɺ֬཰࠷େͷ΋ͷ͕࣮ݱͨ݁͠ՌͰ͋Δɻʯʢ࠷໬ݪཧʣ ؆୯ʹݴ͏ͱɺ ˆ ✓ = argmax ✓ [ P(x

    (n) ; ✓) ] ࠷໬ਪఆ๏͸ɺύϥϝʔλθͷਪఆ஋Λ།Ұʹಛఆ͢Δ఺ਪఆɻʢස౓ओٛʹجͮ͘ʣ ؍ଌ͞Εͨσʔλͷഎܠʹ͸ɺ
 ਅͷ౷ܭϞσϧʢ།Ұʣ͕͋Δɻ ఆࣜԽͨ͠Ϟσϧʹ͓͍ͯɺ
 θΛݻఆ͢Δ͜ͱʹΑΓɺ
 ਅͷ౷ܭϞσϧΛ஋ΛٻΊΔɻ Figure1. ස౓ओٛʹجͮ͘౷ܭϞσϦϯάͷimage [4] [1]ΑΓ
  5. ϕΠζͷఆཧ ϕΠζਪఆɿجૅதͷجૅͷ෮श ˎ ৄࡉͳϞσϦϯά΍MCMCؔ࿈ͷ࿩͸ͳ͠ θʹؔ͢Δ֬཰෼෍ θ͕༩͑ΒΕͨ࣌ͷyͷ֬཰ʢີ౓ʣؔ਺ y͕༩͑ΒΕͨ࣌ͷθͷ֬཰ʢີ౓ʣؔ਺ ͲΜͳԾઆͰ͋ͬͯ΋σʔλͷಘΒΕΔ֬཰ • ࣄલ෼෍

    P(θ) • ໬౓ P(y|θ) • ࣄޙ෼෍ P(θ|y) • ਖ਼نԽఆ਺ P(y)
 σʔλग़ݱ֬཰ P(✓|y) = P(y|✓)P(✓) P(y) / P(y|✓)P(✓) ➡ɹࣄޙ෼෍͸ɺ໬౓ͱࣄલ෼෍Λ͔͚ͨ΋ͷʹൺྫ͢Δʂ ɿ
 ɿ
 ɿ
 ɿ [1]Λ΋ͱʹ
  6. ϕΠζਪఆΛ༻͍Δར఺ ϕΠζਪఆ • ࣄલ෼෍ʢओ؍֬཰ʣΛࣗ༝ʹઃఆ
 ʢਪఆ஋ͷ෼෍ʹਖ਼نੑΛԾఆ͠ͳͯ͘Α͍ʣ • ϕΠζߋ৽ʹΑΔϞσϧͷڧԽ
 ʢσʔλΛߋ৽͍͚ͯ͠͹ɺཧ࿦্͸ਪఆਫ਼౓͸্͕͍ͬͯ͘…ʣ • ٻΊ͍ͨࣄ৅ͷ֬཰෼෍ͦͷ΋ͷΛ༧ଌ

    • ఺ਪఆ஋Λ༻͍Ε͹ɺස౓Ϟσϧͱಉ༷ͷ݁ՌΛฦ͢͜ͱ΋Մೳ • ਅͷ஋ʢ݁Ռʣ͸ҰͭͰ͋Δඞཁ͕ͳ͍ • ؍ଌճ਺n͕খ͍͞৔߹ɺࣄલ෼෍͕ద੾ʹઃఆ͞Ε͍ͯΔͳΒ͹ɺ
 ʮϕΠζਪఆ + ࣄޙ֬཰࠷େԽʯ͕༗ར [1]ͱ[7]ΑΓ
  7. ϕΠζਪఆΛ༻͍Δܽ఺ ϕΠζਪఆ • ଎͘ͳͬͨͱ͸ݴ͑Ͳɺਪఆʹ͔͔Δ͕࣌ؒ௕͍… • ෳࡶͳϞσϧͩͱɺऩଋ͠ͳ͍৔߹͕ଟʑ… • ॳظͷڭҭ՝ఔʹֶ͓͍ͯͿස౓ओٛͱ͸ߟ͑ํ͕ҟͳΔͷͰɺ
 ͪΐͬͱशಘίετ͕… •

    ਪఆ͢Δ্Ͱɺशಘ͢΂͖ཧ࿦΍ݴޠ͕ଟʑ…
 ʢMCMC΍ͦΕΒʹؔ܎͢ΔStan/Jags etc…ʣ [1]ͱ[7]ΑΓ ࠷ऴਪఆͰղ͘΂͖͔ɺϕΠζਪఆͰղ͘΂͖͔ɺΑٞ͘୊ʹͳΔ͕ɺ
 ͦΕͧΕͷ໨త΍ίετͱͷ݉Ͷ߹͍ʹԠͯ͡ར༻͢Ε͹Α͍ɻ
  8. 3छྨͷࣄલ෼෍Λઃఆ͢Δɹˠɹࠞ߹ਖ਼ن෼෍ͷར༻ 1. ಛఆͷUser͸Ͳ͜ʹߦ͘܏޲͕͋Δ͔ɻ= Rider Prior
 →ʮUserͷཤྺʯ
 2. UberΛར༻͢ΔUser͸શମతʹͲ͜ʹߦ͘܏޲͕͋Δ͔ɻ= Uber Prior


    →ʮUberʹ͓͚Δ܏޲ʯ
 3. ͜ͷΤϦΞͰ͸Ͳͷ৔ॴ͕Ұൠతʹਓؾ͕ߴ͍ͷ͔ɻ= Popular Place Prior
 →ʮਓؾͷ͋Δ৔ॴʹؔ͢Δσʔλʯ ࣄલ෼෍ͷߏங
  9. 2. ʮUberʹ͓͚Δ܏޲ʯ • UberͷUser͕ಛఆͷ৔ॴʹߦ͘ಛੑΛར༻ • UberͷUser͕๚Εͨ͜ͱͷ͋Δ৔ॴຖͷճ਺Λར༻ʢਖ਼نԽʣ PUber(D = i) =

    P(D = i|is Uber user) PUber(D = i) UberͷUser͕๚ΕΔ৔ॴͷਖ਼نԽ͞Εͨճ਺ ࣄલ෼෍ͷߏங → [9]ΑΓ
  10. 3. ʮਓؾͷ͋Δ৔ॴʹؔ͢Δσʔλʯ • SFʹ͓͚Δ৔ॴͷ܏޲Λߟྀ • 1000Օॴ΄Ͳͷ঎ۀࢪઃΛؚΜͩσʔλΛར༻ • Ϩετϥϯ, φΠτεϙοτ, ϗςϧ,

    γϣοϐϯά, ϛϡʔδΞϜ etc… • ͓ͦΒ͘ɺWeb্ͷͳΜΒ͔ͷධՁΛ΋ͱʹείΞϦϯά͍ͯ͠Δ…!?
 (the normalized number of reviews left for a business establishment on the site.) P P opular P lace (D = i) P P opular P lace (D = i) ࣄલ෼෍ͷߏங → [9]ΑΓ
  11. P(D = i) = ↵P Popular Place (D = i)

    + P Uber (D = i) + (1 ↵ )P Rider (D = i) Popular Place Prior Uber Prior Rider Prior Destination Prior .3 .3 .4 ←ɹ͜ΕΛࣄલ෼෍ͱͯ͠ઃఆʂ
 ɹɹʢ࣮ࡍͷ஋͸Θ͔Γ·ͤΜ…ʣ Hyper Parameter ࣄલ෼෍ͷ૊Έ߹Θͤ [9]ΑΓ
  12. ৐٬͸͠͹͠͹ɺ࠷ऴ໨త஍ͱ͸ҟͳΔ৔ॴͰԼं͢Δ܏޲͕͋Δɻ P(Y = y|D = i) ˎ Haversineڑ཭ = ׂѪ

    ໬౓ͷߏங → Figure3. Լं৔ॴͱ࠷ऴ໨త஍ͷڑ཭ͷ෼෍[9] ߫֎ͱ౎৺෦Ͱ͸ɺौ଺΍ަࠩ఺ͳͲ༷ʑͳӨڹͰɺ
 ໨త஍ͱԼं஍఺ʹޡ͕ࠩੜ·ΕΔɻ
  13. Ψ΢ε෼෍ʹै͏ͱԾఆ͠ɺ࠷໬ਪఆ஋ɹɹɹɹͱɹɹɹɹɹΛར༻ɻ ˆ µMLE ˆ2 MLE P(Y = y|D = i)

    = N(Y = y|µ, 2) • ໨త஍ͱԼं஍఺ͷڑ཭ͷ֬཰෼෍ͷࢉग़ ໨త஍ͱԼं஍఺ͷڑ཭ͷ֬཰෼෍ Ψ΢ε෼෍ʹै͏ P(Y = y|D = i) ໬౓ͷߏங → [9]ΑΓ
  14. • ฏۉ஋ͱ෼ࢄͷਪఆ஋ͷࢉग़ ฏۉ஋ ෼ࢄ Ψ΢ε෼෍ͷ͋Ε ໨త஍ͱԼं஍఺ͷڑ཭ͷ֬཰෼෍ uniform distribution ˆ 2

    Z=z = 1 Pn k=1 1( Z = z ) n X k=1 ( xk ˆ µZ=z)2 ˆ µZ=z = 1 Pn k=1 1( Z = z ) n X k=1 xk 1( Z = z ) P(Y = y | D = i) = 1 p 2⇡ exp[ (xk ˆ µZ=z) 2 ˆ 2 2 Z=z ] P(Y = y|D = i) ໬౓ͷߏங → [9]ΑΓ [9]Λ΋ͱʹ
  15. ✦ ໬౓ͷ׬੒ P ( X = x | D =

    i ) = P ( Y = y, T = t | D = i ) = P ( Y = y | D = i ) P ( T = t | D = i ) • ʮݸਓͷཤྺʯʮUberͷUserͷ܏޲ʯʮSFͷ܏޲ʯͷ3ͭͷ֬཰෼෍Λࠞ߹ • ֤࣌ؒଳຖͷΧ΢ϯτ਺Λ΋ͱʹΧςΰϦʔ෼෍Λੜ੒ ɹࣄલ෼෍Λઃఆ ࣌ؒଳຖͷ֤৐ंճ਺ ໨త஍ͱԼं஍఺ͷڑ཭ͷ֬཰෼෍ ໬౓ ໬౓ͷߏங → P ( X = x | D = i ) [9]ΑΓ
  16. ݁Ռͱ݁࿦ 1. ৐ं͢ΔUserʹରͯ͠ɺ༧ଌ໨త஍ͷީิϦετʢ100mҎ಺ʣΛࢉग़ 2. ࠷େࣄޙ֬཰ʢMAPਪఆʣͷީิ஍Λબ୒ 3. ͦͷީิ஍ͷॅॴ͕ਅͷ໨త஍ͱҰக͔ͨ͠ ςετํ๏ 1. native

    baselineͱsmart baselineͷൺֱ 1. native baseline
 ީิ஍ͷத͔ΒϥϯμϜʹબ୒͠ɺ40%ͷਫ਼౓Λୡ੒ 2. smart baseline
 ީิ஍ͷத͔Β࠷΋͍ۙީิ஍Λબ୒͠ɺ44%ͷਫ਼౓Λୡ੒ Ϟσϧͷൺֱ ਫ਼౓ͷج४͕͍·͍ͪΘ͔Βͳ͍…
  17. ࣄޙ֬཰࠷େԽɹʙɹMaximum a posteriori (MAP) ࣄޙ֬཰ͷࢉग़ʢMAPਪఆฤʣ ✓ ⇤ = argmax ✓

    log[ P(y | ✓)P(✓) ] → ໬౓Ͱ͸ͳ͘ɺࣄޙ֬཰͕࠷େͱͳΔύϥϝʔλθΛٻΊΔ ஫ʣMAPਪఆ͸ɺϕΠζͷఆཧΛ࢖༻͢Δ͕ɺ఺ਪఆͰ͋ΔͨΊΨνͷBayesian Modelingͱ͸Έͳ͞Εͳ͍ → 0.777 0.182 0.041 ࣄޙ֬཰࠷େ஋ ← ఺ਪఆ → Figure4. ࠷େࣄޙ֬཰ͷબ୒[1] [1]Λ΋ͱʹ
  18. ࢀߟจݙ & ࢀߟURL 1. ੴҪ݈Ұ ଞ,ʮଓΘ͔Γ΍͍͢ύλʔϯೝࣝ ڭࢣͳֶ͠शೖ໳ʯ, Ԣจࣾ, 2014/10/30 2.

    ࣛౡٱ࢚, ʮ਺ཧ৘ใ޻ֶಛ࿦ୈҰʲػցֶशͱσʔλϚΠχϯάʳճؼᶄʯ, 
 URL: (www.geocities.co.jp/Technopolis/5893/2-2.pdf) 3. ݹ୩஌೭,ʮϕΠζ౷ܭσʔλ෼ੳ -R&WinBUGS -ʯ, ே૔ॻళ, 2008/09/15 4. ҆ಓ஌׮,ʮϕΠζ౷ܭϞσϦϯάʯ, ே૔ॻళ, 2010/02/25 5. Allen B Downey,ʮThink Bayes - ϓϩάϥϚͷͨΊͷϕΠζ౷ܭೖ໳ʯ, O`Reilly, 
 2014/9 6. aidiary, “ਓޱ஌ೳʹؔ͢Δஅย࿥”, ‘࠷໬ਪఆɺMAPਪఆɺϕΠζਪఆ’, 
 URL: (http://aidiary.hatenablog.com/entry/20100404/1270359720), 
 posted on 2010/04/04 7. noriume, “Sunny side up”, ‘ैདྷͷਪఆ๏ͱϕΠζਪఆ๏ͷҧ͍’, 
 URL: (http://norimune.net/708), posted on 2013/02/26 8. Masayuki Isobe, “਺ࣜΛͳΔ΂͘࢖Θͳ͍ϕΠζਪఆೖ໳”, 
 URL: (https://speakerdeck.com/chiral/shu-shi-wonarubekushi-wanaibeizutui-ding- ru-men), posted on 2013/2 9. Uber, “Making a Bayesian Model to Infer Uber Rider Destinations,”, 
 URL: (http://blog.uber.com/passenger-destinations)