Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ストリートスナップデータに
統計的ネットワーク分析の適用を試みた

 ストリートスナップデータに
統計的ネットワーク分析の適用を試みた

TokyoR #78 LT

Bc973b8749f15b3a4a204c6a44297ed2?s=128

saltcooky

May 25, 2019
Tweet

More Decks by saltcooky

Other Decks in Science

Transcript

  1. ετϦʔτεφοϓσʔλʹ
 ౷ܭతωοτϫʔΫ෼ੳͷద༻ΛࢼΈͨ 5PLZP3
 !TBMUDPPLZ

  2. ୭ʁ • !TBMUDPPLZ • 3ྺɿ೥͙Β͍͔ͳ • ۈઌɿݪ॓ʹ͋Δ*5ܥͷձࣾ • ࢓ࣄ಺༰ɿ3%తͳ෦ॺͰ3Λ࢖ͬͨ
 ɾσʔλ෼ੳ

    ׂ 
 ɾػցֶश ׂ 
 ɾલॲཧ ׂ  • झຯɿ෰ϑΝογϣϯඒज़ؗ८Γ
  3. ωοτϫʔΫ෼ੳͱ͸ ਓؒؔ܎ɺاۀؒͷؔ܎ɺੜ෺ؒͷؔ܎ɺίϯϐϡʔλωοτϫʔΫ ͳͲͷؔ܎΍ߏ଄Λ෼ੳ͢Δάϥϑཧ࿦Λϕʔεͱͨ͠෼ੳख๏ (ग़య : https://www.slideshare.net/kashitan/tidygraphggraph) (https://www.amazon.co.jp/exec/obidos/ASIN/4320019288) ͜ΕͰษڧ͠·ͨ͠ ࠷ۙͷTokyoRͩͱ
 @kashitan

    ͞Μ͕ ൃදͨ͠Γͯͨ͠
  4. ωοτϫʔΫ෼ੳ Α͋͘Δͷ͸ωοτϫʔΫͷࢦඪͷࢉग़΍ߏ଄ͷநग़ - த৺ੑ
 ͲͷΑ͏ͳਓ͕த৺తͳਓ෺͔ - ίϛϡχςΟநग़
 ͲͷΑ͏ͳάϧʔϓʹ෼͔Ε͍ͯΔ͔ - ૬ؔ܎਺


    ̎ͭͷωοτϫʔΫ͸ࣅ͍ͯΔ͔Ͳ͏͔ - ίΞநग़
 ωοτϫʔΫͷີʹ݁߹ͨ͠த৺෦෼
  5. ωοτϫʔΫͷ͋Δ̎఺ͷ௖఺ؒ J K ͷล͸ɺ֬཰QJKͰ֬཰తʹൃੜ͢Δͱߟ͑Δ QJK͸ύϥϝʔλВΛ࣋ͭϩδεςΟοΫϞσϧͰදݱͰ͖Δ ௖఺J Kͱ௖఺K Lʹล͕ுΔ֬཰͸QJKºQKLͱදݱͰ͖Δ ౷ܭతωοτϫʔΫ෼ੳ K

    L J
  6. ࢦ਺ϥϯμϜάϥϑϞσϧ FYQPOFOUJBMSBOEPNHSBQINPEFM  ɹϥϯμϜάϥϑ:ʹ͓͍ͯಛఆͷάϥϑߏ଄Z͕ಘΒΕΔ֬཰͸֤ล͕ுΔ֬཰ͷ
 ྦྷ৐ͰදݱͰ͖Δͱߟ͑ͨϞσϧ ౷ܭతωοτϫʔΫ෼ੳ yʹ͋Δลͷ਺ ύϥϝʔλ ن֨Խఆ਺ ωοτϫʔΫશମ

    ͷลͷൃੜ֬཰
  7. ࢦ਺ϥϯμϜάϥϑϞσϧɹQ Ϟσϧ ɹϥϯμϜάϥϑ:ͷลͷൃੜ֬཰͸༷ʑͳཁૉʹΑΓ֬཰తʹܾ·ΔϞσϧ ౷ܭతωοτϫʔΫ෼ੳ ཁૉ ϊʔυͷಛ௃ྔɿ೥ྸɺॏΈɺ෦ॺʜ ลͷಛ௃ྔɿަࡍظؒɺ޷Έʜ ϊʔυؒͷؔ܎ͷಛ௃ɿ೥ྸࠩɺۈଓظؒࠩʜ ߏ଄తͳಛ௃ྔɿLελʔߏ଄ͷ਺ʜ ωοτϫʔΫͷߏ੒ཁ

    ཁૉͷ਺
  8. ద༻σʔλ

  9. ద༻σʔλ ೥ྸ ৬ۀ ࡱӨ৔ॴ ண༻ϒϥϯυ

  10. Ϟνϕʔγϣϯ ล͸ண༻ϒϥϯυͷ ڞ௨౓ ϒϥϯυͷબ୒ͷੑ࣭Λ දݱͰ͖ͳ͍͔ (͔ͳΓແཧ໼ཧ)

  11. σʔλऔಘ • ($1্Ͱ%PDLFSΛ༻͍ͯ3TUVEJP 34FMFOJVN؀ڥΛ࡞੒ • SWFTUQBDLBHFΛར༻ͯ͠εΫϨΠϐϯά • ϙΞιϯ෼෍ʹै͏ִؒͰϖʔδऔಘ ͳΜͱͳ͘ 

    • ໿Ұ೥෼ਓͷεφοϓσʔλΛऔಘ
  12. σʔλ֬ೝ ண༻ϒϥϯυϥϯΩϯά ண༻ϒϥϯυωοτϫʔΫ

  13. Ϟσϧ࡞੒(ྫ) ࢦ਺ϥϯμϜϞσϧ͸TUBUOFUQBDLBHFͰ࣮૷͕Ͱ͖·͢ɻ # ωοτϫʔΫΦϒδΣΫτͷ࡞੒
 network <- as.network(x = graph_matrix, directed

    = FALSE, loops = FALSE) # ֤Τοδʹઆ໌ม਺(೥ྸ)Λ௥Ճ network %v% "Age" <- Age # ֤Τοδͷ೥ྸͷࠩΛܭࢉ diff.age <- abs(sweep(matrix(snap_info$Age, nrow = 638, ncol = 638), 2, snap_info$Age)) # Ϟσϧ࡞੒
 model <- ergm( network ~ edges + edgecov(diff.age) + nodecov(“Age”) )

  14. Ϟσϧ࡞੒ ࢦ਺ϥϯμϜϞσϧ͸TUBUOFUQBDLBHFͰ࣮૷͕Ͱ͖·͢ɻ # ετϦʔτεφοϓͷp*Ϟσϧੜ੒ snap_net_model <- ergm(snap_net ~ 
 edges

    + # ลͷ਺ nodecov(“Age")+ # ೥ྸࠩ edgecov(diff.age) + # ೥ྸ nodematch(“Occupation”) + # ৬ۀ nodematch("Point") ) # ࡱӨ৔ॴ

  15. ݁ՌΛݟͯΈΔ > summary(snap_net_model) < ུ > Monte Carlo MLE Results:

    Estimate Std. Error MCMC % z value Pr(>|z|) edges -5.2066393 0.2692526 0 -19.337 <1e-04 *** edgecov.diff.age -0.0015763 0.0094767 0 -0.166 0.8679 nodecov.Age -0.0003136 0.0061215 0 -0.051 0.9591 nodematch.Occupation -0.0453192 0.0842853 0 -0.538 0.5908 nodematch.Point 0.1491330 0.0628610 0 2.372 0.0177 * 
 < ུ > AIC: 13485 BIC: 13536 (Smaller is better.)
 ࡱӨ৔ॴ͕ลͷൃੜʹ Өڹ͍ͯͦ͠͏ AIC/BICͰม਺બ୒Մೳ
  16. ݁ՌΛݟͯΈΔ ϞσϧΛ༻͍ͯγϛϡϨʔγϣϯ ࣮ઢɿγϛϡϨʔγϣϯʹΑΔ஋
 ശͻ͛ਤɿ࣮σʔλͷ஋ ౰ͯ͸·Γྑ͘ͳ͍ʜ

  17. ·ͱΊ • ࠓճͷεφοϓ৘ใͰ͸ɺண༻ϒϥϯυͷؔ܎ੑΛࢦ਺ϥϯμϜ άϥϑϞσϧͰ͏·͘දݱͰ͖·ͤΜͰͨ͠ • ౷ܭతωοτϫʔΫ෼ੳ͸݁ߏ໘ന͍ͷͰɺษڧͯ͠ΈͯͶ • ࢲ΋౷ܭతωοτϫʔΫ෼ੳͷษڧଓ͚͍͖͍ͯͨͱࢥ͍·͢ • ͳͷͰɺৄ͍͠ํ͸͝ڭतئ͍͠·͢

  18. • ڞཱग़൛ʮωοτϫʔΫ෼ੳୈ̎൛ʯླ໦౒ஶ
 IUUQTXXXBNB[PODPKQFYFDPCJEPT"4*/ • \UJEZHSBQI^ͱ\HHSBQI^ʹΑΔϞμϯͳωοτϫʔΫ෼ੳ
 IUUQTXXXTMJEFTIBSFOFULBTIJUBOUJEZHSBQIHHSBQI • 3ʹΑΔωοτϫʔΫ෼ੳΛ·ͱΊ·ͨ͠ωοτϫʔΫͷࢦඪฤ
 IUUQTRJJUBDPNTBMUDPPLZJUFNTFEDFEGCDE •

    3ʹΑΔωοτϫʔΫ෼ੳΛ·ͱΊ·ͨ͠౷ܭతωοτϫʔΫ෼ੳฤ
 IUUQTRJJUBDPNTBMUDPPLZJUFNTCBFGDFCGBDFBDCGD ࢀߟ