Improving "People You May Know" on Directed Social Graph

Improving "People You May Know" on Directed Social Graph

machine learning graph pitch #1 https://machine-learning-pitch.connpass.com/event/130083/ で話した資料です。
有向グラフ上での Graph Embeddings と、それを活用したつながり推薦について話しました。

A54cce9a3933a049ca19477b7382cdde?s=128

Agata Naomichi

May 13, 2019
Tweet

Transcript

  1. ©2019 Wantedly, Inc. Improving “People You May Know”
 on Directed

    Social Graph Machine Learning Graph Pitch #1 May 13, 2019 - Naomichi Agata Graph Embedding Λ༻͍ͨ૒ํ޲ͭͳ͕Γ༧ଌ
  2. ©2018 Wantedly, Inc. Naomichi Agata - 2018/04~ Wantedly, Inc. -

    Wantedly People ͷػցֶशΤϯδχΞ - Data Driven Developer Meetup ӡӦϝϯόʔ - https://d3m.connpass.com @agatan_ ࣗݾ঺հ
  3. ©2019 Wantedly, Inc. • ͳʹΛղܾ͍ͨ͠ͷ͔ • Graph Embeddings • ͭͳ͕ΓͷछྨΛ׆༻͢Δ

    • ·ͱΊ Agenda
  4. ©2019 Wantedly, Inc. ͳʹΛղܾ͍ͨ͠ͷ͔ ղ͖͍ͨ໰୊ͱͦͷલఏ

  5. ©2019 Wantedly, Inc. ʮͭͳ͕Γʯͷ਺Λ;΍͍ͨ͠ʂ ղ͖͍ͨ໰୊  Wantedly People͸ʮͭͳ͕ΓʯΛ؅ཧ͢ΔΞϓϦ  ʢ໊ࢗΛࡱΔPSͭͳ͕ΓϦΫΤετʣˠͭͳ͕Δʂ

     ͭͳ͕ͬͨਓʹؔ࿈͢Δ৘ใ͕Θ͔Δʂ  ͨͱ͑͹Wantedlyࣾһͱͭͳ͕ΔͱʮWantedly, Inc.ʯʹؔ࿈͢Δχϡʔε͕ݟΒΕΔ  Ϣʔβಉ͕࢜ͭͳ͕Δ΄Ͳྑ͍ମݧ͕ఏڙͰ͖Δ͸ͣ
  6. ©2019 Wantedly, Inc. ૒ํ޲ͷʮͭͳ͕Γʯͷ਺Λ;΍͍ͨ͠ʂ  Wantedly Peopleͷͭͳ͕ΓϞσϧ͸ʮ༗޲άϥϑʯͰɺҰํ޲ͷͭͳ͕Γ΋͋Δ  lͭͳ͕ΓϦΫΤετz 

    ͨͱ͑͹"͞Μ͕#͞Μͷ໊ࢗΛࡱӨ͢Δͱɺ"͞Μˠ#͞ΜʹϦΫΤετ͕ૹΒΕΔ  ๻Β͕;΍͍ͨ͠ͷ͸ʮ૒ํ޲ͷͭͳ͕Γʯ  Ұํతͳͭͳ͕ΓϦΫΤετΛ;΍͍ͨ͠Θ͚Ͱ͸ͳ͍ ݫີʹ͸ʜ
  7. ©2019 Wantedly, Inc. - ʮ΋͔ͯ͠͠஌Γ߹͍͔΋ʁʯΛද͍ࣔͨ͠ - “People You May Know”

    - Link Prediction : ͋ΔϢʔβ A ͱ B ͕ͭͳ͕Δ཰Λ༧ଌ͍ͨ͠ Ͳ͏΍ͬͯղ͔͘ ? ? ?
  8. ©2019 Wantedly, Inc. ॳظ  ॳظ͸ϧʔϧϕʔε͔Βελʔτ  ʮಉ͡ձࣾʯʮڞ௨ͷͭͳ͕Γ͕NਓҎ্ʯʮ͋ͳͨͷ໊ࢗΛಡΈऔͬͨਓʯͳͲ  ϧʔϧͷෳࡶԽ΍ਫ਼౓ɺਪન਺͕՝୊ʹͳΓɺվળϑΣʔζʹ

     ෳ਺ͷͭͳ͕ΓΛ΋͍ͬͯΔͷʹϧʔϧʹ౰ͯ͸·Βͳ͍ͨΊʹʮਪન͕ग़ͤͳ͍ʯ Ϣʔβ͕͍ͨ
  9. ©2019 Wantedly, Inc. Graph Embeddings

  10. ©2019 Wantedly, Inc.  ͜͜Ͱ͸Graphதͷ֤Nodeʹରͯ͠ʮϕΫτϧදݱʯΛ֫ಘ͢Δ͜ͱ  ͭͳ͕͍ͬͯΔNodeͨͪ͸ۙ͘ɺͭͳ͕͍ͬͯͳ͍Nodeͨͪ͸ԕ͘ͳΔΑ͏ʹֶश͢Δ  ʮྨࣅ౓ʯ͸Cosineྨࣅ౓΍಺ੵͰఆٛ͢Δ Graph

    Embeddings? A B C similarity(A, B) = ⃗ a ⋅ ⃗ b similarity(A, C) = ⃗ a ⋅ ⃗ c
  11. ©2019 Wantedly, Inc.  ෳࡶͳؔ܎ΛදݱͰ͖Δʢ͜ͱ͕ظ଴͞ΕΔʣ  άϥϑͱ͍͏ෳࡶͳσʔλߏ଄ͷ৘ใΛ࣍ݩϕΫτϧͱͯ͠ѻ͑ͨΒ͏Ε͍͠ʂ  ʢยํ޲ͷΤοδ΋ؚΊΔͱʣ਺ԯΤοδͷάϥϑ͕γϯϓϧͳදݱʹམͱ͠ࠐΊΔ 

    ਪનީิͷݕࡧ͕ߴ଎ʹ࣮ݱͰ͖Δ  શ୳ࡧ͢Δͱͯ͠΋಺ੵΛऔΔ͚ͩ  ʢଞͷλεΫ΁ͷԠ༻ʣ  ಘΒΕͨembeddingsΛdownstream tasksͷೖྗʹ͔ͭ͑Δ  Link Prediction΋ downstream tasksͷͻͱͭ ͳͥGraph EmbeddingsΛ͔ͭ͏ͷ͔
  12. ©2019 Wantedly, Inc. ਪનͷશମ૾ Embeddings ۙ๣఺୳ࡧ ճؼϞσϧ ਪન

  13. ©2019 Wantedly, Inc. ਪનͷશମ૾ Embeddings ۙ๣఺୳ࡧ ճؼϞσϧ ਪન  ͢΂ͯͷϖΞΛྻڍͯ͠ܭࢉ͢Δͷ͸ܭࢉྔతʹ͖ͼ͍͠

     embeddingͷ৘ใΛ͔ͭͬͯਪનީิΛߜΔ  ճؼϞσϧ͸Re-Rankͷ໾ׂ  ۙࣅ࠷ۙ๣୳ࡧ(Approximate Nearest Neighbors) ͕࢖͑ΔͷͰεέʔϧ͢Δ
  14. ©2019 Wantedly, Inc. ਪનͷશମ૾ Embeddings ۙ๣఺୳ࡧ ճؼϞσϧ ਪન  ϢʔβͷϖΞ͔Βʮͭͳ͕Δ཰ʯΛ༧ଌ͢ΔϞσϧ

     ಛ௃ྔͱ֤ͯ͠ϢʔβͷϕΫτϧදݱΛ࢖͏  ͦͷ΄͔ͷ৘ใʢ࣍਺΍ڞ௨ͷͭͳ͕Γ਺ͳͲʣΛಛ௃ྔͱ͔ͯͭ͠͏͜ͱ΋Ͱ͖Δ  ͳͯ͘΋embeddingͷ಺ੵͷେ͖͞ॱʹιʔτ্ͯ͠ҐΛਪન͢Δํ๏͕ͱΕΔ  ͋ͬͨ΄͏͕ਫ਼౓͕Α͔ͬͨ  ֬཰ͱͯ͠ղऍͰ͖ΔΑ͏ʹͳΔͷͰѻ͍΍͍͢
  15. ©2019 Wantedly, Inc. ਪનͷશମ૾ Embeddings ۙ๣఺୳ࡧ ճؼϞσϧ ਪન  ͕͜͜ॏཁʂ

     ͪΌΜͱʮͭͳ͕Γͦ͏ͳϢʔβʯ͕ۙ๣ʹҐஔ͍ͯ͠ͳ͍ͱީิ͔Β࿙Εͯ͠·͏  άϥϑͷ৘ใΛ͏·͘ຒΊࠐΊ͍ͯͳ͍ͱճؼϞσϧͷਫ਼౓͕Ͱͳ͍  ͦ΋ͦ΋Embedding͕ఆٛͰ͖ͳ͍Ϣʔβʹ͸ਪનΛग़ͤͳ͍
  16. ©2019 Wantedly, Inc. ΍ͬͯΈͨ  ·ͣ͸૒ํ޲ͷͭͳ͕ΓͷΈΛ׆༻ֶͯ͠शͨ͠  γϯϓϧͳDeepWalk  

    Graph ্ΛϥϯμϜ΢ΥʔΫͯ͠ಘͨ Node ྻ͔Β Word2Vec ͱಉ͡Α͏ͳํ๏ Ͱֶश  ਪનΛग़ͤΔϢʔβ͕;͑ͨ͜ͱͰɺैདྷͷ໿2ഒͷͭͳ͕ΓΛͭͬͨ͘ʂ %FFQ8BMLPOMJOFMFBSOJOHPGTPDJBMSFQSFTFOUBUJPOT<#1SP[[J ,%%>
  17. ©2019 Wantedly, Inc. Ծઆ  ͭͳ͕ΓϦΫΤετ͕ঝೝ͞Εͳ͔ͬͨ͜ͱ  ʮ୯ํ޲ͷͭͳ͕ΓʯΛܦ༝ͯ͠ऑͭ͘ͳ͕͍ͬͯΔͱ͍͏৘ใ ҰํͰɺ૒ํ޲Ͱͳ͍ΤοδΛࣺ͍ͯͯΔͷͰ ͕׆͔͍ͤͯͳ͍ʜ

    ͜ΕΒΛ׆༻Ͱ͖ͨΒ΋ͬͱྑ͘Ͱ͖ΔͷͰ͸ʁ ϦΫΤετ
  18. ©2019 Wantedly, Inc. ͭͳ͕ΓͷछྨΛ׆༻͢Δ

  19. ©2019 Wantedly, Inc. ಺ੵ΍ίαΠϯྨࣅ౓ΛείΞͱͯ͠࢖͏৔߹ ͭͳ͕ΓͷछྨΛͲ͏දݱ͢Δ͔ similarity(A, B) = ⃗ a

    ⋅ ⃗ b = ⃗ b ⋅ ⃗ a = similarity(B, A) ͳͷͰɺʮ"ˠ#ͷΤοδ͸͋Δͷʹ#ˠ"ͷΤοδ͸ͳ͍ʯΛදݱͰ͖ͳ͍ɻ
  20. ©2019 Wantedly, Inc. ͭͳ͕ΓͷछྨΛͲ͏දݱ͢Δ͔ ͦ͜Ͱɺ"͔Β#ʹରͯ͠Sͱ͍͏Τοδ͕͋Δͱ͖ɺͦΕΛ࣍ͷΑ͏ʹදݱ͢Δ score(A, r, B) = ⃗

    a ⋅ fr ( ⃗ b ) score(B, r, A) = ⃗ b ⋅ fr ( ⃗ a ) ͸Τοδͷछྨ͝ͱʹఆٛ͞ΕΔϕΫτϧΛม׵͢Δؔ਺ fr A B r fr ͸খ͘͞ͳΔΑ͏ʹ΋ֶश ⃗ a ⋅ fr ( ⃗ b ) ͸େ͖͘ ⃗ b ⋅ fr ( ⃗ a )
  21. ©2019 Wantedly, Inc. ࠓճ͸Complex EmbeddingsΛࢀߟʹ࣍ͷΑ͏ͳม׵Λߦͬͨ $PNQMFY&NCFEEJOHTGPS4JNQMF-JOL1SFEJDUJPO<55SPVJMMPO *$.-> real, imag =

    embedding[:dim // 2], embedding[dim // 2:] concat(real * W_real - imag * W_imag, real * W_imag + image * W_real) W_real, W_imag͸ֶश͞ΕΔύϥϝʔλ ͭͳ͕ΓͷछྨΛͲ͏දݱ͢Δ͔
  22. ©2019 Wantedly, Inc. ࣮ݧ  ʮ૒ํ޲ͷͭͳ͕ΓʯʮҰํ௨ߦͷͭͳ͕ΓʯΛ྆ํͻͱͭͷάϥϑ্Ͱදݱͯ͠ embeddingsΛֶश  ֶशͨ͠ϕΫτϧදݱΛಛ௃ྔʹͨ͠LightGBMͷϞσϧΛֶश͠ɺͦͷਫ਼౓ΛධՁ͢Δ 

    ʮ૒ํ޲ͷͭͳ͕ΓʯͷΈΛ࢖ͬͨ৔߹ͷੑೳͱൺֱ͢Δ
  23. ©2019 Wantedly, Inc. ࠓճͷ࣮ݧͰͷදݱ૒ํ޲ A͞Μ B͞Μ ⃗ a ⋅ ⃗

    b ⃗ b ⋅ ⃗ a - src, dst ͱ΋ʹม׵ͳ͠  ͲͪΒͷ޲͖͔Βݟͯ΋ಉ݁͡ՌʹͳΔΑ͏ʹ score(A, bidirected, B) = ⃗ a ⋅ ⃗ b score(B, bidirected, A) = ⃗ b ⋅ ⃗ a
  24. ©2019 Wantedly, Inc. ࠓճͷ࣮ݧͰͷදݱҰํ௨ߦ A͞Μ B͞Μ ⃗ a ⋅ f(

    ⃗ b ) ⃗ b ⋅ f( ⃗ a ) - dstଆΛม׵  A → B ͸positive͔ͭB → A ͸negativeΛ
 දݱͰ͖ΔΑ͏ʹ͢Δ ⃗ a ⋅ f( ⃗ b ) Λେ͖͘ʢA → B͸Ұํ௨ߦʣ ⃗ b ⋅ f( ⃗ a ) Λখ͘͞ʢ# → A͸ଘࡏ͠ͳ͍ʣ ⃗ a ⋅ ⃗ b Λখ͘͞ʢA ͱ B͸૒ํ޲ͭͳ͕ΓͰ͸ͳ͍ʣ
  25. ©2019 Wantedly, Inc. ݁Ռ  ʮTop0Ҏ಺ʹͭͳ͕ΔϢʔβΛਪનͰ͖ΔʯϢʔβ਺͸10.3% ૿Ճ  ΋ͱ΋ͱਪનΛग़͍ͤͯͨϢʔβʹର͢Δਪનͷ࣭͸΄΅ಉ౳
 ʢAUC:

    0.819 → 0.823ʣ  ʮ୯ํ޲ͷΤοδΛ׆༻͢Δ͜ͱͰ࣭΋͕͋Δʯͱ͍͏݁Ռ͸ಘΒΕͳ͔ͬͨ  खΛ޿͛ͯ΋ਫ਼౓͕མͪͳ͔ͬͨͱ͍͑ͦ͏
  26. ©2019 Wantedly, Inc. ͦͷଞͷ࣮ݧ͜Ε͔Β  ͍Ζ͍Ζ΍ͬͯΈ͚ͨͲྑ͍݁Ռ͕ಘΒΕͳ͔࣮ͬͨݧͨͪʜ  ʮ໊ࢗܦ༝ʯʮϦΫΤετܦ༝ʯͱ͍ͬͨ৘ใ΋Τοδͷϥϕϧͱͯ͠ѻͬͯΈΔ  ϦΫΤετͷReject΋Τοδͱͯ͠ѻͬͯΈΔ

     ৽͘͠ਪન͕ग़ͤΔΑ͏ʹͳͬͨϢʔβʹߜͬͨධՁ͕Ͱ͖͍ͯͳ͍  ୯ํ޲ͷΤοδ΋͔ͭ͏Α͏ʹͳͬͨ͜ͱͰɺ࣍਺ͷখ͍͞Ϣʔβ΋ଟؚ͘Ήάϥϑ ʹͳͬͨ  ΄͔ͷλεΫ΁ͷԠ༻
  27. ©2019 Wantedly, Inc. ·ͱΊ  ૒ํ޲ͱ୯ํ޲ͷͭͳ͕ΓΛ۠ผͯ͠ѻ͏͜ͱͰɺΑΓϦονͳ৘ใΛຒΊࠐΊͨʢʁʣ  EstଆͷϕΫτϧදݱΛม׵͢Δ  ༗ޮͳਪન͕Ͱ͖ͨϢʔβ͕10.3%૿Ճͨ͠ʂ

     ࠓޙ  Τοδʹ͸΋ͬͱͨ͘͞Μͷ৘ใ͕͋ΔͷͰ׆༻͍ͨ͠  Link PredictionҎ֎ͷλεΫ΁ͷ׆༻