Upgrade to Pro — share decks privately, control downloads, hide ads and more …

○○2vec 再考

○○2vec 再考

builderscon 2019 「○○2vec 再考」の発表資料です

Agata Naomichi

August 31, 2019
Tweet

More Decks by Agata Naomichi

Other Decks in Programming

Transcript

 1. ©2019 Wantedly, Inc.
  ʓʓ2vec ࠶ߟ
  Builderscon 2019
  Aug 31, 2019 - Naomichi Agata
  Photo by Mika Baumeister on Unsplash

  View full-size slide

 2. ©2019 Wantedly, Inc.
  Naomichi Agata
  @agatan_
  ࣗݾ঺հ
  - 2018/04~ Wantedly, Inc.
  - Wantedly People ͷػցֶशΤϯδχΞ
  - Data Driven Developer Meetup ӡӦϝϯόʔ
  - https://d3m.connpass.com

  View full-size slide

 3. ©2019 Wantedly, Inc.
  ػցֶश΍ͬͯ·͢!
  Ξϯέʔτ

  View full-size slide

 4. ©2019 Wantedly, Inc.
  word2vecͱ͍͏ݴ༿Λ

  ฉ͍ͨ͜ͱ͕͋Δ!
  Ξϯέʔτ
  “ύϦϑϥϯε೔ຊ౦ژzͰ༗໊ͳ͋ΕͰ͢

  View full-size slide

 5. ©2019 Wantedly, Inc.
  ○○2vec֓ཁ
  άϥϑߏ଄Λ࢖ͬͨ○○2vec
  ޮՌతʹ͔ͭ͏ͨΊʹ
  ·ͱΊ
  ΞδΣϯμ

  View full-size slide

 6. ©2019 Wantedly, Inc.
  ○○2vec֓ཁ
  άϥϑߏ଄Λ࢖ͬͨ○○2vec
  ޮՌతʹ͔ͭ͏ͨΊʹ
  ·ͱΊ
  ΞδΣϯμ

  View full-size slide

 7. ©2019 Wantedly, Inc.
  ○○2vec ֓ཁ
  w ○○2vecͱ͸
  w ͳʹ͕͏Ε͍͠ͷ͔
  w ݁ہͲΜͳΞϓϦέʔγϣϯʹ͔ͭ͑Δͷ͔
  ΞδΣϯμ

  View full-size slide

 8. ©2019 Wantedly, Inc.
  ○○2vec֓ཁ
  w ○○2vecͱ͸
  w ͳʹ͕͏Ε͍͠ͷ͔
  w ݁ہͲΜͳΞϓϦέʔγϣϯʹ͔ͭ͑Δͷ͔
  ΞδΣϯμ

  View full-size slide

 9. ©2019 Wantedly, Inc.
  ○○2vec
  ͓͍͍͠
  ͏·͍
  ΤϯδχΞ
  Word2vec
  y
  x
  word2vec: ୯ޠΛ௿࣍ݩͷϕΫτϧʹม׵͢Δ
  ͓͍͍͠  ͏·͍


  ΤϯδχΞ


  View full-size slide

 10. ©2019 Wantedly, Inc.
  ○○2vec ͱ͸
  word2vec͸

  ʮ୯ޠʯ
  Λ
  ʮҙຯ͕ࣅͨ୯ޠ͸ࣅͨϕΫτϧʹͳΔʯ
  Λຬͨ͢ϕΫτϧʹม׵͢Δ
  Paris
  Tokyo
  France
  word2vec
  ( 0.1, 0.3, -0.2, …)
  (-0.7, -0.6, 0.9, …)
  ( 0.8, 0.4, -0.7, …)

  View full-size slide

 11. ©2019 Wantedly, Inc.
  ○○2vec ͱ͸
  Entity 1
  Entity 2
  Entity 3

  ○○2vec
  ( 0.1, 0.3, -0.2, …)
  (-0.7, -0.6, 0.9, …)
  ( 0.8, 0.4, -0.7, …)

  ○○2vec͸

  ʮ○○ʯ
  Λ
  ʮ͋Δੑ࣭ʯ
  Λຬͨ͢ϕΫτϧʹม׵͢Δ

  View full-size slide

 12. ©2019 Wantedly, Inc.
  ○○2vec֓ཁ
  w ○○2vecͷఆٛʹ͍ͭͯ
  w ͳʹ͕͏Ε͍͠ͷ͔
  w ݁ہͲΜͳΞϓϦέʔγϣϯʹ͔ͭ͑Δͷ͔
  ΞδΣϯμ

  View full-size slide

 13. ©2019 Wantedly, Inc.
  ○○2vec Λ࢖͏͜ͱͰɺෳࡶͰந৅తͳ৘ใΛ

  ػցతʹѻ͍΍͍͢ϕΫτϧʹม׵Ͱ͖Δʂ
  ○○2vec֓ཁ ͳʹ͕͏Ε͍͠ͷ͔

  View full-size slide

 14. ©2019 Wantedly, Inc.
  w ෳࡶͳੑ࣭΍ؔ܎ੑ͕ѻ͍΍͘͢ͳΔ
  w l୯ޠͷҙຯzͷΑ͏ͳෳࡶͰந৅తͳ΋ͷ͸ѻ͍ͮΒ͍
  w lϕΫτϧz͸ͨͩͷ਺ࣈͳͷͰ؆୯ʂ
  w Entityͷੑ࣭ɺEntityؒͷؔ܎ੑ͕਺ࣈͰදݱͰ͖Δ
  w ϕΫτϧʹ͸Entityͷੑ࣭͕ຒΊࠐ·Ε͍ͯΔʢ͸ͣʣ
  w ϕΫτϧಉ࢜ͷԋࢉͰEntityؒͷؔ܎ੑ͕දͤΔ
  w ࣅ͍ͯΔॱʹ୯ޠΛฒͼସ͑Δͱ͍ͬͨૢ࡞͕ՄೳʹͳΔ
  ○○2vec֓ཁ ͳʹ͕͏Ε͍͠ͷ͔

  View full-size slide

 15. ©2019 Wantedly, Inc.
  ○○2vec֓ཁ
  w ○○2vecͷఆٛʹ͍ͭͯ
  w ͳʹ͕͏Ε͍͠ͷ͔
  w ݁ہͲΜͳΞϓϦέʔγϣϯʹ͔ͭ͑Δͷ͔
  ΞδΣϯμ

  View full-size slide

 16. ©2019 Wantedly, Inc.
  w ಉ͡ਓ෺͕͍ࣸͬͯΔը૾͕ࣅͨϕΫτϧʹͳΔΑ͏ʹ
  إը૾ݕࡧ
  CNN
  ( 0.1, 0.3, -0.2, …)
  (-0.7, -0.6, 0.9, …)
  (-0.7, -0.5, 0.7, …)

  View full-size slide

 17. ©2019 Wantedly, Inc.
  w ϕΫτϧಉ࢜ͷൺֱͰʮඃࣸମͷಉҰੑʯΛ൑ఆͰ͖Δ
  w ը૾Λ௚઀ൺֱ͢Δͷͱൺ΂ͯʜ
  w ܭࢉ͕ߴ଎ʢ಺ੵऔΔ͚ͩʣ
  w ࡱӨ؀ڥͷҧ͍ͳͲΛٵऩͰ͖Δ
  إը૾ݕࡧ
  CNN  ⃗ ⃗

  ⃗ ⃗

  Index
  ྨࣅϕΫτϧ
  Λݕࡧ

  View full-size slide

 18. ©2019 Wantedly, Inc.
  w ○○2vec ͱผͷΞϧΰϦζϜΛ૊Έ߹Θͤͯ࢖͏͜ͱ΋Α͋͘Δ
  w ྫDoc2vec + NNͰจॻͷΧςΰϦ෼ྨ
  จॻ෼ྨ
  Doc2vec ( 0.1, 0.3, -0.2, …)
  “builderscon͸

  ʮ஌Βͳ͔ͬͨɺΛฉ͘ʯΛ

  ςʔϚͱٕͨ͠ज़ΛѪ͢Δ

  શͯͷΪʔΫୡͷ͓ࡇΓͰ͢ɻ”
  Neural
  Network
  ΧςΰϦ༧ଌ

  View full-size slide

 19. ©2019 Wantedly, Inc.
  ○○2vec Λ࢖͏͜ͱͰɺෳࡶͰந৅తͳ৘ใΛ

  ػցతʹѻ͍΍͍͢ϕΫτϧʹม׵Ͱ͖Δʂ
  ○○2vec֓ཁ ͳʹ͕͏Ε͍͠ͷ͔ʢ࠶ܝʣ

  View full-size slide

 20. ©2019 Wantedly, Inc.
  ○○2vec֓ཁ
  άϥϑߏ଄Λ࢖ͬͨ○○2vec
  ޮՌతʹ͔ͭ͏ͨΊʹ
  ·ͱΊ
  ΞδΣϯμ

  View full-size slide

 21. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  ϊʔυͱϊʔυ͕ΤοδͰͭͳ͕͍ͬͯΔߏ଄
  ൚༻ੑ͕ߴ͘ɺ͍ΖΜͳ৘ใΛάϥϑͰදͤΔ

  View full-size slide

 22. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  ͨͱ͑͹ʜ

  View full-size slide

 23. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  Ϣʔβ͕follow͍ͯ͠ΔϢʔβ

  View full-size slide

 24. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  Ϣʔβ͕ॴଐ͍ͯ͠Δاۀ

  View full-size slide

 25. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  Ϣʔβ͕likeͨ͠هࣄ

  View full-size slide

 26. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  χϡʔεʹొ৔͢Δاۀ

  View full-size slide

 27. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  දݱྗ͸ߴ͍ʜ͕ɺѻ͍ʹ͍͘

  View full-size slide

 28. ©2019 Wantedly, Inc.
  w άϥϑߏ଄ੈͷதͷ͞·͟·ͳෳࡶͳσʔλΛදݱͰ͖Δ
  w ○○2vec ෳࡶͳσʔλΛѻ͍΍͘͢Ͱ͖Δ
  άϥϑߏ଄Λ͔ͭͬͨ ○○2vec
  ૊Έ߹ΘͤͨΒศརͦ͏

  View full-size slide

 29. ©2019 Wantedly, Inc.
  άϥϑߏ଄
  ͨͱ͑͹ʜ
  άϥϑߏ଄Λ࢖ͬͨ○○2vec

  View full-size slide

 30. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ○○2vec
  ○○2vec
  ( 0.1, 0.3, -0.2, …)
  (-0.7, -0.6, 0.9, …)
  ( 0.8, 0.4, -0.7, …)


  ʮάϥϑதͷ֤ϊʔυʯ
  Λ
  ʮྡΓ߹ͬͨϊʔυ͕ࣅͨϕΫτϧʹͳΔʯ
  Λຬͨ͢ϕΫτϧʹม׵͢Δ

  View full-size slide

 31. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ྡΓ߹͍ͬͯΔ ཭Ε͍ͯΔ
  ͨ͘͞Μͷख๏͕ݚڀ͞Ε͍ͯͯ೔ਐ݄าɻ
  ͜͜Ͱ͸γϯϓϧͳ࣮૷Πϝʔδ͚ͩ࿩͠·͢ɻ

  View full-size slide

 32. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ͱ ͸ྡΓ߹͍ͬͯΔ
  ͸཭Ε͍ͯΔ
  y
  x
  ࠷ॳ͸ద౰ͳϕΫτϧ͔Βελʔτ

  View full-size slide

 33. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ͱ ͸ྡΓ߹͍ͬͯΔ
  ͸཭Ε͍ͯΔ
  y
  x
  ྡΓ߹ͬͨϊʔυͷϕΫτϧΛ͚ۙͮΔ

  View full-size slide

 34. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ͱ ͸ྡΓ߹͍ͬͯΔ
  ͸཭Ε͍ͯΔ
  y
  x
  ཭Ε͍ͯΔϊʔυͷϕΫτϧΛԕ͚͟Δ

  View full-size slide

 35. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ͱ ͸ྡΓ߹͍ͬͯΔ
  ͸཭Ε͍ͯΔ
  y
  x
  ͜ΕΛऩଋ͢Δ·Ͱ܁Γฦ͢

  View full-size slide

 36. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ͱ ͸ྡΓ߹͍ͬͯΔ
  ͸཭Ε͍ͯΔ
  y
  x
  ͜ΕΛऩଋ͢Δ·Ͱ܁Γฦ͢

  View full-size slide

 37. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ͱ ͸ྡΓ߹͍ͬͯΔ
  ͸཭Ε͍ͯΔ
  y
  x
  ͜ΕΛऩଋ͢Δ·Ͱ܁Γฦ͢

  View full-size slide

 38. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ͋ΔϊʔυͷϕΫτϧ͸ɺपลϊʔυͱͷؔ܎ੑʹΑܾͬͯఆ͢Δ

  View full-size slide

 39. ©2019 Wantedly, Inc.
  Ͳ͏΍ֶͬͯश͢Δͷ͔
  ࣅͨϊʔυͱͭͳ͕͍ͬͯΔϊʔυಉ࢜͸ࣅͨϕΫτϧʹͳΔ

  View full-size slide

 40. ©2019 Wantedly, Inc.
  ͲΜͳ͜ͱ͕Ͱ͖Δ͔
  άϥϑߏ଄Λ͔ͭͬͨʓʓ2vecͷΞϓϦέʔγϣϯ

  View full-size slide

 41. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  ϢʔβʹfollowͷਪનΛ͍ͨ͠ʂ

  View full-size slide

 42. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  Ϣʔβؒͷfollowؔ܎
  user2vec

  View full-size slide

 43. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  w ڞ௨ͷͭͳ͕Γ͕ଟ͍ਓ͸ࣅͨϕΫτϧʹͳΔ
  w ϕΫτϧ͕ࣅ͍ͯΔϢʔβΛਪન͢Δ

  View full-size slide

 44. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  ϢʔβʹχϡʔεͷਪનΛ͍ͨ͠ʂ

  View full-size slide

 45. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  Ϣʔβͱχϡʔεͷlikeؔ܎
  user2vec & news2vec

  View full-size slide

 46. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  w ҟͳΔछྨͷϊʔυͰ΋௚઀ϕΫτϧΛൺֱͰ͖Δʂ
  w ϢʔβʹʮϕΫτϧ͕ࣅ͍ͯΔهࣄʯΛਪન͢Δ

  View full-size slide

 47. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  Follower͕ಡΜͩχϡʔε͸

  ڵຯ͕͋ΔͷͰ͸ʁ

  View full-size slide

 48. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  Ϣʔβͱχϡʔεͷlikeؔ܎
  user2vec & news2vec
  Ϣʔβؒͷfollowؔ܎

  View full-size slide

 49. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  Ϣʔβ͸followerͷձࣾͷχϡʔεʹ

  ڵຯ͕͋Δͷ͔΋ʁ

  View full-size slide

 50. ©2019 Wantedly, Inc.
  άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ
  Ϣʔβͱχϡʔεͷlikeؔ܎
  user2vec & news2vec

  & company2vec
  Ϣʔβؒͷfollowؔ܎

  اۀͱؔ࿈χϡʔεͷؔ܎
  Ϣʔβͱاۀͷॴଐؔ܎


  View full-size slide

 51. ©2019 Wantedly, Inc.
  ○○2vec֓ཁ
  άϥϑߏ଄Λ࢖ͬͨ○○2vec
  ޮՌతʹ͔ͭ͏ͨΊʹ
  ·ͱΊ
  ΞδΣϯμ

  View full-size slide

 52. ©2019 Wantedly, Inc.
  w ͱʹ͔͘൚༻ੑ͕ߴ͍ɻ͍͍ͩͨͷ͜ͱ͸දݱ͸Ͱ͖Δɻ
  w ༩͑Δάϥϑ͸มΘͬͯ΋ΞϧΰϦζϜ͸ҰॹͰྑ͍
  w ͞·͟·ͳཁҼΛಉ࣌ʹߟֶྀͨ͠श͕Ͱ͖Δ
  w ͭͳ͕Γɺॴଐاۀɺهࣄͱاۀͷؔ܎ੑɺʜ
  w ͍ΖΜͳαϒλεΫʹస༻Ͱ͖Δ○○2vec͕Ͱ͖Δ
  άϥϑߏ଄Λ͔ͭͬͨʓʓ2vec ͷྑ͍ͱ͜Ζ

  View full-size slide

 53. ©2019 Wantedly, Inc.
  • ͭΒ͍͜ͱ΋݁ߏ͋Δ
  w άϥϑͷͲͷ෦෼͕Ͳ͏ޮ͍͍ͯΔͷ͔෼͔ΓͮΒ͍
  w ߋ৽ɺ࠶ֶश͕େมόʔδϣχϯά
  w ʢαϒλεΫͰϦʔΫ͢Δࣄނ͕ൃੜ͠΍͍͢ʣ
  ਖ਼͘͠࢖͏ͨΊʹ

  View full-size slide

 54. ©2019 Wantedly, Inc.
  • શϚγͷڊେάϥϑΛೖྗʹֶश͢Δͱɺ

  ༷ʑͳϊʔυؒͷӨڹΛߟྀͨ͠ਪનΛ࡞Ε͍ͯΔؾ෼ʹͳΕΔ
  • ຊ౰ʹͦͷؔ܎ੑ͸ʮࠓղ͖͍ͨ՝୊ʯʹͱͬͯ༗༻ͳͷ͔ʁ
  • υϝΠϯ஌ࣝΛ׆༻͢Δ + ࣮ݧͯ͠ධՁ͢Δ͔͠ͳ͍
  άϥϑͷͲͷ෦෼͕Ͳ͏ޮ͍͍ͯΔͷ͔෼͔ΓͮΒ͍

  View full-size slide

 55. ©2019 Wantedly, Inc.
  • ৽͍͠ϊʔυɺΤοδ͕௥Ճ͞ΕΔͨͼʹશମΛ࠶ֶश͢Δ
  • ࠩ෼ߋ৽తͳ͜ͱ΋ݚڀ͸͞Ε͍ͯΔ͕େม
  • ࠶ֶशલͱޙͰ͸ಘΒΕΔϕΫτϧ͸׬શʹผ෺ʹͳΔ
  • ಉ͡ Entity Ͱ΋·ͬͨ͘ผ෺ʹͳΔ
  • ؒҧͬͯݹ͍ϕΫτϧͱ৽͍͠ϕΫτϧͰԋࢉ͢ΔͱյΕΔ
  • ʓʓ2vec ͷޙΖʹผͷϞσϧΛͭͳ͍͛ͯΔ৔߹͸

  ͦͷϞσϧ΋ಉ࣌ʹ࠶ֶश͢Δඞཁ͕͋Δ
  ߋ৽ɺ࠶ֶश͕ͭΒ͍όʔδϣχϯά

  View full-size slide

 56. ©2019 Wantedly, Inc.
  ΑΓޮՌతʹ࢖͏ͨΊʹ
  • ߴ଎ʹ࡞Εͯͦͦ͜͜༏लͳϕʔεϥΠϯͱͯ͠࢖͏
  • ݶΒΕͨϦιʔεͰ੒ՌΛ࠷େԽ͢Δ
  • ͻͱཻͰ N ౓͓͍͍͠ঢ়ଶΛ࡞Δ
  • σʔλΛάϥϑతʹѻ͍΍͍͢Α͏ʹͯ͠ੵΈॏͶΔ
  • ಉ͡ Entity ͕ʮಉ͡ʯͰ͋Δͱѻ͑ΔΑ͏ʹ͓ͯ͘͠
  • ͦͷυϝΠϯʹ͓͚Δ Knowledge Base Λ࡞Δ
  • λεΫʹΑͬͯద੾ͳαϒάϥϑΛఆٛ͢Δ
  • ҋӢͳϞσϧڞ༗ / άϥϑڞ༗͸͠ͳ͍

  View full-size slide

 57. ©2019 Wantedly, Inc.
  w ൚༻ੑ͕ͱͯ΋ߴ͍
  w ͞·͟·ͳαϒλεΫʹస༻͠΍͍͢
  w ߴ଎ʹ༏लͳϕʔεϥΠϯϞσϧΛ࡞ΕΔ
  ·ͱΊ
  άϥϑߏ଄Λ࢖ͬͨ ○○2vec ͸ͱͯ΋ศར
  ○○2vec Ͱෳࡶͳ΋ͷΛѻ͍΍͘͢͠Α͏
  w ୯ޠͷҙຯ ඃࣸମͷ৘ใ ʜ
  w ϕΫτϧʹ͢Ε͹ػցతʹѻ͍΍͍͢ʂ

  View full-size slide

 58. ©2019 Wantedly, Inc.
  ࢀߟจݙ
  - Efficient Estimation of Word Representations in Vector Space
  - Tomas Mikolov, et al., 2013, ICLR
  - https://arxiv.org/abs/1301.3781
  - FaceNet: A Unified Embedding for Face Recognition and Clustering
  - Florian Schroff, et al., 2015, CVPR
  - https://arxiv.org/abs/1503.03832
  - PyTorch-BigGraph: A Large-scale Graph Embedding Framework
  - Adam Lerer, et al., 2019, SysML
  - https://arxiv.org/abs/1903.12287
  - StarSpace: Embed All The Things!
  - Ledell Wu, et al.
  - https://arxiv.org/abs/1709.03856

  View full-size slide