○○2vec 再考

○○2vec 再考

builderscon 2019 「○○2vec 再考」の発表資料です

A54cce9a3933a049ca19477b7382cdde?s=128

Agata Naomichi

August 31, 2019
Tweet

Transcript

  1. 1.

    ©2019 Wantedly, Inc. ʓʓ2vec ࠶ߟ Builderscon 2019 Aug 31, 2019

    - Naomichi Agata Photo by Mika Baumeister on Unsplash
  2. 2.

    ©2019 Wantedly, Inc. Naomichi Agata @agatan_ ࣗݾ঺հ - 2018/04~ Wantedly,

    Inc. - Wantedly People ͷػցֶशΤϯδχΞ - Data Driven Developer Meetup ӡӦϝϯόʔ - https://d3m.connpass.com
  3. 7.

    ©2019 Wantedly, Inc.  ◦◦2vec ֓ཁ w ◦◦2vecͱ͸ w ͳʹ͕͏Ε͍͠ͷ͔

    w ݁ہͲΜͳΞϓϦέʔγϣϯʹ͔ͭ͑Δͷ͔ ΞδΣϯμ
  4. 8.
  5. 9.

    ©2019 Wantedly, Inc. ◦◦2vec ͓͍͍͠ ͏·͍ ΤϯδχΞ Word2vec y x

    word2vec: ୯ޠΛ௿࣍ݩͷϕΫτϧʹม׵͢Δ ͓͍͍͠
   ͏·͍   ΤϯδχΞ  
  6. 10.

    ©2019 Wantedly, Inc. ◦◦2vec ͱ͸ word2vec͸
 ʮ୯ޠʯ Λ ʮҙຯ͕ࣅͨ୯ޠ͸ࣅͨϕΫτϧʹͳΔʯ Λຬͨ͢ϕΫτϧʹม׵͢Δ

    Paris Tokyo France word2vec ( 0.1, 0.3, -0.2, …) (-0.7, -0.6, 0.9, …) ( 0.8, 0.4, -0.7, …)
  7. 11.

    ©2019 Wantedly, Inc. ◦◦2vec ͱ͸ Entity 1 Entity 2 Entity

    3 … ◦◦2vec ( 0.1, 0.3, -0.2, …) (-0.7, -0.6, 0.9, …) ( 0.8, 0.4, -0.7, …) … ◦◦2vec͸
 ʮ◦◦ʯ Λ ʮ͋Δੑ࣭ʯ Λຬͨ͢ϕΫτϧʹม׵͢Δ
  8. 14.

    ©2019 Wantedly, Inc. w ෳࡶͳੑ࣭΍ؔ܎ੑ͕ѻ͍΍͘͢ͳΔ w l୯ޠͷҙຯzͷΑ͏ͳෳࡶͰந৅తͳ΋ͷ͸ѻ͍ͮΒ͍ w lϕΫτϧz͸ͨͩͷ਺ࣈͳͷͰ؆୯ʂ w

    Entityͷੑ࣭ɺEntityؒͷؔ܎ੑ͕਺ࣈͰදݱͰ͖Δ w ϕΫτϧʹ͸Entityͷੑ࣭͕ຒΊࠐ·Ε͍ͯΔʢ͸ͣʣ w ϕΫτϧಉ࢜ͷԋࢉͰEntityؒͷؔ܎ੑ͕දͤΔ w ࣅ͍ͯΔॱʹ୯ޠΛฒͼସ͑Δͱ͍ͬͨૢ࡞͕ՄೳʹͳΔ ◦◦2vec֓ཁ ͳʹ͕͏Ε͍͠ͷ͔
  9. 17.

    ©2019 Wantedly, Inc. w ϕΫτϧಉ࢜ͷൺֱͰʮඃࣸମͷಉҰੑʯΛ൑ఆͰ͖Δ w ը૾Λ௚઀ൺֱ͢Δͷͱൺ΂ͯʜ w ܭࢉ͕ߴ଎ʢ಺ੵऔΔ͚ͩʣ w

    ࡱӨ؀ڥͷҧ͍ͳͲΛٵऩͰ͖Δ إը૾ݕࡧ CNN ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ Index ྨࣅϕΫτϧ Λݕࡧ
  10. 18.

    ©2019 Wantedly, Inc. w ◦◦2vec ͱผͷΞϧΰϦζϜΛ૊Έ߹Θͤͯ࢖͏͜ͱ΋Α͋͘Δ w ྫDoc2vec + NNͰจॻͷΧςΰϦ෼ྨ

    จॻ෼ྨ Doc2vec ( 0.1, 0.3, -0.2, …) “builderscon͸
 ʮ஌Βͳ͔ͬͨɺΛฉ͘ʯΛ
 ςʔϚͱٕͨ͠ज़ΛѪ͢Δ
 શͯͷΪʔΫୡͷ͓ࡇΓͰ͢ɻ” Neural Network ΧςΰϦ༧ଌ
  11. 30.

    ©2019 Wantedly, Inc. άϥϑߏ଄Λ͔ͭͬͨ ◦◦2vec ◦◦2vec ( 0.1, 0.3, -0.2,

    …) (-0.7, -0.6, 0.9, …) ( 0.8, 0.4, -0.7, …) … 
 ʮάϥϑதͷ֤ϊʔυʯ Λ ʮྡΓ߹ͬͨϊʔυ͕ࣅͨϕΫτϧʹͳΔʯ Λຬͨ͢ϕΫτϧʹม׵͢Δ
  12. 50.

    ©2019 Wantedly, Inc. άϥϑߏ଄Λ͔ͭͬͨ ʓʓ2vec ͷΞϓϦέʔγϣϯ Ϣʔβͱχϡʔεͷlikeؔ܎ user2vec & news2vec


    & company2vec Ϣʔβؒͷfollowؔ܎ اۀͱؔ࿈χϡʔεͷؔ܎ Ϣʔβͱاۀͷॴଐؔ܎
  13. 52.

    ©2019 Wantedly, Inc. w ͱʹ͔͘൚༻ੑ͕ߴ͍ɻ͍͍ͩͨͷ͜ͱ͸දݱ͸Ͱ͖Δɻ w ༩͑Δάϥϑ͸มΘͬͯ΋ΞϧΰϦζϜ͸ҰॹͰྑ͍ w ͞·͟·ͳཁҼΛಉ࣌ʹߟֶྀͨ͠श͕Ͱ͖Δ w

    ͭͳ͕Γɺॴଐاۀɺهࣄͱاۀͷؔ܎ੑɺʜ w ͍ΖΜͳαϒλεΫʹస༻Ͱ͖Δ◦◦2vec͕Ͱ͖Δ άϥϑߏ଄Λ͔ͭͬͨʓʓ2vec ͷྑ͍ͱ͜Ζ
  14. 55.

    ©2019 Wantedly, Inc. • ৽͍͠ϊʔυɺΤοδ͕௥Ճ͞ΕΔͨͼʹશମΛ࠶ֶश͢Δ • ࠩ෼ߋ৽తͳ͜ͱ΋ݚڀ͸͞Ε͍ͯΔ͕େม • ࠶ֶशલͱޙͰ͸ಘΒΕΔϕΫτϧ͸׬શʹผ෺ʹͳΔ •

    ಉ͡ Entity Ͱ΋·ͬͨ͘ผ෺ʹͳΔ • ؒҧͬͯݹ͍ϕΫτϧͱ৽͍͠ϕΫτϧͰԋࢉ͢ΔͱյΕΔ • ʓʓ2vec ͷޙΖʹผͷϞσϧΛͭͳ͍͛ͯΔ৔߹͸
 ͦͷϞσϧ΋ಉ࣌ʹ࠶ֶश͢Δඞཁ͕͋Δ ߋ৽ɺ࠶ֶश͕ͭΒ͍όʔδϣχϯά
  15. 56.

    ©2019 Wantedly, Inc. ΑΓޮՌతʹ࢖͏ͨΊʹ • ߴ଎ʹ࡞Εͯͦͦ͜͜༏लͳϕʔεϥΠϯͱͯ͠࢖͏ • ݶΒΕͨϦιʔεͰ੒ՌΛ࠷େԽ͢Δ • ͻͱཻͰ

    N ౓͓͍͍͠ঢ়ଶΛ࡞Δ • σʔλΛάϥϑతʹѻ͍΍͍͢Α͏ʹͯ͠ੵΈॏͶΔ • ಉ͡ Entity ͕ʮಉ͡ʯͰ͋Δͱѻ͑ΔΑ͏ʹ͓ͯ͘͠ • ͦͷυϝΠϯʹ͓͚Δ Knowledge Base Λ࡞Δ • λεΫʹΑͬͯద੾ͳαϒάϥϑΛఆٛ͢Δ • ҋӢͳϞσϧڞ༗ / άϥϑڞ༗͸͠ͳ͍
  16. 57.

    ©2019 Wantedly, Inc. w ൚༻ੑ͕ͱͯ΋ߴ͍ w ͞·͟·ͳαϒλεΫʹస༻͠΍͍͢ w ߴ଎ʹ༏लͳϕʔεϥΠϯϞσϧΛ࡞ΕΔ ·ͱΊ

    άϥϑߏ଄Λ࢖ͬͨ ◦◦2vec ͸ͱͯ΋ศར ◦◦2vec Ͱෳࡶͳ΋ͷΛѻ͍΍͘͢͠Α͏ w ୯ޠͷҙຯ ඃࣸମͷ৘ใ ʜ w ϕΫτϧʹ͢Ε͹ػցతʹѻ͍΍͍͢ʂ
  17. 58.

    ©2019 Wantedly, Inc. ࢀߟจݙ - Efficient Estimation of Word Representations

    in Vector Space - Tomas Mikolov, et al., 2013, ICLR - https://arxiv.org/abs/1301.3781 - FaceNet: A Unified Embedding for Face Recognition and Clustering - Florian Schroff, et al., 2015, CVPR - https://arxiv.org/abs/1503.03832 - PyTorch-BigGraph: A Large-scale Graph Embedding Framework - Adam Lerer, et al., 2019, SysML - https://arxiv.org/abs/1903.12287 - StarSpace: Embed All The Things! - Ledell Wu, et al. - https://arxiv.org/abs/1709.03856