Upgrade to Pro — share decks privately, control downloads, hide ads and more …

トピックモデルによる分散表現獲得手法の提案

 トピックモデルによる分散表現獲得手法の提案

2016年の言語処理学会の発表スライド

Kento Nozawa

March 09, 2016
Tweet

More Decks by Kento Nozawa

Other Decks in Research

Transcript

  1. ୯ޠͷ෼ࢄදݱ • ୯ޠΛີͳϕΫτϧۭؒͷ1఺ͱͯ͠දݱ • ୯ޠؒͷԋࢉ͕Մೳʢking - man + woman =

    queenʣ ࣍ݩΛಛ௃෇͚Δ୯ޠΛ֫ಘͰ͖Ε͹දݱ͸ΑΓϦονʹ 2 ਤ1ɿ2࣍ݩۭؒʹ͓͚Δ෼ࢄදݱͷྫ
  2. ୯ޠλΠϓΛڞى͢Δ୯ޠͷଟॏू߹Ͱදݱ 1୯ޠλΠϓΛͦͷલޙn୯ޠ͔ΒͳΔଟॏू߹Ͱදݱ จॻ1: ੨৿ ͷ Ξοϓϧ δϡʔε ͸ ͓͍͍͠ จॻ2:

    ΦϥΫϧ ͱ Ξοϓϧ ͸ ΞϝϦΧ ͷ اۀ ͩ จॻ3:[ࢁས ͷ ͿͲ͏ δϡʔε ͸ ҿΈ ΍͍͢] ↓ Ξοϓϧ = [੨৿, ͷ, δϡʔε, ͸, ΦϥΫϧ, ͱ, ͸, ΞϝϦΧ] ྫɿલޙ2୯ޠͷ৔߹ 6
  3. ୯ޠλΠϓΛڞى͢Δ୯ޠͷଟॏू߹Ͱදݱ ෼෍Ծઆ͔Βࣅͨ୯ޠ͸ࣅͨଟॏू߹ʹͳΓ΍͍͢ จॻ1: ੨৿ ͷ Ξοϓϧ δϡʔε ͸ ͓͍͍͠ จॻ2:

    ΦϥΫϧ ͱ Ξοϓϧ ͸ ΞϝϦΧ ͷ اۀ ͩ จॻ3:[ࢁས ͷ ͿͲ͏ δϡʔε ͸ ҿΈ ΍͍͢] ↓ Ξοϓϧ = [੨৿, ͷ, δϡʔε, ͸, ΦϥΫϧ, ͱ, ͸, ΞϝϦΧ] ͿͲ͏ = [ࢁས, ͷ,δϡʔε, ͸] ྫɿલޙ2୯ޠͷ৔߹ 7
  4. Latent Dirichlet Allocation • จॻͷ֬཰తੜ੒Ϟσϧ [Blei+, 2003] • จॻ಺ͷ୯ޠͷڞى৘ใ͔ΒτϐοΫΛൃݟ •

    τϐοΫͰ৚͚݅ͮΒΕΔશ୯ޠͷ֬཰෼෍ • จॻͷ୯ޠͰ৚͚݅ͮΒΕΔτϐοΫͷ֬཰෼෍ 8 ػցֶश ػցֶशͱ͸ɺਓ ޻஌ೳʹ͓͚Δݚ ڀ՝୊ͷҰͭͰɺ ਓ͕ؒࣗવʹߦͬ ͍ͯΔֶशೳྗͱ ಉ༷ͷػೳΛίϯ ϐϡʔλͰ࣮ݱʜ ݚڀ ՝୊ ஌ࣝ Պֶऀ ʜ ػցֶश ਓ޻஌ೳ Ϟσϧ αϯϓϧ ʜ จॻσʔλ τϐοΫ      τϐοΫͷ֬཰
  5. LDAͷֶशʢτϐοΫ਺3ͷ৔߹ʣ ଟॏू߹Ͱදݱͨ͠1୯ޠΛ1จॻͱΈͳ͠ɼLDAΛֶश • ྫɿΞοϓϧ = [੨৿, ͷ, δϡʔε, ͸, ΦϥΫϧ,

    ͱ, ͸, ΞϝϦΧ] 1. ୯ޠʢଟॏू߹ʣ͝ͱͷτϐοΫͷଟ߲෼෍ • ֬཰෼෍ͱϕΫτϧͱͯ͠ѻ͏ ྫɿΞοϓϧ = [0.3, 0.6, 0.1], ΦϥΫϧ = [0.6, 0.1, 0.3], ͿͲ͏ = [0.1, 0.7, 0.2] 2. τϐοΫ͝ͱͷ୯ޠͷଟ߲෼෍ ྫɿτϐοΫ1 = [ιϑτ΢ΣΞ:0.3, PC:0.2, ϓϩάϥϜ:0.1, …] τϐοΫ2 = [δϡʔε:0.4,ύΠ:0.2, ϑϧʔπ:0.1, …] τϐοΫ3 = [͕:0.2, ͷ:0.2, ʹ:0.1, …] 9
  6. ࣍ݩͷղऍ • ୯ޠͷτϐοΫ෼෍ͷ஋͕ߴ͍΄ͲɼͦͷτϐοΫΛ΋ͭ ྫɿΞοϓϧ = [0.3, 0.6, 0.1] τϐοΫͷ୯ޠ෼෍ͷ஋͕ߴ͍΄ͲɼτϐοΫΛಛ௃෇͚Δ୯ޠ τϐοΫ1

    = [ιϑτ΢ΣΞ:0.3, PC:0.2, ϓϩάϥϜ:0.1, …] τϐοΫ2 = [δϡʔε:0.4, ύΠ:0.2, ϑϧʔπ:0.1, …] τϐοΫ3 = [͕:0.2, ͷ:0.2, ʹ:0.1, …] Ξοϓϧ͸δϡʔε, ύΠ, ϑϧʔπͳͲͷ୯ޠͰಛ௃෇͚ΒΕΔ ͿͲ͏ = [0.1, 0.7, 0.2]΋ಉ༷ͷτϐοΫͰಛ௃෇͚ΒΕΔ ͿͲ͏͸ɼτϐοΫ2ͷҙຯͰΞοϓϧͱྨࣅ 10
  7. ෼ࢄදݱͷධՁํ๏ ධՁํ๏ • word similarityɿॱҐ૬ؔ܎਺ʹΑΔධՁ • apple tree 0.2 •

    apple orange 0.7 • analogyɿਖ਼ղ཰ʹΑΔධՁ • man king woman queen ൺֱख๏ CBoWͱSkip–gram [Mikolov+, 2013] 15
  8. ൺֱ࣮ݧͷ݁Ռ • word similarityͰ͸໿0.2~0.3ͷࠩ • analogyͰ͸໿0.4~0.5ͷࠩ cos: ίαΠϯྨࣅ౓ js: δΣϯηϯγϟϊϯμΠόʔδΣϯε

    16 XPSETJNJMBSJUZ BOBMPHZ EBUBTFU 84 844 843 .&/ .5 38 (PPHMF .43 $#P8         4LJQrHSBN         ఏҊ๏ DPT         ఏҊ๏ KT         ද1ɿൺֱ࣮ݧͷ݁Ռ
  9. τϐοΫͷղऍɿpython • pythonͷτϐοΫͷ֬཰஋θͷߴ্͍Ґ3τϐοΫʹ஫໨ • ͦΕͧΕͷτϐοΫͷ͏ͪ֬཰஋ͷߴ্͍Ґ10୯ޠ • 880ɿऄͱʮMonty Pythonʯ • 145ɿϓϩάϥϛϯά

    • 732ɿߴස౓ޠ 17 UPQJD*%    В    Ћ     DJSDVT BSDIJWF UIF  TOBLF TPGUXBSF J  NPOUZ XFC JU  DPCSB QSPHSBNNJOH ZPV  TLFUDI EBUBCBTF CF  QBMJO CBTFE IBWF  MJ[BSE WJEFP B  FWFOJOH TFSWFS CVU  HSBJM MJOVY JG  WJQFS JOUFSGBDF DBO
  10. τϐοΫͷղऍɿbow • bowͷτϐοΫͰα<0.2͔ͭ֬཰஋θͷߴ্͍Ґ3τϐοΫʹ஫໨ • ͦΕͧΕͷτϐοΫͷ͏ͪ֬཰஋ͷߴ্͍Ґ10୯ޠ • 389ɿધɼ૷උͳͲ • 547ɿધ •

    919ɿҥ෰΍ମͷ෦Ґ 18 UPQJD*%    В    Ћ     BJSDSBGU TIJQ GBDF  TQFFE TIJQT XPSO  XFJHIU NFSDIBOU DBQ  HVO QBUSPM TIPFT  CVJMU CPBUT XFBST  TIJQ OBWBM UJF  NBDIJOF WFTTFMT XPSF  QPXFS DSFX XFBS  ESJWF DBSHP TIJSU  TUFBN WFTTFM TIPVMEFS