Upgrade to Pro — share decks privately, control downloads, hide ads and more …

トピックモデルによる分散表現獲得手法の提案

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

 トピックモデルによる分散表現獲得手法の提案

2016年の言語処理学会の発表スライド

Avatar for Kento Nozawa

Kento Nozawa

March 09, 2016
Tweet

More Decks by Kento Nozawa

Other Decks in Research

Transcript

  1. ୯ޠͷ෼ࢄදݱ • ୯ޠΛີͳϕΫτϧۭؒͷ1఺ͱͯ͠දݱ • ୯ޠؒͷԋࢉ͕Մೳʢking - man + woman =

    queenʣ ࣍ݩΛಛ௃෇͚Δ୯ޠΛ֫ಘͰ͖Ε͹දݱ͸ΑΓϦονʹ 2 ਤ1ɿ2࣍ݩۭؒʹ͓͚Δ෼ࢄදݱͷྫ
  2. ୯ޠλΠϓΛڞى͢Δ୯ޠͷଟॏू߹Ͱදݱ 1୯ޠλΠϓΛͦͷલޙn୯ޠ͔ΒͳΔଟॏू߹Ͱදݱ จॻ1: ੨৿ ͷ Ξοϓϧ δϡʔε ͸ ͓͍͍͠ จॻ2:

    ΦϥΫϧ ͱ Ξοϓϧ ͸ ΞϝϦΧ ͷ اۀ ͩ จॻ3:[ࢁས ͷ ͿͲ͏ δϡʔε ͸ ҿΈ ΍͍͢] ↓ Ξοϓϧ = [੨৿, ͷ, δϡʔε, ͸, ΦϥΫϧ, ͱ, ͸, ΞϝϦΧ] ྫɿલޙ2୯ޠͷ৔߹ 6
  3. ୯ޠλΠϓΛڞى͢Δ୯ޠͷଟॏू߹Ͱදݱ ෼෍Ծઆ͔Βࣅͨ୯ޠ͸ࣅͨଟॏू߹ʹͳΓ΍͍͢ จॻ1: ੨৿ ͷ Ξοϓϧ δϡʔε ͸ ͓͍͍͠ จॻ2:

    ΦϥΫϧ ͱ Ξοϓϧ ͸ ΞϝϦΧ ͷ اۀ ͩ จॻ3:[ࢁས ͷ ͿͲ͏ δϡʔε ͸ ҿΈ ΍͍͢] ↓ Ξοϓϧ = [੨৿, ͷ, δϡʔε, ͸, ΦϥΫϧ, ͱ, ͸, ΞϝϦΧ] ͿͲ͏ = [ࢁས, ͷ,δϡʔε, ͸] ྫɿલޙ2୯ޠͷ৔߹ 7
  4. Latent Dirichlet Allocation • จॻͷ֬཰తੜ੒Ϟσϧ [Blei+, 2003] • จॻ಺ͷ୯ޠͷڞى৘ใ͔ΒτϐοΫΛൃݟ •

    τϐοΫͰ৚͚݅ͮΒΕΔશ୯ޠͷ֬཰෼෍ • จॻͷ୯ޠͰ৚͚݅ͮΒΕΔτϐοΫͷ֬཰෼෍ 8 ػցֶश ػցֶशͱ͸ɺਓ ޻஌ೳʹ͓͚Δݚ ڀ՝୊ͷҰͭͰɺ ਓ͕ؒࣗવʹߦͬ ͍ͯΔֶशೳྗͱ ಉ༷ͷػೳΛίϯ ϐϡʔλͰ࣮ݱʜ ݚڀ ՝୊ ஌ࣝ Պֶऀ ʜ ػցֶश ਓ޻஌ೳ Ϟσϧ αϯϓϧ ʜ จॻσʔλ τϐοΫ      τϐοΫͷ֬཰
  5. LDAͷֶशʢτϐοΫ਺3ͷ৔߹ʣ ଟॏू߹Ͱදݱͨ͠1୯ޠΛ1จॻͱΈͳ͠ɼLDAΛֶश • ྫɿΞοϓϧ = [੨৿, ͷ, δϡʔε, ͸, ΦϥΫϧ,

    ͱ, ͸, ΞϝϦΧ] 1. ୯ޠʢଟॏू߹ʣ͝ͱͷτϐοΫͷଟ߲෼෍ • ֬཰෼෍ͱϕΫτϧͱͯ͠ѻ͏ ྫɿΞοϓϧ = [0.3, 0.6, 0.1], ΦϥΫϧ = [0.6, 0.1, 0.3], ͿͲ͏ = [0.1, 0.7, 0.2] 2. τϐοΫ͝ͱͷ୯ޠͷଟ߲෼෍ ྫɿτϐοΫ1 = [ιϑτ΢ΣΞ:0.3, PC:0.2, ϓϩάϥϜ:0.1, …] τϐοΫ2 = [δϡʔε:0.4,ύΠ:0.2, ϑϧʔπ:0.1, …] τϐοΫ3 = [͕:0.2, ͷ:0.2, ʹ:0.1, …] 9
  6. ࣍ݩͷղऍ • ୯ޠͷτϐοΫ෼෍ͷ஋͕ߴ͍΄ͲɼͦͷτϐοΫΛ΋ͭ ྫɿΞοϓϧ = [0.3, 0.6, 0.1] τϐοΫͷ୯ޠ෼෍ͷ஋͕ߴ͍΄ͲɼτϐοΫΛಛ௃෇͚Δ୯ޠ τϐοΫ1

    = [ιϑτ΢ΣΞ:0.3, PC:0.2, ϓϩάϥϜ:0.1, …] τϐοΫ2 = [δϡʔε:0.4, ύΠ:0.2, ϑϧʔπ:0.1, …] τϐοΫ3 = [͕:0.2, ͷ:0.2, ʹ:0.1, …] Ξοϓϧ͸δϡʔε, ύΠ, ϑϧʔπͳͲͷ୯ޠͰಛ௃෇͚ΒΕΔ ͿͲ͏ = [0.1, 0.7, 0.2]΋ಉ༷ͷτϐοΫͰಛ௃෇͚ΒΕΔ ͿͲ͏͸ɼτϐοΫ2ͷҙຯͰΞοϓϧͱྨࣅ 10
  7. ෼ࢄදݱͷධՁํ๏ ධՁํ๏ • word similarityɿॱҐ૬ؔ܎਺ʹΑΔධՁ • apple tree 0.2 •

    apple orange 0.7 • analogyɿਖ਼ղ཰ʹΑΔධՁ • man king woman queen ൺֱख๏ CBoWͱSkip–gram [Mikolov+, 2013] 15
  8. ൺֱ࣮ݧͷ݁Ռ • word similarityͰ͸໿0.2~0.3ͷࠩ • analogyͰ͸໿0.4~0.5ͷࠩ cos: ίαΠϯྨࣅ౓ js: δΣϯηϯγϟϊϯμΠόʔδΣϯε

    16 XPSETJNJMBSJUZ BOBMPHZ EBUBTFU 84 844 843 .&/ .5 38 (PPHMF .43 $#P8         4LJQrHSBN         ఏҊ๏ DPT         ఏҊ๏ KT         ද1ɿൺֱ࣮ݧͷ݁Ռ
  9. τϐοΫͷղऍɿpython • pythonͷτϐοΫͷ֬཰஋θͷߴ্͍Ґ3τϐοΫʹ஫໨ • ͦΕͧΕͷτϐοΫͷ͏ͪ֬཰஋ͷߴ্͍Ґ10୯ޠ • 880ɿऄͱʮMonty Pythonʯ • 145ɿϓϩάϥϛϯά

    • 732ɿߴස౓ޠ 17 UPQJD*%    В    Ћ     DJSDVT BSDIJWF UIF  TOBLF TPGUXBSF J  NPOUZ XFC JU  DPCSB QSPHSBNNJOH ZPV  TLFUDI EBUBCBTF CF  QBMJO CBTFE IBWF  MJ[BSE WJEFP B  FWFOJOH TFSWFS CVU  HSBJM MJOVY JG  WJQFS JOUFSGBDF DBO
  10. τϐοΫͷղऍɿbow • bowͷτϐοΫͰα<0.2͔ͭ֬཰஋θͷߴ্͍Ґ3τϐοΫʹ஫໨ • ͦΕͧΕͷτϐοΫͷ͏ͪ֬཰஋ͷߴ্͍Ґ10୯ޠ • 389ɿધɼ૷උͳͲ • 547ɿધ •

    919ɿҥ෰΍ମͷ෦Ґ 18 UPQJD*%    В    Ћ     BJSDSBGU TIJQ GBDF  TQFFE TIJQT XPSO  XFJHIU NFSDIBOU DBQ  HVO QBUSPM TIPFT  CVJMU CPBUT XFBST  TIJQ OBWBM UJF  NBDIJOF WFTTFMT XPSF  QPXFS DSFX XFBS  ESJWF DBSHP TIJSU  TUFBN WFTTFM TIPVMEFS