Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient Estimation of Word Representations in...

Efficient Estimation of Word Representations in Vector Space

深層学習勉強会で発表した

Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space, ICLR 2013

説明資料です。

Mamoru Komachi

December 09, 2014
Tweet

More Decks by Mamoru Komachi

Other Decks in Research

Transcript

  1. Efficient Estimation of Word Representations in Vector Space Tomas Mikolov,

    Kai Chen, Greg Corrado and Jeffrey Dean, ICLR 2013 ※εϥΠυதͷਤද͸શͯ࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक [email protected] Deep Learning ษڧձ@ट౎େֶ౦ژ 2014/12/01
  2. 100ສޠኮ͋Δେن໛σʔλ͔ Βޮ཰తʹ୯ޠϕΫτϧΛֶश | word2vec ͱͯ͠ެ։͞Ε͍ͯΔख๏ | ෳ਺ͷछྨͷྨࣅ౓ʢmultiple degrees of similarityʣʹରԠ͢ΔϕΫτϧදݱ

    { ҙຯͷྨࣅʢҙຯతͳؔ܎ʣ { ୯ޠͷ຤ඌͷྨࣅʢ౷ޠతͳؔ܎ʣ | ୯ޠؒͷઢܗੑΛอͭΑ͏ͳϕΫτϧૢ࡞ { vector(“King”) – vector(“Man”) + vector(“Woman”) = vector(“Queen”) 3
  3. ୯ޠΛ࿈ଓϕΫτϧͱͯ͠දݱ ͢Δઌߦݚڀ: NNLM | χϡʔϥϧωοτϫʔΫݴޠϞσϧʢNNLMʣ (Bengio et al., JMLR 2003)

    ϑΟʔυϑΥϫʔυNNΛઢܗࣹӨ૚ͱඇઢܗӅ Ε૚ͱ૊Έ߹Θͤɺ୯ޠϕΫτϧදݱͱ౷ܭత ݴޠϞσϧΛಉ࣌ʹֶश | NNLMͷѥछ (Mikolov et al., ICASSP 2009) 1ͭͷӅΕ૚ͷNNͰ୯ޠϕΫτϧΛֶश͠ɺֶ शͨ͠୯ޠϕΫτϧͰNNLMΛ܇࿅͢Δʢಉ࣌ ʹֶशͤͣɺ෼ֶ͚ͯश͢Δʣ →ຊݚڀ͸ɺͪ͜ΒͷΞϓϩʔνͰɺ୯ޠϕΫ τϧͷγϯϓϧͳֶशํ๏ΛఏҊ 4
  4. ୯ޠͷ෼ࢄදݱΛֶश͢ ΔNNLMख๏ͱͷൺֱ | ୯ޠͷ࿈ଓϕΫτϧΛֶश͢Δͷ͸ NNLM Ҏ֎ ʹ΋ LSA ΍ LDA

    ͕ߟ͑ΒΕΔ͕ɺઌߦݚڀͰ NNLM ͷ΄͏͕ LSA/LDA ΑΓΑ͍͜ͱ͕ࣔ͞ Ε͍ͯΔͷͰɺຊݚڀ͸ NNLM ͷΈൺֱ { LDA ͸ʢφΠʔϒʹ͸ʣେن໛σʔλʹద༻͢ Δ͜ͱ͕Ͱ͖ͳ͍ | Ϟσϧͷύϥϝʔλ O = E º T º Q { E: ܇࿅ͷεςοϓ਺ʢ3ʙ50ʣ { T: ܇࿅σʔλͷԆ΂୯ޠ਺ʢʙ1ԯޠʣ { Q: ֤Ϟσϧʹ͓͚Δύϥϝʔλ 5
  5. ϑΟʔυϑΥϫʔυNNLM ʹ͓͚Δύϥϝʔλͱܭࢉྔ | Bengio et al. (JMLR 2003) { ೖྗ૚:

    N ݸલʢͨͱ͑͹N=10ʣ·Ͱͷ୯ޠʢ1- of-Vදݱ; V=ޠኮʣ { ࣹӨ૚: P; NºD࣍ݩʢ500ʙ2000࣍ݩʣͷڞ༗ࣹ Өߦྻ { ӅΕ૚: Hʢ500ʙ1000࣍ݩʣ { ग़ྗ૚: V࣍ݩ | ֤܇࿅ࣄྫ͋ͨΓͷܭࢉྔ Q = N º D + N º D º H + H º V →V Λ2෼໦Ͱදݱ͢Ε͹͜ͷ෦෼͸ log(V) →ܭࢉྔͷϘτϧωοΫ͸ N º D º H ͷ෦෼ 6
  6. ϑΟʔυϑΥϫʔυNNLM ͷܭࢉྔ࡟ݮํ๏ | ֤܇࿅ࣄྫ͋ͨΓͷܭࢉྔ Q = N º D +

    N º D º H + H º V →V Λ2෼໦Ͱදݱ͢Ε͹͜ͷ෦෼͸ log(V) →ܭࢉྔͷϘτϧωοΫ͸ N º D º H ͷ෦෼ | ߴ଎Խͷख๏ { softmaxؔ਺Λ֊૚Խ { Ϟσϧͷਖ਼نԽΛ͠ͳ͍ | Huffman໦Λߏங͢Δ͜ͱʹΑΓɺԼઢ෦͸ log(Unigram_perplexity(V)) →100ສޠኮͷ৔߹ɺ଎౓͸2ഒ 7
  7. աڈͷཤྺΛߟྀͰ͖ΔRNNLM | ճؼχϡʔϥϧωοτϫʔΫʹجͮ͘ݴޠϞσ ϧʢrecurrent neural net language modelʣ { ೖྗ૚

    { ࣹӨ૚: ͳ͠ { ӅΕ૚: ࣌ؒ஗Ԇͷ͋Δ઀ଓʹΑΔճؼߦྻΛ ͍࣋ͬͯΔ →୹ظهԱɻҎલͷঢ়ଶͰݱࡏͷঢ়ଶΛߋ৽ { ग़ྗ૚ | RNNϞσϧͷܭࢉྔ Q = H º H + H º V →V͸2෼໦Ͱߴ଎ԽͰ͖ΔͷͰɺϘτϧωοΫ ͸ H º H 8
  8. χϡʔϥϧωοτϫʔΫ͸ฒྻ ෼ࢄॲཧΛ༻͍ͯ܇࿅Ͱ͖Δ | DistBelief (Dean et al., NIPS 2012) {

    େن໛෼ࢄϑϨʔϜϫʔΫ { ಉ͡Ϟσϧͷෳ੡Λฒྻʹ࣮ߦ͠ɺύϥϝʔλߋ ৽͸தԝͷαʔόͰಉظ͢Δɻ { AdaGradΛ༻͍ͨϛχόονඇಉظ gradient descent { 100Ҏ্ͷෳ੡Λ࡞Δͷ͕ී௨ | NNLM ΍ word2vec ͸͜ΕΒΛ༻͍ͯ܇࿅ͨ͠ 9
  9. Continuous Skip-gram Ϟσϧ | CBOWʹࣅ͍ͯΔ͕ɺจ຺͔Βݱࡏͷ୯ޠΛ༧ ଌ͢ΔͷͰ͸ͳ͘ɺݱࡏͷ୯ޠ͔Βपลͷ୯ޠ Λ༧ଌ͢Δʢ෼ྨਫ਼౓Λ࠷େԽ͢ΔʣϞσϧ →͜Ε΋ “Deep” Ͱ͸ͳ͍

    | จ຺௕ʢCʣΛ޿͛Δͱ୯ޠϕΫ τϧͷ࣭͸Α͘ͳΔ͕ɺܭࢉྔ͕ େ͖͘ͳΔ͠ɺ཭ΕΕ͹཭ΕΔ΄ Ͳݱࡏͷ୯ޠͱແؔ܎ʹͳΔͷͰɺ ڑ཭ʹԠͯ͡μ΢ϯαϯϓϦϯά →ଟগҙຯɾ౷ޠత৘ใΛߟྀʁ | ܭࢉྔ Q = C º (D + D º log(V))13
  10. ·ͱΊ୯ޠϕΫτϧͷ৽͍͠ ࡞Γํˠ$#08 ͱ 4LJQάϥ Ϝ | CBOW ͱ Skip άϥϜͱ͍͏2ͭͷχϡʔϥϧ

    ωοτϫʔΫݴޠϞσϧΛఏҊͨ͠ { طଘͷχϡʔϥϧωοτϫʔΫݴޠϞσϧΑΓ ؆୯ͳΞʔΩςΫνϟ { ฒྻ෼ࢄܭࢉʹΑΓ௒େن໛ίʔύεͰ΋ܭࢉ Մೳʢޠኮ਺Λ੍ݶ͠ͳͯ͘Α͍ʣ | SemEval-2012 Task 2ʢ୯ޠͷҙຯɾ౷ޠؔ܎ ͷ༧ଌʣͰଞʹެ։͞Ε͍ͯΔ୯ޠϕΫτϧͱ ൺ΂ͯେ͖͘ੑೳΛ޲্ͤͨ͞ 16