Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ふわふわ系列ラベリング / ner 2018

himkt
January 16, 2018

ふわふわ系列ラベリング / ner 2018

レトリバセミナー「ふわふわ系列ラベリング」のスライドです

himkt

January 16, 2018
Tweet

More Decks by himkt

Other Decks in Research

Transcript

  1. ࣗݾ঺հ • ໊લ: ฏদ३ (@himkt) • ॴଐ: ஜ೾େֶେֶӃ म࢜̍೥ •

    ଔۀݚڀͰܥྻϥϕϦϯά໰୊Λ͔ͭͬͨ
 => ໘ന͍ʂʂ 
  2. ࣗવݴޠॲཧʹ͓͚ΔܥྻϥϕϦϯά • ඼ࢺλάਪఆ: ΈΜͳ͔ͭͬͯΔ • ݻ༗දݱநग़: ෳ߹ޠͷநग़ͳͲ • νϟϯΩϯά: จઅ୯ҐͰσʔλΛॲཧ

    • ҙຯ໾ׂղੳ: ߴ౓ͳจͷղੳ  ܅ ͷ ໊ ͸ ͸ ࠷ ߴ # * * * 0 0 0 ݻ༗දݱநग़ͷྫʢநग़͞ΕΔݻ༗දݱ: “܅ͷ໊͸”ʣ
  3. ܥྻϥϕϦϯά • => ܥྻͷߏ଄Λଊ͑Δ͜ͱ͕Ͱ͖Δ (֬཰) ϞσϧΛߟ͑Δ • HMM, MEMM, CRF

     ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ xt xt 1 xt+1 xt+2 xt 2 yt 2 yt yt 1 yt+1 yt+2 … … … …
  4. • => ҎԼͷಉ࣌֬཰Λߟ͑Ε͹Α͍͜ͱʹͳΔ • HMM͸ೖྗܥྻͱϥϕϧྻ͕ੜ੒͞ΕΔ֬཰Λֶश͢Δ
 (ੜ੒Ϟσϧ)  ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ xt

    xt 1 xt+1 xt+2 xt 2 yt 2 yt yt 1 yt+1 yt+2 … … … … Hidden Markov Model P(y1:n, x1:n) = P(y1:n)P(x1:n | y1:n) = n Y i=1 P(yi | yi 1)P(xi | yi)
  5. ֬཰͕࠷େͱͳΔܥྻΛޮ཰తʹൃݟ͢Δ • ܥྻ௕Λn ϥϕϧͷछྨ਺Λkͱ͢Δͱɼɹ ௨ΓͷY • => ϏλϏΞϧΰϦζϜ • ಈతܭը๏ͷҰछ:

    Ͱ֬཰࠷େͷύεΛൃݟ  kn ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … k௨Γ k௨Γ k௨Γ k௨Γ k௨Γ O(k 2 n)
  6. Maximum Entropy Markov Model • ࣝผϞσϧͰ͋Δ • ૉੑؔ਺ f ͕ಋೖ͞Εͨ

     ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … x1:n P(y1:n | x1:n) = n Y i=1 P(yi | yi 1, x1:n) = n Y i=1 exp wT f(yi, yi 1, x1:n) Z(yi 1, x1:n)
  7. ૉੑؔ਺ f ʹ͍ͭͯ • ͋Β͔͡ΊఆΊΒΕ͍͔ͨͭ͘ͷϧʔϧʹج͖ͮ
 i൪໨ͷϥϕϧΛ༧ଌ͢ΔͨΊͷૉੑ (ϕΫτϧ) Λநग़ • ͋Β͔͡ΊఆΊΒΕͨϧʔϧΛ·ͱΊͯૉੑςϯϓϨʔτͱ͍͏

     i൪໨ͷ୯ޠ͕େจࣈͰ
 ͸͡·͍ͬͯΔ͔Ͳ͏͔ (0/1) i൪໨ͷ୯ޠ͕-ingͰ
 ͓Θ͍ͬͯΔ͔Ͳ͏͔ (0/1) f(yi, yi 1, x1:n) = 0 B B B B B B B B B @ 1 0 1 . . . 0 1 0 1 C C C C C C C C C A
  8. • MEMM͸֤Ґஔʹରͯ͠ہॴతʹਖ਼نԽΛߦ͏ • => Label Bias, Length Biasͷ໰୊͕ى͖Δ  Maximum

    Entropy Markov Model: cons ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … x1:n P(y1:n | x1:n) = n Y i=1 P(yi | yi 1, x1:n) = n Y i=1 exp wT f(yi, yi 1, x1:n) Z(yi 1, x1:n)
  9. Label Bias, Length Bias  Kudo, Taku, Kaoru Yamamoto, and

    Yuji Matsumoto. “Applying Conditional Random Fields to Japanese Morphological Analysis." EMNLP. Vol. 4. 2004.
  10. • => ܥྻશମΛߟྀͨ֬͠཰ (Label Biasͷղফ) Conditional Random Field  P(y1:n

    | x1:n) = 1 Z(x1:n) n Y i=1 (yi | yi 1, x1:n) = 1 Z(x1:n) n Y i=1 exp wT f(yi, yi 1, x1:n) Z(x1:n) = X y0 n Y i=1 exp wT f(y0 i , y0 i 1 , x1:n) େҬతͳਖ਼نԽ ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … x1:n
  11. ܥྻϥϕϦϯάͷϞσϧͷมભ ֶश͕؆୯ͳϞσϧ (HMM) ॊೈʹૉੑΛܾΊΔ͜ͱ͕Ͱ͖ΔϞσϧ (MEMM, CRF, ߏ଄Խύʔηϓτϩϯ, ෼ྨثͷஞ࣮࣍ߦ,…) ૉੑͷܾఆɾநग़Λࣗಈతʹߦ͏
 χϡʔϥϧωοτϫʔΫ[Collobert+,

    2011͔Β?] ͷྲྀߦ ࠓճͷൃදͰ͸RNNΛ࢖ͬͨϞσϧͨͪΛ঺հ͠·͢  Collobert, Ronan, et al. "Natural language processing (almost) from scratch."
 Journal of Machine Learning Research 12.Aug (2011): 2493-2537.
  12. ͸͡ΊͯLSTM + CRFͷϞσϧΛܥྻϥϕϦϯάʹద༻ 3छͷૉੑ (spelling feature + context feature +

    word embedding) • spelling feature (12छྨ) ୯ޠ͕େจࣈͰ࢝·͍ͬͯΔ͔Ͳ͏͔ ୯ޠ͕શͯେจࣈ͔Ͳ͏͔ … • context feature: uni-gram, bi-gram, tri-gram • word embedding: SENNA [Collobert+, 2011]͕ॳظ஋ feature connection trick (MEMM (CRF?) likeͳૉੑͷऔΓѻ͍)  Huang+, ArXiv2015 (ArXiv only?) Bidirectional LSTM-CRF Models
 for Sequence Tagging Collobert, Ronan, et al. "Natural language processing (almost) from scratch."
 Journal of Machine Learning Research 12.Aug (2011): 2493-2537.
  13. ωοτϫʔΫߏ଄ • word embeddingҎ֎͸௚઀CRFʹೖྗ (feature connection trick) • LSTMओ໾ͱ͍͏ΑΓैདྷͷCRFʹLSTMΛ෇͚Ճ͑ͨϞσϧʁ 

    ਤ͸࿦จத͔ΒҾ༻ TQFMMJOHGFBUVSF DPOUFYUGFBUVSF XPSEFNCFEEJOH ૉੑநग़ͱ͍͑͹$//ʂ
  14.  Chiu+, TACL2016 Named Entity Recognition with
 Bidirectional LSTM-CNNs BiLSTM

    + CNN 2छྨͷ୯ޠಛ௃ྔ (word embedding + additional feature) 2छྨͷจࣈಛ௃ྔ (char embedding x CNN + additional feature) CNNΛ࢖ͬͯ୯ޠͷจࣈ৘ใϕʔεಛ௃ྔΛநग़
  15. ωοτϫʔΫߏ଄ • 2छྨͷ୯ޠಛ௃ྔ • ຒΊࠐΈ • Additional feature • খจࣈԽ+ޠװ

    • େจࣈԽͨ͠୯ޠ • খจࣈԽͨ͠୯ޠ • CNN͕நग़ͨ͠จࣈಛ௃ྔ • ຒΊࠐΈ • Additional feature  ਤ͸࿦จத͔ΒҾ༻
  16. ֎෦ͷϥϕϧσʔλΛ࢖Θͳ͍BiLSTM-CRF ڭࢣσʔλதͰֶशͨ͠จࣈϕʔεͷ୯ޠಛ௃ྔ
 + 
 ϥϕϧΛ෇༩͍ͯ͠ͳ͍େن໛ίʔύεͰֶशͨ͠୯ޠຒΊࠐΈ[Ling15] CNNΛ࢖ͬͯ୯ޠͷจࣈ৘ใϕʔεಛ௃ྔΛநग़  Lample+, NAACL-HLT2016 Neural

    Architectures for
 Named Entity Recognition Yulia, Wang Ling Lin Chu-Cheng, et al.
 "Not all contexts are created equal: Better word representations with variable attention.” Proc. EMNLP (2015).
  17.  Ma+, ACL2016 End-to-End Sequence Labeling via
 Bidirectional LSTM-CNNs-CRF ਤ͸࿦จத͔ΒҾ༻

    Pennington Jeffrey, Richard Socher, and Christopher Manning.
 "Glove: Global vectors for word representation.” Proc. EMNLP. 2014. -45. $// $3'Ϟσϧࣗମ͸͔ͳΓෳࡶʹͳ͖ͬͯͨ ଞͷλεΫ΍σʔλͱͷ૊Έ߹Θͤ͸Ͱ͖Δʁ • ୯ޠಛ௃ྔ (GloVe 100d [Pennington2014]) • จࣈಛ௃ྔ (จࣈCNNͷΈ)
  18. ਤ͸࿦จத͔ΒҾ༻ ֶशࡁΈͷݴޠϞσϧ [Chelba+, 2013]
 + CNN/RNNͰߏ੒ͨ͠จࣈϕʔεͷಛ௃ྔ + ୯ޠͷ෼ࢄදݱ Λૉੑͱͨ͠BiGRU+ CRF

     Peters+, ACL2017 Semi-supervised sequence tagging
 with bidirectional language models Chelba et al. "One billion word benchmark for measuring progress in statistical language modeling."
 arXiv preprint arXiv:1312.3005 (2013).
  19. จࣈ৘ใ͔Β୯ޠΛ౰ͯΔݴޠϞσϧΛಉ࣌ʹֶश͢Δ Highway network [Srivastava+, 2015] Λ࠾༻ͯ͠૚Λ·͍ͨͰ৘ใΛޮ཰తʹ఻͑Δ ௥Ճͷϥϕϧ෇͖σʔλෆཁ  Liu+, AAAI2018

    Empower Sequence Labeling with Task-Aware
 Neural Language Model ਤ͸࿦จத͔ΒҾ༻ Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. “Highway networks." arXiv preprint arXiv:1505.00387 (2015).
  20. • ΈΜͳެ։͞ΕֶͨशࡁΈ෼ࢄදݱΛ࢖͍ͬͯΔ • SENNA -> GloVe, Word2Vec (Skip-gram? CBoW?) •

    ୯ޠͷ෼ࢄදݱ͸50࣍ݩɼ100࣍ݩͱൺֱత௿Ί͕ଟ͍ (*) ֤ݚڀͰ࢖ΘΕ͍ͯΔ୯ޠϕΫτϧ  Paper Word Embedding 1 collobert2011 SENNA50d 2 Huang2015 SENNA50d 3 Chiu2016 SENNA50d (beat: GloVe, Word2Vec) 4 Lample2016 Skip-n-gram (en: 100d, other: 64d) 5 Ma2016 GloVe100d 6 Paters2017 SENNA50d 7 Rei2017 Word2Vec (Google News 300d, PubMed+PMC 200d) 8 Liu2018 GloVe100d *SENNA͸50࣍ݩσʔλͷΈެ։͞Ε͍ͯΔ
  21. ·ͱΊ • HMMʹΑΔܥྻϥϕϦϯά • ࣝผϞσϧʹ͍ͨ͠ɼॊೈʹૉੑΛೖΕ͍ͨ • MEMM, CRFͷΑ͏ͳࣝผϞσϧ͕ఏҊ͞Εͨ • ॊೈʹૉੑΛऔΓೖΕΔ͜ͱ͕Մೳʹ

    (ૉੑΤϯδχΞϦϯά΁) • χϡʔϥϧܥྻϥϕϦϯά͕ੜ·Εͨ • ૉੑΛࣗಈతʹநग़͢Δ࣌୅΁ • ൒ڭࢣ͋ΓϞσϧɾϚϧνλεΫֶश • ϥϕϧ෇͚ෆཁͳσʔλΛ׆༻͢Δ • खݩʹ͋ΔڭࢣσʔλͰ΋ͬͱ͏·͘΍Δ 
  22. ࿦จͷଞʹεϥΠυ࡞੒࣌ʹࢀߟʹͨ͠ࢿྉ • ۙ೥ͷࣗવݴޠॲཧʹ͓͚ΔܥྻϥϕϦϯάͷಈ޲͕·ͱ·ͬͨࢿྉ (p.15~) • https://www.slideshare.net/stairlab/2017-73465873 • ෼ྨ໰୊->ܥྻϥϕϦϯά ϩδεςΟοΫճؼ->CRFͷྲྀΕͰ·ͱΊΒΕͨࢿྉ •

    http://www.ism.ac.jp/editsec/toukei/pdf/64-2-179.pdf • HMM -> MEMM -> CRFͷల։͕આ໌͞Εͨࢿྉ • http://www.cs.stanford.edu/~nmramesh/crf • https://ssli.ee.washington.edu/people/duh/projects/CRFintro.pdf • https://abicky.net/2010/06/21/082851/ 
  23. ͓·͚ ެ։͞Ε͍ͯΔ࣮૷ • Ma+, 2016: https://github.com/XuezheMax/NeuroNLP2 • Lample+, 2016: https://github.com/glample/tagger

    • Rei+, 2017: https://github.com/marekrei/sequence-labeler • Liu+, 2018: https://github.com/LiyuanLucasLiu/LM-LSTM-CRF 
  24. ͓·͚ ൺֱ͋Ε͜Ε  P(xi | c0, , c0,1, . .

    . , ci 1, ) P(xi | x1, . . . , xi 1) Liu+ͷݴޠϞσϧ ;ͭ͏ͷݴޠϞσϧ E = T X t=1 log P(yt | dt) E = s(y) + log X ˜ y2 ˜ Y exp s(˜ y) E = s(y) + log X ˜ y2 ˜ Y exp s(˜ y) + ⇣ ( T 1 X t=1 log P(wt+1 | ! mt) + ( T X t=2 log P(wt 1 | mt) ⌘ Liu+, Rei+ͷϩεؔ਺ ͦͷଞͷϞσϧͷϩεؔ਺ Chiu+, ͷϩεؔ਺ ↑ݴޠϞσϧͷ߲