Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ふわふわ系列ラベリング / ner 2018

himkt
January 16, 2018

ふわふわ系列ラベリング / ner 2018

レトリバセミナー「ふわふわ系列ラベリング」のスライドです

himkt

January 16, 2018
Tweet

More Decks by himkt

Other Decks in Research

Transcript

  1. ;Θ;ΘܥྻϥϕϦϯά ϨτϦόηϛφʔ #2018/01/17 ฏদ३

  2. ࣗݾ঺հ • ໊લ: ฏদ३ (@himkt) • ॴଐ: ஜ೾େֶେֶӃ म࢜̍೥ •

    ଔۀݚڀͰܥྻϥϕϦϯά໰୊Λ͔ͭͬͨ
 => ໘ന͍ʂʂ 
  3. ͖ΐ͏͸ͳ͢͜ͱ • ܥྻϥϕϦϯάͱ͸ • ؆୯ͳઆ໌ • χϡʔϥϧҎલͷܥྻϥϕϦϯά • ͍͔ͭ͘ͷϞσϧͷൺֱ •

    χϡʔϥϧܥྻϥϕϦϯά • ݸਓతʹ໘ന͍ͱࢥͬͨϞσϧΛத৺ʹ 
  4. ܥྻϥϕϦϯά • ͳΜΒ͔ͷܥྻσʔλʹରͯ͠ϥϕϧΛ༧ଌ͢Δ • => ؍ଌܥྻX͕༩͑ΒΕͨͱ͖ʹɹɹɹɹ Λ࠷େԽ͢Δ y  ؍ଌ͞Ε͍ͯΔܥྻ

    ༧ଌ͍ͨ͠ϥϕϧྻ xt xt 1 xt+1 xt+2 xt 2 yt 2 yt yt 1 yt+1 yt+2 … … … … P(y1:n | x1:n)
  5. ࣗવݴޠॲཧʹ͓͚ΔܥྻϥϕϦϯά • ඼ࢺλάਪఆ: ΈΜͳ͔ͭͬͯΔ • ݻ༗දݱநग़: ෳ߹ޠͷநग़ͳͲ • νϟϯΩϯά: จઅ୯ҐͰσʔλΛॲཧ

    • ҙຯ໾ׂղੳ: ߴ౓ͳจͷղੳ  ܅ ͷ ໊ ͸ ͸ ࠷ ߴ # * * * 0 0 0 ݻ༗දݱநग़ͷྫʢநग़͞ΕΔݻ༗දݱ: “܅ͷ໊͸”ʣ
  6. ܥྻϥϕϦϯά • => ܥྻͷߏ଄Λଊ͑Δ͜ͱ͕Ͱ͖Δ (֬཰) ϞσϧΛߟ͑Δ • HMM, MEMM, CRF

     ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ xt xt 1 xt+1 xt+2 xt 2 yt 2 yt yt 1 yt+1 yt+2 … … … …
  7. Hidden Markov Model • ࣌ࠁtͷϥϕϧ͸࣌ࠁt-1ʹͷΈґଘ͢Δ • ࣌ࠁtͷ୯ޠ͸࣌ࠁtͷϥϕϧʹͷΈґଘ͢Δ  ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ

    xt xt 1 xt+1 xt+2 xt 2 yt 2 yt yt 1 yt+1 yt+2 … … … …
  8. • => ҎԼͷಉ࣌֬཰Λߟ͑Ε͹Α͍͜ͱʹͳΔ • HMM͸ೖྗܥྻͱϥϕϧྻ͕ੜ੒͞ΕΔ֬཰Λֶश͢Δ
 (ੜ੒Ϟσϧ)  ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ xt

    xt 1 xt+1 xt+2 xt 2 yt 2 yt yt 1 yt+1 yt+2 … … … … Hidden Markov Model P(y1:n, x1:n) = P(y1:n)P(x1:n | y1:n) = n Y i=1 P(yi | yi 1)P(xi | yi)
  9. ֬཰͕࠷େͱͳΔܥྻΛޮ཰తʹൃݟ͢Δ • ܥྻ௕Λn ϥϕϧͷछྨ਺Λkͱ͢Δͱɼɹ ௨ΓͷY • => ϏλϏΞϧΰϦζϜ • ಈతܭը๏ͷҰछ:

    Ͱ֬཰࠷େͷύεΛൃݟ  kn ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … k௨Γ k௨Γ k௨Γ k௨Γ k௨Γ O(k 2 n)
  10. ϏλϏΞϧΰϦζϜͷงғؾ  C C C 1 2 3 ࣌ࠁU͔ΒUʹ͍ͭͯ΋ͬͱ΋ભҠίετ͕௿͍ࢬ͸ʁ ֤࣌ࠁʹ͍ͭͯɹɹճͷܭࢉશମͰɹɹɹճͷܭࢉ

    ϥϕϧ਺k (k=3) ܥྻ௕n (n=3) C C C k2 k2n cost: 1.2 cost: 4.9 cost: 2.2
  11. • HMM͸XͱYͷಉ࣌֬཰Λֶश͢Δੜ੒Ϟσϧ • ೖྗ୯ޠྻ΋ϞσϦϯά͢Δඞཁ͸͋Δ͔ʁ • =>ࣝผϞσϧ͕ཉ͍͠ • HMM͸ૉੑͷࣗ༝౓Λ্͛ʹ͍͘ • ୯ޠͷද૚ͷ৘ใ΍Ґஔͷ৘ใΛૉੑʹ͍ͨ͠

    • => MEMMɾCRFɾߏ଄Խύʔηϓτϩϯ…etc  Hidden Markov Model: cons
  12. ܥྻϥϕϦϯάͷϞσϧͷมભ ֶश͕؆୯ͳϞσϧ (HMM) ॊೈʹૉੑΛܾΊΔ͜ͱ͕Ͱ͖ΔϞσϧ (MEMM, CRF, ߏ଄Խύʔηϓτϩϯ, ෼ྨثͷஞ࣮࣍ߦ…) 

  13. Maximum Entropy Markov Model • ࣝผϞσϧͰ͋Δ • ૉੑؔ਺ f ͕ಋೖ͞Εͨ

     ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … x1:n P(y1:n | x1:n) = n Y i=1 P(yi | yi 1, x1:n) = n Y i=1 exp wT f(yi, yi 1, x1:n) Z(yi 1, x1:n)
  14. ૉੑؔ਺ f ʹ͍ͭͯ • ͋Β͔͡ΊఆΊΒΕ͍͔ͨͭ͘ͷϧʔϧʹج͖ͮ
 i൪໨ͷϥϕϧΛ༧ଌ͢ΔͨΊͷૉੑ (ϕΫτϧ) Λநग़ • ͋Β͔͡ΊఆΊΒΕͨϧʔϧΛ·ͱΊͯૉੑςϯϓϨʔτͱ͍͏

     i൪໨ͷ୯ޠ͕େจࣈͰ
 ͸͡·͍ͬͯΔ͔Ͳ͏͔ (0/1) i൪໨ͷ୯ޠ͕-ingͰ
 ͓Θ͍ͬͯΔ͔Ͳ͏͔ (0/1) f(yi, yi 1, x1:n) = 0 B B B B B B B B B @ 1 0 1 . . . 0 1 0 1 C C C C C C C C C A
  15. • MEMM͸֤Ґஔʹରͯ͠ہॴతʹਖ਼نԽΛߦ͏ • => Label Bias, Length Biasͷ໰୊͕ى͖Δ  Maximum

    Entropy Markov Model: cons ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … x1:n P(y1:n | x1:n) = n Y i=1 P(yi | yi 1, x1:n) = n Y i=1 exp wT f(yi, yi 1, x1:n) Z(yi 1, x1:n)
  16. Label Bias, Length Bias  Kudo, Taku, Kaoru Yamamoto, and

    Yuji Matsumoto. “Applying Conditional Random Fields to Japanese Morphological Analysis." EMNLP. Vol. 4. 2004.
  17. • => ܥྻશମΛߟྀͨ֬͠཰ (Label Biasͷղফ) Conditional Random Field  P(y1:n

    | x1:n) = 1 Z(x1:n) n Y i=1 (yi | yi 1, x1:n) = 1 Z(x1:n) n Y i=1 exp wT f(yi, yi 1, x1:n) Z(x1:n) = X y0 n Y i=1 exp wT f(y0 i , y0 i 1 , x1:n) େҬతͳਖ਼نԽ ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … x1:n
  18. ܥྻϥϕϦϯάͷϞσϧͷมભ ֶश͕؆୯ͳϞσϧ (HMM) ॊೈʹૉੑΛܾΊΔ͜ͱ͕Ͱ͖ΔϞσϧ (MEMM, CRF, ߏ଄Խύʔηϓτϩϯ, ෼ྨثͷஞ࣮࣍ߦ,…) ૉੑͷܾఆɾநग़Λࣗಈతʹߦ͏
 χϡʔϥϧωοτϫʔΫ[Collobert+,

    2011͔Β?] ͷྲྀߦ ࠓճͷൃදͰ͸RNNΛ࢖ͬͨϞσϧͨͪΛ঺հ͠·͢  Collobert, Ronan, et al. "Natural language processing (almost) from scratch."
 Journal of Machine Learning Research 12.Aug (2011): 2493-2537.
  19. ͸͡ΊͯLSTM + CRFͷϞσϧΛܥྻϥϕϦϯάʹద༻ 3छͷૉੑ (spelling feature + context feature +

    word embedding) • spelling feature (12छྨ) ୯ޠ͕େจࣈͰ࢝·͍ͬͯΔ͔Ͳ͏͔ ୯ޠ͕શͯେจࣈ͔Ͳ͏͔ … • context feature: uni-gram, bi-gram, tri-gram • word embedding: SENNA [Collobert+, 2011]͕ॳظ஋ feature connection trick (MEMM (CRF?) likeͳૉੑͷऔΓѻ͍)  Huang+, ArXiv2015 (ArXiv only?) Bidirectional LSTM-CRF Models
 for Sequence Tagging Collobert, Ronan, et al. "Natural language processing (almost) from scratch."
 Journal of Machine Learning Research 12.Aug (2011): 2493-2537.
  20. ωοτϫʔΫߏ଄ • word embeddingҎ֎͸௚઀CRFʹೖྗ (feature connection trick) • LSTMओ໾ͱ͍͏ΑΓैདྷͷCRFʹLSTMΛ෇͚Ճ͑ͨϞσϧʁ 

    ਤ͸࿦จத͔ΒҾ༻ TQFMMJOHGFBUVSF DPOUFYUGFBUVSF XPSEFNCFEEJOH ૉੑநग़ͱ͍͑͹$//ʂ
  21.  Chiu+, TACL2016 Named Entity Recognition with
 Bidirectional LSTM-CNNs BiLSTM

    + CNN 2छྨͷ୯ޠಛ௃ྔ (word embedding + additional feature) 2छྨͷจࣈಛ௃ྔ (char embedding x CNN + additional feature) CNNΛ࢖ͬͯ୯ޠͷจࣈ৘ใϕʔεಛ௃ྔΛநग़
  22. ωοτϫʔΫߏ଄ • 2छྨͷ୯ޠಛ௃ྔ • ຒΊࠐΈ • Additional feature • খจࣈԽ+ޠװ

    • େจࣈԽͨ͠୯ޠ • খจࣈԽͨ͠୯ޠ • CNN͕நग़ͨ͠จࣈಛ௃ྔ • ຒΊࠐΈ • Additional feature  ਤ͸࿦จத͔ΒҾ༻
  23. • prefix-suffixͷΑ͏ͳ৘ใΛଊ͑Δ͜ͱ͕Ͱ͖Δ • Additional Char Feature: େจࣈɾখจࣈɾه߸ɾͦͷଞ จࣈϕʔεͷಛ௃ྔ  ਤ͸࿦จத͔ΒҾ༻

    ͱͯ΋ྑ͍ੑೳ ΑΓਓखͷૉੑʹґଘ͠ͳ͍Ϟσϧ͸࡞Εͳ͍͔ʁ
  24. ֎෦ͷϥϕϧσʔλΛ࢖Θͳ͍BiLSTM-CRF ڭࢣσʔλதͰֶशͨ͠จࣈϕʔεͷ୯ޠಛ௃ྔ
 + 
 ϥϕϧΛ෇༩͍ͯ͠ͳ͍େن໛ίʔύεͰֶशͨ͠୯ޠຒΊࠐΈ[Ling15] CNNΛ࢖ͬͯ୯ޠͷจࣈ৘ใϕʔεಛ௃ྔΛநग़  Lample+, NAACL-HLT2016 Neural

    Architectures for
 Named Entity Recognition Yulia, Wang Ling Lin Chu-Cheng, et al.
 "Not all contexts are created equal: Better word representations with variable attention.” Proc. EMNLP (2015).
  25. ωοτϫʔΫߏ଄ • ୯ޠϕʔε + จࣈϕʔεͷಛ௃ΛBiLSTMͰநग़  ਤ͸࿦จத͔ΒҾ༻ $// -45.ͱ-45. $3'͸ग़͖ͯͨ


    $// -45. $3'΋͋Δ
  26.  Ma+, ACL2016 End-to-End Sequence Labeling via
 Bidirectional LSTM-CNNs-CRF ਤ͸࿦จத͔ΒҾ༻

    Pennington Jeffrey, Richard Socher, and Christopher Manning.
 "Glove: Global vectors for word representation.” Proc. EMNLP. 2014. -45. $// $3'Ϟσϧࣗମ͸͔ͳΓෳࡶʹͳ͖ͬͯͨ ଞͷλεΫ΍σʔλͱͷ૊Έ߹Θͤ͸Ͱ͖Δʁ • ୯ޠಛ௃ྔ (GloVe 100d [Pennington2014]) • จࣈಛ௃ྔ (จࣈCNNͷΈ)
  27. ैདྷͷܥྻϥϕϦϯά -> χϡʔϥϧܥྻϥϕϦϯά ෳࡶͳૉੑΛࣗಈతʹநग़ͯ͠΄͍͠ => χϡʔϥϧωοτϫʔΫΛ࢖͏ ଞͷλεΫ΍σʔλͱͷ૊Έ߹Θͤ͸Ͱ͖ͳ͍͔ʁ => ϚϧνλεΫֶशɼ൒ڭࢣֶश 

  28. ਤ͸࿦จத͔ΒҾ༻ ֶशࡁΈͷݴޠϞσϧ [Chelba+, 2013]
 + CNN/RNNͰߏ੒ͨ͠จࣈϕʔεͷಛ௃ྔ + ୯ޠͷ෼ࢄදݱ Λૉੑͱͨ͠BiGRU+ CRF

     Peters+, ACL2017 Semi-supervised sequence tagging
 with bidirectional language models Chelba et al. "One billion word benchmark for measuring progress in statistical language modeling."
 arXiv preprint arXiv:1312.3005 (2013).
  29. ࣍ͷ୯ޠɾલͷ୯ޠΛ౰ͯΔ໰୊Λಉ࣌ʹղ͘ ௥Ճͷϥϕϧ෇͖σʔλ͸ඞཁͳ͠  Rei+, ACL2017 Semi-supervised Multitask Learning
 for Sequence

    Labeling ਤ͸࿦จத͔ΒҾ༻ ˜ E = E + ( ! E + E )
  30. จࣈ৘ใ͔Β୯ޠΛ౰ͯΔݴޠϞσϧΛಉ࣌ʹֶश͢Δ Highway network [Srivastava+, 2015] Λ࠾༻ͯ͠૚Λ·͍ͨͰ৘ใΛޮ཰తʹ఻͑Δ ௥Ճͷϥϕϧ෇͖σʔλෆཁ  Liu+, AAAI2018

    Empower Sequence Labeling with Task-Aware
 Neural Language Model ਤ͸࿦จத͔ΒҾ༻ Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. “Highway networks." arXiv preprint arXiv:1505.00387 (2015).
  31. ࣮ݧ݁Ռ (CoNLL03 NER, F1) [in Liu2018] ද͸࿦จத͔ΒҾ༻

  32. • ΈΜͳެ։͞ΕֶͨशࡁΈ෼ࢄදݱΛ࢖͍ͬͯΔ • SENNA -> GloVe, Word2Vec (Skip-gram? CBoW?) •

    ୯ޠͷ෼ࢄදݱ͸50࣍ݩɼ100࣍ݩͱൺֱత௿Ί͕ଟ͍ (*) ֤ݚڀͰ࢖ΘΕ͍ͯΔ୯ޠϕΫτϧ  Paper Word Embedding 1 collobert2011 SENNA50d 2 Huang2015 SENNA50d 3 Chiu2016 SENNA50d (beat: GloVe, Word2Vec) 4 Lample2016 Skip-n-gram (en: 100d, other: 64d) 5 Ma2016 GloVe100d 6 Paters2017 SENNA50d 7 Rei2017 Word2Vec (Google News 300d, PubMed+PMC 200d) 8 Liu2018 GloVe100d *SENNA͸50࣍ݩσʔλͷΈެ։͞Ε͍ͯΔ
  33. ·ͱΊ • HMMʹΑΔܥྻϥϕϦϯά • ࣝผϞσϧʹ͍ͨ͠ɼॊೈʹૉੑΛೖΕ͍ͨ • MEMM, CRFͷΑ͏ͳࣝผϞσϧ͕ఏҊ͞Εͨ • ॊೈʹૉੑΛऔΓೖΕΔ͜ͱ͕Մೳʹ

    (ૉੑΤϯδχΞϦϯά΁) • χϡʔϥϧܥྻϥϕϦϯά͕ੜ·Εͨ • ૉੑΛࣗಈతʹநग़͢Δ࣌୅΁ • ൒ڭࢣ͋ΓϞσϧɾϚϧνλεΫֶश • ϥϕϧ෇͚ෆཁͳσʔλΛ׆༻͢Δ • खݩʹ͋ΔڭࢣσʔλͰ΋ͬͱ͏·͘΍Δ 
  34. ࿦จͷଞʹεϥΠυ࡞੒࣌ʹࢀߟʹͨ͠ࢿྉ • ۙ೥ͷࣗવݴޠॲཧʹ͓͚ΔܥྻϥϕϦϯάͷಈ޲͕·ͱ·ͬͨࢿྉ (p.15~) • https://www.slideshare.net/stairlab/2017-73465873 • ෼ྨ໰୊->ܥྻϥϕϦϯά ϩδεςΟοΫճؼ->CRFͷྲྀΕͰ·ͱΊΒΕͨࢿྉ •

    http://www.ism.ac.jp/editsec/toukei/pdf/64-2-179.pdf • HMM -> MEMM -> CRFͷల։͕આ໌͞Εͨࢿྉ • http://www.cs.stanford.edu/~nmramesh/crf • https://ssli.ee.washington.edu/people/duh/projects/CRFintro.pdf • https://abicky.net/2010/06/21/082851/ 
  35. ͓·͚ ެ։͞Ε͍ͯΔ࣮૷ • Ma+, 2016: https://github.com/XuezheMax/NeuroNLP2 • Lample+, 2016: https://github.com/glample/tagger

    • Rei+, 2017: https://github.com/marekrei/sequence-labeler • Liu+, 2018: https://github.com/LiyuanLucasLiu/LM-LSTM-CRF 
  36. ͓·͚ ൺֱ͋Ε͜Ε  P(xi | c0, , c0,1, . .

    . , ci 1, ) P(xi | x1, . . . , xi 1) Liu+ͷݴޠϞσϧ ;ͭ͏ͷݴޠϞσϧ E = T X t=1 log P(yt | dt) E = s(y) + log X ˜ y2 ˜ Y exp s(˜ y) E = s(y) + log X ˜ y2 ˜ Y exp s(˜ y) + ⇣ ( T 1 X t=1 log P(wt+1 | ! mt) + ( T X t=2 log P(wt 1 | mt) ⌘ Liu+, Rei+ͷϩεؔ਺ ͦͷଞͷϞσϧͷϩεؔ਺ Chiu+, ͷϩεؔ਺ ↑ݴޠϞσϧͷ߲