Upgrade to Pro — share decks privately, control downloads, hide ads and more …

完全なアノテーションが得られない状況下での固有表現抽出

Koga Kobayashi
September 18, 2019

 完全なアノテーションが得られない状況下での固有表現抽出

Koga Kobayashi

September 18, 2019
Tweet

More Decks by Koga Kobayashi

Other Decks in Technology

Transcript

  1. NERͷֶशʹ࢖͑Δσʔλ Ξϊςʔγϣϯίʔύε • NERλεΫͰҰൠతʹར༻͞ΕΔσʔληοτ • ୯ޠʹରͯ͠ɼ֤ʑରԠ͢Δϥϕϧ͕෇༩͞Ε͍ͯΔ σϝϦοτ • υϝΠϯຖʹ࡞Γ௚͢ඞཁ͕͋Δ •

    Ξϊςʔγϣϯʹߴ͍ίετ͕͔͔Δ Donald B-PER John I-PER Trump E-PER is O president O of O the O US S-LOC the O ΞϊςʔγϣϯίʔύεΛ࡞ΔίετΛݮΒ͢ or ແ͍ͨ͘͠  
  2. NERͷֶशʹ࢖͑Δσʔλ ෦෼తΞϊςʔγϣϯίʔύε • Ұ෦ͷ୯ޠʹ͚ͩΞϊςʔγϣϯ͕෇༩͞Ε͍ͯΔίʔύε • Ξϊςʔλ͸ࣗ৴ͷͳ͍୯ޠʹΞϊςʔγϣϯ͠ͳͯ͘ࡁΉ σϝϦοτ • ֶशʹগ͠޻෉͕ඞཁ(ҰൠతͳCRFͰ͸ֶशग़དྷͳ͍) 

     Donald - John B-PER Trump E-PER is - president - of O the - US - the O ෦෼తΞϊςʔγϣϯίʔύε - Eraldo R Fernandes and Ulf Brefeld. 2011. Learning from partially annotated sequences. In Proceedings of ECML-KDD.
 - Andrew Carlson, Scott Gaffney, and Flavian Vasile. 2009. Learning a named entity tagger from gazetteers with the partial perceptron. In Proceedings of AAAI Spring Symposium: Learning by Reading and Learning to Read.
 - Jie Zhanming, Xie Pengjun, Lu Wei, Ding Ruixue and Li Linlin. 2019. Better Modeling of Incomplete Annotations for Named Entity Recognition. In Proceedings of NAACL.
  3. NERͷֶशʹ࢖͑Δσʔλ   ࣙॻ • ͍ΘΏΔࣙॻɼͨͩݴ༿ͷ಺༰ɼҙຯ·Ͱ࢖͏͜ͱ͸গͳ͍ • ༷ʑͳ෼໺ʹஔ͍ͯݩ͔Βଘࡏ͍ͯ͠Δ͜ͱ͕ଟ͍ • ਓ໊ࣙయ,

    ༀֶ༻ޠࣙయ, ෺࣭ɾࡐྉσʔλϕʔε ࣙॻϚονʹΑΔݻ༗දݱநग़Ͱ͸ • ࣙॻʹଘࡏ͠ͳ͍୯ޠΛݕग़Ͱ͖ͳ͍ • ݻ༗දݱʹؒҧͬͨϥϕϧ͕෇༩͞ΕΔ ͱ͍ͬͨ໰୊͕ੜ͡Δ Donald - John S-PER Trump - is - president - of - the - US - the - Person Dictionary John Michael ࣙॻϚον ࣙॻ + ੜίʔύεΛ༻͍ͨDistantly Supervisedͱ͍͏ख๏͕ग़ݱ
  4. ࠓճͷ෼໺ʹؔ܎͋Δ࿦จ ෦෼తΞϊςʔγϣϯίʔύε • Better Modeling of Incomplete Annotation for Named

    Entity Recognition Distantly Supervised NER • Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning • Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning • Learning Named Entity Tagger using Domain-Specific Dictionary • Distant supervision for relation extraction without labeled data. • Distant supervision for relation extraction via piecewise convolutional neural networks.  
  5. Better Modeling of Incomplete Annotations
 for Named Entity Recognition ෆ׬શͳΞϊςʔγϣϯ͕ߦΘΕ͍ͯΔσʔλ͔ΒNERΛߦ͏ϞσϧͷఏҊ

      Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li NAACL 2019 എܠ • ෆ׬શͳΞϊςʔγϣϯΛԾఆ͢Δࡍʹɼ
 ୯ޠϨϕϧͰϥϕϧΛऔΓআ͘ͷ͸ݱ࣮తͰ͸ͳ͍ (A.1) • ໌ࣔతʹOλάΛΞϊςʔγϣϯ͢Δ͜ͱ͸ແ͍ (A.1, A.2) ࣮ӡ༻Ͱى͖͏Δϥϕϧܽଛ͸
 A.3ʹͳΔͱஶऀ͸ओு https://github.com/kajyuuen/Incomplete-NER-Methods ࠶ݱ࣮૷͠·ͨ͠:
  6. Better Modeling of Incomplete Annotations
 for Named Entity Recognition 

     Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li NAACL 2019 ௨ৗͷCRF ఏҊख๏ͷCRF ఏҊख๏ CRFΛ֦ு͠ɼऔΓ͏Δϥϕϧͷ૊Έ߹ΘͤΛߟྀ͢ΔϞσϧ ֬཰෼෍ ͷਪఆ Hard: ࠷΋Մೳੑͷߴ͍ϥϕϧܥྻʹ֬཰1ΛׂΓ౰ͯΔ Soft: ͋Γ͏Δશͯͷϥϕϧܥྻʹରͯ֬͠཰ΛׂΓ౰ͯΔ ͜ΕΒͷ֬཰෼෍͸k෼ׂަࠩݕূʹΑͬͯਪఆΛߦ͏ q q = 1 CRF-PA ఏҊख๏
  7. Better Modeling of Incomplete Annotations
 for Named Entity Recognition 

     Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li NAACL 2019 ݁Ռ ৚݅ઃఆ: ޒׂͷϥϕϧܽଛ+શͯͷOλάΛ࡟আ • ׬શͳΞϊςʔγϣϯ͕෇͍ͨͱ͖ʹൺ΂ͯ΋·ͣ·ͣͳੑೳΛࣔ͢ • ϥϕϧ͕෇͍͍ͯͳ͍৔ॴΛOͱͯ͠Έͳ͢Simpleʹ͸େউར
  8. Distantly Supervised Named Entity Recognition 
 using Positive-Unlabeled Learning 

     Minlong Peng, Xiaoyu Xing, Qi Zhang, Jinlan Fu, Xuanjing Huang ACL 2019 PUֶशΛར༻ͯࣙ͠ॻͱੜςΩετ͚ͩΛ༻͍ͯNERΛߦ͏ ఏҊख๏ • ࠷௕Ұக๏Λ༻͍ͯɼࣙॻ͔Βੜίʔύεʹରͯ͠ΞϊςʔγϣϯΛߦ͏ • ϥϕϧ෇͚͕ߦΘΕͨσʔλΛPositive, ͦΕҎ֎ΛUnlabeledͱֶͯ͠श • BIOɼBIOESͱ͍ͬͨϥϕϦϯάεΩʔϚΛར༻͠ͳ͍͜ͱͰɼ
 ࣙॻʹΑΔޡΞϊςʔγϣϯΛݮΒ͢͜ͱ͕ग़དྷΔ • ֤ΫϥεຖʹPU෼ྨػΛ࡞੒ɼ༧ଌ֬཰͕࠷΋ߴ͍ΫϥεΛ࠾༻
  9. Distantly Supervised Named Entity Recognition 
 using Positive-Unlabeled Learning 

     Minlong Peng, Xiaoyu Xing, Qi Zhang, Jinlan Fu, Xuanjing Huang ACL 2019 ݁Ռ Ξϊςʔγϣϯίʔύε ࣙॻͱੜίʔύε ࣙॻϚονʹൺ΂ͯɼେ͖͘ੑೳΛ޲্ͤͨ͞
  10. Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning

      Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He, Min Zhang COLING 2018 Distant SupervisionͱڧԽֶशͷ૊Έ߹ΘͤʹΑΔNER ఏҊख๏ • ڧԽֶशΛ༻͍ͯnoisyͳϥϕϧ෇͚Λ࡟আ • CRF-PAΛ༻͍Δ͜ͱͰ
 imcompleteͳܥྻͰ΋ֶशΛՄೳʹ͢Δ എܠ • ࣙॻϚονʹΑͬͯ࡞੒ͨ͠ڭࢣσʔλ(Distant Supervision)ʹ͸
 imcomplete, noisyͳϥϕϧ෇͚͕ߦΘΕΔͱ͍͏໰୊͕͋Δ
  11. Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning

      Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He, Min Zhang COLING 2018 ݁Ռ ࣙॻϚον΍LSTM-CRF-PAͷΈͷ৔߹ʹൺ΂ͯɼߴ͍நग़ੑೳΛ࣋ͭ ͕খ͞ΊͷΞϊςʔγϣϯίʔύε ͕Distantly SupervisedʹΑͬͯ࡞ΒΕͨڭࢣσʔλ ℋ
  12. αʔϕΠ ײ૝ • ෦෼తΞϊςʔγϣϯίʔύεΛֶश͢ΔϞσϧ͸සൟʹݟΒΕͨ • Distantly Supervised NERͷݚڀͰ͸͔ͳΓͷ֬཰Ͱར༻ɼҾ༻͞ΕΔ • σʔληοτͲ͏͢Δͷ໰୊

    • ࣗ࡞͢Δύλʔϯ͕͔ͳΓଟ͍ • CoNLL2003͔ΒϥϯμϜܽଛͤ͞Δύλʔϯ΋ݟΔ • ͲͪΒʹ͠Ζ࠶ݱੑͷ͋ΔϥϕϧܽଛΛߦ͏͜ͱ͕೉͍͠ • ྲྀߦ͖͍ͬͯͯΔײ͡͸͢Δ • Nested NER΍DS NERͷΑ͏ͳෳࡶͳλεΫઃఆͷ࿦จଟ͍ • ࣙॻ͚ͩͰNERΛߦ͍͍ͨؾ࣋ͪͷਓͨͪ͸୔ࢁ͍ͦ͏ • ී௨ͷܥྻϥϕϦϯάλεΫʹݶք͕དྷ͍ͯΔʁ