$30 off During Our Annual Pro Sale. View Details »

完全なアノテーションが得られない状況下での固有表現抽出

Koga Kobayashi
September 18, 2019

 完全なアノテーションが得られない状況下での固有表現抽出

Koga Kobayashi

September 18, 2019
Tweet

More Decks by Koga Kobayashi

Other Decks in Technology

Transcript

 1. ׬શͳΞϊςʔγϣϯ͕

  ಘΒΕͳ͍ঢ়گԼͰͷݻ༗දݱநग़
  খྛ ᕣՏ @kajyuuen

  View Slide

 2. ࣗݾ঺հ


  খྛ ᕣՏ (Koga Kobayashi)
  ID: @kajyuuen
  ஜ೾େֶେֶӃ म࢜1೥
  NERϒϩάΛॻ͍ͯ·͢

  View Slide

 3. NERͷֶशʹ࢖͑Δσʔλ
  Ξϊςʔγϣϯίʔύε
  • NERλεΫͰҰൠతʹར༻͞ΕΔσʔληοτ
  • ୯ޠʹରͯ͠ɼ֤ʑରԠ͢Δϥϕϧ͕෇༩͞Ε͍ͯΔ
  σϝϦοτ
  • υϝΠϯຖʹ࡞Γ௚͢ඞཁ͕͋Δ
  • Ξϊςʔγϣϯʹߴ͍ίετ͕͔͔Δ
  Donald
  B-PER
  John
  I-PER
  Trump
  E-PER
  is
  O
  president
  O
  of
  O
  the
  O
  US
  S-LOC
  the
  O
  ΞϊςʔγϣϯίʔύεΛ࡞ΔίετΛݮΒ͢ or ແ͍ͨ͘͠


  View Slide

 4. NERͷֶशʹ࢖͑Δσʔλ
  ෦෼తΞϊςʔγϣϯίʔύε
  • Ұ෦ͷ୯ޠʹ͚ͩΞϊςʔγϣϯ͕෇༩͞Ε͍ͯΔίʔύε
  • Ξϊςʔλ͸ࣗ৴ͷͳ͍୯ޠʹΞϊςʔγϣϯ͠ͳͯ͘ࡁΉ
  σϝϦοτ
  • ֶशʹগ͠޻෉͕ඞཁ(ҰൠతͳCRFͰ͸ֶशग़དྷͳ͍)


  Donald
  -
  John
  B-PER
  Trump
  E-PER
  is
  -
  president
  -
  of
  O
  the
  -
  US
  -
  the
  O
  ෦෼తΞϊςʔγϣϯίʔύε
  - Eraldo R Fernandes and Ulf Brefeld. 2011. Learning from partially annotated sequences. In Proceedings of ECML-KDD.

  - Andrew Carlson, Scott Gaffney, and Flavian Vasile. 2009. Learning a named entity tagger from gazetteers with the partial perceptron.
  In Proceedings of AAAI Spring Symposium: Learning by Reading and Learning to Read.

  - Jie Zhanming, Xie Pengjun, Lu Wei, Ding Ruixue and Li Linlin. 2019. Better Modeling of Incomplete Annotations for Named Entity
  Recognition. In Proceedings of NAACL.

  View Slide

 5. NERͷֶशʹ࢖͑Δσʔλ


  ࣙॻ
  • ͍ΘΏΔࣙॻɼͨͩݴ༿ͷ಺༰ɼҙຯ·Ͱ࢖͏͜ͱ͸গͳ͍
  • ༷ʑͳ෼໺ʹஔ͍ͯݩ͔Βଘࡏ͍ͯ͠Δ͜ͱ͕ଟ͍
  • ਓ໊ࣙయ, ༀֶ༻ޠࣙయ, ෺࣭ɾࡐྉσʔλϕʔε
  ࣙॻϚονʹΑΔݻ༗දݱநग़Ͱ͸
  • ࣙॻʹଘࡏ͠ͳ͍୯ޠΛݕग़Ͱ͖ͳ͍
  • ݻ༗දݱʹؒҧͬͨϥϕϧ͕෇༩͞ΕΔ
  ͱ͍ͬͨ໰୊͕ੜ͡Δ
  Donald
  -
  John
  S-PER
  Trump
  -
  is
  -
  president
  -
  of
  -
  the
  -
  US
  -
  the
  -
  Person Dictionary
  John Michael
  ࣙॻϚον
  ࣙॻ + ੜίʔύεΛ༻͍ͨDistantly Supervisedͱ͍͏ख๏͕ग़ݱ

  View Slide

 6. ࠓճͷ෼໺ʹؔ܎͋Δ࿦จ
  ෦෼తΞϊςʔγϣϯίʔύε
  • Better Modeling of Incomplete Annotation for Named Entity
  Recognition
  Distantly Supervised NER
  • Distantly Supervised Named Entity Recognition using Positive-Unlabeled
  Learning
  • Distantly Supervised NER with Partial Annotation Learning and
  Reinforcement Learning
  • Learning Named Entity Tagger using Domain-Specific Dictionary
  • Distant supervision for relation extraction without labeled data.
  • Distant supervision for relation extraction via piecewise convolutional
  neural networks.


  View Slide

 7. Better Modeling of Incomplete Annotations

  for Named Entity Recognition
  ෆ׬શͳΞϊςʔγϣϯ͕ߦΘΕ͍ͯΔσʔλ͔ΒNERΛߦ͏ϞσϧͷఏҊ


  Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
  NAACL 2019
  എܠ
  • ෆ׬શͳΞϊςʔγϣϯΛԾఆ͢Δࡍʹɼ

  ୯ޠϨϕϧͰϥϕϧΛऔΓআ͘ͷ͸ݱ࣮తͰ͸ͳ͍ (A.1)
  • ໌ࣔతʹOλάΛΞϊςʔγϣϯ͢Δ͜ͱ͸ແ͍ (A.1, A.2)
  ࣮ӡ༻Ͱى͖͏Δϥϕϧܽଛ͸

  A.3ʹͳΔͱஶऀ͸ओு
  https://github.com/kajyuuen/Incomplete-NER-Methods
  ࠶ݱ࣮૷͠·ͨ͠:

  View Slide

 8. Better Modeling of Incomplete Annotations

  for Named Entity Recognition


  Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
  NAACL 2019
  ௨ৗͷCRF
  ఏҊख๏ͷCRF
  ఏҊख๏
  CRFΛ֦ு͠ɼऔΓ͏Δϥϕϧͷ૊Έ߹ΘͤΛߟྀ͢ΔϞσϧ
  ֬཰෼෍ ͷਪఆ
  Hard: ࠷΋Մೳੑͷߴ͍ϥϕϧܥྻʹ֬཰1ΛׂΓ౰ͯΔ
  Soft: ͋Γ͏Δશͯͷϥϕϧܥྻʹରͯ֬͠཰ΛׂΓ౰ͯΔ
  ͜ΕΒͷ֬཰෼෍͸k෼ׂަࠩݕূʹΑͬͯਪఆΛߦ͏
  q
  q = 1
  CRF-PA
  ఏҊख๏

  View Slide

 9. Better Modeling of Incomplete Annotations

  for Named Entity Recognition


  Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
  NAACL 2019
  ݁Ռ
  ৚݅ઃఆ: ޒׂͷϥϕϧܽଛ+શͯͷOλάΛ࡟আ
  • ׬શͳΞϊςʔγϣϯ͕෇͍ͨͱ͖ʹൺ΂ͯ΋·ͣ·ͣͳੑೳΛࣔ͢
  • ϥϕϧ͕෇͍͍ͯͳ͍৔ॴΛOͱͯ͠Έͳ͢Simpleʹ͸େউར

  View Slide

 10. Distantly Supervised Named Entity Recognition 

  using Positive-Unlabeled Learning


  Minlong Peng, Xiaoyu Xing, Qi Zhang, Jinlan Fu, Xuanjing Huang
  ACL 2019
  PUֶशΛར༻ͯࣙ͠ॻͱੜςΩετ͚ͩΛ༻͍ͯNERΛߦ͏
  ఏҊख๏
  • ࠷௕Ұக๏Λ༻͍ͯɼࣙॻ͔Βੜίʔύεʹରͯ͠ΞϊςʔγϣϯΛߦ͏
  • ϥϕϧ෇͚͕ߦΘΕͨσʔλΛPositive, ͦΕҎ֎ΛUnlabeledͱֶͯ͠श
  • BIOɼBIOESͱ͍ͬͨϥϕϦϯάεΩʔϚΛར༻͠ͳ͍͜ͱͰɼ

  ࣙॻʹΑΔޡΞϊςʔγϣϯΛݮΒ͢͜ͱ͕ग़དྷΔ
  • ֤ΫϥεຖʹPU෼ྨػΛ࡞੒ɼ༧ଌ֬཰͕࠷΋ߴ͍ΫϥεΛ࠾༻

  View Slide

 11. Distantly Supervised Named Entity Recognition 

  using Positive-Unlabeled Learning


  Minlong Peng, Xiaoyu Xing, Qi Zhang, Jinlan Fu, Xuanjing Huang
  ACL 2019
  ݁Ռ
  Ξϊςʔγϣϯίʔύε ࣙॻͱੜίʔύε
  ࣙॻϚονʹൺ΂ͯɼେ͖͘ੑೳΛ޲্ͤͨ͞

  View Slide

 12. Distantly Supervised NER with
  Partial Annotation Learning and Reinforcement Learning


  Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He, Min Zhang
  COLING 2018
  Distant SupervisionͱڧԽֶशͷ૊Έ߹ΘͤʹΑΔNER
  ఏҊख๏
  • ڧԽֶशΛ༻͍ͯnoisyͳϥϕϧ෇͚Λ࡟আ
  • CRF-PAΛ༻͍Δ͜ͱͰ

  imcompleteͳܥྻͰ΋ֶशΛՄೳʹ͢Δ
  എܠ
  • ࣙॻϚονʹΑͬͯ࡞੒ͨ͠ڭࢣσʔλ(Distant Supervision)ʹ͸

  imcomplete, noisyͳϥϕϧ෇͚͕ߦΘΕΔͱ͍͏໰୊͕͋Δ

  View Slide

 13. Distantly Supervised NER with
  Partial Annotation Learning and Reinforcement Learning


  Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He, Min Zhang
  COLING 2018
  ݁Ռ
  ࣙॻϚον΍LSTM-CRF-PAͷΈͷ৔߹ʹൺ΂ͯɼߴ͍நग़ੑೳΛ࣋ͭ
  ͕খ͞ΊͷΞϊςʔγϣϯίʔύε

  ͕Distantly SupervisedʹΑͬͯ࡞ΒΕͨڭࢣσʔλ


  View Slide

 14. αʔϕΠ ײ૝
  • ෦෼తΞϊςʔγϣϯίʔύεΛֶश͢ΔϞσϧ͸සൟʹݟΒΕͨ
  • Distantly Supervised NERͷݚڀͰ͸͔ͳΓͷ֬཰Ͱར༻ɼҾ༻͞ΕΔ
  • σʔληοτͲ͏͢Δͷ໰୊
  • ࣗ࡞͢Δύλʔϯ͕͔ͳΓଟ͍
  • CoNLL2003͔ΒϥϯμϜܽଛͤ͞Δύλʔϯ΋ݟΔ
  • ͲͪΒʹ͠Ζ࠶ݱੑͷ͋ΔϥϕϧܽଛΛߦ͏͜ͱ͕೉͍͠
  • ྲྀߦ͖͍ͬͯͯΔײ͡͸͢Δ
  • Nested NER΍DS NERͷΑ͏ͳෳࡶͳλεΫઃఆͷ࿦จଟ͍
  • ࣙॻ͚ͩͰNERΛߦ͍͍ͨؾ࣋ͪͷਓͨͪ͸୔ࢁ͍ͦ͏
  • ී௨ͷܥྻϥϕϦϯάλεΫʹݶք͕དྷ͍ͯΔʁ


  View Slide