Upgrade to Pro — share decks privately, control downloads, hide ads and more …

完全なアノテーションが得られない状況下での固有表現抽出

Koga Kobayashi
September 18, 2019

 完全なアノテーションが得られない状況下での固有表現抽出

Koga Kobayashi

September 18, 2019
Tweet

More Decks by Koga Kobayashi

Other Decks in Technology

Transcript

  1. ׬શͳΞϊςʔγϣϯ͕

    ಘΒΕͳ͍ঢ়گԼͰͷݻ༗දݱநग़
    খྛ ᕣՏ @kajyuuen

    View Slide

  2. ࣗݾ঺հ


    খྛ ᕣՏ (Koga Kobayashi)
    ID: @kajyuuen
    ஜ೾େֶେֶӃ म࢜1೥
    NERϒϩάΛॻ͍ͯ·͢

    View Slide

  3. NERͷֶशʹ࢖͑Δσʔλ
    Ξϊςʔγϣϯίʔύε
    • NERλεΫͰҰൠతʹར༻͞ΕΔσʔληοτ
    • ୯ޠʹରͯ͠ɼ֤ʑରԠ͢Δϥϕϧ͕෇༩͞Ε͍ͯΔ
    σϝϦοτ
    • υϝΠϯຖʹ࡞Γ௚͢ඞཁ͕͋Δ
    • Ξϊςʔγϣϯʹߴ͍ίετ͕͔͔Δ
    Donald
    B-PER
    John
    I-PER
    Trump
    E-PER
    is
    O
    president
    O
    of
    O
    the
    O
    US
    S-LOC
    the
    O
    ΞϊςʔγϣϯίʔύεΛ࡞ΔίετΛݮΒ͢ or ແ͍ͨ͘͠


    View Slide

  4. NERͷֶशʹ࢖͑Δσʔλ
    ෦෼తΞϊςʔγϣϯίʔύε
    • Ұ෦ͷ୯ޠʹ͚ͩΞϊςʔγϣϯ͕෇༩͞Ε͍ͯΔίʔύε
    • Ξϊςʔλ͸ࣗ৴ͷͳ͍୯ޠʹΞϊςʔγϣϯ͠ͳͯ͘ࡁΉ
    σϝϦοτ
    • ֶशʹগ͠޻෉͕ඞཁ(ҰൠతͳCRFͰ͸ֶशग़དྷͳ͍)


    Donald
    -
    John
    B-PER
    Trump
    E-PER
    is
    -
    president
    -
    of
    O
    the
    -
    US
    -
    the
    O
    ෦෼తΞϊςʔγϣϯίʔύε
    - Eraldo R Fernandes and Ulf Brefeld. 2011. Learning from partially annotated sequences. In Proceedings of ECML-KDD.

    - Andrew Carlson, Scott Gaffney, and Flavian Vasile. 2009. Learning a named entity tagger from gazetteers with the partial perceptron.
    In Proceedings of AAAI Spring Symposium: Learning by Reading and Learning to Read.

    - Jie Zhanming, Xie Pengjun, Lu Wei, Ding Ruixue and Li Linlin. 2019. Better Modeling of Incomplete Annotations for Named Entity
    Recognition. In Proceedings of NAACL.

    View Slide

  5. NERͷֶशʹ࢖͑Δσʔλ


    ࣙॻ
    • ͍ΘΏΔࣙॻɼͨͩݴ༿ͷ಺༰ɼҙຯ·Ͱ࢖͏͜ͱ͸গͳ͍
    • ༷ʑͳ෼໺ʹஔ͍ͯݩ͔Βଘࡏ͍ͯ͠Δ͜ͱ͕ଟ͍
    • ਓ໊ࣙయ, ༀֶ༻ޠࣙయ, ෺࣭ɾࡐྉσʔλϕʔε
    ࣙॻϚονʹΑΔݻ༗දݱநग़Ͱ͸
    • ࣙॻʹଘࡏ͠ͳ͍୯ޠΛݕग़Ͱ͖ͳ͍
    • ݻ༗දݱʹؒҧͬͨϥϕϧ͕෇༩͞ΕΔ
    ͱ͍ͬͨ໰୊͕ੜ͡Δ
    Donald
    -
    John
    S-PER
    Trump
    -
    is
    -
    president
    -
    of
    -
    the
    -
    US
    -
    the
    -
    Person Dictionary
    John Michael
    ࣙॻϚον
    ࣙॻ + ੜίʔύεΛ༻͍ͨDistantly Supervisedͱ͍͏ख๏͕ग़ݱ

    View Slide

  6. ࠓճͷ෼໺ʹؔ܎͋Δ࿦จ
    ෦෼తΞϊςʔγϣϯίʔύε
    • Better Modeling of Incomplete Annotation for Named Entity
    Recognition
    Distantly Supervised NER
    • Distantly Supervised Named Entity Recognition using Positive-Unlabeled
    Learning
    • Distantly Supervised NER with Partial Annotation Learning and
    Reinforcement Learning
    • Learning Named Entity Tagger using Domain-Specific Dictionary
    • Distant supervision for relation extraction without labeled data.
    • Distant supervision for relation extraction via piecewise convolutional
    neural networks.


    View Slide

  7. Better Modeling of Incomplete Annotations

    for Named Entity Recognition
    ෆ׬શͳΞϊςʔγϣϯ͕ߦΘΕ͍ͯΔσʔλ͔ΒNERΛߦ͏ϞσϧͷఏҊ


    Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
    NAACL 2019
    എܠ
    • ෆ׬શͳΞϊςʔγϣϯΛԾఆ͢Δࡍʹɼ

    ୯ޠϨϕϧͰϥϕϧΛऔΓআ͘ͷ͸ݱ࣮తͰ͸ͳ͍ (A.1)
    • ໌ࣔతʹOλάΛΞϊςʔγϣϯ͢Δ͜ͱ͸ແ͍ (A.1, A.2)
    ࣮ӡ༻Ͱى͖͏Δϥϕϧܽଛ͸

    A.3ʹͳΔͱஶऀ͸ओு
    https://github.com/kajyuuen/Incomplete-NER-Methods
    ࠶ݱ࣮૷͠·ͨ͠:

    View Slide

  8. Better Modeling of Incomplete Annotations

    for Named Entity Recognition


    Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
    NAACL 2019
    ௨ৗͷCRF
    ఏҊख๏ͷCRF
    ఏҊख๏
    CRFΛ֦ு͠ɼऔΓ͏Δϥϕϧͷ૊Έ߹ΘͤΛߟྀ͢ΔϞσϧ
    ֬཰෼෍ ͷਪఆ
    Hard: ࠷΋Մೳੑͷߴ͍ϥϕϧܥྻʹ֬཰1ΛׂΓ౰ͯΔ
    Soft: ͋Γ͏Δશͯͷϥϕϧܥྻʹରͯ֬͠཰ΛׂΓ౰ͯΔ
    ͜ΕΒͷ֬཰෼෍͸k෼ׂަࠩݕূʹΑͬͯਪఆΛߦ͏
    q
    q = 1
    CRF-PA
    ఏҊख๏

    View Slide

  9. Better Modeling of Incomplete Annotations

    for Named Entity Recognition


    Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
    NAACL 2019
    ݁Ռ
    ৚݅ઃఆ: ޒׂͷϥϕϧܽଛ+શͯͷOλάΛ࡟আ
    • ׬શͳΞϊςʔγϣϯ͕෇͍ͨͱ͖ʹൺ΂ͯ΋·ͣ·ͣͳੑೳΛࣔ͢
    • ϥϕϧ͕෇͍͍ͯͳ͍৔ॴΛOͱͯ͠Έͳ͢Simpleʹ͸େউར

    View Slide

  10. Distantly Supervised Named Entity Recognition 

    using Positive-Unlabeled Learning


    Minlong Peng, Xiaoyu Xing, Qi Zhang, Jinlan Fu, Xuanjing Huang
    ACL 2019
    PUֶशΛར༻ͯࣙ͠ॻͱੜςΩετ͚ͩΛ༻͍ͯNERΛߦ͏
    ఏҊख๏
    • ࠷௕Ұக๏Λ༻͍ͯɼࣙॻ͔Βੜίʔύεʹରͯ͠ΞϊςʔγϣϯΛߦ͏
    • ϥϕϧ෇͚͕ߦΘΕͨσʔλΛPositive, ͦΕҎ֎ΛUnlabeledͱֶͯ͠श
    • BIOɼBIOESͱ͍ͬͨϥϕϦϯάεΩʔϚΛར༻͠ͳ͍͜ͱͰɼ

    ࣙॻʹΑΔޡΞϊςʔγϣϯΛݮΒ͢͜ͱ͕ग़དྷΔ
    • ֤ΫϥεຖʹPU෼ྨػΛ࡞੒ɼ༧ଌ֬཰͕࠷΋ߴ͍ΫϥεΛ࠾༻

    View Slide

  11. Distantly Supervised Named Entity Recognition 

    using Positive-Unlabeled Learning


    Minlong Peng, Xiaoyu Xing, Qi Zhang, Jinlan Fu, Xuanjing Huang
    ACL 2019
    ݁Ռ
    Ξϊςʔγϣϯίʔύε ࣙॻͱੜίʔύε
    ࣙॻϚονʹൺ΂ͯɼେ͖͘ੑೳΛ޲্ͤͨ͞

    View Slide

  12. Distantly Supervised NER with
    Partial Annotation Learning and Reinforcement Learning


    Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He, Min Zhang
    COLING 2018
    Distant SupervisionͱڧԽֶशͷ૊Έ߹ΘͤʹΑΔNER
    ఏҊख๏
    • ڧԽֶशΛ༻͍ͯnoisyͳϥϕϧ෇͚Λ࡟আ
    • CRF-PAΛ༻͍Δ͜ͱͰ

    imcompleteͳܥྻͰ΋ֶशΛՄೳʹ͢Δ
    എܠ
    • ࣙॻϚονʹΑͬͯ࡞੒ͨ͠ڭࢣσʔλ(Distant Supervision)ʹ͸

    imcomplete, noisyͳϥϕϧ෇͚͕ߦΘΕΔͱ͍͏໰୊͕͋Δ

    View Slide

  13. Distantly Supervised NER with
    Partial Annotation Learning and Reinforcement Learning


    Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He, Min Zhang
    COLING 2018
    ݁Ռ
    ࣙॻϚον΍LSTM-CRF-PAͷΈͷ৔߹ʹൺ΂ͯɼߴ͍நग़ੑೳΛ࣋ͭ
    ͕খ͞ΊͷΞϊςʔγϣϯίʔύε

    ͕Distantly SupervisedʹΑͬͯ࡞ΒΕͨڭࢣσʔλ


    View Slide

  14. αʔϕΠ ײ૝
    • ෦෼తΞϊςʔγϣϯίʔύεΛֶश͢ΔϞσϧ͸සൟʹݟΒΕͨ
    • Distantly Supervised NERͷݚڀͰ͸͔ͳΓͷ֬཰Ͱར༻ɼҾ༻͞ΕΔ
    • σʔληοτͲ͏͢Δͷ໰୊
    • ࣗ࡞͢Δύλʔϯ͕͔ͳΓଟ͍
    • CoNLL2003͔ΒϥϯμϜܽଛͤ͞Δύλʔϯ΋ݟΔ
    • ͲͪΒʹ͠Ζ࠶ݱੑͷ͋ΔϥϕϧܽଛΛߦ͏͜ͱ͕೉͍͠
    • ྲྀߦ͖͍ͬͯͯΔײ͡͸͢Δ
    • Nested NER΍DS NERͷΑ͏ͳෳࡶͳλεΫઃఆͷ࿦จଟ͍
    • ࣙॻ͚ͩͰNERΛߦ͍͍ͨؾ࣋ͪͷਓͨͪ͸୔ࢁ͍ͦ͏
    • ී௨ͷܥྻϥϕϦϯάλεΫʹݶք͕དྷ͍ͯΔʁ


    View Slide