Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Japanese Zero Anaphora Resolution Can Benefit from Parallel Texts Through Neural Transfer Learning

Japanese Zero Anaphora Resolution Can Benefit from Parallel Texts Through Neural Transfer Learning

Slide for paper accepted at EMNLP2021 findings

Masato Umakoshi

November 12, 2021
Tweet

More Decks by Masato Umakoshi

Other Decks in Technology

Transcript

  1. Japanese Zero Anaphora Resolution Can Benefit from Parallel Texts Through

    Neural Transfer Learning Masato Umakoshi, Yugo Murawaki and Sadao Kurohashi Kyoto University 1
  2. Zero Anaphora Resolution (ZAR) 2 • Detect omission of expression,

    zero pronoun (ZP) • Identify its antecedents • SOTA: BERT-based multi-task learning [Ueda+, 2020] 妻が 息⼦に いくつか おもちゃを 買ってあげた 。ϕ! =NOM ⾚い ⾞を 特に 気に⼊っている 。 wife=NOM several son=DAT toy=ACC red buy.GER=give.PST especially 𝜙! =NOM like.GER=be.NPST car=ACC
  3. Parallel Texts as Implicit Annotations for ZAR • Some omitted

    expressions in Japanese are obligatory in English 3 妻が 息⼦に いくつか おもちゃを 買ってあげた 。ϕ! =NOM ⾚い ⾞を 特に 気に⼊っている 。 wife=NOM several son=DAT toy=ACC red buy.GER=give.PST especially 𝜙! =NOM like.GER=be.NPST car=ACC My wife got my son several toys. He especially likes the red car.
  4. 妻が 息⼦に いくつか おもちゃを 買ってあげた 。ϕ! =NOM ⾚い ⾞を 特に

    気に⼊っている 。 Existing Approaches 4 • A line of research tried rule-based methods to annotate translation pairs with ZAR annotation (Nakaiwa, 1999; Furukawa+, 17) • Rely on word alignment, dependency parsing, and coreference resolution • However, their methods met limitation due to error propagation nsubj Coreference Word alignment My wife got my son several toys. He especially likes the red car.
  5. Idea: Neural Transfer Learning 5 • We propose neural transfer

    learning from MT • By generating English sentence, the MT model is forced to implicitly recover ZP • Toward the discrepancy between MT(encoder-decoder) and ZAR(encoder), we adopt BERT-based method • To mitigate catastrophic forgetting, we test to incorporate masked language modeling (MLM) during MT
  6. 6 Baseline Method • Argument selection, which is analogous to

    head selection in dependency parsing[Ueda+, 2020]
  7. Results 8 Web News Ueda et al. (2020) 70.3 56.7

    + MT 70.5 57.7 + MT w/ MLM 71.9 58.3 • MT contributes ZAR • Incorporating MLM leads to further gain
  8. Example 9 ZP was translated correctly (the school) and its

    antecedent was successfully identified OURS BASELINE GOLD Source Text Translation by model 第七⼗四回 全国 ⾼校 ラグビー フットボール ⼤会 準決勝の 五⽇ 、 近鉄花園ラグビー場の スタンドでは ϕ! -NOM 同 ラグビー 場から わずか ⼀・五キロの ⾄近距離に あり ながら 、 .. same rugby 𝜙! -NOM 1.5km=GEN distance=DAT be-GER only Twenty-two students of Osaka Korean pro-Pyongyang Korean high school students watched the final at Kintetsu National High School's Hanazono Stadium on Sunday . Although the school is only about five kilometers away from the stadium, .. ORD 74 CLF rugby national high.school football tournament semi final=GEN 5day Kintetsu Hanazono rugby field=GEN stands=LOC=TOP Osaka korea high school 25 CLF=NOM blue wind breaker rugby club.member appearance=INS watch=do.PST field=ABL but ⼤阪 朝鮮 ⾼級学校 ラグビー 部員 ⼆⼗五⼈が ⻘い ウインドブレーカー 姿で 観戦した。
  9. Conclusion • We propose neural transfer learning from MT •

    Our experiments show that ZAR can benefit from parallel texts thanks to the flexibility of neural networks • Our framework revived the old idea(1999!) • In addition, we find further gains can be obtained by incorporating MLM 10