Split-and-Rephrase-ja.pdf

Split and Rephrase S. Narayan, C. Gardent, S. B. Cohen,
and A. Shimorina, EMNLP, pp. 606–616, 2017. ⻑岡技術科学⼤学⾃然⾔語処理研究室丸⼭拓海

はじめに Ø Split-and-Rephrase 難解⽂を短い⽂に分割する 1. 彼⼥は約束の時間になっても来ない。 2. 私は彼⼥に電話をかけた。約束の時間になったが、彼⼥は来ないので電話をかけた。
Splitting, Rephrasing 2

はじめに Ø Split-and-Rephrase 3

はじめに Ø Split-and-Rephrase 4 「語の削除」「語彙平易化」は⾏わない

はじめに Ø 本稿の貢献 • ”Split-and-Rephrase”を提案 • ⽂分割コーパスの構築 https://github.com/shashiongithub/Split-and-Rephrase • 5つのモデルで
”Split-and-Rephrase” を検証 5

WEBSPLIT Benchmarkの構築

WEBSPLIT Benchmarkの構築 Ø WEBNLG Dataset (Gardent, 2017)を利⽤ RDF tripleと対応する1つ以上のテキストによって構成 Ø
RDF (Resource Description Format) triple • WEB上のリソースを記述する枠組み • (subject | property | object)の3つの要素を持つ「空の⾊は⻘い」: (空|⾊を持つ|⻘) subject 空 object ⻘ property (〜という)⾊を持つ 7

WEBSPLIT Benchmarkの構築 Ø Main idea RDF tripleをテキスト(T)の意味表現(M)として利⽤ 8 「と
」,「と + 」を⽂分割ペアに！

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 2. RDF tripleと⽂数に基づき、テキストを対応付け
3. RDF tripleに基づいて' , … , * を並び替え 9

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 − Stanford CoreNLPを使⽤
− ⼈⼿で修正 2. RDF tripleと⽂数に基づいてテキストを対応付け 3. RDF tripleに基づいて' , … , * を並び替え 10

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 2. RDF tripleと⽂数に基づいてテキストを対応付け
3. RDF tripleに基づいて' , … , * を並び替え 11 , , , , … , , , … , : テキスト , … , : 意味表現 , … , は独⽴ : 難解テキスト : の意味表現 = ∪ ⋯ ∪

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 2. RDF tripleと⽂数に基づいてテキストを対応付け
3. RDF tripleに基づいて , … , を並び替え 12

WEBSPLIT Benchmarkの構築 Ø WEBSPLIT Benchmark • データサイズ: 1,100,166ペア • 語彙サイズ:
3,311語 13

⽂分割モデル

⽂分割モデル Ø 問題の定式化 15 4: の意味表現 '6* : 意味表現 '
, … , * : ⽂分割されたテキスト : 難解⽂ : ⽂分割モデルの変数 Parameter

⽂分割モデル Ø 問題の定式化 16 4: の意味表現 '6* : 意味表現 '
, … , * : ⽂分割されたテキスト : 難解⽂ : ⽂分割モデルの変数 Parameter MULTISEQ2SEQ (Zoph and Knight, 2016) HYBLID SIMPL (Narayan and Gardent,2014) SEQ2SEQ (Luong, 2015) SPLIT-MULTISEQ2SEQ, SPLIT-SEQ2SEQ

⽂分割モデル Ø HYBRID SIMPL (Narayan and Gardent,2014) • |; をモデル化
• PB-SMTを⽤いた平易化モデル • 談話構造を利⽤した⽂分割 17

⽂分割モデル Ø SEQ2SEQ (Luong, 2015) • |; をモデル化 • Encoder-Decoder
(Local attention, Input feed approach) 18 Local attention Input feed approach

⽂分割モデル Ø MULTI SEQ2SEQ (Zoph and Knight, 2016) • |;
4 ; をモデル化 • Multi-source encoder-decoder model • 難解⽂と意味表現4(深さ優先で線形化)を⼊⼒ 19 Encoder Decoder

⽂分割モデル 21

実験結果

実験結果 23

実験結果 24 Ø 出⼒例

まとめ Ø ”Split-and-Rephrase”を提案 Ø ⽂分割コーパスの構築 https://github.com/shashiongithub/Split-and-Rephrase Ø 5つのモデルで ”Split-and-Rephrase” を検証
25

参考⽂献 [1] C. Gardent, A. Shimorina, S. Narayan, and P.
Loria, “Creating Training Corpora for NLG Micro-Planning,” In Proceedings of ACL, 2017. [2] S. Narayan, D. Lorraine, and C. Gardent, “Hybrid Simplification using Deep Semantics and Machine Translation,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 435–445, 2014. [3] M. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention based Neural Machine Translation,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, 2015. [4] B. Zoph and K. Knight, “Multi-Source Neural Translation,” Proceedings of NAACL-HLT, pp. 30–34, 2016.

Split-and-Rephrase-ja.pdf

Split-and-Rephrase-ja.pdf

MARUYAMA

More Decks by MARUYAMA

Featured

Transcript

Split and Rephrase S. Narayan, C. Gardent, S. B. Cohen,

はじめに Ø Split-and-Rephrase 難解⽂を短い⽂に分割する 1. 彼⼥は約束の時間になっても来ない。 2. 私は彼⼥に電話をかけた。約束の時間になったが、彼⼥は来ないので電話をかけた。

はじめに Ø Split-and-Rephrase 3

はじめに Ø Split-and-Rephrase 4 「語の削除」「語彙平易化」は⾏わない

はじめに Ø 本稿の貢献 • ”Split-and-Rephrase”を提案 • ⽂分割コーパスの構築 https://github.com/shashiongithub/Split-and-Rephrase • 5つのモデルで

WEBSPLIT Benchmarkの構築

WEBSPLIT Benchmarkの構築 Ø WEBNLG Dataset (Gardent, 2017)を利⽤ RDF tripleと対応する1つ以上のテキストによって構成 Ø

WEBSPLIT Benchmarkの構築 Ø Main idea RDF tripleをテキスト(T)の意味表現(M)として利⽤ 8 「と

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 2. RDF tripleと⽂数に基づき、テキストを対応付け

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 − Stanford CoreNLPを使⽤

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 2. RDF tripleと⽂数に基づいてテキストを対応付け

WEBSPLIT Benchmarkの構築 Ø 構築⼿順 1. WEBNLG datasetを⽂単位に分割 2. RDF tripleと⽂数に基づいてテキストを対応付け

WEBSPLIT Benchmarkの構築 Ø WEBSPLIT Benchmark • データサイズ: 1,100,166ペア • 語彙サイズ:

⽂分割モデル

⽂分割モデル Ø 問題の定式化 15 4: の意味表現 '6* : 意味表現 '

⽂分割モデル Ø 問題の定式化 16 4: の意味表現 '6* : 意味表現 '

⽂分割モデル Ø HYBRID SIMPL (Narayan and Gardent,2014) • |; をモデル化

⽂分割モデル Ø SEQ2SEQ (Luong, 2015) • |; をモデル化 • Encoder-Decoder

⽂分割モデル Ø MULTI SEQ2SEQ (Zoph and Knight, 2016) • |;

⽂分割モデル Ø SPLIT-MULTISEQ2SEQ ' , … , * |; 4

⽂分割モデル 21

実験結果

実験結果 23

実験結果 24 Ø 出⼒例

まとめ Ø ”Split-and-Rephrase”を提案 Ø ⽂分割コーパスの構築 https://github.com/shashiongithub/Split-and-Rephrase Ø 5つのモデルで ”Split-and-Rephrase” を検証

参考⽂献 [1] C. Gardent, A. Shimorina, S. Narayan, and P.