Split-and-Rephrase-en.pdf

Split and Rephrase S. Narayan, C. Gardent, S. B. Cohen,
and A. Shimorina, EMNLP, pp. 606–616, 2017. Nagaoka University of Technology Takumi Maruyama

Introduction Ø Split-and-Rephrase task 1. Labour politician, John Clancy is
the leader of Birmingham. 2. John Madin was born in this city. 3. He was the architect of 103 Colmore Row. John Clancy is a labour politican who leads Birmingham, where architect John Madin, who designed 103 Colmore Row, was born. 2 Complex Sentence

Introduction Ø Split-and-Rephrase task 3

Introduction Ø Split-and-Rephrase task 4 Splitting a complex sentence into
shorter sentences while preserving meaning

Introduction Ø Contributions • Proposing new simplification task (Split-and Rephrase)
• Creating and making available benchmark for Split-and-Rephrase systems https://github.com/shashiongithub/Split-and-Rephrase • Providing five models to understand difficulty of this task 5

Creating the WEBSPLIT Benchmark

Creating the WEBSPLIT Benchmark Ø WEBNLG Dataset (Gardent, 2017) Item
consists of a set of RDF triples and one or more text Ø RDF (Resource Description Format) triple • Framework for representing information in the Web • Format: (subject | property | object) “John was born in New York.”: (John | Birth place |New York) subject John object New York property Birth place 7

Creating the WEBSPLIT Benchmark Ø Main idea Using RDF triple
as meaning representation of text 8 “ ”, “ + ” likely to be a pair.

Creating the WEBSPLIT Benchmark Ø 3 steps of creating the
WEBSPLIT dataset 1. Sentence segmentation 2. Pairing using semantic information 3. Ordering on sequences of texts 9

WEBSPLIT dataset 1. Sentence segmentation − 13,308 verbalisations contained WEBNLG corpus − Using Stanford CoreNLP pipeline 2. Pairing using semantic information 3. Ordering on sequences of texts 10

Creating the WEBSPLIT Benchmark 11 , , , , …
, , , … , : Sequence of texts , … , : Meaning representation of , … , : Single complex sentence : Meaning representation of = ∪ ⋯ ∪ Ø 3 steps of creating the WEBSPLIT dataset 1. Sentence segmentation 2. Pairing using semantic information 3. Ordering on sequences of texts

WEBSPLIT dataset 1. Sentence segmentation 2. Paring using semantic information 3. Ordering on sequences of texts − Corresponding to left-to-right depth-first traversal of RDF triple 12

Creating the WEBSPLIT Benchmark Ø WEBSPLIT Benchmark • Data size:
1,100,166 pairs • Vocabulary size: 3,311 13

Split-and-Rephrase models

Split-and-Rephrase models Ø Problem formulation 15 4: Meaning representation of
678: Set of 6 , … , 8 : Simplified text : Complex sentence : Model parameters Parameter

4: Meaning representation of 678: Set of 6 , …
, 8 : Simplified text : Complex sentence : Model parameters Parameter Split-and-Rephrase models Ø Problem formulation 16 MULTISEQ2SEQ (Zoph and Knight, 2016) HYBLID SIMPL (Narayan and Gardent,2014) SEQ2SEQ (Luong, 2015) SPLIT-MULTISEQ2SEQ, SPLIT-SEQ2SEQ

Split-and-Rephrase models Ø HYBRID SIMPL (Narayan and Gardent,2014) • Simplification
Model for splitting and deletion • Using phrase-based statistical machine translation • exploiting discourse representation structure 17

Split-and-Rephrase models Ø SEQ2SEQ (Luong, 2015) • Encoder-Decoder model −Local
attention −Input feed approach 18 Local attention Input feed approach

Split-and-Rephrase models Ø MULTI SEQ2SEQ (Zoph and Knight, 2016) •
Multi-source encoder-decoder model • To encode : − Complex sentence () − Meaning representation (4) which linearized by doing depth-first left-right RDF tree traversal 19 Encoder Decoder

Split-and-Rephrase models Ø SPLIT-MULTISEQ2SEQ Ø SPLIT-SEQ2SEQ 20 6 , …
, 8 |; 4 ; :Probabilistic model ? |; ? ; : MULTI SEQ2SEQ 6 , … , 8 |; 4 ; : Probabilistic model ? |; ? ; : SEQ2SEQ

Split-and-Rephrase models 21

Results

Results 23

Results 24 Ø Example outputs from different models

Summary Ø Proposing new simplification task (Split-and Rephrase) Ø Creating
and making available benchmark for Split-and-Rephrase systems https://github.com/shashiongithub/Split-and-Rephrase Ø Providing five models to understand difficulty of this task 25

References [1] C. Gardent, A. Shimorina, S. Narayan, and P.
Loria, “Creating Training Corpora for NLG Micro-Planning,” In Proceedings of ACL, 2017. [2] S. Narayan, D. Lorraine, and C. Gardent, “Hybrid Simplification using Deep Semantics and Machine Translation,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 435–445, 2014. [3] M. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention based Neural Machine Translation,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, 2015. [4] B. Zoph and K. Knight, “Multi-Source Neural Translation,” Proceedings of NAACL-HLT, pp. 30–34, 2016.

Split-and-Rephrase-en.pdf

Split-and-Rephrase-en.pdf

MARUYAMA

More Decks by MARUYAMA

Featured

Transcript

Split and Rephrase S. Narayan, C. Gardent, S. B. Cohen,

Introduction Ø Split-and-Rephrase task 1. Labour politician, John Clancy is

Introduction Ø Split-and-Rephrase task 3

Introduction Ø Split-and-Rephrase task 4 Splitting a complex sentence into

Introduction Ø Contributions • Proposing new simplification task (Split-and Rephrase)

Creating the WEBSPLIT Benchmark

Creating the WEBSPLIT Benchmark Ø WEBNLG Dataset (Gardent, 2017) Item

Creating the WEBSPLIT Benchmark Ø Main idea Using RDF triple

Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

Creating the WEBSPLIT Benchmark 11 , , , , …

Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

Creating the WEBSPLIT Benchmark Ø WEBSPLIT Benchmark • Data size:

Split-and-Rephrase models

Split-and-Rephrase models Ø Problem formulation 15 4: Meaning representation of

4: Meaning representation of 678: Set of 6 , …

Split-and-Rephrase models Ø HYBRID SIMPL (Narayan and Gardent,2014) • Simplification

Split-and-Rephrase models Ø SEQ2SEQ (Luong, 2015) • Encoder-Decoder model −Local

Split-and-Rephrase models Ø MULTI SEQ2SEQ (Zoph and Knight, 2016) •

Split-and-Rephrase models Ø SPLIT-MULTISEQ2SEQ Ø SPLIT-SEQ2SEQ 20 6 , …

Split-and-Rephrase models 21

Results

Results 23

Results 24 Ø Example outputs from different models

Summary Ø Proposing new simplification task (Split-and Rephrase) Ø Creating

References [1] C. Gardent, A. Shimorina, S. Narayan, and P.