Split-and-Rephrase-en.pdf

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=47 MARUYAMA
February 27, 2018
60

 Split-and-Rephrase-en.pdf

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=128

MARUYAMA

February 27, 2018
Tweet

Transcript

  1. Split and Rephrase S. Narayan, C. Gardent, S. B. Cohen,

    and A. Shimorina, EMNLP, pp. 606–616, 2017. Nagaoka University of Technology Takumi Maruyama
  2. Introduction Ø Split-and-Rephrase task 1. Labour politician, John Clancy is

    the leader of Birmingham. 2. John Madin was born in this city. 3. He was the architect of 103 Colmore Row. John Clancy is a labour politican who leads Birmingham, where architect John Madin, who designed 103 Colmore Row, was born. 2 Complex Sentence
  3. Introduction Ø Split-and-Rephrase task 3

  4. Introduction Ø Split-and-Rephrase task 4 Splitting a complex sentence into

    shorter sentences while preserving meaning
  5. Introduction Ø Contributions • Proposing new simplification task (Split-and Rephrase)

    • Creating and making available benchmark for Split-and-Rephrase systems https://github.com/shashiongithub/Split-and-Rephrase • Providing five models to understand difficulty of this task 5
  6. Creating the WEBSPLIT Benchmark

  7. Creating the WEBSPLIT Benchmark Ø WEBNLG Dataset (Gardent, 2017) Item

    consists of a set of RDF triples and one or more text Ø RDF (Resource Description Format) triple • Framework for representing information in the Web • Format: (subject | property | object) “John was born in New York.”: (John | Birth place |New York) subject John object New York property Birth place 7
  8. Creating the WEBSPLIT Benchmark Ø Main idea Using RDF triple

    as meaning representation of text 8 “ ”, “ + ” likely to be a pair.
  9. Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

    WEBSPLIT dataset 1. Sentence segmentation 2. Pairing using semantic information 3. Ordering on sequences of texts 9
  10. Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

    WEBSPLIT dataset 1. Sentence segmentation − 13,308 verbalisations contained WEBNLG corpus − Using Stanford CoreNLP pipeline 2. Pairing using semantic information 3. Ordering on sequences of texts 10
  11. Creating the WEBSPLIT Benchmark 11 , , , , …

    , , , … , : Sequence of texts , … , : Meaning representation of , … , : Single complex sentence : Meaning representation of = ∪ ⋯ ∪ Ø 3 steps of creating the WEBSPLIT dataset 1. Sentence segmentation 2. Pairing using semantic information 3. Ordering on sequences of texts
  12. Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

    WEBSPLIT dataset 1. Sentence segmentation 2. Paring using semantic information 3. Ordering on sequences of texts − Corresponding to left-to-right depth-first traversal of RDF triple 12
  13. Creating the WEBSPLIT Benchmark Ø WEBSPLIT Benchmark • Data size:

    1,100,166 pairs • Vocabulary size: 3,311 13
  14. Split-and-Rephrase models

  15. Split-and-Rephrase models Ø Problem formulation 15 4: Meaning representation of

    678: Set of 6 , … , 8 : Simplified text : Complex sentence : Model parameters Parameter
  16. 4: Meaning representation of 678: Set of 6 , …

    , 8 : Simplified text : Complex sentence : Model parameters Parameter Split-and-Rephrase models Ø Problem formulation 16 MULTISEQ2SEQ (Zoph and Knight, 2016) HYBLID SIMPL (Narayan and Gardent,2014) SEQ2SEQ (Luong, 2015) SPLIT-MULTISEQ2SEQ, SPLIT-SEQ2SEQ
  17. Split-and-Rephrase models Ø HYBRID SIMPL (Narayan and Gardent,2014) • Simplification

    Model for splitting and deletion • Using phrase-based statistical machine translation • exploiting discourse representation structure 17
  18. Split-and-Rephrase models Ø SEQ2SEQ (Luong, 2015) • Encoder-Decoder model −Local

    attention −Input feed approach 18 Local attention Input feed approach
  19. Split-and-Rephrase models Ø MULTI SEQ2SEQ (Zoph and Knight, 2016) •

    Multi-source encoder-decoder model • To encode : − Complex sentence () − Meaning representation (4) which linearized by doing depth-first left-right RDF tree traversal 19 Encoder Decoder
  20. Split-and-Rephrase models Ø SPLIT-MULTISEQ2SEQ Ø SPLIT-SEQ2SEQ 20 6 , …

    , 8 |; 4 ; :Probabilistic model ? |; ? ; : MULTI SEQ2SEQ 6 , … , 8 |; 4 ; : Probabilistic model ? |; ? ; : SEQ2SEQ
  21. Split-and-Rephrase models 21

  22. Results

  23. Results 23

  24. Results 24 Ø Example outputs from different models

  25. Summary Ø Proposing new simplification task (Split-and Rephrase) Ø Creating

    and making available benchmark for Split-and-Rephrase systems https://github.com/shashiongithub/Split-and-Rephrase Ø Providing five models to understand difficulty of this task 25
  26. References [1] C. Gardent, A. Shimorina, S. Narayan, and P.

    Loria, “Creating Training Corpora for NLG Micro-Planning,” In Proceedings of ACL, 2017. [2] S. Narayan, D. Lorraine, and C. Gardent, “Hybrid Simplification using Deep Semantics and Machine Translation,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 435–445, 2014. [3] M. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention based Neural Machine Translation,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, 2015. [4] B. Zoph and K. Knight, “Multi-Source Neural Translation,” Proceedings of NAACL-HLT, pp. 30–34, 2016.