Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Split-and-Rephrase-en.pdf

MARUYAMA
February 27, 2018
80

 Split-and-Rephrase-en.pdf

MARUYAMA

February 27, 2018
Tweet

Transcript

  1. Split and Rephrase S. Narayan, C. Gardent, S. B. Cohen,

    and A. Shimorina, EMNLP, pp. 606–616, 2017. Nagaoka University of Technology Takumi Maruyama
  2. Introduction Ø Split-and-Rephrase task 1. Labour politician, John Clancy is

    the leader of Birmingham. 2. John Madin was born in this city. 3. He was the architect of 103 Colmore Row. John Clancy is a labour politican who leads Birmingham, where architect John Madin, who designed 103 Colmore Row, was born. 2 Complex Sentence
  3. Introduction Ø Contributions • Proposing new simplification task (Split-and Rephrase)

    • Creating and making available benchmark for Split-and-Rephrase systems https://github.com/shashiongithub/Split-and-Rephrase • Providing five models to understand difficulty of this task 5
  4. Creating the WEBSPLIT Benchmark Ø WEBNLG Dataset (Gardent, 2017) Item

    consists of a set of RDF triples and one or more text Ø RDF (Resource Description Format) triple • Framework for representing information in the Web • Format: (subject | property | object) “John was born in New York.”: (John | Birth place |New York) subject John object New York property Birth place 7
  5. Creating the WEBSPLIT Benchmark Ø Main idea Using RDF triple

    as meaning representation of text 8 “ ”, “ + ” likely to be a pair.
  6. Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

    WEBSPLIT dataset 1. Sentence segmentation 2. Pairing using semantic information 3. Ordering on sequences of texts 9
  7. Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

    WEBSPLIT dataset 1. Sentence segmentation − 13,308 verbalisations contained WEBNLG corpus − Using Stanford CoreNLP pipeline 2. Pairing using semantic information 3. Ordering on sequences of texts 10
  8. Creating the WEBSPLIT Benchmark 11 , , , , …

    , , , … , : Sequence of texts , … , : Meaning representation of , … , : Single complex sentence : Meaning representation of = ∪ ⋯ ∪ Ø 3 steps of creating the WEBSPLIT dataset 1. Sentence segmentation 2. Pairing using semantic information 3. Ordering on sequences of texts
  9. Creating the WEBSPLIT Benchmark Ø 3 steps of creating the

    WEBSPLIT dataset 1. Sentence segmentation 2. Paring using semantic information 3. Ordering on sequences of texts − Corresponding to left-to-right depth-first traversal of RDF triple 12
  10. Creating the WEBSPLIT Benchmark Ø WEBSPLIT Benchmark • Data size:

    1,100,166 pairs • Vocabulary size: 3,311 13
  11. Split-and-Rephrase models Ø Problem formulation 15 4: Meaning representation of

    678: Set of 6 , … , 8 : Simplified text : Complex sentence : Model parameters Parameter
  12. 4: Meaning representation of 678: Set of 6 , …

    , 8 : Simplified text : Complex sentence : Model parameters Parameter Split-and-Rephrase models Ø Problem formulation 16 MULTISEQ2SEQ (Zoph and Knight, 2016) HYBLID SIMPL (Narayan and Gardent,2014) SEQ2SEQ (Luong, 2015) SPLIT-MULTISEQ2SEQ, SPLIT-SEQ2SEQ
  13. Split-and-Rephrase models Ø HYBRID SIMPL (Narayan and Gardent,2014) • Simplification

    Model for splitting and deletion • Using phrase-based statistical machine translation • exploiting discourse representation structure 17
  14. Split-and-Rephrase models Ø SEQ2SEQ (Luong, 2015) • Encoder-Decoder model −Local

    attention −Input feed approach 18 Local attention Input feed approach
  15. Split-and-Rephrase models Ø MULTI SEQ2SEQ (Zoph and Knight, 2016) •

    Multi-source encoder-decoder model • To encode : − Complex sentence () − Meaning representation (4) which linearized by doing depth-first left-right RDF tree traversal 19 Encoder Decoder
  16. Split-and-Rephrase models Ø SPLIT-MULTISEQ2SEQ Ø SPLIT-SEQ2SEQ 20 6 , …

    , 8 |; 4 ; :Probabilistic model ? |; ? ; : MULTI SEQ2SEQ 6 , … , 8 |; 4 ; : Probabilistic model ? |; ? ; : SEQ2SEQ
  17. Summary Ø Proposing new simplification task (Split-and Rephrase) Ø Creating

    and making available benchmark for Split-and-Rephrase systems https://github.com/shashiongithub/Split-and-Rephrase Ø Providing five models to understand difficulty of this task 25
  18. References [1] C. Gardent, A. Shimorina, S. Narayan, and P.

    Loria, “Creating Training Corpora for NLG Micro-Planning,” In Proceedings of ACL, 2017. [2] S. Narayan, D. Lorraine, and C. Gardent, “Hybrid Simplification using Deep Semantics and Machine Translation,” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 435–445, 2014. [3] M. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention based Neural Machine Translation,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, 2015. [4] B. Zoph and K. Knight, “Multi-Source Neural Translation,” Proceedings of NAACL-HLT, pp. 30–34, 2016.