Simplification_Using_Paraphrases_and_Context-based_Lexical_Substitution_.pdf

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=47 MARUYAMA
June 20, 2018
52

 Simplification_Using_Paraphrases_and_Context-based_Lexical_Substitution_.pdf

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=128

MARUYAMA

June 20, 2018
Tweet

Transcript

  1. Simplification Using Paraphrases and Context-based Lexical Substitution Nagaoka University of

    Technology MARUYAMA Takumi Literature review: Reno Kriz, Eleni Miltsakaki, Marianna Apidianaki and Chris Callison-Burch Proceedings of NAACL-HLT 2018, pages 207–217 1
  2. Abstract Ø Lexical simplification involves identifying complex words and recommending

    simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 2
  3. Abstract Ø Lexical simplification involves identifying complex words and recommending

    simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 3 complex word identification
  4. Abstract Ø Lexical simplification involves identifying complex words and recommending

    simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 4 Lexical substitution
  5. Identifying Complex Words

  6. Identifying Complex Words Ø Gold-standard data • 200 texts from

    the Newsela corpus • Nine crowdsourced annotator labeled complex words • 17,318 labeled tokens 6
  7. Identifying Complex Words Ø Methods • Support Vector Machine classifier

    (Shardlow2013) • Random Forest Classifier (Shardlow2013) Ø Features • Word-specific feature: e.g.) Word length, Number of syllables, Word frequency (Google Web1T corpus), Number of WordNet synonyms • Context-specific feature: e.g.) Average length of words in the sentence, Average number of syllables, Average word frequency, Average number of WordNet synonyms, Sentence length 7
  8. Identifying Complex Words ØComparison systems • All-Complex: Labeling all words

    as complex • Token Length: thresholding for word length the length threshold with the best performance was 7 •n-gram Frequency: thresholding for word frequency using Google n-gram counts the frequency threshold with the best performance was 19,950,000 8
  9. Identifying Complex Words ØEvaluation 9

  10. Lexical substitution

  11. Lexical substitution Ø Data • WordNet, PPDB, Simple PPDB Ø

    In-context ranking and substitution •!, #: Word embeddings of the substitute and target • $: The set of context embeddings 11
  12. Lexical substitution ØComparison systems • WordNet frequency: All WordNet synonyms

    of a complex word are ranked in decreasing order of Google n-gram frequency • Simple PPDB Score: All SimplePPDB synonyms of a complex word are ranked in decreasing order of their SimplePPDB score 12
  13. Lexical substitution ØEvaluation 13

  14. Overall Simplification System

  15. Overall Simplification System Ø Evaluation • Complex word identification: SVM-context

    • Lexical substitution: AddCos-SimplePPDB 15
  16. Summary ØThis paper proposed a complex word identification model and

    a simplification mechanism which relies on a word- embedding lexical substitution model. • Complex word identification: SVM-context • Lexical substitution: AddCos-SimplePPDB 16