Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Simplification_Using_Paraphrases_and_Context-based_Lexical_Substitution_.pdf

MARUYAMA
June 20, 2018
70

 Simplification_Using_Paraphrases_and_Context-based_Lexical_Substitution_.pdf

MARUYAMA

June 20, 2018
Tweet

Transcript

  1. Simplification Using Paraphrases and Context-based Lexical Substitution Nagaoka University of

    Technology MARUYAMA Takumi Literature review: Reno Kriz, Eleni Miltsakaki, Marianna Apidianaki and Chris Callison-Burch Proceedings of NAACL-HLT 2018, pages 207–217 1
  2. Abstract Ø Lexical simplification involves identifying complex words and recommending

    simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 2
  3. Abstract Ø Lexical simplification involves identifying complex words and recommending

    simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 3 complex word identification
  4. Abstract Ø Lexical simplification involves identifying complex words and recommending

    simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 4 Lexical substitution
  5. Identifying Complex Words Ø Gold-standard data • 200 texts from

    the Newsela corpus • Nine crowdsourced annotator labeled complex words • 17,318 labeled tokens 6
  6. Identifying Complex Words Ø Methods • Support Vector Machine classifier

    (Shardlow2013) • Random Forest Classifier (Shardlow2013) Ø Features • Word-specific feature: e.g.) Word length, Number of syllables, Word frequency (Google Web1T corpus), Number of WordNet synonyms • Context-specific feature: e.g.) Average length of words in the sentence, Average number of syllables, Average word frequency, Average number of WordNet synonyms, Sentence length 7
  7. Identifying Complex Words ØComparison systems • All-Complex: Labeling all words

    as complex • Token Length: thresholding for word length the length threshold with the best performance was 7 •n-gram Frequency: thresholding for word frequency using Google n-gram counts the frequency threshold with the best performance was 19,950,000 8
  8. Lexical substitution Ø Data • WordNet, PPDB, Simple PPDB Ø

    In-context ranking and substitution •!, #: Word embeddings of the substitute and target • $: The set of context embeddings 11
  9. Lexical substitution ØComparison systems • WordNet frequency: All WordNet synonyms

    of a complex word are ranked in decreasing order of Google n-gram frequency • Simple PPDB Score: All SimplePPDB synonyms of a complex word are ranked in decreasing order of their SimplePPDB score 12
  10. Summary ØThis paper proposed a complex word identification model and

    a simplification mechanism which relies on a word- embedding lexical substitution model. • Complex word identification: SVM-context • Lexical substitution: AddCos-SimplePPDB 16