Simplification_Using_Paraphrases_and_Context-based_Lexical_Substitution_.pdf

Simplification Using Paraphrases and Context-based Lexical Substitution Nagaoka University of
Technology MARUYAMA Takumi Literature review: Reno Kriz, Eleni Miltsakaki, Marianna Apidianaki and Chris Callison-Burch Proceedings of NAACL-HLT 2018, pages 207–217 1

Abstract Ø Lexical simplification involves identifying complex words and recommending
simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 2

simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 3 complex word identification

simpler substitutes Ø This paper proposed a complex word identification model and a simplification mechanism which relies on a word- embedding lexical substitution model. 4 Lexical substitution

Identifying Complex Words

Identifying Complex Words Ø Gold-standard data • 200 texts from
the Newsela corpus • Nine crowdsourced annotator labeled complex words • 17,318 labeled tokens 6

Identifying Complex Words Ø Methods • Support Vector Machine classifier
(Shardlow2013) • Random Forest Classifier (Shardlow2013) Ø Features • Word-specific feature: e.g.) Word length, Number of syllables, Word frequency (Google Web1T corpus), Number of WordNet synonyms • Context-specific feature: e.g.) Average length of words in the sentence, Average number of syllables, Average word frequency, Average number of WordNet synonyms, Sentence length 7

Identifying Complex Words ØComparison systems • All-Complex: Labeling all words
as complex • Token Length: thresholding for word length the length threshold with the best performance was 7 •n-gram Frequency: thresholding for word frequency using Google n-gram counts the frequency threshold with the best performance was 19,950,000 8

Identifying Complex Words ØEvaluation 9

Lexical substitution

Lexical substitution Ø Data • WordNet, PPDB, Simple PPDB Ø
In-context ranking and substitution •!, #: Word embeddings of the substitute and target • $: The set of context embeddings 11

Lexical substitution ØComparison systems • WordNet frequency: All WordNet synonyms
of a complex word are ranked in decreasing order of Google n-gram frequency • Simple PPDB Score: All SimplePPDB synonyms of a complex word are ranked in decreasing order of their SimplePPDB score 12

Lexical substitution ØEvaluation 13

Overall Simplification System

Overall Simplification System Ø Evaluation • Complex word identification: SVM-context
• Lexical substitution: AddCos-SimplePPDB 15

Summary ØThis paper proposed a complex word identification model and
a simplification mechanism which relies on a word- embedding lexical substitution model. • Complex word identification: SVM-context • Lexical substitution: AddCos-SimplePPDB 16

Simplification_Using_Paraphrases_and_Context-ba...

Simplification_Using_Paraphrases_and_Context-based_Lexical_Substitution_.pdf

MARUYAMA

More Decks by MARUYAMA

Featured

Transcript

Simplification Using Paraphrases and Context-based Lexical Substitution Nagaoka University of

Abstract Ø Lexical simplification involves identifying complex words and recommending

Abstract Ø Lexical simplification involves identifying complex words and recommending

Abstract Ø Lexical simplification involves identifying complex words and recommending

Identifying Complex Words

Identifying Complex Words Ø Gold-standard data • 200 texts from

Identifying Complex Words Ø Methods • Support Vector Machine classifier

Identifying Complex Words ØComparison systems • All-Complex: Labeling all words

Identifying Complex Words ØEvaluation 9

Lexical substitution

Lexical substitution Ø Data • WordNet, PPDB, Simple PPDB Ø

Lexical substitution ØComparison systems • WordNet frequency: All WordNet synonyms

Lexical substitution ØEvaluation 13

Overall Simplification System

Overall Simplification System Ø Evaluation • Complex word identification: SVM-context

Summary ØThis paper proposed a complex word identification model and