BLEU_is_Not_Suitable_for_the_Evaluation_of_Text_Simplification.pdf

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=47 MARUYAMA
September 10, 2018
51

 BLEU_is_Not_Suitable_for_the_Evaluation_of_Text_Simplification.pdf

A3ea3bc5dde6ae2dd6eae71da9c418b0?s=128

MARUYAMA

September 10, 2018
Tweet

Transcript

  1. BLEU is Not Suitable for the Evaluation of Text Simplification

    Elior Sulem, Omri Abend, Ari Rappoport Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing(EMNLP2018) Nagaoka University of Technology Takumi Maruyama Literature review: 1
  2. Abstract Ø Text simplification(TS) )!0 )! %  ' Ø

    BLEU TS )! *# Ø /- • $&)! "(, • $& . + )!0 ' 2
  3. Introduction Ø 2 2)(! • Wiki-8REF (Xu et al. 2016)

    − .8'5 - − +,. 43 0  • Hsplit − /1*$ − +, 0 Ø 6& MT-based simplification system, sentence splitting system2) (2)7, ) Ø '5 2)7 2) #"% 3
  4. Gold-Standard Splitting Corpus

  5. Gold-Standard Splitting Corpus Ø Wiki-8REF (Xu et al. 2016)*!$$%) Ø

    1$4annotators(Native:2, Native-like proficiency: 2)#, Ø 2+-guidlines 1. $$'/ &"(%  2. % .%  5
  6. Gold-Standard Splitting Corpus Ø 4  ( 359) • HSplit1:

    annotator=Native, guideline=1 • HSplit2: annotator=Native-like, guideline=1 • HSplit3: annotator=Native, guideline=2 • HSplit4: annotator=Native-like, guideline=2 6
  7. Experiments

  8. Experimental Setup (Metrics) Ø BLEU • BP: brevity penalty •!"

    : weights (usually uniform) •#" : modified n-gram precisions Ø iBLEU • $, &, ': input text, output text, reference text • (: parameter 8
  9. Experimental Setup (Metrics) Ø Flesch-Kincaid Grade Level (FK) Ø SARI

    Ø Levenshtein distance (!"#$ ) 9
  10. Experimental Setup (Human evaluation) Ø 703 Ø   •

    Grammaticality (G) • Meaning preservation (M) • Simplicity (S) • Structural Simplicity (StS) Ø 5  10
  11. Experimental Setup (Systems) Ø Standard Reference Setting • Systems/Corpora without

    splits − Wiki-8REF(  ) − Six MT-based simplification systems (NTS, SBMT-SARI etc.) • All systems/Corpora − Wiki-8REF + HSplit − Six MT-based simplification systems (NTS, SBMT-SARI etc.) Ø HSplit as Reference Setting − HSplit − Six sentence splitting systems (DSS, SEMoses etc.) 11
  12. Results with Standard Reference Setting 12 Sentence-level spearman correlation (and

    p-values)
  13. Results with Standard Reference Setting 13 BLEUS, StS  

    Sentence-level spearman correlation (and p-values)
  14. Results with Standard Reference Setting 14    GM

      Sentence-level spearman correlation (and p-values)
  15. Results with Standard Reference Setting 15  BLEU  

     BLEU-8refLDspearman correlation: 0.86 BLEU-8refLDspearman correlation: 0.82 BLEU-8refLDspearman correlation: 0.52 BLEU-8refLDspearman correlation: 0.55 Sentence-level spearman correlation (and p-values)
  16. Results with Standard Reference Setting 16 Sentence-level spearman correlation (and

    p-values) SARI    
  17. Experimental Setup (Systems) Ø Standard Reference Setting • Systems/Corpora without

    splits − Wiki-8REF(  ) − Six MT-based simplification systems (NTS, SBMT-SARI etc.) • All systems/Corpora − Wiki-8REF + HSplit − Six MT-based simplification systems (NTS, SBMT-SARI etc.) Ø HSplit as Reference Setting − HSplit − Six sentence splitting systems (DSS, SEMoses etc.) 17
  18. Results with HSplit as Reference Setting 18 Sentence-level spearman correlation

    (and p-values) BLEUS, StS  
  19.  Ø BLEU Text simplification   • BLEUSimplicity 

      • BLEUGrammaticality, Meaning preservation  Levenshitein distanceGrammaticality, Meaning preservation  Ø “BLEU should not be used for the evaluation of text simplification in general and sentence splitting” 19