Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BLEU_is_Not_Suitable_for_the_Evaluation_of_Text...

MARUYAMA
September 10, 2018
160

 BLEU_is_Not_Suitable_for_the_Evaluation_of_Text_Simplification.pdf

MARUYAMA

September 10, 2018
Tweet

Transcript

  1. BLEU is Not Suitable for the Evaluation of Text Simplification

    Elior Sulem, Omri Abend, Ari Rappoport Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing(EMNLP2018) Nagaoka University of Technology Takumi Maruyama Literature review: 1
  2. Abstract Ø Text simplification(TS) )!0 )! %  ' Ø

    BLEU TS )! *# Ø /- • $&)! "(, • $& . + )!0 ' 2
  3. Introduction Ø 2 2)(! • Wiki-8REF (Xu et al. 2016)

    − .8'5 - − +,. 43 0  • Hsplit − /1*$ − +, 0 Ø 6& MT-based simplification system, sentence splitting system2) (2)7, ) Ø '5 2)7 2) #"% 3
  4. Gold-Standard Splitting Corpus Ø Wiki-8REF (Xu et al. 2016)*!$$%) Ø

    1$4annotators(Native:2, Native-like proficiency: 2)#, Ø 2+-guidlines 1. $$'/ &"(%  2. % .%  5
  5. Gold-Standard Splitting Corpus Ø 4  ( 359) • HSplit1:

    annotator=Native, guideline=1 • HSplit2: annotator=Native-like, guideline=1 • HSplit3: annotator=Native, guideline=2 • HSplit4: annotator=Native-like, guideline=2 6
  6. Experimental Setup (Metrics) Ø BLEU • BP: brevity penalty •!"

    : weights (usually uniform) •#" : modified n-gram precisions Ø iBLEU • $, &, ': input text, output text, reference text • (: parameter 8
  7. Experimental Setup (Human evaluation) Ø 703 Ø   •

    Grammaticality (G) • Meaning preservation (M) • Simplicity (S) • Structural Simplicity (StS) Ø 5  10
  8. Experimental Setup (Systems) Ø Standard Reference Setting • Systems/Corpora without

    splits − Wiki-8REF(  ) − Six MT-based simplification systems (NTS, SBMT-SARI etc.) • All systems/Corpora − Wiki-8REF + HSplit − Six MT-based simplification systems (NTS, SBMT-SARI etc.) Ø HSplit as Reference Setting − HSplit − Six sentence splitting systems (DSS, SEMoses etc.) 11
  9. Results with Standard Reference Setting 13 BLEUS, StS  

    Sentence-level spearman correlation (and p-values)
  10. Results with Standard Reference Setting 14    GM

      Sentence-level spearman correlation (and p-values)
  11. Results with Standard Reference Setting 15  BLEU  

     BLEU-8refLDspearman correlation: 0.86 BLEU-8refLDspearman correlation: 0.82 BLEU-8refLDspearman correlation: 0.52 BLEU-8refLDspearman correlation: 0.55 Sentence-level spearman correlation (and p-values)
  12. Experimental Setup (Systems) Ø Standard Reference Setting • Systems/Corpora without

    splits − Wiki-8REF(  ) − Six MT-based simplification systems (NTS, SBMT-SARI etc.) • All systems/Corpora − Wiki-8REF + HSplit − Six MT-based simplification systems (NTS, SBMT-SARI etc.) Ø HSplit as Reference Setting − HSplit − Six sentence splitting systems (DSS, SEMoses etc.) 17
  13.  Ø BLEU Text simplification   • BLEUSimplicity 

      • BLEUGrammaticality, Meaning preservation  Levenshitein distanceGrammaticality, Meaning preservation  Ø “BLEU should not be used for the evaluation of text simplification in general and sentence splitting” 19