Slide 3
Slide 3 text
Introduction
● Metrics based on surface similarity (BLEU, ROUGE)
○ measure the surface similarity b/w the reference and candidate.
○ correlate poorly with human judgment
● Metrics based on learned components
○ High correlation with human judgment
○ Fully learned metrics (BEER, RUSE, ESIM)
■ are trained ent-to-end, and rely on handcrafted features and/or
learned embeddings
■ offers gread expressivity
○ Hybrid metrics (YiSi, BERTscore)
■ combine trained elements, e.g., contextual embeddings, with
handwritten logic, e.g., as token alignment rules
■ offers robustness