Slide 35
Slide 35 text
Reference
[1] Learning an Unreferenced Metric for Online Dialogue Evaluation, K. Sinha et al., ACL 2020, https://arxiv.org/abs/2005.00583
[2] Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation, W. Liang et al., ACL
2020, https://arxiv.org/abs/2005.10716
[3] Evaluating Dialogue Generation Systems via Response Selection, S. Sato et al., ACL 2020, https://arxiv.org/abs/2004.14302
[4] Speaker Sensitive Response Evaluation Model, J. Bak et al., ACL 2020, https://arxiv.org/abs/2006.07015
[5] USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation, S. Mehri et al., ACL 2020,
https://arxiv.org/abs/2005.00456
[6] Designing Precise and Robust Dialogue Response Evaluators, T. Zhao et al., ACL 2020, https://arxiv.org/abs/2004.04908
[7] uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems, T. Yuma et al., ACL 2020,
https://www.aclweb.org/anthology/2020.acl-srw.27/
[8] Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation, B. Pang et al., ACL 2020,
https://www.aclweb.org/anthology/2020.acl-main.333/
[9] RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems, C. Tao et al., AAAI 2018,
https://arxiv.org/abs/1701.03079