Slide 97
Slide 97 text
参考文献
P.68
[Pan+, 2020] Pan, Y., Yao, T., Li, Y., & Mei, T. X-linear attention networks for image captioning. CVPR2020
[Ziegler+, 2019] Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., & Irving, G.
Fine-Tuning Language Models from Human Preferences. arXiv. http://arxiv.org/abs/1909.08593
[Stiennon+, 2020] Stiennon, N., Ouyang, L., Wu, J., Ziegler, D. M., Lowe, R., Voss, C., Radford, A., Amodei, D., &
Christiano, P. Learning to summarize from human feedback. NeurIPS2020.
P.72
[Benjio+,2015] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for
sequence prediction with recurrent Neural networks. NIPS2015. MIT Press, Cambridge, MA, USA, 1171–1179.
P.78
[Schulman+,2017] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal Policy Optimization
Algorithms. arXiv2017.
P.79
[Choshen+, ICLR2020] Choshen, L., Fox, L., Aizenbud, Z., & Abend, O. (2019). On the Weaknesses of
Reinforcement Learning for Neural Machine Translation. ICLR2020.
P.84
[Benotti+,2021] Benotti, L., & Blackburn, P. Grounding as a Collaborative Process. EACL2021. 515–531.
P.86
[Nguyen+, 2019] Khanh Nguyen, Hal Daumé III. Help, Anna! Visual Navigation with Natural Multimodal Assistance
via Retrospective Curiosity-Encouraging Imitation Learning. EMNLP2019. 97/97