Dumitru Erhan. Show and tell: A neural image caption generator. CVPR 2015. [Agrawal+, 2016] Stanislaw Antol, Aishwarya Agrawal, JiasenLu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. VQA: visual question answering. ICCV2015. [Das+, 2018] Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra. Embodied Question Answering. CVPR2018. [Xu+, 2018] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. CVPR2018. [Bisk+, 2016] Yonatan Bisk, Deniz Yuret, Daniel Marcu. Natural Language Communication with Robots. NAACL2016. P.6 [Wang+, 2019] Yujia Wang, Wenguan Wang, Wei Liang, Lap-Fai Yu. Comic-Guided Speech Synthesis. SIGGRAPH Asia2019. [Bojanowski+, 2015] Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid. Weakly-Supervised Alignment of Video With Text. ICCV2015. [Li+, 2017] Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang. Person Search with Natural Language Description. CVPR2017. 28/31