Slide 28
Slide 28 text
参考文献
P.5
[Vinyals+,2015] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image
caption generator. CVPR 2015.
[Agrawal+, 2016] Stanislaw Antol, Aishwarya Agrawal, JiasenLu, Margaret Mitchell, Dhruv Batra, C. Lawrence
Zitnick, and Devi Parikh. VQA: visual question answering. ICCV2015.
[Das+, 2018] Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra. Embodied
Question Answering. CVPR2018.
[Xu+, 2018] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He.
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. CVPR2018.
[Bisk+, 2016] Yonatan Bisk, Deniz Yuret, Daniel Marcu. Natural Language Communication with Robots. NAACL2016.
P.6
[Wang+, 2019] Yujia Wang, Wenguan Wang, Wei Liang, Lap-Fai Yu. Comic-Guided Speech Synthesis. SIGGRAPH
Asia2019.
[Bojanowski+, 2015] Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce,
Cordelia Schmid. Weakly-Supervised Alignment of Video With Text. ICCV2015.
[Li+, 2017] Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang. Person Search with
Natural Language Description. CVPR2017.
28/31