Slide 71
Slide 71 text
• Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller,
Aubrey Tatarowicz, Brandyn White, Samuel White, Tom Yeh: VizWiz: nearly real-time answers to visual
questions. UIST 2010: 333-342
• Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey P.
Bigham: VizWiz Grand Challenge: Answering Visual Questions From Blind People. CVPR 2018: 3608-3617
• Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus
Rohrbach: Towards VQA Models That Can Read. CVPR 2019: 8317-8326
• Minesh Mathew, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar: DocVQA: A Dataset for VQA on
Document Images. WACV 2021
• Lu Chen, Xingyu Chen, Zihan Zhao, Danyang Zhang, Jiabao Ji, Ao Luo, Yuxuan Xiong, Kai Yu: WebSRC: A
Dataset for Web-Based Structural Reading Comprehension. CoRR abs/2101.09465 (2021)
• Ryota Tanaka, Kyosuke Nishida, Sen Yoshida: VisualMRC: Machine Reading Comprehension on Document
Images. AAAI 2021
• Oleksii Sidorov, Ronghang Hu, Marcus Rohrbach, Amanpreet Singh: TextCaps: A Dataset for Image
Captioning with Reading Comprehension. ECCV (2) 2020: 742-758
• Tsu-Jui Fu, William Yang Wang, Daniel J. McDuff, Yale Song: DOC2PPT: Automatic Presentation Slides
Generation from Scientific Documents. CoRR abs/2101.11796 (2021)
• Yang Li, Gang Li, Luheng He, Jingjie Zheng, Hong Li, Zhiwei Guan: Widget Captioning: Generating Natural
Language Description for Mobile User Interface Elements. EMNLP (1) 2020: 5495-5510
72
参考⽂献︓視覚情報に含まれる⾔語情報
(データセット)