Images Using 1 Million Captioned Photographs. NIPS 2011: 1143-1151 • Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick: Microsoft COCO: Common Objects in Context. ECCV (5) 2014: 740-755 • Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, C. Lawrence Zitnick: Microsoft COCO Captions: Data Collection and Evaluation Server. CoRR abs/1504.00325 (2015) • Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei: Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. Int. J. Comput. Vis. 123(1): 32-73 (2017) • Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut: • Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. ACL (1) 2018: 2556-2565 • Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei A. F. Florêncio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo: TAP: Text-Aware Pre-training for Text-VQA and Text- Caption. CoRR abs/2012.04638 (2020) • Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut: Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts. CoRR abs/2102.08981 (2021) • Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, Marc Najork: WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning. CoRR abs/2103.01913 (2021) ⼤規模画像キャプションデータ 67