Slide 26
Slide 26 text
参考文献
Mathew, M., Karatzas, D., Jawahar, C.V. (2021). DocVQA: A Dataset for VQA on Document Images. WACV 2021.
Huang, Z. et al. (2019). ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction. ICDAR 2019.
Park, S. et al. (2019). CORD: A Consolidated Receipt Dataset for Post-OCR Parsing. NeurIPS 2019 Workshop.
Yu, W. et al. (2025). BBox-DocVQA: Bounding-Box-Grounded Dataset for Document VQA. arXiv:2511.15090.
Nourbakhsh, A. et al. (2025). Where is this coming from? Making groundedness count in Document VQA. NAACL Findings 2025.
Onami, E. et al. (2024). JDocQA: Japanese Document QA Dataset. LREC-COLING 2024.
Fujitake, M. (2024). JaPOC: Japanese Post-OCR Correction Benchmark. arXiv:2409.19948.
Liu, Y. et al. (2024). OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models. SCIS.
Fu, L. et al. (2025). OCRBench v2: Improved Benchmark for LMMs on Visual Text. arXiv:2501.00321.
Acknowledgements — アノテーション協力への圧倒的感謝🙏
🙏🙏
Gen Sato
Nanami Kato
Yusuke Tanimiya
© 2026, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 26 / 27