R. et al. (2023). SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images. arXiv preprint arXiv:2301.04883. [2] Onami, E. et al. (2024). JDocQA: Japanese Document Question Answering Dataset for Generative Language Models. arXiv preprint arXiv:2403.19454. [3] Suri, M. et al. (2025). VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation. arXiv preprint arXiv:2412.10704.