Slide 46
Slide 46 text
Copyright 2022 NTT CORPORATION 45
Copyright 2024 NTT CORPORATION
n 実世界の⽂書を視覚的に(画像として)理解し読解するタスク
課題①: ⽂書画像理解
VisualMRC [Tanaka&Nishida+, AAAI’21]
PubLayNet [Xu+, ICDAR’19] Screen2Word [Wang+, UIST’21]
Zhong+, PubLayNet: largest dataset ever for document layout analysis, ICADR’19
Tanaka+, VisualMRC: Machine Reading Comprehension on Document Images, AAAI’21
Wang+, Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning, UIST’21