Slide 70
Slide 70 text
参考⽂献 (視覚的読解モデル)
n Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, “LayoutLM: Pre-training of Text
and Layout for Document Image Understanding”, in KDD20
n Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, “LayoutLMv2: Multi-modal
Pre-training for Visually-Rich Document Understanding”, in ACL21
n Ryota Tanaka, Kyosuke Nsihida, Shuichi Nishioka, “VisualMRC: Machine Reading Comprehension on
Document Images”, in AAAI21
n Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park, “BROS: A Pre-
trained Language Model Focusing on Text and Layout for Better Key Information Extraction from
Documents”, in AAAI22
n Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si, “StructuralLM: Structural
Pre-training for Form Understanding”, in ACL21
n Geewook Kim, Teakgyu Hong, Moonbin Yim, Jinyoung Park†, Jinyeong Yim, Wonseok Hwang†, Sangdoo
Yun, Dongyoon Han, Seunghyun Park, “Donut : Document Understanding Transformer without OCR”, in
arXiv21111.15664
n Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha, “DocFormer: End-to-
End Transformer for Document Understanding”, in CVPR21
n Rafał Powalski, Łukasz Borchmann, Dawid Jurkiewicz, Tomasz Dwojak, Michał Pietruszka, Gabriela Pałka, ”
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer”, in
ICDAR21
n Xiang Deng, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Huan Sun, “DOM-LM: Learning
Generalizable Representations for HTML Documents”, in arXiv:2201.10608
69