Slide 17
Slide 17 text
関連研究 – V&L の学習済みモデル
• Two-stream transformer
• LXMERT [Tan+, 2019], ViLBERT [Lu+, 2019]
• Single-stream transformer
• VisualBERT [Li+, 2020], VL-BERT [Su+, 2020]
• Using entities
• CMR [Zheng+, 2020]
• Using object detection-based objectives
• UNITER [Chen+, 2019], Unicoder-VL [Li+, 2020]
• Video-language task
• VideoBERT [Sun+, 2019]
• CBT [Sun+, 2019]
2021/7/31 17
提案⼿法の新規性
⾃⼰教師あり学習にテキストデー
タを⼀切必要としない