Coarse-to-Fine Vision-Language Pre-training
with Fusion in the Backbone
Zi-Yi Dou∗‡, Aishwarya Kamath*♮, Zhe Gan*†♠, Pengchuan Zhang §, Jianfeng Wang†, Linjie Li†,
Zicheng Liu†, Ce Liu†, Yann LeCun♮, Nanyun Peng‡, Jianfeng Gao†, Lijuan Wang†
†Microsoft ‡University of California, Los Angeles ♮New York University
*Equal Technical Contribution ♠Project Lead §Work done while at Microsoft
慶應義塾大学 杉浦孔明研究室 畑中駿平
Dou, Zi-Yi, et al. "Coarse-to-fine vision-language pre-training with fusion in the backbone." NeurIPS 2022.