Aishwarya Kamath*♮, Zhe Gan*†♠, Pengchuan Zhang §, Jianfeng Wang†, Linjie Li†, Zicheng Liu†, Ce Liu†, Yann LeCun♮, Nanyun Peng‡, Jianfeng Gao†, Lijuan Wang† †Microsoft ‡University of California, Los Angeles ♮New York University *Equal Technical Contribution ♠Project Lead §Work done while at Microsoft 慶應義塾大学 杉浦孔明研究室 畑中駿平 Dou, Zi-Yi, et al. "Coarse-to-fine vision-language pre-training with fusion in the backbone." NeurIPS 2022.