[Lin+, ECCV14]:Lin, Tsung-Yi, et al. "Microsoft COCO: Common Objects in Context.“, ECCV14
[Plummer+, ICCV15]:Plummer, Bryan A., et al. "Flickr30k entities: Collecting Region-to-Phrase
Correspondences for Richer Image-to-Sentence Models.“, ICCV15
[Karan+, CVPR21]:Desai, Karan, and Justin Johnson. "Virtex: Learning Visual Representations from
Textual Annotations.“, CVPR21
[Zhang+, CVPR20] :Zhang, Qi, et al. "Context-Aware Attention Network for Image-Text Retrieval.“,
CVPR20
[Chen+, CVPR21]:Chen, Jiacheng, et al. "Learning the Best Pooling Strategy for Visual Semantic
Embedding.“, CVPR21
[Frome+, NIPS13]:Frome, Andrea, et al. "Devise: A Deep Visual-Semantic Embedding model.“, NIPS13
[Song+, CVPR19]:Song, Yale, and Mohammad Soleymani. "Polysemous visual-semantic embedding for
cross-modal retrieval.“, CVPR19
[Chun+, CVPR21]:Chun, Sanghyuk, et al. "Probabilistic Embeddings for Cross-modal Retrieval.“, CVPR21
[Locatello+, NeurIPS20]:Locatello, Francesco, et al. "Object-Centric Learning with Slot Attention.“,
NeurIPS20
[DL輪読会]Object-Centric Learning with Slot Attention
https://www.slideshare.net/DeepLearningJP2016/dlobjectcentric-learning-with-slot-attention
26
参考文献