Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever (OpenAI) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020, 2021. 杉浦孔明研究室 上田雄斗
a photo of four cats 84.57 a photo of two cats 0.05 Label Prob[%] a photo of three carrots 12.65 a photo of two carrots 82.47 a photo of five carrots 4.88 画像内の数を数え上げるタスクにはうまく機能しないと考えられる [改善案] 別に用意した「数に関するラベル」を使用して,fine tuningする GT GT https://github.com/openai/CLIP