Slide 10
Slide 10 text
© GO Inc. 10
● CLIPモデルの訓練にはデータの質と多様性が重要[24, 26, 57]
● Encyclopedia of Lifeプロジェクトから44万種類・660万枚の画像を入手
● iNat21の1万種類・270万枚の画像や昆虫のデータセットBIOSCAN-1Mも利用
TreeOfLife-10Mデータセット
Encyclopedia of Lifeプロジェクト TreeOfLife-10Mに含まれる生物の種類
[24] Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, and Ludwig Schmidt. Data determines distributional robustness in contrastive language image
pre-training (CLIP). In International Conference on Machine Learning, pages 6216–6234, 2022.
[26] Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, et al. DataComp: In
search of the next generation of multimodal datasets. arXiv preprint arXiv:2304.14108, 2023.
[57] Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, and Ludwig Schmidt. Quality not quantity: On the interaction between dataset design and robustness of CLIP. In Advances in
Neural Information Processing Systems, pages 21455–21469, 2022.