Slide 37
Slide 37 text
37
CLaSP: CLIP-like multimodal learning for materials science
• CV has fostered large-scale datasets of images with textual annotations (e.g., ImageNet,
MS-COCO), enabling multimodal learning between text and images (CLIP, 2021).
• Materials science lacks such resources, mainly due to the difficulty of crowdsourcing.
• Instead, we leverage a public database of 400k materials with publication metadata (titles
and abstracts) to enable contrastive learning between text and structure.
Yuta Suzuki, Tatsunori Taniai, Ryo Igarashi, Kotaro Saito, Naoya Chiba, Yoshitaka Ushiku, Kanta Ono. Bridging Text and Crystal Structures: Literature-driven
Contrastive Learning for Materials Science. Machine Learning: Science and Technology (2025). Also appeared at NerIPS 2024 AI4Mat and CVPR 2025 MM4Mat.