Slide 10
Slide 10 text
CLIP: Contrastive Language-Image
Pre-training
CLIP is a bridge between NLP and computer
vision, connecting text and Images
It has a text encoder and image encoder,
trained with 400 million image-text pairs.
● DALLE, DALLE-2
● Stable Diffusion
● Imagen, Imagen 2, Imagen 3
Paper: Learning Transferable Visual Models From
Natural Language Supervision
10