2”によりテキストから⽣成された画像 [Ramesh(OpenAI)+, arXivʼ22.04] vibrant portrait painting of Salvador Dalí with a robotic half face a shiba inu wearing a beret and black turtleneck
Generation with CLIP Latents. CoRR abs/2204.06125 (2022) 2. Jean-Baptiste Alayrac et al.: Flamingo: a Visual Language Model for Few-Shot Learning. NeurIPS 2022 3. Tom B. Brown et al.: Language Models are Few-Shot Learners. NeurIPS 2020 4. Alec Radford et al.: Learning Transferable Visual Models From Natural Language Supervision. ICML 2021 5. Tevet, Guy, et al.: Motionclip: Exposing human motion generation to clip space. ECCV 2022 6. Jason Wei et al.: Finetuned Language Models Are Zero-Shot Learners. ICLR 2022 7. Wenliang Daiet al.: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. CoRR abs/2305.06500 (2023) 8. Haotian Liu et al.: Visual Instruction Tuning, NeurIPS 2023 9. Long Ouyang et al.: Training language models to follow instructions with human feedback. NeurIPS 2022 10. OpenAI: GPT-4 Technical Report. CoRR abs/2303.08774 (2023) 11. Zhengyuan Yang et al.: The Dawn of LMMs: Preliminary Explorations with GPT- 4V(ision). CoRR abs/2309.17421 (2023)