Slide 136
Slide 136 text
拡散モデルのファインチューニング 136
「An astronaut riding a horse
in a photorealistic style」
「Teddy bears shopping for
groceries in the style of ukiyo-e」
SORA
(OpenAI, 2024)
Diffusion model
DALL·E: [Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark
Chen, Ilya Sutskever: Zero-Shot Text-to-Image Generation. ICML2021.]
DALL·E2:[Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen: Hierarchical Text-
Conditional Image Generation with CLIP Latents. arXiv:2204.06125]
Optimizing distribution
[Rafailov et al. 2024]
• Post training:
e.g., Preference optimization
• Bayesian inference
• Reinforcement learning
E.g.:
DPO,
Bayes filtering
𝜇ref: Pretrained diffusion model