Synthesis Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Muller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach Stability AI, UK; Stable Diffusion v3 の元論文 Image and Video Dept. / Generation team Shunsuke Kitada, Ph.D. HP: shunk031.me / 𝕏: @shunk031 ※本発表で紹介する図や数式は 対象の論文およびブログ記事から 引用しております stability.ai/news/stable-diffusion-3
vector で埋めて推論しても、簡単なプロントで破綻なく描画可能 ◦ 学習時に3つのテキストエンコーダをランダムにゼロ埋めしているため、ある程度補完可能? 比較結果 4/4 18 “A burger patty, with the bottom bun and lettuce and tomatoes. ”COFFEE” written on it in mustard” “A monkey holding a sign reading ”Scaling transformer models is awesome!” “A mischievous ferret with a playful grin squeezes itself into a large glass jar, surrounded by colorful candy. The jar sits on a wooden table in a cozy kitchen, and warm sunlight filters through a nearby window” All text-encoders without T5