GenAI for Visual Content Creation - DCOC Summit 2024

Slide 1

Slide 1 text

Developer Creators and Online Communities Summit Generative AI for Visual Content Creation Margaret Maynard-Reid #GoogleDCOC 1

Slide 2

Slide 2 text

@margaretmz #GoogleDCOC ML GDE (Google Developer Expert) 3D artist Fashion Designer Instructor at UW About me margaretmz.art 2

Slide 3

Slide 3 text

Intro to Generative AI Design workflow Create images Create music Create videos/animations Future trends 01 02 03 04 05 06 Content #GoogleDCOC 3

Slide 4

Slide 4 text

1. Intro to GenAI #GoogleDCOC 4

Slide 5

Slide 5 text

@margaretmz #GoogleDCOC What is Generative AI? Train ML models to generates new content: ● Text ● Image ● Video/animations ● Audio e.g. music ● Code… focus of my talk today is visual! 5 “Generative models: take a machine, observe many samples from a distribution and generate more samples from that same distribution”. - Ian Goodfellow 2016

Slide 6

Slide 6 text

@margaretmz #GoogleDCOC Type of Generative Models ● Generative Adversarial Networks (GANs) ● Variational Autoencoders (VAEs) ● Flow-based models ● Diffusion models ● Large Language Models (LLMs) Source: Lilian Weng blog (link) 6

Slide 7

Slide 7 text

@margaretmz #GoogleDCOC Catalysts of the current generative AI craze ● Stable Diffusion (released by Stability AI) - August 10, 2022 ● ChatGPT - Nov 30, 2022 ● The rise of LLMs! 7

Slide 8

Slide 8 text

@margaretmz #GoogleDCOC How to use generative AI? So many options: ● Use a tool with GUI ○ Web browser e.g. Bard, ChatGPT ○ Discord, e.g. Midjourney ○ Desktop applications: Photoshop or Illustrator, e.g. Firefly ○ Cloud: Google, Microsoft, Amazon… ● Use existing open-source code ● Fine-tune model with your own data ● Hugging Face - all of the above! ← Hugging Face and Google partner for open AI collaboration 8

Slide 9

Slide 9 text

3. The Design Workflow #GoogleDCOC 9

Slide 10

Slide 10 text

@margaretmz #GoogleDCOC The visual design workflow ● Theme & moodboard => color palette ● Sketches ● High fidelity Images ● Music ● Image to Video ● Upload video ● Social 10

Slide 11

Slide 11 text

@margaretmz #GoogleDCOC Step by step ● Generate single image ● Generate music ● Create video: image + music ⇒ video ● Upload to YouTube ● Generate thumbnail and end screen ● Generate social media posting (text) 11

Slide 12

Slide 12 text

Slide 13

Slide 13 text

@margaretmz #GoogleDCOC ColorMagic https://colormagic.app/ Type a prompt of the colors you could like have such as “Beautiful sunflowers” Then a color palette of 5 colors gets generated automatically. 13

Slide 14

Slide 14 text

2. 2D / 3D Images #GoogleDCOC 14

Slide 15

Slide 15 text

@margaretmz #GoogleDCOC Tools for generating images 2D images ● Midjourney ● ChatGPT + DALLE ● Bing Image Creator ● Meta AI Generator ● Adobe Firefly… ● Google’s Imagen 2 (restricted General Availability only) 3D images ● Luma AI 15 UI/UX Design ● Adobe Express ● Canva ● Microsoft Designer

Slide 16

Slide 16 text

@margaretmz #GoogleDCOC Generate image ● Prompt + Reference image ● Choose a tool ● How to get a prompt? ○ Lexica.art ○ Midjourney /describe ○ Use a model to caption the image 16

Slide 17

Slide 17 text

@margaretmz #GoogleDCOC Lexica.art Lexica.art is one of my favorite places to get a prompt! ● First I type a description of what I want ● Select an image that I like ● Then copy the prompt which I can use in another GenAI tool… 17

Slide 18

Slide 18 text

@margaretmz #GoogleDCOC Prompt Engineering Best Practices Guides: ● Prompt Design Strategies (Google AI for Developers) ● Prompt Engineering Best Practices (Laurence Moroney) ● Prompt Engineering Guide (OpenAI) Hands-on ● Prompt Design - Best Practices (Google Cloud GenAI for Developers Learning Path, Course #7 Generative AI Explorer - Vertex AI ) 18

Slide 19

Slide 19 text

@margaretmz #GoogleDCOC Midjourney - Text to Image Here is how created the image shown on the previous slides: ● First created a few images with text-to-image in Midjourney, with the prompt = “Prompt Engineering Best Practices” ● I like the second image the best so I downloaded it which has a size of 1024x1024 19

Slide 20

Slide 20 text

@margaretmz #GoogleDCOC Adobe Firefly (in Photoshop) - Generative Fill Now I use Photoshop’s Generative Fill to expand the image 20 Increase canvas size Generative fill without prompt Image expanded!

Slide 21

Slide 21 text

@margaretmz #GoogleDCOC Bing Image Creator https://www.bing.com/images/ ● Powered by OpenAI’s DALLE ● Has a great understanding of what a qipao is ● Keep track of my creation history ● Seamless integration with Microsoft Designer 21

Slide 22

Slide 22 text

@margaretmz #GoogleDCOC Firefly Image 2 https://firefly.adobe.com/ ● Not always good with full body images ● Great with close-up portraits ● High fidelity, realistic kind of like Adobe Stock photos. 22

Slide 23

Slide 23 text

@margaretmz #GoogleDCOC Imagine with Meta AI https://imagine.meta.com/ ● Pretty impressive results ● Qipao looks like a flared dress ● The design deviation still looks beautiful ● Love the vibrant colors 23

Slide 24

Slide 24 text

@margaretmz #GoogleDCOC Google Imagen 2 (Preview) 12/13/2013 - Imagen 2 on Vertex AI is now generally available ● Text to image generation ● Logo generation ● Image editing Check out the documentation here Prompt="A view of Google Campus in Singapore in 2050, futuristic" Image source: image generated by calling API. 24

Slide 25

Slide 25 text

@margaretmz #GoogleDCOC IP-Adapter-FaceID Plus demo - Hugging Face https://huggingface.co/spaces/multimoda lart/Ip-Adapter-FaceID ● Drag and drop a photo of mine ● Type a text ● Set "Preserve Face Structure" to set how similar the image will look like you 25

Slide 26

Slide 26 text

@margaretmz #GoogleDCOC 3D art - Luma AI Create 3D objects < 10 seconds ● Text to 3D ● Camera capture ● Video to 3D API ● Download 3D models which can be imported to software such as Blender or Clo3D for editing. Prompt: “realistic colorful Christmas earrings” https://lumalabs.ai/ 26

Slide 27

Slide 27 text

5. Music | Audio #GoogleDCOC 27

Slide 28

Slide 28 text

@margaretmz #GoogleDCOC Music generation ● Riffusion (https://www.riffusion.com/) ● Google’s MusicFX ● Stable Audio (https://www.audiocipher.com/post/stable-audio-ai) ● MAGNet 28

Slide 29

Slide 29 text

@margaretmz #GoogleDCOC Riffusion Just enter text prompt Music generated with lyrics 29

Slide 30

Slide 30 text

@margaretmz #GoogleDCOC Google’s MusicFX Available via Google’s Test Kitchen Powered by MusicLM Many new features are added: - Generate longer tracks: 30, 50, 70 seconds (previously only 20 seconds) - Music genre added 30

Slide 31

Slide 31 text

@margaretmz #GoogleDCOC MAGNet Masked Audio Generation using a Single Non-Autoregressive Transformer ● Open source ● Text to music ● Text to audio ● On par with SOTA models ● 7x faster 31 Source: https://pages.cs.huji.ac.il/adiyoss-lab/MAGNeT/

Slide 32

Slide 32 text

3.Video & Animation #GoogleDCOC 32

Slide 33

Slide 33 text

@margaretmz #GoogleDCOC Tools for videos & animations Short videos for entertament ● Runway ML ● Stable Video Diffusion ● Kaiber.ai ● Pika.art ● Meta’s Fairy ● Google VideoPoet 33 Longer videos for marketing or training ● Synthesia ● Heygen

Slide 34

Slide 34 text

@margaretmz #GoogleDCOC Stable Video Diffusion Hugging Face Community Demo https://huggingface.co/spaces/multimodalart/st able-video-diffusion Input: ● Prompt ● Image (drag & drop) 34

Slide 35

Slide 35 text

@margaretmz #GoogleDCOC Kaiber.ai example ● Start with a simple reference ● Beautiful designs ● Great lighting & animation ● Can add music 35

Slide 36

Slide 36 text

@margaretmz #GoogleDCOC Pica.art example ● Text to video ● Image to video ● Can simulate walking ● Keep adding to video length 36

Slide 37

Slide 37 text

@margaretmz #GoogleDCOC Google VideoPoet A large language model for zero shot video generation Multimodal: ● Text to video ● Image to video ● Stylization ● Outpainting ● Stylization ● Video to audio Source: https://sites.research.google/videopoet/ 37

Slide 38

Slide 38 text

@margaretmz #GoogleDCOC Heygen https://app.heygen.com/ ● Create instant avatar which can be modified freely ● Provide a script (text) ● A video is generated after a few minutes! 38

Slide 39

Slide 39 text

@margaretmz #GoogleDCOC Synthesia 39

Slide 40

Slide 40 text

5. Future Trends #GoogleDCOC 40

Slide 41

Slide 41 text

@margaretmz #GoogleDCOC The Future of Generative AI ● Multimodal, not limited to just text or images ○ Animation ○ Video ○ 3D objects ● Integration of GenAI models into applications ● Efficient, smaller and on on-device: ○ SDXL Turbo ○ LoRA ○ aMuse ● (Easier) customization 41

Slide 42

Slide 42 text

Thank you! Follow me on LinkedIn, X (Twitter), Medium, Github => @margaretmz #GoogleDCOC 42