GenAI for Visual Content Creation - DCOC Summit 2024

Developer Creators and Online Communities Summit Generative AI for Visual
Content Creation Margaret Maynard-Reid #GoogleDCOC 1

@margaretmz #GoogleDCOC ML GDE (Google Developer Expert) 3D artist Fashion
Designer Instructor at UW About me margaretmz.art 2

Intro to Generative AI Design workflow Create images Create music
Create videos/animations Future trends 01 02 03 04 05 06 Content #GoogleDCOC 3

1. Intro to GenAI #GoogleDCOC 4

@margaretmz #GoogleDCOC What is Generative AI? Train ML models to
generates new content: • Text • Image • Video/animations • Audio e.g. music • Code… focus of my talk today is visual! 5 “Generative models: take a machine, observe many samples from a distribution and generate more samples from that same distribution”. - Ian Goodfellow 2016

@margaretmz #GoogleDCOC Type of Generative Models • Generative Adversarial Networks
(GANs) • Variational Autoencoders (VAEs) • Flow-based models • Diffusion models • Large Language Models (LLMs) Source: Lilian Weng blog (link) 6

@margaretmz #GoogleDCOC Catalysts of the current generative AI craze •
Stable Diffusion (released by Stability AI) - August 10, 2022 • ChatGPT - Nov 30, 2022 • The rise of LLMs! 7

@margaretmz #GoogleDCOC How to use generative AI? So many options:
• Use a tool with GUI ◦ Web browser e.g. Bard, ChatGPT ◦ Discord, e.g. Midjourney ◦ Desktop applications: Photoshop or Illustrator, e.g. Firefly ◦ Cloud: Google, Microsoft, Amazon… • Use existing open-source code • Fine-tune model with your own data • Hugging Face - all of the above! ← Hugging Face and Google partner for open AI collaboration 8

3. The Design Workflow #GoogleDCOC 9

@margaretmz #GoogleDCOC The visual design workflow • Theme & moodboard
=> color palette • Sketches • High fidelity Images • Music • Image to Video • Upload video • Social 10

@margaretmz #GoogleDCOC Step by step • Generate single image •
Generate music • Create video: image + music ⇒ video • Upload to YouTube • Generate thumbnail and end screen • Generate social media posting (text) 11

@margaretmz #GoogleDCOC ColorMagic https://colormagic.app/ Type a prompt of the colors
you could like have such as “Beautiful sunflowers” Then a color palette of 5 colors gets generated automatically. 13

2. 2D / 3D Images #GoogleDCOC 14

@margaretmz #GoogleDCOC Tools for generating images 2D images • Midjourney
• ChatGPT + DALLE • Bing Image Creator • Meta AI Generator • Adobe Firefly… • Google’s Imagen 2 (restricted General Availability only) 3D images • Luma AI 15 UI/UX Design • Adobe Express • Canva • Microsoft Designer

@margaretmz #GoogleDCOC Generate image • Prompt + Reference image •
Choose a tool • How to get a prompt? ◦ Lexica.art ◦ Midjourney /describe ◦ Use a model to caption the image 16

@margaretmz #GoogleDCOC Lexica.art Lexica.art is one of my favorite places
to get a prompt! • First I type a description of what I want • Select an image that I like • Then copy the prompt which I can use in another GenAI tool… 17

@margaretmz #GoogleDCOC Prompt Engineering Best Practices Guides: • Prompt Design
Strategies (Google AI for Developers) • Prompt Engineering Best Practices (Laurence Moroney) • Prompt Engineering Guide (OpenAI) Hands-on • Prompt Design - Best Practices (Google Cloud GenAI for Developers Learning Path, Course #7 Generative AI Explorer - Vertex AI ) 18

@margaretmz #GoogleDCOC Midjourney - Text to Image Here is how
created the image shown on the previous slides: • First created a few images with text-to-image in Midjourney, with the prompt = “Prompt Engineering Best Practices” • I like the second image the best so I downloaded it which has a size of 1024x1024 19

@margaretmz #GoogleDCOC Adobe Firefly (in Photoshop) - Generative Fill Now
I use Photoshop’s Generative Fill to expand the image 20 Increase canvas size Generative fill without prompt Image expanded!

@margaretmz #GoogleDCOC Bing Image Creator https://www.bing.com/images/ • Powered by OpenAI’s
DALLE • Has a great understanding of what a qipao is • Keep track of my creation history • Seamless integration with Microsoft Designer 21

@margaretmz #GoogleDCOC Firefly Image 2 https://firefly.adobe.com/ • Not always good
with full body images • Great with close-up portraits • High fidelity, realistic kind of like Adobe Stock photos. 22

@margaretmz #GoogleDCOC Imagine with Meta AI https://imagine.meta.com/ • Pretty impressive
results • Qipao looks like a flared dress • The design deviation still looks beautiful • Love the vibrant colors 23

@margaretmz #GoogleDCOC Google Imagen 2 (Preview) 12/13/2013 - Imagen 2
on Vertex AI is now generally available • Text to image generation • Logo generation • Image editing Check out the documentation here Prompt="A view of Google Campus in Singapore in 2050, futuristic" Image source: image generated by calling API. 24

@margaretmz #GoogleDCOC IP-Adapter-FaceID Plus demo - Hugging Face https://huggingface.co/spaces/multimoda lart/Ip-Adapter-FaceID
• Drag and drop a photo of mine • Type a text • Set "Preserve Face Structure" to set how similar the image will look like you 25

@margaretmz #GoogleDCOC 3D art - Luma AI Create 3D objects
< 10 seconds • Text to 3D • Camera capture • Video to 3D API • Download 3D models which can be imported to software such as Blender or Clo3D for editing. Prompt: “realistic colorful Christmas earrings” https://lumalabs.ai/ 26

5. Music | Audio #GoogleDCOC 27

@margaretmz #GoogleDCOC Music generation • Riffusion (https://www.riffusion.com/) • Google’s MusicFX
• Stable Audio (https://www.audiocipher.com/post/stable-audio-ai) • MAGNet 28

@margaretmz #GoogleDCOC Riffusion Just enter text prompt Music generated with
lyrics 29

@margaretmz #GoogleDCOC Google’s MusicFX Available via Google’s Test Kitchen Powered
by MusicLM Many new features are added: - Generate longer tracks: 30, 50, 70 seconds (previously only 20 seconds) - Music genre added 30

@margaretmz #GoogleDCOC MAGNet Masked Audio Generation using a Single Non-Autoregressive
Transformer • Open source • Text to music • Text to audio • On par with SOTA models • 7x faster 31 Source: https://pages.cs.huji.ac.il/adiyoss-lab/MAGNeT/

3.Video & Animation #GoogleDCOC 32

@margaretmz #GoogleDCOC Tools for videos & animations Short videos for
entertament • Runway ML • Stable Video Diffusion • Kaiber.ai • Pika.art • Meta’s Fairy • Google VideoPoet 33 Longer videos for marketing or training • Synthesia • Heygen

@margaretmz #GoogleDCOC Stable Video Diffusion Hugging Face Community Demo https://huggingface.co/spaces/multimodalart/st
able-video-diffusion Input: • Prompt • Image (drag & drop) 34

@margaretmz #GoogleDCOC Kaiber.ai example • Start with a simple reference
• Beautiful designs • Great lighting & animation • Can add music 35

@margaretmz #GoogleDCOC Pica.art example • Text to video • Image
to video • Can simulate walking • Keep adding to video length 36

@margaretmz #GoogleDCOC Google VideoPoet A large language model for zero
shot video generation Multimodal: • Text to video • Image to video • Stylization • Outpainting • Stylization • Video to audio Source: https://sites.research.google/videopoet/ 37

@margaretmz #GoogleDCOC Heygen https://app.heygen.com/ • Create instant avatar which can
be modified freely • Provide a script (text) • A video is generated after a few minutes! 38

@margaretmz #GoogleDCOC Synthesia 39

5. Future Trends #GoogleDCOC 40

@margaretmz #GoogleDCOC The Future of Generative AI • Multimodal, not
limited to just text or images ◦ Animation ◦ Video ◦ 3D objects • Integration of GenAI models into applications • Efficient, smaller and on on-device: ◦ SDXL Turbo ◦ LoRA ◦ aMuse • (Easier) customization 41

Thank you! Follow me on LinkedIn, X (Twitter), Medium, Github
=> @margaretmz #GoogleDCOC 42

GenAI for Visual Content Creation - DCOC Summit...

GenAI for Visual Content Creation - DCOC Summit 2024

More Decks by Margaret Maynard-Reid

Other Decks in Technology

Featured

Transcript