Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GenAI for Visual Content Creation - DCOC Summit...

GenAI for Visual Content Creation - DCOC Summit 2024

This talk was presented at the Developer Creators and Online Communities Summit on January 26 2024 in Singapore.

The magic of generative AI is revolutionizing the way we create visual content. This talk explores the fundamentals of Generative AI and showcases how to use Generative AI to create stunning graphics, animations, videos, music and more. Artists, designers, content creators, developers and anyone interested in the intersection of technology and creativity - all welcome! Whether you are a curious beginner or seasoned expert, you will gain valuable insights to unleash your creativity with generative AI!

Margaret Maynard-Reid

January 26, 2024
Tweet

More Decks by Margaret Maynard-Reid

Other Decks in Technology

Transcript

  1. Developer Creators and Online Communities Summit Generative AI for Visual

    Content Creation Margaret Maynard-Reid #GoogleDCOC 1
  2. @margaretmz #GoogleDCOC ML GDE (Google Developer Expert) 3D artist Fashion

    Designer Instructor at UW About me margaretmz.art 2
  3. Intro to Generative AI Design workflow Create images Create music

    Create videos/animations Future trends 01 02 03 04 05 06 Content #GoogleDCOC 3
  4. @margaretmz #GoogleDCOC What is Generative AI? Train ML models to

    generates new content: • Text • Image • Video/animations • Audio e.g. music • Code… focus of my talk today is visual! 5 “Generative models: take a machine, observe many samples from a distribution and generate more samples from that same distribution”. - Ian Goodfellow 2016
  5. @margaretmz #GoogleDCOC Type of Generative Models • Generative Adversarial Networks

    (GANs) • Variational Autoencoders (VAEs) • Flow-based models • Diffusion models • Large Language Models (LLMs) Source: Lilian Weng blog (link) 6
  6. @margaretmz #GoogleDCOC Catalysts of the current generative AI craze •

    Stable Diffusion (released by Stability AI) - August 10, 2022 • ChatGPT - Nov 30, 2022 • The rise of LLMs! 7
  7. @margaretmz #GoogleDCOC How to use generative AI? So many options:

    • Use a tool with GUI ◦ Web browser e.g. Bard, ChatGPT ◦ Discord, e.g. Midjourney ◦ Desktop applications: Photoshop or Illustrator, e.g. Firefly ◦ Cloud: Google, Microsoft, Amazon… • Use existing open-source code • Fine-tune model with your own data • Hugging Face - all of the above! ← Hugging Face and Google partner for open AI collaboration 8
  8. @margaretmz #GoogleDCOC The visual design workflow • Theme & moodboard

    => color palette • Sketches • High fidelity Images • Music • Image to Video • Upload video • Social 10
  9. @margaretmz #GoogleDCOC Step by step • Generate single image •

    Generate music • Create video: image + music ⇒ video • Upload to YouTube • Generate thumbnail and end screen • Generate social media posting (text) 11
  10. 12

  11. @margaretmz #GoogleDCOC ColorMagic https://colormagic.app/ Type a prompt of the colors

    you could like have such as “Beautiful sunflowers” Then a color palette of 5 colors gets generated automatically. 13
  12. @margaretmz #GoogleDCOC Tools for generating images 2D images • Midjourney

    • ChatGPT + DALLE • Bing Image Creator • Meta AI Generator • Adobe Firefly… • Google’s Imagen 2 (restricted General Availability only) 3D images • Luma AI 15 UI/UX Design • Adobe Express • Canva • Microsoft Designer
  13. @margaretmz #GoogleDCOC Generate image • Prompt + Reference image •

    Choose a tool • How to get a prompt? ◦ Lexica.art ◦ Midjourney /describe ◦ Use a model to caption the image 16
  14. @margaretmz #GoogleDCOC Lexica.art Lexica.art is one of my favorite places

    to get a prompt! • First I type a description of what I want • Select an image that I like • Then copy the prompt which I can use in another GenAI tool… 17
  15. @margaretmz #GoogleDCOC Prompt Engineering Best Practices Guides: • Prompt Design

    Strategies (Google AI for Developers) • Prompt Engineering Best Practices (Laurence Moroney) • Prompt Engineering Guide (OpenAI) Hands-on • Prompt Design - Best Practices (Google Cloud GenAI for Developers Learning Path, Course #7 Generative AI Explorer - Vertex AI ) 18
  16. @margaretmz #GoogleDCOC Midjourney - Text to Image Here is how

    created the image shown on the previous slides: • First created a few images with text-to-image in Midjourney, with the prompt = “Prompt Engineering Best Practices” • I like the second image the best so I downloaded it which has a size of 1024x1024 19
  17. @margaretmz #GoogleDCOC Adobe Firefly (in Photoshop) - Generative Fill Now

    I use Photoshop’s Generative Fill to expand the image 20 Increase canvas size Generative fill without prompt Image expanded!
  18. @margaretmz #GoogleDCOC Bing Image Creator https://www.bing.com/images/ • Powered by OpenAI’s

    DALLE • Has a great understanding of what a qipao is • Keep track of my creation history • Seamless integration with Microsoft Designer 21
  19. @margaretmz #GoogleDCOC Firefly Image 2 https://firefly.adobe.com/ • Not always good

    with full body images • Great with close-up portraits • High fidelity, realistic kind of like Adobe Stock photos. 22
  20. @margaretmz #GoogleDCOC Imagine with Meta AI https://imagine.meta.com/ • Pretty impressive

    results • Qipao looks like a flared dress • The design deviation still looks beautiful • Love the vibrant colors 23
  21. @margaretmz #GoogleDCOC Google Imagen 2 (Preview) 12/13/2013 - Imagen 2

    on Vertex AI is now generally available • Text to image generation • Logo generation • Image editing Check out the documentation here Prompt="A view of Google Campus in Singapore in 2050, futuristic" Image source: image generated by calling API. 24
  22. @margaretmz #GoogleDCOC IP-Adapter-FaceID Plus demo - Hugging Face https://huggingface.co/spaces/multimoda lart/Ip-Adapter-FaceID

    • Drag and drop a photo of mine • Type a text • Set "Preserve Face Structure" to set how similar the image will look like you 25
  23. @margaretmz #GoogleDCOC 3D art - Luma AI Create 3D objects

    < 10 seconds • Text to 3D • Camera capture • Video to 3D API • Download 3D models which can be imported to software such as Blender or Clo3D for editing. Prompt: “realistic colorful Christmas earrings” https://lumalabs.ai/ 26
  24. @margaretmz #GoogleDCOC Music generation • Riffusion (https://www.riffusion.com/) • Google’s MusicFX

    • Stable Audio (https://www.audiocipher.com/post/stable-audio-ai) • MAGNet 28
  25. @margaretmz #GoogleDCOC Google’s MusicFX Available via Google’s Test Kitchen Powered

    by MusicLM Many new features are added: - Generate longer tracks: 30, 50, 70 seconds (previously only 20 seconds) - Music genre added 30
  26. @margaretmz #GoogleDCOC MAGNet Masked Audio Generation using a Single Non-Autoregressive

    Transformer • Open source • Text to music • Text to audio • On par with SOTA models • 7x faster 31 Source: https://pages.cs.huji.ac.il/adiyoss-lab/MAGNeT/
  27. @margaretmz #GoogleDCOC Tools for videos & animations Short videos for

    entertament • Runway ML • Stable Video Diffusion • Kaiber.ai • Pika.art • Meta’s Fairy • Google VideoPoet 33 Longer videos for marketing or training • Synthesia • Heygen
  28. @margaretmz #GoogleDCOC Kaiber.ai example • Start with a simple reference

    • Beautiful designs • Great lighting & animation • Can add music 35
  29. @margaretmz #GoogleDCOC Pica.art example • Text to video • Image

    to video • Can simulate walking • Keep adding to video length 36
  30. @margaretmz #GoogleDCOC Google VideoPoet A large language model for zero

    shot video generation Multimodal: • Text to video • Image to video • Stylization • Outpainting • Stylization • Video to audio Source: https://sites.research.google/videopoet/ 37
  31. @margaretmz #GoogleDCOC Heygen https://app.heygen.com/ • Create instant avatar which can

    be modified freely • Provide a script (text) • A video is generated after a few minutes! 38
  32. @margaretmz #GoogleDCOC The Future of Generative AI • Multimodal, not

    limited to just text or images ◦ Animation ◦ Video ◦ 3D objects • Integration of GenAI models into applications • Efficient, smaller and on on-device: ◦ SDXL Turbo ◦ LoRA ◦ aMuse • (Easier) customization 41