Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What is multimodality, and can it generate captions for my Instagram photos?

What is multimodality, and can it generate captions for my Instagram photos?

2024 February Prague Python Pizza
https://prague.python.pizza/

Bilge Yücel

February 24, 2024
Tweet

More Decks by Bilge Yücel

Other Decks in Programming

Transcript

  1. About me • 🥑 Developer Relations Engineer at deepset •

    🏗 Open source LLM Framework: Haystack • 📍 Istanbul, Turkey • 💃 Latin music/dances Bilge Yücel
  2. Multimodality refers to the capacity of a machine learning model

    to process and understand information from various modalities, such as text, images, videos, and audio. This capability allows the model to perform tasks that require integration of different types of data, such as creating video from a prompt or generating a text based on the image. Multimodality
  3. A street cat playing a shrill pipe and drinking raki

    by the Bosphorus, photorealistic, natural light text-to-image Image generation • Gemini-Pro-Vision, Midjourney, Dall-E, Stable Diffusion…
  4. image-to-text CAPTION GENERATION Bustling city square with street performers in

    the shadow of towering Gothic spires under a blue sky with wispy clouds. • OCR, image classification • GPT-4V, Gemini-Pro-Vision, CLIP…
  5. 🐳💦✨Witnessing the majestic dance of a whale, leaping high above

    the cerulean sea. 🌊 #WhaleWatching #NatureLovers #Ocean
  6. Haystack • Fully open-source framework built in Python for custom

    LLM applications • Provides tools that developers need to build state-of-the-art NLP systems
  7. Haystack • Fully open-source framework built in Python for custom

    LLM applications • Provides tools that developers need to build state-of-the-art NLP systems • Building blocks: Pipelines & Components