What is multimodality, and can it generate captions for my Instagram photos?

Multimodality Whatis AND CAN IT GENERATE CAPTIONS FOR MY INSTAGRAM
PHOTOS?

About me • 🥑 Developer Relations Engineer at deepset •
🏗 Open source LLM Framework: Haystack • 📍 Istanbul, Turkey • 💃 Latin music/dances Bilge Yücel

IMAGE-TO-TEXT TODAY'S agenda 01 02 03 04 TEXT-TO-IMAGE MULTIMODALITY TEXT-TO-TEXT
05 CAPTION GENERATOR

text-to-text

text-to-text Text generation • GPT-3.5-turbo • GPT-4,Gemini-Pro, Claude, Command, Mistral…

multimodality

Multimodality refers to the capacity of a machine learning model
to process and understand information from various modalities, such as text, images, videos, and audio. This capability allows the model to perform tasks that require integration of different types of data, such as creating video from a prompt or generating a text based on the image. Multimodality

text-to-image

A street cat playing a shrill pipe and drinking raki
by the Bosphorus, photorealistic, natural light text-to-image Image generation • Gemini-Pro-Vision, Midjourney, Dall-E, Stable Diffusion…

image-to-text

image-to-text CAPTION GENERATION Bustling city square with street performers in
the shadow of towering Gothic spires under a blue sky with wispy clouds. • OCR, image classification • GPT-4V, Gemini-Pro-Vision, CLIP…

Instagram Caption Generator

🐳💦✨Witnessing the majestic dance of a whale, leaping high above
the cerulean sea. 🌊 #WhaleWatching #NatureLovers #Ocean

Captionate 📸 • Generates Instagrammable captions • image-to-text + text-to-text
• Haystack + Gradio

Haystack • Fully open-source framework built in Python for custom
LLM applications

LLM applications • Provides tools that developers need to build state-of-the-art NLP systems

LLM applications • Provides tools that developers need to build state-of-the-art NLP systems • Building blocks: Pipelines & Components

Caption Generation Pipeline IMAGE-TO-TEXT TEXT-TO-TEXT

Caption Generation Pipeline

Caption Generation UI

Generative Question Answering

A wrap That's THANK YOU

Resources Captionate 📸 @bilgeycl Bilge Yücel Haystack @bilgeyucel Presentation

What is multimodality, and can it generate capt...

What is multimodality, and can it generate captions for my Instagram photos?

Bilge Yücel

More Decks by Bilge Yücel

Other Decks in Programming

Featured

Transcript

Multimodality Whatis AND CAN IT GENERATE CAPTIONS FOR MY INSTAGRAM

About me • 🥑 Developer Relations Engineer at deepset •

IMAGE-TO-TEXT TODAY'S agenda 01 02 03 04 TEXT-TO-IMAGE MULTIMODALITY TEXT-TO-TEXT

text-to-text

text-to-text Text generation • GPT-3.5-turbo • GPT-4,Gemini-Pro, Claude, Command, Mistral…

multimodality

Multimodality refers to the capacity of a machine learning model

text-to-image

A street cat playing a shrill pipe and drinking raki

image-to-text

image-to-text CAPTION GENERATION Bustling city square with street performers in

Instagram Caption Generator

🐳💦✨Witnessing the majestic dance of a whale, leaping high above

Captionate 📸 • Generates Instagrammable captions • image-to-text + text-to-text

Haystack • Fully open-source framework built in Python for custom

Haystack • Fully open-source framework built in Python for custom

Haystack • Fully open-source framework built in Python for custom

Caption Generation Pipeline IMAGE-TO-TEXT TEXT-TO-TEXT

Caption Generation Pipeline IMAGE-TO-TEXT TEXT-TO-TEXT

Caption Generation Pipeline IMAGE-TO-TEXT TEXT-TO-TEXT

Caption Generation Pipeline IMAGE-TO-TEXT TEXT-TO-TEXT

Caption Generation Pipeline IMAGE-TO-TEXT TEXT-TO-TEXT

Caption Generation Pipeline

Caption Generation Pipeline

Caption Generation UI

Generative Question Answering

A wrap That's THANK YOU

Resources Captionate 📸 @bilgeycl Bilge Yücel Haystack @bilgeyucel Presentation