Building Inclusive AI Experiences with Sign Language and GenAI

@cafonsomota Building Inclusive AI Experiences with Sign Language and GenAI
PHOTOGRAPH: VIKI-MELKIU @ WIRED

@cafonsomota Building Inclusive AI Experiences with Sign Language and GenAI
PHOTOGRAPH: VIKI-MELKIU @ WIRED Building Inclusive AI Experiences with Sign Language and GenAI

¡Hola 🌎! @cafonsomota

👨💻 Android GDE 🧙 Android/ KMP craftsman (and advocate) ✍
Author @kodecodev 🗺 Loves travel, photography and running 🇪🇸 Happy to be here in Spain! Suggestions of places to visit/eat? 🍻 @cafonsomota

Sign Language

ASL Sign language is a visual-manual language that uses hands,
body movements, and facial expressions to communicate. Di ff erent countries have distinct sign languages, such as American Sign Language (ASL), or Língua Gestual Portuguesa (LGP)

ASL * all the gestures that you’ll see in this
presentation follow ASL - American Sign Language, guidelines. Sign language is a visual-manual language that uses hands, body movements, and facial expressions to communicate. Di ff erent countries have distinct sign languages, such as American Sign Language (ASL)*, or Língua Gestual Portuguesa (LGP)

ASL: American Sign Language with some variations, o ff i
cially used in Nigeria

LGP: Linguagem Gestual Portuguesa

ISL: Italian Sign Language A B C D E F
G H I J K L M N O P Q R S T U V W X Y Z

ISL: Italian Sign Language

LSE: Lengua de Signos Española

LSC: Llengua de Signes Catalana

Sign Language H • e • l • l •
o Saying h • e • l • l • o 🇺🇸 ASL

o Saying h 🇺🇸 ASL

o Saying h • e 🇺🇸 ASL

o Saying h • e • l 🇺🇸 ASL

o Saying h • e • l • l 🇺🇸 ASL

o Saying h • e • l • l • o 🇺🇸 ASL

Sign Language Hello Saying hello 🇺🇸 ASL

Sign Language Hello in di ff erent sign languages 🇺🇸
ASL American Sign Language

ASL American Sign Language 🇵🇹 LGP Língua Gestual Portuguesa

ASL American Sign Language 🇵🇹 LGP Língua Gestual Portuguesa 🇮🇹 LIS Lingua dei Segni Italiana

ASL American Sign Language 🇵🇹 LGP Língua Gestual Portuguesa https://sign.mt 🇮🇹 LIS Lingua dei Segni Italiana

• Linguistic Complexity • Gestures, facial expressions, body movements, …
• Multiple sign-languages across the world (ASL, LGP, …) • Technical • Hand shape, orientation and movement • Hands might overlap or move too fast • It needs to combine hands, face and body • Requires large datasets (which are currently missing) Sign Language Challenges

• Linguistic Complexity • Gestures, facial expressions, body movements, …
• Multiple sign-languages across the world (ASL, LGP, …) • Technical • Hand shape, orientation and movement • Hands might overlap or move too fast • It needs to combine hands, face and body • Requires large datasets (which are currently missing) • AI & Computer Vision • Gestures are not 2D • Signs need to be captured from di ff erent angles • Models can easily fail due to hand size, skin tone, etc. Sign Language Challenges

Sign Language SignGemma • New AI model • Focus: Translate
ASL into English text • Available later this year

Sign Language SignGemma • New AI model • Focus: Translate
ASL into English text • Available: unknown

Sign Language SignGemma* • New AI model • Focus: Translate
ASL into English text • Available: unknown * Gemma is a family of open, lightweight, generative AI models created by Google.

GestuAl github.com/cmota/GestuAl

github.com/cmota/GestuAl GestuAlX

Sign Language Veo3: Person waving hello Challenges

Sign Language Challenges Veo3: Person waving hello Video processing

Sign Language Challenges Video processing Uncompressed Full HD (1080p) •
1920x1080 pixels = 2 073 600 pixels

1920x1080 pixels = 2 073 600 pixels 3 color channels (RGB) • 3x1920x1080 pixels

1920x1080 pixels = 2 073 600 pixels 3 color channels (RGB) • 3x1920x1080 pixels 6 220 800 pixels to compute for a single frame

Sign Language Challenges Video processing Most of the data in
the image is irrelevant • The model needs the sign from the hands’s shape • Not the color of the speaker’s sweater

Sign Language Challenges Video processing Most of the data in
the image is irrelevant • The model needs the sign from the hands’s shape • Not the color of the speaker’s sweater In some scenarios, the model might assign the word hello with a blue sweater

Sign Language Challenges Video processing Lack of invariance • If
the person moves towards/backwards the camera - The model may fail to recognize the sign

the person moves towards/backwards the camera - The model may fail to recognize the sign • If the gesture is done on the top-left vs bottom-right • The model may fail to recognize the sign

the person moves towards/backwards the camera - The model may fail to recognize the sign • If the gesture is done on the top-left vs bottom-right • The model may fail to recognize the sign • If the camera is facing a (slightly) di ff erent angle • The model may fail to recognize the sign

Sign Language Challenges Video processing To overcome these limitations, a
model trained with raw data needs an astronomical amount of data in: - Hundreds of di ff erent hands - Various backgrounds - Di ff erent light conditions (day, night, shadows, …) - Multiple camera angles and distances - Di ff erent hand size, skin tone, shape, etc. Good luck!

Sign Language Landmarks Video processing Landmarks Using…

Sign Language Landmarks Video processing Landmarks Using… Set of points
that identify speci fi c features of an object: - Face - Hand - Body

Sign Language Landmarks Veo3: Person waving hello

Sign Language Landmarks Veo3: Person waving hello 0 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 WRIST 1 THUMB_CMC 2 THUMB_MCP 3 THUMB_IP 4 THUMB_TIP 5 INDEX_FINGER_MCP 6. …

Sign Language Landmarks Veo3: Person waving hello 0 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 WRIST 1 THUMB_CMC 2 THUMB_MCP 3 THUMB_IP 4 THUMB_TIP 5 INDEX_FINGER_MCP 6. … 21coordinates

Sign Language Landmarks Landmarks Each landmark provides: - x and
y coordinates: independent of the image size - z coordinate: depth - visibility: certainty about the landmark visibility

MediaPipe MediaPipe

MediaPipe • Open-source framework • Cross-platform • Adds-on: computer vision
and machine learning • Highly performant for smartphones - Supports live video processing • Pre-trained models for common computer vision tasks MediaPipe

MediaPipe MediaPipe Tasks Tasks Studio Model Maker

MediaPipe Tasks MediaPipe Tasks Tasks Studio Model Maker

MediaPipe MediaPipe Tasks - High-level APIs for common tasks -
Triggered when recognizes gestures, faces, etc. Tasks Tasks Studio Model Maker

MediaPipe • Face detection • Face mesh • Iris •
Hands • Pose • Gesture • Sel fi e segmentation • Hair segmentation • Object detection MediaPipe Tasks Tasks Studio Model Maker

• Face detection • Face mesh • Iris • Hands
• Pose • Gesture • Sel fi e segmentation • Hair segmentation • Object detection MediaPipe MediaPipe Tasks Tasks Studio Model Maker Spacial Frame Classi fi cation (static)

• Face detection • Face mesh • Iris • Hands
• Pose • Gesture • Sel fi e segmentation • Hair segmentation • Object detection MediaPipe MediaPipe Tasks Tasks Studio Model Maker Dynamic Sequence Recognition

MediaPipe Hand Gesture Recognition MediaPipe

MediaPipe 1 MediaPipe Hand Gesture Recognition

MediaPipe 1 2 MediaPipe Hand Gesture Recognition

MediaPipe 22 21 … 61 92 1 2 3 MediaPipe
Hand Gesture Recognition

MediaPipe Hello 22 21 … 61 92 1 2 3
4 MediaPipe Hand Gesture Recognition

MediaPipe 🍿 Easy on-device Machine Learning with MediaPipe Hello 22
21 … 61 92 1 2 3 4 MediaPipe Hand Gesture Recognition

MediaPipe MediaPipe val handLandmarkerListener = object : HandLandmarkerHelper.LandmarkerListener { override
fun onError(error: String, errorCode: Int) { // Do something } override fun onResults(resultBundle: HandLandmarkerHelper.ResultBundle) { // Do something } } Tasks

MediaPipe Tasks val handLandmarkerListener = object : HandLandmarkerHelper.LandmarkerListener { override
fun onError(error: String, errorCode: Int) { // Do something } override fun onResults(resultBundle: HandLandmarkerHelper.ResultBundle) { // Do something } }

fun onError(error: String, errorCode: Int) { // Do something } override fun onResults(resultBundle: HandLandmarkerHelper.ResultBundle) { val results = resultBundle.results.firstOrNull() } }

fun onError(error: String, errorCode: Int) { // Do something } override fun onResults(resultBundle: HandLandmarkerHelper.ResultBundle) { val results = resultBundle.results.firstOrNull() ?. landmarks() } }

fun onError(error: String, errorCode: Int) { // Do something } override fun onResults(resultBundle: HandLandmarkerHelper.ResultBundle) { detectDynamicSign(resultBundle.results.firstOrNull() ?. landmarks()) } } <Category "Hello" (displayName= score=0.65438545 index=-1)> onResults$gestureCategories

MediaPipe Studio - Visualize and test models - No code
needed - Web tool, where you can test how a model performs - Data can be uploaded or from webcam - Built for prototyping and tweaking con fi gurations MediaPipe Studio Tasks Studio Model Maker

MediaPipe MediaPipe Studio mediapipe-studio.webapps.google.com

MediaPipe Model Maker - Customize existing models with your data
- You can test how a model performs - Built for prototyping and tweaking con fi gurations MediaPipe Model Maker Tasks Studio Model Maker

Model Training

Spacial Frame Classi fi cation

Spacial Frame Classi fi cation Dynamic Sequence Recognition

Model Training Training a new word MediaPipe Spacial Frame Classi
fi cation (static)

MediaPipe - Best for poses, signs, static content - Images
as input (internally converted to landmarks) - No concept of time - Low complexity for training Model Training Training a new word Spacial Frame Classi fi cation (static)

as input (internally converted to landmarks) - No concept of time - Low complexity for training Model Training Training a new word H Spacial Frame Classi fi cation (static)

as input (internally converted to landmarks) - No concept of time - Low complexity for training Model Training Training a new word E H Spacial Frame Classi fi cation (static)

as input (internally converted to landmarks) - No concept of time - Low complexity for training Model Training Training a new word E H L Spacial Frame Classi fi cation (static)

as input (internally converted to landmarks) - No concept of time - Low complexity for training Model Training Training a new word E H L L Spacial Frame Classi fi cation (static)

as input (internally converted to landmarks) - No concept of time - Low complexity for training Model Training Training a new word E H L L O Spacial Frame Classi fi cation (static)

Inference is triggered on a frame-by-frame basis. Model Training Single
input 1 Training a new word Spacial Frame Classi fi cation (static)

* the more pictures we give the model, the better
it learns more reliable it becomes. Android takes burst of photos* 1 Android Model Training Spatial frame classi fi cation (static) Xd

Android takes burst of photos • PreviewView and ImageAnalysis from
CameraX 1 Android Model Training 🙋 Training a new word: statically

CameraX 1 Android Model Training val preview = Preview.Builder() .build().also { it.setSurfaceProvider(previewView.surfaceProvider) } analyzer = ImageAnalysis.Builder() .setBackpressureStrategy(STRATEGY_KEEP_ONLY_LATEST) .setOutputImageFormat(OUTPUT_IMAGE_FORMAT_RGBA_8888) .build() 🙋 Training a new word: statically

CameraX 1 Android Model Training val selector = CameraSelector.DEFAULT_FRONT_CAMERA val providerFuture = ProcessCameraProvider.getInstance(this) cameraProvider = providerFuture.get() cameraProvider.bindToLifecycle( this, selector, preview, analyzer ) 🙋 Training a new word: statically

CameraX • Save images 1 Android Model Training analyzer.setAnalyzer(executor) { imageProxy -> val bitmap = imageProxy.toBitmap() saveBitmap(bitmap) ... } 🙋 Training a new word: statically

CameraX • Save images Organize your data under subfolders 1 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l Android Model Training Training a new word: statically

CameraX • Save images Organize your data under subfolders Export your datasets 1 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l Android Model Training Training a new word: statically

Android takes burst of photos 1 🙋 📁 dataset/letter_h 📁
dataset/letter_e 📁 dataset/letter_l These fi les are imported into Google Colab 2 import Colab Model Training Training a new word: statically

dataset/letter_e 📁 dataset/letter_l These fi les are imported into Google Colab • Cloud-based platform • You can write and execute Python code • Provides access to GPU and TPU • Built for machine learning/ data analysis • Free 2 import Colab Model Training Training a new word: statically

dataset

import Colab These fi les are imported into Google Colab
2 import zipfile from pathlib import Path dest = Path("/content/dataset") dest.mkdir(parents=True, exist_ok=True) !unzip /content/dataset.zip -d /content/dataset.zip 🙋 📁 dataset/thumbs_up 📁 dataset/palm_open 📁 dataset/victory import Model Training

🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import Android takes
burst of photos 1 These fi les are imported into Google Colab Train the model 3 2 train Colab Model Training Training a new word: statically

dataset/letter_e 📁 dataset/letter_l These fi les are imported into Google Colab 3 2 import Colab For static images use MediaPipe tasks Train the model train Model Training Training a new word: statically

Do you remember the Gesture task? Model Training MediaPipe Tasks

MediaPipe Tasks transfer learning supports the machine learning technique reducing
training time and the amount of data needed

Learned features Victory Other word Pre-trained model Model Training Training
from scratch

Trained to identify the word hello Pre-trained model Victory Hello
New task Model Training Training from scratch

Model Training MediaPipe tasks w/ transfer learning updates .task 3
Colab !pip install -q mediapipe-model-maker from mediapipe_model_maker import g_recognizer 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import transfer learning

Colab data = g_recognizer.Dataset.from_folder( dirname=IMAGES_PATH, hparams=g_recognizer.HandDataPreprocessingParams() ) 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import transfer learning

Colab data = g_recognizer.Dataset.from_folder( dirname=IMAGES_PATH, hparams=g_recognizer.HandDataPreprocessingParams() ) train_data, val_data = data.split(0.8) 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import transfer learning

Colab data = g_recognizer.Dataset.from_folder( dirname=IMAGES_PATH, hparams=g_recognizer.HandDataPreprocessingParams() ) train_data, val_data = data.split(0.8) hparams = g_recognizer.HParams(export_dir="lgp_gestures") options = g_recognizer.GestureRecognizerOptions(hparams) model = grecognizer.GestureRecognizer.create( train_data=train_data, validation_data=validation_data, options=options ) 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import transfer learning

Colab data = g_recognizer.Dataset.from_folder( dirname=IMAGES_PATH, hparams=g_recognizer.HandDataPreprocessingParams() ) train_data, val_data = data.split(0.8) hparams = g_recognizer.HParams(export_dir="lgp_gestures") options = g_recognizer.GestureRecognizerOptions(hparams) model = grecognizer.GestureRecognizer.create( train_data=train_data, validation_data=validation_data, options=options ) model.export_model() 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import transfer learning gesture_recognizer.task

Platform Deployment Android Android takes burst of photos/ records gestures
1 These fi les are imported into Google Colab MediaPipe tasks w/ transfer learning updates .task 3 2 Update .task fi le on Android 4 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import gesture_recognizer.task

Platform Deployment Add the generated file to the assets folder
• Using PreviewView to see your camera feed • ImageAnalysis to analyze each frame 4 Android 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import gesture_recognizer.task

Platform Deployment Android val base = BaseOptions.builder() .setModelAssetPath("gesture_recognizer.task") .build() Add
the generated file to the assets folder • Using PreviewView to see your camera feed • ImageAnalysis to analyze each frame 4 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import .task or .t fl ite fi le

Platform Deployment Android val options = GestureRecognizer.GestureRecognizerOptions .builder() .setBaseOptions(base) .setRunningMode(RunningMode.LIVE_STREAM)
.setNumHands(2) .setMinHandDetectionConfidence(0.5f) . .. .setResultListener(this :: onResults) .build() return GestureRecognizer.createFromOptions(this, options) Add the generated file to the assets folder • Using PreviewView to see your camera feed • ImageAnalysis to analyze each frame 4 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import .task or .t fl ite fi le

Platform Deployment Android private fun onResults( result: GestureRecognizerResult, input: MPImage
) { result.landmarks() ... Add the generated file to the assets folder • Using PreviewView to see your camera feed • ImageAnalysis to analyze each frame 4 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import .task or .t fl ite fi le

Platform Deployment Android <Category "H" (score=0.91343545)> Add the generated file
to the assets folder • Using PreviewView to see your camera feed • ImageAnalysis to analyze each frame 4 private fun onResults( result: GestureRecognizerResult, input: MPImage ) { result.gestures().first() ... onResults$result.gestures().first() 🙋 📁 dataset/letter_h 📁 dataset/letter_e 📁 dataset/letter_l import .task or .t fl ite fi le

as input (internally converted to landmarks) - No concept of time - Low complexity for training Model Training Training a new word Fails at dynamic movement Spacial Frame Classi fi cation (static)

Model Training Training a new word Dynamic Sequence Recognition MediaPipe

Model Training Training a new word Dynamic Sequence Recognition MediaPipe
Keras LSTM + TFLite

Keras LSTM + TFLite - Best for gestures - Input
data should be landmarks - Concept of time (50 frames) - High complexity for training Model Training Training a new word Dynamic Sequence Recognition MediaPipe

Android record new gesture • PreviewView and ImageAnalysis from CameraX
• HandLandmarker from MediaPipe Tasks 1 Android Model Training val base = BaseOptions.builder() .setModelAssetPath("hand_landmarker.task") .build() 🙋 Training a new word: dynamically

• HandLandmarker from MediaPipe Tasks 1 Android Model Training analyzer.setAnalyzer(executor) { imageProxy -> val bitmap = imageProxy.toBitmap() val result = handLandmarker.detect(bitmap) saveLandmarksToJson(result, jsonFile) ... } 🙋 Training a new word: dynamically

landmarks=[[<Normalized Landmark (x=0.7980323 y=0.93040633 z=1.6768519E-7 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark
(x=0.84686965 y=0.7815488 z=0.17734309 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.83930504 y=0.67293304 z=0.27335492 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.8450331 y=0.5756166 z=0.32521075 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.8525987 y=0.50325805 z=0.37289098 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.66961986 y=0.5906027 z=0.33934474 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.7292199 y=0.5008519 z=0.41763493 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.78829324 y=0.49037883 z=0.4322687 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.8099678 y=0.47911656 z=0.4381774 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.6511261 y=0.55183184 z=0.25960848 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.7372778 y=0.448718 z=0.33917797 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.8074192 y=0.44839847 z=0.33002883 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.84728354 y=0.46005705 z=0.31662878 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.65492624 y=0.5027987 z=0.17971571 visibility= Optional.empty presence=Optional.empty)>, <Normalized L a n d m a r k ( x = 0 . 7 2 5 9 3 8 4 4 y = 0 . 3 8 8 9 0 3 z = 0 . 2 5 3 3 1 0 0 2 v i s i b i l i t y = O p t i o n a l . e m p t y presence=Optional.empty)>, <Normalized Landmark (x=0.7843007 y=0.37726077 z=0.25037652 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.8155069 y=0.37290576 z=0.23587881 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.66662866 y=0.44040278 z=0.1096873 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.7140896 y=0.33419773 z=0.17267345 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.7474869 y=0.2870987 z=0.19267684 visibility= Optional.empty presence=Optional.empty)>, <Normalized Landmark (x=0.7668232 y=0.24446534 z=0.1966459 visibility= Optional.empty presence=Optional.empty)>]]

Model Training Android record new gesture • PreviewView and ImageAnalysis
from CameraX • HandLandmarker from MediaPipe Tasks Organize your data under subfolders 1 Android Training a new word: dynamically 🙋 📁 dataset/weather 📁 dataset/hello 📁 dataset/cancel

• HandLandmarker from MediaPipe Tasks Organize your data under subfolders Repeat multiple times 1 Android Training a new word: dynamically Model Training 🙋 📁 dataset/weather 📁 dataset/hello 📁 dataset/cancel_00 📁 dataset/cancel_01 📁 dataset/cancel_02 …

• HandLandmarker from MediaPipe Tasks Organize your data under subfolders Repeat multiple times Export your datasets 1 Android Training a new word: dynamically Model Training 🙋 📁 dataset/weather 📁 dataset/hello 📁 dataset/cancel

🙆 🙋 🙅 📁 dataset/weather 📁 dataset/hello 📁 dataset/cancel import
Android record new gesture 1 These fi les are imported into Google Colab For dynamic gestures use TensorFlow/Keras 3 2 train Colab Train the model Training a new word: dynamically Model Training

Keras Android record new gesture 1 🙆 🙋 🙅 📁
dataset/weather 📁 dataset/hello 📁 dataset/cancel These fi les are imported into Google Colab 3 2 import train For dynamic gestures use TensorFlow/Keras Train the model Training a new word: dynamically Model Training

Android record new gesture 1 🙆 🙋 🙅 📁 dataset/weather
📁 dataset/hello 📁 dataset/cancel These fi les are imported into Google Colab 3 2 import train Keras For dynamic gestures use TensorFlow/Keras • High-level API • Integrated with TensorFlow • Easier and faster to build and train models Train the model Training a new word: dynamically Model Training

📁 dataset/hello 📁 dataset/cancel These fi les are imported into Google Colab 3 2 import train Keras For dynamic gestures use TensorFlow/Keras Train the model Training a new word: dynamically Model Training

📁 dataset/hello 📁 dataset/cancel These fi les are imported into Google Colab 3 2 import train Keras For dynamic gestures use TensorFlow/Keras • Deep learning framework • Handles: - Low-level operations - Optimization - Deployment Train the model Training a new word: dynamically Model Training

DEMO! Training a new word: dynamically Model Training

Firebase Configuration

Firebase Firebase Genkit Cloud Functions

• Serverless compute service • Supports HTTP requests - It
can also be triggered from other Firebase services Firebase Functions Cloud Functions

can also be triggered from other Firebase services • Firebase Authentication: User sign up Firebase Functions Cloud Functions Sends a welcome email

can also be triggered from other Firebase services • Receives a HTTP request to get the weather Firebase Functions Cloud Functions talks to Genkit to generate a user-friendly response

Firebase • Open-source framework for building AI applications • Genkit
uses fl ows (set of steps) to de fi ne a task: Genkit Genkit

uses fl ows (set of steps) to de fi ne a task: Genkit How’s the weather? Genkit

uses fl ows (set of steps) to de fi ne a task: Genkit How’s the weather? Generate a user-friendly response. Genkit

uses fl ows (set of steps) to de fi ne a task: Genkit Genkit

uses fl ows (set of steps) to de fi ne a task: Receive a HTTP Request with latitude and longitude Genkit 1 Genkit

uses fl ows (set of steps) to de fi ne a task: Receive a HTTP Request with latitude and longitude Query OpenWeatherMap API for the weather Genkit 2 1 Genkit

uses fl ows (set of steps) to de fi ne a task: Receive a HTTP Request with latitude and longitude Query OpenWeatherMap API for the weather With this information ask Gemini to fi rst map the coordinates into a city and then to create a user- friendly response Genkit 3 2 1 Genkit

Genkit Firebase import { defineFlow } from 'firebase-genkit' import {
geminiPro } from '@genkit-ai/vertexai' import fetch from 'node-fetch'

import { defineFlow } from 'firebase-genkit' import { geminiPro }
from '@genkit-ai/vertexai' import fetch from 'node-fetch' export const generateWeatherMessage = defineFlow( { name: 'generateWeatherMessage', inputSchema: { gesture: 'string', latitude: 'string', longitude: 'string', }, outputSchema: 'string' } => Firebase Genkit

import { defineFlow } from 'firebase-genkit' import { geminiPro }
from '@genkit-ai/vertexai' import fetch from 'node-fetch' export const generateWeatherMessage = defineFlow( { name: 'generateWeatherMessage', inputSchema: { gesture: 'string', latitude: 'string', longitude: 'string', }, outputSchema: 'string' } async ({ gesture, lat, lon }) => { const apiKey = OPENWEATHER_API_KEY.value() const response = await fetch( 'https: // api.openweathermap.org/data/2.5/weather? lat=${lat}&lon=${lon}&appid=${apiKey}' Firebase Genkit temperature

const response = await fetch( 'https: // api.openweathermap.org/data/2.5/weather? lat=${lat}&lon=${lon}&appid=${apiKey}' )
const weatherJson = await response.json() async ({ gesture, lat, lon }) => { const apiKey = OPENWEATHER_API_KEY.value() const response = await fetch( 'https: // api.openweathermap.org/data/2.5/weather? lat=${lat}&lon=${lon}&appid=${apiKey}' ) const weatherJson = await response.json() const prompt = ' Gesto reconhecido: "${gesture}". Coordenadas do local: ${latitude}, ${longitude} . API do tempo retornou os seguintes dados: ${JSON.stringify(weatherJson)}. Com base nestes dados, escreve uma frase natural e Firebase Genkit text (response)

GestuAl Android interface GenkitApi { @POST("generateWeatherMessage") suspend fun generateMessage( @Body
body: WeatherRequest ): WeatherResponse } gesture latitude longitude text (response)

GestuAl Android val provider = genkitApi.generateMessage( WeatherRequest( RequestData( gest =
label, lat = location.value ?. latitude ?: 0.0, lon = location.value ?. longitude ?: 0.0 ) ) ) text (response) gesture latitude longitude response

GestuAl Android tts = TextToSpeech(context) { _ -> val localePt
= Locale.forLanguageTag("pt-PT") val availability = tts.setLanguage(localePt) // Validation checks tts.speak(message, TextToSpeech.QUEUE_FLUSH, null, "id")

GestuAl • Training • Android: Records landmarks on device •
Colab: Train model (transfer learning) • Analyzes • Android: Recognize the gesture • Cloud Functions: Hosts the application • Genkit: Talks with OpenWeatherMap/ Gemini • Gemini: Generates human readable text

Gracias. @cafonsomota

Building Inclusive AI Experiences with Sign Lan...

Building Inclusive AI Experiences with Sign Language and GenAI

More Decks by cmota

Other Decks in Programming

Featured

Transcript