[GDG Mien Trung - DevFest 2024] Build AI-Powered Apps with Gemini API using Vertex AI in Firebase

Build AI-Powered Apps with Gemini API using Vertex AI in
Firebase GDG MienTrung Hoàng Nguyễn - Technical Lead @ Nimble

Introduction to Gemini API and Vertex AI Gemini models capabilities
and Use-cases Integrate Google AI client SDK vs Vertex AI in Firebase SDK Control output generation What We’ll Cover Today

Understanding Gemini models and Gemini API GDG MienTrung

GDG MienTrung What is Gemini models? 󰚒󰢦

Gemini 1.0 Pro Gemini 1.5 Pro Gemini 1.5 Flash Gemini
2.0 Flash Gemini models

( 1 ) Gemini model variants and Use-cases gemini-1.0.pro (Deprecated
on 15/02/2025): Natural language tasks, multi-turn text and code chat, and code generation ( 2 ) gemini-1.5-pro: Complex reasoning tasks requiring more intelligence ( 3 ) gemini-1.5-flash-8b: High volume and lower intelligence tasks

( 4 ) Gemini model variants and Use-cases (cont.) gemini-1.5-flash:
Fast and versatile performance across a diverse variety of tasks ( 5 ) gemini-2.0-flash-exp (Released on 11/12/2024): Next generation features, speed, and multimodal generation for a diverse variety of tasks

GDG MienTrung What is Gemini API? 󰚒󰢦

Gemini API Gemini API is a powerful tool that allows
developers to access and utilize Google's advanced Gemini models. •Text •Images •Video •Audio •Documents

Generate text (text-only input) Generate text (multimodal input) Generate structured
output (JSON) Multi-turn chat Gemini API capabilities Function calling

Gemini model capabilities GDG MienTrung

Supported input and output

Supported capabilities and general features

Integrate Gemini API by using Google AI client SDK GDG
MienTrung

Google AI integration architecture

Setup API Key

// build.gradle.kts (:app) dependencies { [...] implementation("com.google.ai.client.generativeai:generativeai:0.9.0") }

// ViewModel.kt val model = GenerativeModel( model = "gemini-1.5-flash", apiKey
= BuildConfig.apikey, [...] )

// ViewModel.kt val prompt = "Write a story about a
magic backpack." var response = "" generativeModel.generateContentStream(prompt).collect { chunk -> print(chunk.text) response += chunk.text }

Vertex AI in Firebase GDG MienTrung

•Text •Images •Video •Audio •Documents Gemini API using Vertex AI
• It is a set of endpoints within the larger Vertex AI API surface: `apiplatform.googleapis.com` • For Vertex AI in Firebase SDKs, the total request size limit is 20 MB. (Alternative option is using Cloud Storage URLs)

Integrate Vertex AI in Firebase GDG MienTrung

Vertex AI in Firebase integration architecture

Setup Firebase project

// build.gradle.kts (:app) dependencies { [...] implementation("com.google.firebase:firebase-vertexai:16.0.2") }

// ApiKey is no longer needed to init model instance
val model = Firebase.vertexAI.generativeModel( modelName = "gemini-1.5-flash", [...] )

Prompt design Model configuration Safety Settings System instructions Structured output
using response schema Control content generation

GDG MienTrung What is Prompt design? 󰚒󰢦

GDG MienTrung “Prompt design is the process of creating prompts
that elicit the desired response from language models.”

Task (Required) System instruction (Optional) Few-shot examples (Optional) Contextual information
(Optional) Components of a prompt

Generate text (text-only input) val prompt = "Tell me about
GDG DevFest" var response = "" generativeModel.generateContentStream(prompt).collect { chunk -> print(chunk.text) response += chunk.text }

Generate text (multimodal input) - Single image // Loads an
image from the app/res/drawable/ directory val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky) // Provide a prompt that includes the image specified above and text val prompt = content { image(bitmap) text("What developer tool is this mascot from?") } [...]

Generate text (multimodal input) - Multiple images // Loads an
image from the app/res/drawable/ directory val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky) val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza) // Provide a prompt that includes the images specified above and text val prompt = content { image(bitmap1) image(bitmap2) text("What's different between these pictures?") } [...]

Generate text (multimodal input) - Video val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream -> stream?.let { val bytes = stream.readBytes() // Provide a prompt that includes the video specified above and text val prompt = content { inlineData(bytes, "video/mp4") text("What is in the video?") } [...] }

Prompt design Model configuration and parameters Safety Settings System instructions
Structured output using response schema Control content generation

// Model configuration val model = GenerativeModel( generationConfig = generationConfig
{ // [0..2] Higher temperature will make outputs more random and diverse temperature = 0.15f // [1..++] Lower top-k also concentrates sampling on the highest probability tokens for each step. Typically 50-100 topK = 32 // [0..1] Lower top-p values reduce diversity and focus on more probable tokens. topP = 1f [...] }, )

// Model configuration val model = GenerativeModel( [...] generationConfig =
generationConfig { [...] // 1 token ~ 4 characters maxOutputTokens = 4096 [...] }, [...] )

// Safety Settings val model = GenerativeModel( [...] safetySettings =
listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), // HarmCategory(UNKNOWN, HARASSMENT, HATE_SPEECH, SEXUALLY_EXPLICIT, DANGEROUS_CONTENT) // BlockThreshold(UNSPECIFIED, LOW_AND_ABOVE, MEDIUM_AND_ABOVE, ONLY_HIGH, NONE) ) [...] )

// System instructions val model = GenerativeModel( [...] systemInstruction =
content { text("Your name is Nobita. And I am Doraemon is talking to you") } )

// Structured output using response schema val config = generationConfig
{ responseMimeType = "application/json" responseSchema = [...] }

// Structured output using response schema responseSchema = Schema.arr( name
= "characters", description = "List of characters", items = Schema.obj( name = "character", description = "A characters", contents = arrayOf( Schema.str("name", "Name of the character"), Schema.int("age", "Age of the character"), Schema.str("species", "Species of the character"), Schema.enum( name = "accessory", description = "Accessory of the character", values = listOf("hat", "glasses", "scarf"), ), [...]

Why migrate to use Vertex AI? GDG MienTrung

( 1 ) Why migrate to use Vertex AI? Prototyping
vs Production/Enterprise-scale: Google AI client SDKs are useful for getting started with the Gemini API and prototyping ( 2 ) Security features for mobile and web apps: Use Firebase AppCheck to verify API calls are from your actual app ( 3 ) Ecosystem built for mobile and web apps: Enhance integration such as Cloud Storage, Cloud Firestore, Remote Config for Firebase

Gemini API and Vertex AI capabilities Model configuration and parameters
Prompt design and control output generation Recap

Q&A 󰚒󰢦 GDG MienTrung

Thank you 👋 󰳕 hoangnl.dev 🚀 GDG MienTrung

[GDG Mien Trung - DevFest 2024] Build AI-Power...

[GDG Mien Trung - DevFest 2024] Build AI-Powered Apps with Gemini API using Vertex AI in Firebase

More Decks by Hoàng Nguyễn

Other Decks in Technology

Featured

Transcript