[GDG Mien Trung - DevFest 2024] Build AI-Powered Apps with Gemini API using Vertex AI in Firebase

by Hoàng Nguyễn

Slide 1

Slide 1 text

Build AI-Powered Apps with Gemini API using Vertex AI in Firebase GDG MienTrung Hoàng Nguyễn - Technical Lead @ Nimble

Slide 2

Slide 2 text

Introduction to Gemini API and Vertex AI Gemini models capabilities and Use-cases Integrate Google AI client SDK vs Vertex AI in Firebase SDK Control output generation What We’ll Cover Today

Slide 3

Slide 3 text

Understanding Gemini models and Gemini API GDG MienTrung

Slide 4

Slide 4 text

GDG MienTrung What is Gemini models? 󰚒󰢦

Slide 5

Slide 5 text

Gemini 1.0 Pro Gemini 1.5 Pro Gemini 1.5 Flash Gemini 2.0 Flash Gemini models

Slide 6

Slide 6 text

( 1 ) Gemini model variants and Use-cases gemini-1.0.pro (Deprecated on 15/02/2025): Natural language tasks, multi-turn text and code chat, and code generation ( 2 ) gemini-1.5-pro: Complex reasoning tasks requiring more intelligence ( 3 ) gemini-1.5-flash-8b: High volume and lower intelligence tasks

Slide 7

Slide 7 text

( 4 ) Gemini model variants and Use-cases (cont.) gemini-1.5-flash: Fast and versatile performance across a diverse variety of tasks ( 5 ) gemini-2.0-flash-exp (Released on 11/12/2024): Next generation features, speed, and multimodal generation for a diverse variety of tasks

Slide 8

Slide 8 text

GDG MienTrung What is Gemini API? 󰚒󰢦

Slide 9

Slide 9 text

Gemini API Gemini API is a powerful tool that allows developers to access and utilize Google's advanced Gemini models. ●Text ●Images ●Video ●Audio ●Documents

Slide 10

Slide 10 text

Generate text (text-only input) Generate text (multimodal input) Generate structured output (JSON) Multi-turn chat Gemini API capabilities Function calling

Slide 11

Slide 11 text

Gemini model capabilities GDG MienTrung

Slide 12

Slide 12 text

Supported input and output

Slide 13

Slide 13 text

Supported capabilities and general features

Slide 14

Slide 14 text

Supported capabilities and general features

Slide 15

Slide 15 text

Integrate Gemini API by using Google AI client SDK GDG MienTrung

Slide 16

Slide 16 text

Google AI integration architecture

Slide 17

Slide 17 text

Setup API Key

Slide 18

Slide 18 text

// build.gradle.kts (:app) dependencies { [...] implementation("com.google.ai.client.generativeai:generativeai:0.9.0") }

Slide 19

Slide 19 text

// ViewModel.kt val model = GenerativeModel( model = "gemini-1.5-flash", apiKey = BuildConfig.apikey, [...] )

Slide 20

Slide 20 text

// ViewModel.kt val prompt = "Write a story about a magic backpack." var response = "" generativeModel.generateContentStream(prompt).collect { chunk -> print(chunk.text) response += chunk.text }

Slide 21

Slide 21 text

Vertex AI in Firebase GDG MienTrung

Slide 22

Slide 22 text

●Text ●Images ●Video ●Audio ●Documents Gemini API using Vertex AI ● It is a set of endpoints within the larger Vertex AI API surface: `apiplatform.googleapis.com` ● For Vertex AI in Firebase SDKs, the total request size limit is 20 MB. (Alternative option is using Cloud Storage URLs)

Slide 23

Slide 23 text

Integrate Vertex AI in Firebase GDG MienTrung

Slide 24

Slide 24 text

Vertex AI in Firebase integration architecture

Slide 25

Slide 25 text

Setup Firebase project

Slide 26

Slide 26 text

// build.gradle.kts (:app) dependencies { [...] implementation("com.google.firebase:firebase-vertexai:16.0.2") }

Slide 27

Slide 27 text

// ApiKey is no longer needed to init model instance val model = Firebase.vertexAI.generativeModel( modelName = "gemini-1.5-flash", [...] )

Slide 28

Slide 28 text

Prompt design Model configuration Safety Settings System instructions Structured output using response schema Control content generation

Slide 29

Slide 29 text

GDG MienTrung What is Prompt design? 󰚒󰢦

Slide 30

Slide 30 text

GDG MienTrung “Prompt design is the process of creating prompts that elicit the desired response from language models.”

Slide 31

Slide 31 text

Task (Required) System instruction (Optional) Few-shot examples (Optional) Contextual information (Optional) Components of a prompt

Slide 32

Slide 32 text

Generate text (text-only input) val prompt = "Tell me about GDG DevFest" var response = "" generativeModel.generateContentStream(prompt).collect { chunk -> print(chunk.text) response += chunk.text }

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Generate text (multimodal input) - Single image // Loads an image from the app/res/drawable/ directory val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky) // Provide a prompt that includes the image specified above and text val prompt = content { image(bitmap) text("What developer tool is this mascot from?") } [...]

Slide 35

Slide 35 text

Generate text (multimodal input) - Multiple images // Loads an image from the app/res/drawable/ directory val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky) val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza) // Provide a prompt that includes the images specified above and text val prompt = content { image(bitmap1) image(bitmap2) text("What's different between these pictures?") } [...]

Slide 36

Slide 36 text

Generate text (multimodal input) - Video val contentResolver = applicationContext.contentResolver contentResolver.openInputStream(videoUri).use { stream -> stream?.let { val bytes = stream.readBytes() // Provide a prompt that includes the video specified above and text val prompt = content { inlineData(bytes, "video/mp4") text("What is in the video?") } [...] }

Slide 37

Slide 37 text

Prompt design Model configuration and parameters Safety Settings System instructions Structured output using response schema Control content generation

Slide 38

Slide 38 text

// Model configuration val model = GenerativeModel( generationConfig = generationConfig { // [0..2] Higher temperature will make outputs more random and diverse temperature = 0.15f // [1..++] Lower top-k also concentrates sampling on the highest probability tokens for each step. Typically 50-100 topK = 32 // [0..1] Lower top-p values reduce diversity and focus on more probable tokens. topP = 1f [...] }, )

Slide 39

Slide 39 text

// Model configuration val model = GenerativeModel( [...] generationConfig = generationConfig { [...] // 1 token ~ 4 characters maxOutputTokens = 4096 [...] }, [...] )

Slide 40

Slide 40 text

Prompt design Model configuration Safety Settings System instructions Structured output using response schema Control content generation

Slide 41

Slide 41 text

// Safety Settings val model = GenerativeModel( [...] safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), // HarmCategory(UNKNOWN, HARASSMENT, HATE_SPEECH, SEXUALLY_EXPLICIT, DANGEROUS_CONTENT) // BlockThreshold(UNSPECIFIED, LOW_AND_ABOVE, MEDIUM_AND_ABOVE, ONLY_HIGH, NONE) ) [...] )

Slide 42

Slide 42 text

Prompt design Model configuration Safety Settings System instructions Structured output using response schema Control content generation

Slide 43

Slide 43 text

// System instructions val model = GenerativeModel( [...] systemInstruction = content { text("Your name is Nobita. And I am Doraemon is talking to you") } )

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

Prompt design Model configuration Safety Settings System instructions Structured output using response schema Control content generation

Slide 46

Slide 46 text

// Structured output using response schema val config = generationConfig { responseMimeType = "application/json" responseSchema = [...] }

Slide 47

Slide 47 text

// Structured output using response schema responseSchema = Schema.arr( name = "characters", description = "List of characters", items = Schema.obj( name = "character", description = "A characters", contents = arrayOf( Schema.str("name", "Name of the character"), Schema.int("age", "Age of the character"), Schema.str("species", "Species of the character"), Schema.enum( name = "accessory", description = "Accessory of the character", values = listOf("hat", "glasses", "scarf"), ), [...]

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

Why migrate to use Vertex AI? GDG MienTrung

Slide 50

Slide 50 text

( 1 ) Why migrate to use Vertex AI? Prototyping vs Production/Enterprise-scale: Google AI client SDKs are useful for getting started with the Gemini API and prototyping ( 2 ) Security features for mobile and web apps: Use Firebase AppCheck to verify API calls are from your actual app ( 3 ) Ecosystem built for mobile and web apps: Enhance integration such as Cloud Storage, Cloud Firestore, Remote Config for Firebase

Slide 51

Slide 51 text

Gemini API and Vertex AI capabilities Model configuration and parameters Prompt design and control output generation Recap