Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[GDG Mien Trung - DevFest 2024] Build AI-Power...

[GDG Mien Trung - DevFest 2024] Build AI-Powered Apps with Gemini API using Vertex AI in Firebase

This is TechTalk session in DevFest 2024 organized by GDG Mien Trung.
Event link: https://gdg.community.dev/events/details/google-gdg-mientrung-presents-gdg-devfest-mientrung-2024/

We explored building AI-powered apps using Google's Gemini API and Vertex AI in Firebase. Here's a snapshot of what we covered:

🔹 Gemini API: Access advanced AI capabilities for text, images, video, and audio with powerful models like Gemini 1.5 Pro and the latest Gemini 2.0 Flash.
🔹 Integration Made Easy: Learn how to implement Gemini API with the Google AI Client SDK or Vertex AI through Firebase SDKs.
🔹 Prompt Design & Model Configurations: Master techniques for controlling AI output, designing effective prompts, and ensuring safety settings.
🔹 Why Vertex AI?: Ideal for secure, production-ready apps with seamless Firebase ecosystem integration.

Hoàng Nguyễn

December 14, 2024
Tweet

More Decks by Hoàng Nguyễn

Other Decks in Technology

Transcript

  1. Build AI-Powered Apps with Gemini API using Vertex AI in

    Firebase GDG MienTrung Hoàng Nguyễn - Technical Lead @ Nimble
  2. Introduction to Gemini API and Vertex AI Gemini models capabilities

    and Use-cases Integrate Google AI client SDK vs Vertex AI in Firebase SDK Control output generation What We’ll Cover Today
  3. ( 1 ) Gemini model variants and Use-cases gemini-1.0.pro (Deprecated

    on 15/02/2025): Natural language tasks, multi-turn text and code chat, and code generation ( 2 ) gemini-1.5-pro: Complex reasoning tasks requiring more intelligence ( 3 ) gemini-1.5-flash-8b: High volume and lower intelligence tasks
  4. ( 4 ) Gemini model variants and Use-cases (cont.) gemini-1.5-flash:

    Fast and versatile performance across a diverse variety of tasks ( 5 ) gemini-2.0-flash-exp (Released on 11/12/2024): Next generation features, speed, and multimodal generation for a diverse variety of tasks
  5. Gemini API Gemini API is a powerful tool that allows

    developers to access and utilize Google's advanced Gemini models. •Text •Images •Video •Audio •Documents
  6. Generate text (text-only input) Generate text (multimodal input) Generate structured

    output (JSON) Multi-turn chat Gemini API capabilities Function calling
  7. // ViewModel.kt val prompt = "Write a story about a

    magic backpack." var response = "" generativeModel.generateContentStream(prompt).collect { chunk -> print(chunk.text) response += chunk.text }
  8. •Text •Images •Video •Audio •Documents Gemini API using Vertex AI

    • It is a set of endpoints within the larger Vertex AI API surface: `apiplatform.googleapis.com` • For Vertex AI in Firebase SDKs, the total request size limit is 20 MB. (Alternative option is using Cloud Storage URLs)
  9. // ApiKey is no longer needed to init model instance

    val model = Firebase.vertexAI.generativeModel( modelName = "gemini-1.5-flash", [...] )
  10. GDG MienTrung “Prompt design is the process of creating prompts

    that elicit the desired response from language models.”
  11. Generate text (text-only input) val prompt = "Tell me about

    GDG DevFest" var response = "" generativeModel.generateContentStream(prompt).collect { chunk -> print(chunk.text) response += chunk.text }
  12. Generate text (multimodal input) - Single image // Loads an

    image from the app/res/drawable/ directory val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky) // Provide a prompt that includes the image specified above and text val prompt = content { image(bitmap) text("What developer tool is this mascot from?") } [...]
  13. Generate text (multimodal input) - Multiple images // Loads an

    image from the app/res/drawable/ directory val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky) val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza) // Provide a prompt that includes the images specified above and text val prompt = content { image(bitmap1) image(bitmap2) text("What's different between these pictures?") } [...]
  14. Generate text (multimodal input) - Video val contentResolver = applicationContext.contentResolver

    contentResolver.openInputStream(videoUri).use { stream -> stream?.let { val bytes = stream.readBytes() // Provide a prompt that includes the video specified above and text val prompt = content { inlineData(bytes, "video/mp4") text("What is in the video?") } [...] }
  15. Prompt design Model configuration and parameters Safety Settings System instructions

    Structured output using response schema Control content generation
  16. // Model configuration val model = GenerativeModel( generationConfig = generationConfig

    { // [0..2] Higher temperature will make outputs more random and diverse temperature = 0.15f // [1..++] Lower top-k also concentrates sampling on the highest probability tokens for each step. Typically 50-100 topK = 32 // [0..1] Lower top-p values reduce diversity and focus on more probable tokens. topP = 1f [...] }, )
  17. // Model configuration val model = GenerativeModel( [...] generationConfig =

    generationConfig { [...] // 1 token ~ 4 characters maxOutputTokens = 4096 [...] }, [...] )
  18. // Safety Settings val model = GenerativeModel( [...] safetySettings =

    listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), // HarmCategory(UNKNOWN, HARASSMENT, HATE_SPEECH, SEXUALLY_EXPLICIT, DANGEROUS_CONTENT) // BlockThreshold(UNSPECIFIED, LOW_AND_ABOVE, MEDIUM_AND_ABOVE, ONLY_HIGH, NONE) ) [...] )
  19. // System instructions val model = GenerativeModel( [...] systemInstruction =

    content { text("Your name is Nobita. And I am Doraemon is talking to you") } )
  20. // Structured output using response schema val config = generationConfig

    { responseMimeType = "application/json" responseSchema = [...] }
  21. // Structured output using response schema responseSchema = Schema.arr( name

    = "characters", description = "List of characters", items = Schema.obj( name = "character", description = "A characters", contents = arrayOf( Schema.str("name", "Name of the character"), Schema.int("age", "Age of the character"), Schema.str("species", "Species of the character"), Schema.enum( name = "accessory", description = "Accessory of the character", values = listOf("hat", "glasses", "scarf"), ), [...]
  22. ( 1 ) Why migrate to use Vertex AI? Prototyping

    vs Production/Enterprise-scale: Google AI client SDKs are useful for getting started with the Gemini API and prototyping ( 2 ) Security features for mobile and web apps: Use Firebase AppCheck to verify API calls are from your actual app ( 3 ) Ecosystem built for mobile and web apps: Enhance integration such as Cloud Storage, Cloud Firestore, Remote Config for Firebase
  23. Gemini API and Vertex AI capabilities Model configuration and parameters

    Prompt design and control output generation Recap