Leverage our skills and apps with new AI/ML tools for Android

Slide 1

Slide 1 text

Leverage your skills & apps with new AI/ML tools for Android Julien Salvi - Android GDE | Android @ Aircall droidcon Lisbon 2024 󰐨 @JulienSalvi Let’s use AI/ML wisely 🤓

Slide 2

Slide 2 text

Julien Salvi Lead Android Engineer @ Aircall Android GDE PAUG, Punk & IPAs! @JulienSalvi Bonjour !

Slide 3

Slide 3 text

The age of AI/ML A little bit of context… A little bit of context… It’s there, we cannot escape 🫣

Slide 4

Slide 4 text

The Age of AI/ML AI context in 2024 ● AI/ML is booming for 2 years now with many tools that are more and more accessible for everyone ● We’ve seen the rise of Generative AI (ChatGPT, Gemini, MistralAI…) and a push in other ML tools (MediaPipe, TensorFlow, ML Kit…) ● Android isn’t not escaping the AI trend! ● We now have a large set of tools to leverage our skills and apps 🚀

Slide 5

Slide 5 text

The Age of AI/ML AI/ML on Android ● We can identify 2 categories of tools: ○ AI for developers ○ AI for apps ● The first one will help developers in building great apps being a daily assistant to leverage their skills ● The second will provide a set of tools to build better apps ● Each tool has its own learning curve, cost and set of features 🚀

Slide 6

Slide 6 text

The Age of AI/ML Learning curve Tools The learning curve

Slide 7

Slide 7 text

AI/ML tools to leverage your skills A little bit of context… A little bit of context… Let AI assist you 🤖

Slide 8

Slide 8 text

AI assistants Gemini, Copilots, JetBrains AI…

Slide 9

Slide 9 text

AI Assistants Gemini, Copilot, JetBrains AI… ● AI assistants should be seen as pair programmers 🤓 Use them to boost productivity, explore new ideas, and learn new techniques. ● DON’T blindly accept every suggestions! 🫣 ● Security & Privacy matters a lot! 🔐 Check your company policy before using a new AI companion. Be mindful of the code you share with the AI assistants.

Slide 10

Slide 10 text

x Let’s see GitHub Copilot & JetBrains AI in action 🤓 x

Slide 11

Slide 11 text

AI chats , ChatGPT, Mistral AI…

Slide 12

Slide 12 text

AI chats The art of prompting ● When using LLM based chats, you must provide the best context possible to get the best answers 📝 ● You're not chatting with a human! 🤖 Be clear, specific, and avoid ambiguity. ● Context is key: Provide relevant information about your code, goals, and desired outcome. ○ Instead of: "Make this better" ○ Try: "Simplify this Kotlin function and explain the changes"

Slide 13

Slide 13 text

AI chats The art of prompting ● Keywords are your best friends: Use relevant technical terms ("Jetpack Compose," "coroutines," "Room database") ● Structure for success: ○ State the task clearly: "Write a..." "Explain how..." "Find errors in..." ○ Provide code snippets or context: (Use the "Insert code" button in Gemini) ○ Specify desired format: "Kotlin code", "Bulleted list", "Slides"... ● If the first response isn't perfect, rephrase or refine your prompt!

Slide 14

Slide 14 text

AI chats Gemini in Android Studio ● Gemini is directly built in Android Studio 🛠 ● It can answer coding question, generate code or help you debug some part of your code 🤓 ● You can control the data/code shared with Gemini 🧐 ● Fine-grained control of the files your share with Gemini using a .aiexclude file 🔐

Slide 15

Slide 15 text

AI chats Gemini in Android Studio ● Get quick answers: Ask about Android APIs, libraries, best practices, or even general coding concepts. ○ "How do I use Room to store data?" ○ "How can I make my app more accessible?" ○ "What's the difference between ViewModel and SavedStateHandle?"

Slide 16

Slide 16 text

AI chats Gemini in Android Studio ● Generate different code options: Describe the functionality you need, and Gemini will suggest code snippets. ○ "Create a function to fetch data from this API endpoint using Retrofit” ○ "Write a composable function that displays a list of items in a lazy grid"

Slide 17

Slide 17 text

AI chats Gemini in Android Studio ● Improve existing code: Ask Gemini to review your code for potential issues, optimizations, or improvements. ○ “Can you help me simplify this code?” ○ “Is there a more efficient way to implement this feature?”

Slide 18

Slide 18 text

Main Usage: Offer suggestions and automate repetitive tasks Cost: $0* Learning Curve: Fast Pros Cons ● Integrated within Android Studio ● Trained for Android dev ● No additional cost ● Privacy controls ● Still in development ● No offline support ● Privacy concerns *depending on how you value your code 😅 in Android Studio

Slide 19

Slide 19 text

Main Usage: Offer suggestions and automate repetitive tasks Cost: From $10/month to $39/seat per month Learning Curve: Fast Pros Cons ● Official Plugin for Android Studio ● Easy connect with your GitHub account ● Lots of features if fully integrated with GitHub ● Privacy controls ● Context awareness ● Non negligible cost ● No offline support ● Privacy concerns GitHub Copilot in Android Studio

Slide 20

Slide 20 text

Main Usage: Offer suggestions and automate repetitive tasks in IntelliJ Cost: €8.33/per month Learning Curve: Fast Pros Cons ● Plugin for IntelliJ ● Efficient code completion and generation ● Can generate commit msg, explain errors, generate documentation ● Privacy controls ● Customer data not used to train the models ● Still in development ● No offline support ● Privacy concerns JetBrains AI ℹ JetBrains AI is using LLMs from OpenAI and Google

Slide 21

Slide 21 text

x Let’s have a real look 🤓

Slide 22

Slide 22 text

AI/ML tools for your apps A little bit of context… A little bit of context… Build smarter & richer apps 󰳕

Slide 23

Slide 23 text

for Android GenAI for your apps

Slide 24

Slide 24 text

In a nutshell ● Gemini easily enables generative AI capabilities in your apps to build enhanced features like sentiment analysis, smart bots, text summary and more ● ⚠ Only use the Google AI Client SDK for prototyping as you can leak your API key if it’s embedded in your app ● Prefer using Gemini in Firebase with Vertex AI or having your own gateway for a safe usage for Android

Slide 25

Slide 25 text

In a nutshell ● Gemini on-device with Nano is still a private preview 🥲 ● The more context (text and images) you give, the more accurate your response will be! ● Experiment with the parameters to get the desired output for Android

Slide 26

Slide 26 text

Google AI Client SDK for Android https://developer.android.com/ai/google-ai-client-sdk

Slide 27

Slide 27 text

Vertex AI with Firebase for Android https://developer.android.com/ai/vertex-ai-firebase

Slide 28

Slide 28 text

dependencies { // Google AI Client SDK for Android implementation 'com.google.ai.client.generativeai:generativeai:0.9.0' // Vertex AI in Firebase implementation 'com.google.firebase:firebase-vertexai:16.0.0-beta04' } 🧪 prototyping

Slide 29

Slide 29 text

// With Google AI SDK on Android val model = GenerativeModel( model = "gemini-1.5-flash", apiKey = "", generationConfig = generationConfig { temperature = 0.15f topK = 32 topP = 1f maxOutputTokens = 4096 }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE), ) )

Slide 30

Slide 30 text

// With Vertex AI in Firebase val model = Firebase.vertexAI.generativeModel( model = "gemini-1.5-flash", generationConfig = generationConfig { temperature = 0.15f topK = 32 topP = 1f maxOutputTokens = 4096 }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE), ) )

Slide 31

Slide 31 text

// Text generation with a simple prompt scope.launch { val response = model.generateContent("Give a recipe with the best Portuguese ingredients") } // Use an image and a prompt scope.launch { val response = model.generateContent( content { image(bitmap) text("Is there some carrot in this picture?") } ) } // Text generation as a stream thanks to Flow scope.launch { var outputContent = "" generativeModel.generateContentStream("My awesome prompt").collect { response -> outputContent += response.text } }

Slide 32

Slide 32 text

for Android Main Usage: Enhance your apps with GenAI-based features Cost: from $0 to $21/1 million tokens (output)* Learning Curve: Fast Pros Cons ● Fast integration with Android ● Proxy with Firebase ● Fast learning curve ● Text and image as input ● On-device capabilities with Gemeni Nano ● Lots of things still in preview ● Heavy process can be costly ● High risk of leaking your API key if you embed the SDK in your app *depending on the Gemini model and/or Firebase cost

Slide 33

Slide 33 text

ML Kit NLP and Video/Image analysis

Slide 34

Slide 34 text

In a nutshell ML Kit on Android ● ML Kit brings powerful and easy-to-use ML features, optimized for Android and iOS with minimal coding and resource. ● It provides pre-built and customizable models for common use cases such as image and text recognition, face detection, barcode scanning… ● ML Kit also allows developers to train custom models using their own data.

Slide 35

Slide 35 text

Model installation ML Kit on Android ● Models in ML Kit APIs can be installed in 3 different ways: ○ Unbundled: Models are downloaded and managed via Google Play Services. ○ Bundled: Models are statically linked to your app at build time. ○ Dynamically downloaded: Models are downloaded on demand. ● By using ML Kit you will increase your app size (2 to 10 MB per model).

Slide 36

Slide 36 text

Vision libraries ML Kit on Android Text Recognition v2 Face Detection Face Mesh Detection (beta) Object Detection Image Labeling Document Scanning (beta) Pose Detection (beta) Barcode Scanning Digital Ink Recognition (beta) Selfie and subject segmentation (beta)

Slide 37

Slide 37 text

NLP libraries ML Kit on Android Language identification Smart Reply Translation Entity extraction (beta)

Slide 38

Slide 38 text

Text Recognition v2 ML Kit on Android ● Text Recognition v2 allows us to extract text from images (camera or static images) ● Trained to recognize text in over 100 languages, including Latin-based scripts and non-Latin scripts such as Japanese or Chinese. ● The Text Recognizer segments text into blocks, lines, elements and symbols. Bundled model: +4 MB per architecture

Slide 39

Slide 39 text

Slide 40

Slide 40 text

dependencies { // To recognize Latin script implementation 'com.google.mlkit:text-recognition:16.0.0' // To recognize Chinese script implementation 'com.google.mlkit:text-recognition-chinese:16.0.0' // To recognize Devanagari script implementation 'com.google.mlkit:text-recognition-devanagari:16.0.0' // To recognize Japanese script implementation 'com.google.mlkit:text-recognition-japanese:16.0.0' // To recognize Korean script implementation 'com.google.mlkit:text-recognition-korean:16.0.0' }

Slide 41

Slide 41 text

// Init TextRecognition client (here latin languages) val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS) // Load bitmpap image for instance val image = InputImage.fromBitmap(bitmap, 0) // Use the client to process the image val result = recognizer.process(image) .addOnSuccessListener { visionText -> // Get text from image and info where they are located val allText = visionText.text val blocks = visionText.textBlocks // ... } .addOnFailureListener { e -> // Task failed with an exception }

Slide 42

Slide 42 text

Kit for Android Main Usage: Build computer vision and NLP features with pre-built models Cost: $0 Learning Curve: Quite Fast Pros Cons ● Fast integration with Android and on-device ● Free to use ● Pre-built models for various use cases ● Custom model deployment ● Optimized for mobile usage ● Black box with pre-built models ● Limited model customization ● Some features require the Google Play Services ● Performance for Computer Vision

Slide 43

Slide 43 text

x Let’s explore some real life examples 🚀

Slide 44

Slide 44 text

Google AI Edge MediaPipe & TensorFlow Lite LiteRT

Slide 45

Slide 45 text

In a nutshell TensorFlow Lite or LiteRT ● LiteRT (formerly TensorFlow Lite) is a high-performance cross-platform runtime for on-device AI ● Convert or use existing model that suit your use cases or build your own! ● LiteRT is optimized for mobile with a focus on privacy, size and perf ● The learning curve of building your own TFLite models can be quite steep and can require Python knowledge https://ai.google.dev/edge/litert

Slide 46

Slide 46 text

In a nutshell TensorFlow Lite or LiteRT ● You can take advantage of the Play Service to have a lighter app and use high-level API in Java/Kotlin (recommended way) ● The high-level API will be there in run the inferences through an Interpreter API expose in library. ● You will have control on the input, output and learning part ● Otherwise, you’ll have to deal a C/C++ API! https://ai.google.dev/edge/litert ●

Slide 47

Slide 47 text

TensorFlow Lite or LiteRT https://developers.googleblog.com/en/tensorflow-lite-is-now-litert/

Slide 48

Slide 48 text

LiteRT / TensorFlow Lite Main Usage: Build and deploy ML models on Android to bring on-device ML features Cost: $0* Learning Curve: High Pros Cons ● Build your own models or use existing ones ● Optimized for on-device ML ● Offline support ● Low latency and real time performances ● Full control of the flow ● Steepest learning curve for Android Dev ● Python knowledge mandatory ● Model conversation to .tflite format ● Requires strong ML knowledge *building and hosting your models can be non negligible

Slide 49

Slide 49 text

In a nutshell MediaPipe Framework ● MediaPipe Framework is a low-level tool to build on-device ML pipelines ● It requires NDK/C++ to run the pipelines on Android and be familiar with several Framework concepts (Packets, Graph, Calculator) ● The learning curve is steep and it can take some time to master the entire flow https://ai.google.dev/edge/mediapipe/framework

Slide 50

Slide 50 text

In a nutshell MediaPipe Solutions ● MediaPipe brings a cross-platform and easy-to-use ML solution, optimized for mobile with minimal coding and resource. ● It provides pre-built models for multiple fields such as vision, text, audio or GenAI… or you can build and evaluate your own models with the Model Maker & Studio tools ● You must add the models in the app resources before using the MediaPipe libraries https://ai.google.dev/edge/mediapipe/solutions/guide

Slide 51

Slide 51 text

MediaPipe Solutions Gesture recognition Image Classification Face stylization Object detection Audio classification Interactive segmentation Vision Hand detection Face detection Experimental Text Text classification Text embedded Language identification Audio GenAI Image generation LLM inference Experimental

Slide 52

Slide 52 text

Hand landmarks detection MediaPipe Solution ● Hand landmarks detection identify the key points of the hand ● The input can a static image, a decoded video frame or a live stream ● The library offers many configurable options ● ⚠ Embedding models in your app will have an impact on your app size

Slide 53

Slide 53 text

dependencies { // To recognize hand landmarks implementation 'com.google.mediapipe:tasks-vision:0.10.15' } // Download the pre-built model // Add it to your app assets /src/main/assets

Slide 54

Slide 54 text

// Path the model to the library val baseOptions = BaseOptions.builder().setModelAssetPath("path_to_model").build() // Configure the hand landmarks detection here val optionsBuilder = HandLandmarker.HandLandmarkerOptions.builder() .setBaseOptions(baseOptions) .setMinHandDetectionConfidence(minHandDetectionConfidence) .setMinTrackingConfidence(minHandTrackingConfidence) .setMinHandPresenceConfidence(minHandPresenceConfidence) .setNumHands(maxNumHands) .setRunningMode(RunningMode.IMAGE) // Start detecting! val handLandmarker = HandLandmarker.createFromOptions(context, optionsBuilder.build()) val mediaPipeImage = BitmapImageBuilder(image).build() // image is Bitmap val result = handLandmarker?.detect(mediaPipeImage)

Slide 55

Slide 55 text

Slide 56

Slide 56 text

MediaPipe (Solutions & Framework) Main Usage: Build ML-based features (vision, text, audio) with turnkey models or your own Cost: $0* Learning Curve: Medium high Pros Cons ● Built-in models for MediaPipe Tasks ● Lots of use cases covered ● On-device ML ● Customize your own models ● Cross-platform ● Steeper learning curve ● MediaPipe Framework requires NDK knowledge ● Python recommended to build ML models *building and hosting your models can be non negligible

Slide 57

Slide 57 text

x MediaPipe live from the studio 🚀

Slide 58

Slide 58 text

AI assistants Gemini in Android Studio https://developer.android.com/studio/preview/gemini GitHub Copilot https://docs.github.com/en/copilot JetBrains AI https://www.jetbrains.com/help/idea/ai-assistant.html

Slide 59

Slide 59 text

AI/ML tools ML Kit https://developers.google.com/ml-kit MediaPipe https://ai.google.dev/edge/mediapipe/solutions/guide Gemini on Android https://developer.android.com/ai/generativeai ML/AI Codelabs https://codelabs.developers.google.com/?category=aiandmachinelearning&product=android

Slide 60

Slide 60 text

Julien Salvi droidcon Lisbon 2024 󰐨 @JulienSalvi Obrigado! Take good advantage of AI