Gen AI on Android - Speaker Deck

Gen AI on Android

by Sa-ryong Kang

Slide 1

Slide 1 text

Gen AI for Android Developers Sa-ryong Kang Developer Relations Engineer

Slide 2

Slide 2 text

Why Should I use GenAI on Android? How Can I Use GenAI in My App? How Does Gemini Work? Prompt Engineering Deep Dive What’s Next?

Slide 3

Slide 3 text

Why Should I Use GenAI on Android?

Slide 4

Slide 4 text

● Gemini Nano on Android: Building with on-device gen AI ○ On-device GenAI use cases ○ GenAI APIs powered by Gemini Nano ○ Actual apps leveraging Gemini Nano ○ youtu.be/mP9QESmEDls We’ve covered already at I/O ‘25

Slide 5

Slide 5 text

Let’s see the real example

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

How Can I Use GenAI in My App?

Slide 9

Slide 9 text

The most efficient On-Device AI model on Android

Slide 10

Slide 10 text

● Starting Pixel 8 Pro ● 1.8B or 3.25B param* ● Starting Pixel 10 series ● Even better i18n ● Matformer ● Starting Pixel 9 series ● Internationalization ● Image input Evolution of Gemini Nano * Gemini: A Family of Highly Capable Multimodal Models - arxiv.org/abs/2312.11805

Slide 11

Slide 11 text

New ● Performance & efficiency: Per Layer Embeddings, etc. ● Many-in-1 Flexibility: MatFormer

Slide 12

Slide 12 text

● Confident Adaptive Language Modeling (aka Early Exit) ● Speculative Decoding ● Prefix Caching (coming) ● MatFormer Sub-model AICore Optimization

Slide 13

Slide 13 text

● Summarization ● Proofreading ● Rewrite ● Image Description ● Automatic Speech Recognition (coming soon) GenAI APIs built for on-device tasks

Slide 14

Slide 14 text

Pixel 10, Pixel 10 Pro / XL Pixel 9, Pixel 9 Pro / Pro XL / Pro Fold Currently supporting 33 devices Magic 7 Pro, Magic 7 iQOO 13 Razr 60 Ultra OnePlus 13, OnePlus 13s Find N5, Find X8, Find X8 Pro POCO F7 Ultra realme GT 7 Pro Galaxy Z Fold7 Galaxy S25, Galaxy S25+, Galaxy S25 Ultra vivo X200, vivo X200 Pro Xiaomi 15 Ultra, Xiaomi 15

Slide 15

Slide 15 text

Prompt API ● Currently, experimental through Google AI Edge SDK ● Beta release coming soon! ○ Production-ready ○ As the part of MLKit SDK

Slide 16

Slide 16 text

Want to experiment with open models?

Slide 17

Slide 17 text

Gemma 3n + MediaPipe LLM Inference API

Slide 18

Slide 18 text

Want to use the power of cloud-based Gemini?

Slide 19

Slide 19 text

Firebase AI Logic Firebase SDKs Gemini API in Vertex AI Gemini Developer API

Slide 20

Slide 20 text

How Does Gemini Work?

Slide 21

Slide 21 text

Quick Quiz

Slide 22

Slide 22 text

Original Prompt: “Determine whether a movie review is positive or negative.”

Slide 23

Slide 23 text

“Determine whether a movie review is positive or negative. This is very important to my career.” Q: Is the accuracy improved?

Slide 24

Slide 24 text

The answer is yes. LLM thinks differently than humans.

Slide 25

Slide 25 text

● Tokenization ● Prefix ● Stringify ● Input Embedding ● Attention Layer ● Sampling How Gemini Works?

Slide 26

Slide 26 text

1. Preprocessing - Tokenization ● Token lookup table depends on NPU, etc. ● In Gemini Nano, ○ 1 token = avg. 1.3 - 2 Japanese characters ○ (TokenCounter API is coming to Nano Prompt API later) “Hello! How are you doing today?” Hello ! _How _are _you _doing _today ? 4521 235341 2250 708 692 3900 3646 235336

Slide 27

Slide 27 text

1. Preprocessing - Embedding ● What is embedding? ○ Multidimensional vector ○ Can perform mathematical operations ● For example, “Paris is to France as London is to _____.”

Slide 28

Slide 28 text

Proprietary + Conﬁdential Embeddings 28 y x 0 Paris France London

Slide 29

Slide 29 text

Proprietary + Conﬁdential Embeddings 29 y x 0 Paris France London embd(???) = embd(France) - embd(Paris) + embd(London)

Slide 30

Slide 30 text

Proprietary + Conﬁdential Embeddings 30 y x 0 Paris France London England embd(England) = embd(France) - embd(Paris) + embd(London)

Slide 31

Slide 31 text

2. Decode ● Decoder is an effective text auto-completers ○ The LLM takes text as input, and generates the probabilities on what “word” (token) comes next ○ The next “word” is selected (“sampled”) from this probability distribution and appended to the input

Slide 32

Slide 32 text

Example Prompt: Translate this to Japanese, using a casual tone. User Input: What do you like? Answer:

Slide 33

Slide 33 text

Example Prompt: Translate this to Japanese, using a casual tone. User Input: What do you like? Answer: 何

Slide 34

Slide 34 text

Example Prompt: Translate this to Japanese, using a casual tone. User Input: What do you like? Answer: 何が

Slide 35

Slide 35 text

Example Prompt: Translate this to Japanese, using a casual tone. User Input: What do you like? Answer: 何が好き

Slide 36

Slide 36 text

Example Prompt: Translate this to Japanese, using a casual tone. User Input: What do you like? Answer: 何が好き？

Slide 37

Slide 37 text

Decoder - Self-Attention ● The Self Attention layer is responsible for seeing how embeddings relate to one another. For example, ○ Clarify words with multiple meanings Eg: 最中, queen ○ “Flavor” the meaning Eg: 仕事の最中, Queen of England ○ Conversely, incorporate preceding words

Slide 38

Slide 38 text

● Confident Adaptive Language Modeling (aka Early Exit) ● Speculative Decoding ● Prefix Caching (coming) ● MatFormer Sub-model AICore Optimization (again)

Slide 39

Slide 39 text

Decode - Output Probabilities ● Aka, Logits ● Embeddings converted to token with probabilities ● Then, sampled to output token

Slide 40

Slide 40 text

Prompt Engineering for Android Developers

Slide 41

Slide 41 text

Tip 0: Start small

Slide 42

Slide 42 text

Don't panic ● Let’s assume: ○ LLM is an employee who is naturally smart, but lack experiences ○ You're a kind, patient manager who is good at micro-management Summarize the meeting notes. Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any. 1st try Observe reflect

Slide 43

Slide 43 text

Tip 1: Test, test, and test!

Slide 44

Slide 44 text

Untested Prompt is the Root of All Evil ● Do not (completely) trust your intelligence ● 1. prepare evaluation samples ○ Recommendation: 200+ (with diversity) ● 2. decide metrics to evaluate quality ○ Eg: Accuracy (正解率), precision (適合率), recall (再現率), F1 score ● 3. Evaluate ⇒ Refine ⇒ Iterate

Slide 45

Slide 45 text

Tip 2: Tune Your Configuration

Slide 46

Slide 46 text

Inference Parameters Setting Accuracy Creativity Temperature Lower is better (Closer to 0) Set higher for more random responses Top-K Value of 1 chooses the most probable token from the entire vocabulary Set 40 to select from larger number of possible tokens Temperature: Controls the degree of randomness in token selection. Range: [0.0f, 1.0f] Output Token Limit: Max amount of text output from one prompt. Top-K: Determines how many tokens to select from to determine the next token. Range: [1, vocab]

Slide 47

Slide 47 text

Temperature ● A metaphor borrowed from thermodynamics ● Temperature controls the degree of randomness in token selection

Slide 48

Slide 48 text

OK. Then I'd set it 0 to avoid hallucination

Slide 49

Slide 49 text

No ● Hallucinations occur because of various reasons ● Note that LLM’s accuracy is not 100%. ● When repetition occurs, greedy decoding makes it worse ● Recommend to start with.. ○ 0.2, if output format is important ○ 0.8, for summarization ■ Side effect: broken output format But, don’t worry; We have Constraint Decoding

Slide 50

Slide 50 text

Recommendations Temperature: start with 0.5 Top-K: start with 40 Output Token Limit: start with 100 (only in case of Nano)

Slide 51

Slide 51 text

Tip 3: Enhance by emotional stimuli

Slide 52

Slide 52 text

Really? ● Yes. This is counter-intuitive, but it generally lead to better performance. ○ Emotional stimuli can enrich original prompts representation. ● Give it a try with “This is very important to my career.” on your prompts ;) Source: LLMs Understand and Can Be Enhanced by Emotional Stimuli - arxiv.org/abs/2307.11760

Slide 53

Slide 53 text

Tip 4: Reframe your instruction to understand you effectively

Slide 54

Slide 54 text

When you need to reframe your prompt ● Negative instruction ○ See Language models are not naysayers ● Regression problem (scoring) Source: arxiv.org/abs/2306.08189

Slide 55

Slide 55 text

Tip 5: Consider Common Practices

Slide 56

Slide 56 text

Common Practices ● Set Proper Role / Persona ● Premise Order Matters ○ Attention layer is more optimized to ordered process ○ See Premise Order Matters in Reasoning ● Add Delimiter (especially for Nano) ○ E.g., “###” Source: arxiv.org/abs/2402.08939

Slide 57

Slide 57 text

Common Practices (cont.) ● Write Prompts in English ○ But, don’t panic. Use Gemini to translate your prompt. ○ See Do Multilingual Language Models Think Better in English? Source: arxiv.org/abs/2308.01223

Slide 58

Slide 58 text

Common Practices (cont.) ● Use Chain-of-Thought Prompting ○ One or Few-shots with explaining thought process ○ See Chain-of-Thought Prompting Elicits Reasoning ● Additionally, describe the thought process ○ "Additionally, briefly explain the main reasons supporting your decision to help me understand your thought process." Source: arxiv.org/abs/2201.11903

Slide 59

Slide 59 text

Last, but Most Important Tip: Use Try Many-shots Examples

Slide 60

Slide 60 text

Why? ● LLM is very effective to understand from examples

Slide 61

Slide 61 text

In case of few-shot examples Extract the technical specifications from the text below in a JSON format. INPUT: Google Nest Wifi, network speed up to 1200Mpbs, 2.4GHz and 5GHz frequencies, WP3 protocol OUTPUT: { "product":"Google Nest Wifi", "speed":"1200Mpbs", "frequencies": ["2.4GHz", "5GHz"], "protocol":"WP3" } Google Pixel 7, 5G network, 8GB RAM, Tensor G2 processor, 128GB of storage, Lemongrass

Slide 62

Slide 62 text

In case of few-shot examples { "product": "Google Pixel 7", "network": "5G", "ram": "8GB", "processor": "Tensor G2", "storage": "128GB", "color": "Lemongrass" }

Slide 63

Slide 63 text

In-Context Learning: Most powerful prompting skill ● LLM is very effective to understand from examples ● Recommendation: Pick common mistakes ● See Many-Shot In-Context Learning Source: arxiv.org/abs/2404.11018

Slide 64

Slide 64 text

Sound good. But, should I do that manually?

Slide 65

Slide 65 text

Let Gemini Improve You Prompt! ● Automated Prompt Optimization ○ See A Systematic Survey of Automatic Prompt Optimization Techniques ● We are working on a way to help developers achieve automated prompt optimization! Source: arxiv.org/abs/2502.16923

Slide 66

Slide 66 text

Let’s Revisit Kakao T’s Use Case

Slide 67

Slide 67 text

Proprietary + Conﬁdential "Given a message, extract the recipient's basic address, detail address, name, and phone number. - Output ONLY a single, valid JSON object. - Use the following structure: { ""name"": ""extracted_name"" or null, ""phone"": ""extracted_phone_number"" or null, ""basic_address"": ""extracted_basic_address"" or null, ""detail_address"": ""extracted_detail_address"" or null } - Name is the recipient's name. If multiple names are present, choose the recipient's only. - Phone number is the recipient's phone number. If multiple phone numbers are present, choose the recipient's only. - Retain the original spelling and format from the message. - Recipient is sometimes marked as: [ ... ] - Basic address consists of province, city and street. - Detail address is the remainder of the basic address. Apartment name, unit, suite, and floor number should be included in detail address, not in basic address. - The followings are example of apartment name which should be included in detail address: [ ... ] - If the information of a field is missing, set as null. Note that the field should be null, not the string ""null"". Here is the message to extract: {input} " Case Study – Kakao T Old Prompt (highly optimized by internal/external engineering teams) Common Mistakes Extracts sender information instead of recipient Incorrectly splitting basic and detailed address components

Slide 68

Slide 68 text

Proprietary + Conﬁdential You are a ... highly accurate data extraction AI, specializing in Korean logistics and contact information...Please follow these instructions with extreme care. # OUTPUT SPECIFICATION You MUST output ONLY a single, valid JSON object ... # CORE LOGIC & PROCESSING STEPS ### **STEP 1: IDENTIFY THE RECIPIENT (CRITICAL FIRST STEP)** Your primary goal is to find the recipient. Use this hierarchy of rules: * **Rule A: Explicit Recipient First** ... * **Rule B: Implied Recipient (The Exception Rule)**... * **Rule C: Sequential Information**... ### **STEP 2: EXTRACT & CLEAN EACH FIELD FOR THE IDENTIFIED RECIPIENT** Once you have identified the recipient, extract their information precisely as follows. * **`"name"` Extraction & Cleaning:**... * **`"phone"` Extraction:**... * **`"basic_address"` and `"detail_address"` Splitting Logic:**... * **`basic_address` Definition:** This is the standard Korean ""Road Name Address"" ... * **`detail_address` Definition:** This is **ABSOLUTELY EVERYTHING** that comes after the building number in the full address string.... --- Here is the message to extract. Analyze it carefully and provide ONLY the final JSON object. {input} Case Study – Kakao T New Prompt More detailed step-by-step processing instructions More concrete explanations of address splitting logic After this, we added 2-shot examples (omitted)

Slide 69

Slide 69 text

What’s Next?

Slide 70

Slide 70 text

What’s Next? ● Android’s answer to MCP? マジック・サジェスト (Magic Cue)

Slide 71

Slide 71 text

Thank you very much! ● Start Your GenAI Journey with.. ○ d.android.com/ai , kaggle.com/whitepaper-prompt-engineering ● Feel free to reach out to me, if you have … ○ Any good idea / plan on Agentive AI ○ Any plan to implement use case of GenAI on Android ○ Interested in Early Evaluation for vertical subtitles on ExoPlayer ○ [email protected]