Hey Google, I want to use AI in my Android App but I'm poor. What are my options?

Fabio Catinella Senior Android Developer @ Hey Google, I want
to use AI in my Android App but I’m poor. What are my options?

AI is expensive GPT-5.4: Input: $2.50/ 1M Tokens Gemini 3.1:
Input: $2 < 200K Tokens $4 >= 200K Tokens Claude Opus 4.7: Input: $5/ 1M Tokens *Prices updated at April 26

Why are they so expensive? ROI: The tech giants that
spent billions creating these foundational models need to recover their massive research and development (R&D) investments. The "Large" in LLM: These models are mathematically gargantuan. Running inference requires massive clusters of high-end AI hardware (GPUs/TPUs). Energy Consumption: Processing millions of tokens requires an immense amount of electricity, creating a direct link between AI computation and rising utility expenses.

Wouldn’t be great if we could find a way to
fit a model inside our devices to cut the cost?

Small Language Model SLMs are lightweight versions of traditional language
models designed to operate e ff i ciently on resource- constrained environments such as smartphones, embedded systems, or low-power computers. Small Language Model (SLM) Source: h tt ps://huggingface.co/blog/jjokah/small-language-model

How are they made small? • Knowledge distillation: Training a
smaller "student" model using knowledge transferred from a larger "teacher" model. • Pruning: Removing redundant or less important parameters within the neural network architecture. • Quantization: Reducing the precision of numerical values used in calculations (e.g. converting fl oating-point numbers to integers). Source: h tt ps://huggingface.co/blog/jjokah/small-language-model

Source: h tt ps://www.superannotate.com/blog/small-language-models

How to use them on Android?

Android AICore Source: h tt ps://android-developers.googleblog.com/2023/12/a-new-foundation-for-ai-on-android.html The easiest way to
use an SLM on Android is by using Android AI Core. Android AICore is, in fact, a new system service in Android 14 that provides easy access to the SLM version of Gemini called Gemini Nano.

Android AICore Source: h tt ps://android-developers.googleblog.com/2023/12/a-new-foundation-for-ai-on-android.html 🤔

Gemini Nano is probably already installed on your device

Android AICore (Device Support) Source: h tt ps://developer.android.com/ai/gemini-nano Following the
same trend of Google Chrome, AICore (and Gemini Nano together with it) will be installed on your device at fi rst need*. * if the developer did his work right.

Android AICore (Capabilities) Source: h tt ps://developer.android.com/ai/gemini-nano • Summarization: Summarize
articles or conversations as a bulleted list. • Proofreading: Proofread short chat messages. • Rewriting: Rewrite short chat messages in di ff erent tones or styles. • Image Description: Generate a short description of a given image. • Speech Recognition: Transcribe spoken audio to text. • Prompt (beta): Generate text content based on a custom text-only or multimodal prompt.

How to use it Let’s take a movie application where
users can see trailers and read infos and reviews about a movie.

How to use it Imagine we want to use AI
su summarize all the reviews of a single movie using local models.

How to use it //libs.versions.toml module = “com.google.mlkit:genai-summarization"

How to use it private val summarizer by lazy {
val options = SummarizerOptions.builder(context) .setInputType(SummarizerOptions.InputType.ARTICLE) .setOutputType(SummarizerOptions.OutputType.ONE_BULLET) .setLongInputAutoTruncationEnabled(true) .build() Summarization.getClient(options) } For our goal we would use the Summarization feature Android AICore o ff ers. To do that we need to create an instance of Summarizer. *Context Limits: Gemini Nano has a context window of 4,096 tokens

//SummarizerOptions.class public @interface InputType { int ARTICLE = 1; int
CONVERSATION = 2; } public @interface Language { int ENGLISH = 0; int JAPANESE = 1; int KOREAN = 2; } public @interface OutputType { int ONE_BULLET = 1; int TWO_BULLETS = 2; int THREE_BULLETS = 3; }

How to use it val featureStatus = summarizer.checkFeatureStatus().get() when (featureStatus)
{ FeatureStatus.AVAILABLE -> {...} FeatureStatus.DOWNLOADING -> {...} FeatureStatus.DOWNLOADABLE -> { summarizer.downloadFeature() } FeatureStatus.UNAVAILABLE -> {…} Then we need to check if the model is already installed on the phone and download it in case.

How to use it override suspend fun summarize(reviews: List<Review>): String
{ val text = reviews.joinToString("\n\n") { it.content } val request = SummarizationRequest.builder(text).build() return try { val result = withContext(Dispatchers.IO) { summarizer.runInference(request).get() } result.summary.substringAfter("*").trim() } catch (e: Exception) { "Failed to summarize reviews: ${e.message}" } } Once we have downloaded the model and AICore is ready we can fi nally start to summarize a text.

Result

Everything’s good but… Unfortunately Android AICore comes with some important
downsides: • Availability: AICore unfortunately is not available on every device but only to a small set of them. • Discrepancy: Depending on the user’s device AICore might use di ff erent versions of Gemini Nano. Supporting only a few Android device models is quite limiting. Can be nice in case you want to o ff er “that” extra sparkle in your app but for sure you can’t use AI as a main feature. Source: h tt ps://developer.android.com/ai/gemini-nano

How can we make AI available on more devices?

Google AI Edge Google AI Edge is Google's o ff
i cial suite of tools, SDKs, and runtimes designed to enable developers to run Machine Learning and Arti fi cial Intelligence models directly on end-user devices (on-device / on edge), such as smartphones (Android and iOS), web browsers, PCs, and embedded systems. Source: h tt ps://ai.google.dev/edge

LiteRT-LM Source: h tt ps://ai.google.dev/edge/litert-lm LiteRT-LM is a framework optimized
to run Language Models on device. • KV-Cache Management: Prevents the smartphone from recalculating the entire chat history with every new typed word by saving context data in memory. • Tokenization: Automatically converts user text into numbers (tokens). • Multimodality: Natively supports combined input of text, images, and audio directly on-device. • And more…

How to use it Di ff erently from Android Ai
Core, in this case the user (or the app) is responsible for downloading the model. For this example we will use Gemma 4 E2B.

How to use it //libs.versions.toml module = "com.google.ai.edge.litertlm:litertlm-android"

How to use it private suspend fun initializeEngine(modelFile: File){ val
config = EngineConfig( modelPath = modelFile.absolutePath ) val newEngine = Engine(config) newEngine.initialize() } First thing to do is to initialize the Engine by loading the local model from fi le.

How to use it override suspend fun summarize(reviews: List<Review>): String
{ ... val prompt = "Please summarize the following reviews:\n\n${reviews.joinToString("\n\n") { it.content }}" + "Keep only the answer without saying anything about the prompt. Use max 100 words. Show only the summary." + "Show at the end the Average rating for that movie. Here are the ratings : ${reviews.joinToString(", ") { it.rating.toString() }}" return try { currentConversation = currentEngine.createConversation( conversationConfig = ConversationConfig( tools = listOf( tool(AverageToolSet()) ), ) ) currentConversation?.sendMessageAsync(prompt)?.collect { it -> _conversation.value += it.toString() } currentConversation?.close() ... } catch (e: Exception) { ... } } This time, there is not an already ready Summarizer, so we need to also write the prompt explicitly.

Tool Calling The library allows us to de fi ne
tools using the @Tool annotation, which under the hood generates a schema identical to OpenAI's tool de fi nition. class AverageToolSet : ToolSet { @Tool(description = "Calculates the arithmetic mean of a list of numeric movie ratings. Use this tool when you need to provide a precise average rating based on the provided review scores.”) fun getAverageRating( @ToolParam(description = "A list of floating-point numbers representing individual review scores (e.g., [8.5, 7.0, 9.0]). Ratings typically range from 1.0 to 10.0.") ratings: List<Double> = emptyList(), ): Double { return ratings.average() } } Source: h tt ps://ai.google.dev/edge/litert-lm/android#de fi ning_and_using_tools

Result

Again, this all seems great, but... By integrating a local
model into our app, we solved the cross-device availability issue of Android AICore. However, in doing so, we introduced a new challenge: the model has to be downloaded, and it's not light. The Gemma 4 E2B model is over 2GB. To convince a user to download that much data, you need a highly compelling use case certainly not just summarizing movie reviews. In the medical fi eld, however, this approach makes perfect sense. Patient privacy is paramount, making it entirely reasonable to accept a massive app footprint in exchange for a fully local model.

PROS: - Zero cost for both the developer and users
because the model is shared between the app and the operating system. CONS: - Unfortunately, however, it is not available on all Android devices. - Limited in functionality. Recap PROS: - Greater fl exibility in model selection. - Can execute tools and functions. CONS: - The model is not shared and must be downloaded by the app.

What’s next

What’s next Source: h tt ps://developer.android.com/ai During last I/O, Google
announced that Android is moving from a classic Operating system to a Intelligence System. This means there will be a more focus on the agentic side.

AppFunctions Source: h tt ps://developer.android.com/ai/appfunctions “AppFunctions serve as the mobile
equivalent of tools within the Model Context Protocol (MCP). While MCP traditionally standardizes how agents connect to server-side tools, AppFunctions provide the same mechanism for Android apps. This lets you expose your app's capabilities as orchestratable "tools" that authorized apps (callers) can discover and execute to ful fi ll user intents. “

How do they work? Source: h tt ps://developer.android.com/ai/appfunctions

How to declare an App Function Source: h tt ps://developer.android.com/ai/appfunctions
dependencies { implementation("androidx.appfunctions:appfunctions:1.0.0-alpha09") implementation(“androidx.appfunctions:appfunctions-service:1.0.0-alpha09") ksp("androidx.appfunctions:appfunctions-compiler:1.0.0-alpha09") }

How to declare an App Function Source: h tt ps://developer.android.com/ai/appfunctions
/** * Create a new task or reminder with a title, due time, and location. * * @param context The execution context provided by the system. * @param title The descriptive title of the task (e.g., "Pick up my package"). * @param dueDateTime The specific date and time when the task should be completed. * @param location The physical location associated with the task (e.g., "Work"). * @return The created Task */ @AppFunction(isDescribedByKDoc = true) suspend fun createTask( context: AppFunctionContext, title: String, dueDateTime: LocalDateTime? = null, location: String? = null ) : Task {...}

Verify AppFunction integration Source: h tt ps://developer.android.com/ai/appfunctions adb shell cmd
app_function list-app-functions | grep --after-context 10 $myPackageName Since AppFunctions cannot be tested yet except by select users on speci fi c apps, our only option to verify that our app is ready is to use ADB. By running ADB commands, we can inspect the declared AppFunctions on our device and fi lter them by name.

Conclusions

Conclusions • AI is expensive • AICore comes to help
but it is limited to few Android device models • LiteRT-LM allows us to run Local models in our application but still needs a big amount of storage because of the model itself. • AppFunctions give us the possibility to expose functions to models/agents that we don’t own (hence we don’t pay)

Google AI Edge Gallery

Fabio Catinella Senior Android Developer @ Grazie! @FabioCati [email protected]

Hey Google, I want to use AI in my Android App ...

Hey Google, I want to use AI in my Android App but I'm poor. What are my options?

More Decks by Fabio Catinella

Other Decks in Programming

Featured

Transcript